Abstract
Genome-Wide Association Study (GWAS) aims to identify genetic variants that are significantly associated with genetic traits. To analyze GWAS data that often contains 0.5 to 1 million Single Nucleotide Polymorphisms (SNPs) genotyped from thousands of individuals, stringent statistical significant thresholds are pre-defined for multiple testing adjustment, e.g., with p-value < 10-8 for single SNP detection and at least < 10-12 for SNP-SNP interaction detection. Such stringent thresholds were used for efficiency computation but it hinders the discovery of many true genetic variants and more practical approaches are needed to conduct GWAS. In this paper, we propose a machine learning approach to identify groups of predictive SNPs in GWAS analysis. Our method differs from other methods by first translates the genomics knowledge into SNP grouping as priors, then select a list of most predictive SNP groups using linear regression regularized by group sparse constraints, solved by Group-lasso (Least Absolute Shrinkage and Selection Operator). The selected SNPs groups compose a sparse feature space which yields a higher predictive power for continuous trait prediction. We conduct experiment on SiMES (Singapore Malay Eye Study) data set, with 3280 Malay individuals genotyped on Illumina 610 quad arrays. We investigate one discrete trait (Glaucoma) and two glaucoma-related quantitative traits, optic Disc-Cup-Ratio (CDR) and Intraocular Pressure (IOP). The hypothesis is that, with more biological knowledge embedded, a learning mechanism yields higher predictive power. Our preliminary results support the above hypothesis. Further analysis reveals that our approach can identify groups of SNPs highly associated with a particular genetic trait, in spite of the small sample size and the incomplete biological knowledge.
Original language | English |
---|---|
Pages (from-to) | 107-114 |
Number of pages | 8 |
Journal | Procedia Computer Science |
Volume | 11 |
DOIs | |
Publication status | Published - 2012 |
Externally published | Yes |
Event | 3rd International Conference on Computational Systems-Biology and Bioinformatics, CSBio 2012 - Bangkok, Thailand Duration: 3 Oct 2012 → 5 Oct 2012 |
Keywords
- Genome wide association study (GWAS)
- Group-lasso
- Least absolute shrinkage selector operation (lasso)
- Regularized linear regression
- Single nucleotide polymorphism (SNP)
ASJC Scopus subject areas
- General Computer Science