. 2012 Jan 15;28(2):229-37.

doi: 10.1093/bioinformatics/btr649. Epub 2011 Dec 6.

Identifying quantitative trait loci via group-sparse multitask regression and feature selection: an imaging genetics study of the ADNI cohort

Hua Wang¹, Feiping Nie, Heng Huang, Sungeun Kim, Kwangsik Nho, Shannon L Risacher, Andrew J Saykin, Li Shen; Alzheimer's Disease Neuroimaging Initiative

Affiliations

PMID: 22155867
PMCID: PMC3259438
DOI: 10.1093/bioinformatics/btr649

Identifying quantitative trait loci via group-sparse multitask regression and feature selection: an imaging genetics study of the ADNI cohort

Hua Wang et al. Bioinformatics. 2012.

. 2012 Jan 15;28(2):229-37.

doi: 10.1093/bioinformatics/btr649. Epub 2011 Dec 6.

Authors

Hua Wang¹, Feiping Nie, Heng Huang, Sungeun Kim, Kwangsik Nho, Shannon L Risacher, Andrew J Saykin, Li Shen; Alzheimer's Disease Neuroimaging Initiative

Affiliation

¹ Department of Computer Science and Engineering, University of Texas at Arlington, Arlington, TX 76019, USA.

PMID: 22155867
PMCID: PMC3259438
DOI: 10.1093/bioinformatics/btr649

Abstract

Motivation: Recent advances in high-throughput genotyping and brain imaging techniques enable new approaches to study the influence of genetic variation on brain structures and functions. Traditional association studies typically employ independent and pairwise univariate analysis, which treats single nucleotide polymorphisms (SNPs) and quantitative traits (QTs) as isolated units and ignores important underlying interacting relationships between the units. New methods are proposed here to overcome this limitation.

Results: Taking into account the interlinked structure within and between SNPs and imaging QTs, we propose a novel Group-Sparse Multi-task Regression and Feature Selection (G-SMuRFS) method to identify quantitative trait loci for multiple disease-relevant QTs and apply it to a study in mild cognitive impairment and Alzheimer's disease. Built upon regression analysis, our model uses a new form of regularization, group ℓ(2,1)-norm (G(2,1)-norm), to incorporate the biological group structures among SNPs induced from their genetic arrangement. The new G(2,1)-norm considers the regression coefficients of all the SNPs in each group with respect to all the QTs together and enforces sparsity at the group level. In addition, an ℓ(2,1)-norm regularization is utilized to couple feature selection across multiple tasks to make use of the shared underlying mechanism among different brain regions. The effectiveness of the proposed method is demonstrated by both clearly improved prediction performance in empirical evaluations and a compact set of selected SNP predictors relevant to the imaging QTs.

Availability: Software is publicly available at: http://ranger.uta.edu/%7eheng/imaging-genetics/.

PubMed Disclaimer

Figures

**Fig. 1.**
Top 37 AD risk factor genes used in this study and the numbers of their SNPs.

**Fig. 2.**
Pairwise LD correlation coefficients (r²>0.2 in blue) among the 1224 SNPs used in this study. The SNPs clearly form groups.

**Fig. 3.**
VBM ROIs used in this study are mapped onto a brain.

**Fig. 4.**
Illustration of the proposed G-SMuRFS method. We incorporate the group structural information of the genetic markers through a new group ℓ_2,1-norm regularization (‖W‖_{G_2,1}), and enforce ℓ_2,1-norm regularization (‖W‖_{G_2,1}) to jointly select prominent SNPs across all endophenotypes.

**Fig. 5.**
Performance comparison: The mean and SD of the root mean square errors (RMSEs) obtained from five cross-validation trials in each experiment are plotted, where each error bar indicates ±1 SD. (a) FreeSurfer imaging phenotypes; (b) VBM imaging phenotypes.

**Fig. 6.**
(a and b) Show the heat maps of RMSEs for predicting VBM (a) and FreeSurfer (b) measures using LR, RR, our G-SMuRFS method with SNPs grouped by gene and G-SMuRFS with SNPs grouped by r²>0.2, where top 10 SNPs were used in our G-SMuRFS methods. In (c), RMSEs for predicting VBM measures using four methods are mapped onto the brain volume.

**Fig. 7.**
Regression coefficients are visualized for top 10 selected SNPs in each of the four experiments (from top to bottom): (i) group by r²>0.2, regression on VBM measures; (ii) group by gene, regression on VBM measures; (iii) group by r²>0.2, regression on FreeSurfer measures; and (iv) group by gene, regression on FreeSurfer measures.

**Fig. 8.**
Pair-wise LD in a group of 46 SNPs proximal to SORCS1. Numerical values r² of the LD maps are determined by Haploview and visualized with WGAViewer. The top panel is the ideogram of the chromosome and the vertical red line represents the relative location of the locus of interest. In the second panel, regression coefficients*100 is plotted for each SNP for the FreeSurfer data, where two top hits rs765651 and rs1931600 are labeled with red lines. In the third panel, regression coefficients*100 is plotted for each SNP for the VBM data, where two top hits rs1931600 and rs1936488 are labeled with red lines. The fourth panel shows the recent selection score (Voight *et al.*, 2006). The bottom figure demonstrates the LD pattern among 46 SNPs.

See this image and copyright information in PMC

References

1. Argyriou A., et al. Advances in Neural Information Processing Systems. Vol. 19. MIT Press; 2007. Multi-task feature learning; pp. 41–48. - PubMed
1. Ashburner J., Friston K. Voxel-based morphometry–the methods. Neuroimage. 2000;11:805–821. - PubMed
1. Ballard D.H., et al. Comparisons of multi-marker association methods to detect association between a candidate region and disease. Genet. Epidemiol. 2010;34:201–212. - PMC - PubMed
1. Barrett J.C., et al. Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics. 2005;21:263–265. - PubMed
1. Bertram L., et al. Systematic meta-analyses of Alzheimer disease genetic association studies: the AlzGene database. Nat. Genet. 2007;39:17–23. - PubMed

Publication types

Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Medical
- MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Identifying quantitative trait loci via group-sparse multitask regression and feature selection: an imaging genetics study of the ADNI cohort

Affiliation

Identifying quantitative trait loci via group-sparse multitask regression and feature selection: an imaging genetics study of the ADNI cohort

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Medical