Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 Jan 15;28(2):229-37.
doi: 10.1093/bioinformatics/btr649. Epub 2011 Dec 6.

Identifying quantitative trait loci via group-sparse multitask regression and feature selection: an imaging genetics study of the ADNI cohort

Affiliations

Identifying quantitative trait loci via group-sparse multitask regression and feature selection: an imaging genetics study of the ADNI cohort

Hua Wang et al. Bioinformatics. .

Abstract

Motivation: Recent advances in high-throughput genotyping and brain imaging techniques enable new approaches to study the influence of genetic variation on brain structures and functions. Traditional association studies typically employ independent and pairwise univariate analysis, which treats single nucleotide polymorphisms (SNPs) and quantitative traits (QTs) as isolated units and ignores important underlying interacting relationships between the units. New methods are proposed here to overcome this limitation.

Results: Taking into account the interlinked structure within and between SNPs and imaging QTs, we propose a novel Group-Sparse Multi-task Regression and Feature Selection (G-SMuRFS) method to identify quantitative trait loci for multiple disease-relevant QTs and apply it to a study in mild cognitive impairment and Alzheimer's disease. Built upon regression analysis, our model uses a new form of regularization, group ℓ(2,1)-norm (G(2,1)-norm), to incorporate the biological group structures among SNPs induced from their genetic arrangement. The new G(2,1)-norm considers the regression coefficients of all the SNPs in each group with respect to all the QTs together and enforces sparsity at the group level. In addition, an ℓ(2,1)-norm regularization is utilized to couple feature selection across multiple tasks to make use of the shared underlying mechanism among different brain regions. The effectiveness of the proposed method is demonstrated by both clearly improved prediction performance in empirical evaluations and a compact set of selected SNP predictors relevant to the imaging QTs.

Availability: Software is publicly available at: http://ranger.uta.edu/%7eheng/imaging-genetics/.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Top 37 AD risk factor genes used in this study and the numbers of their SNPs.
Fig. 2.
Fig. 2.
Pairwise LD correlation coefficients (r2>0.2 in blue) among the 1224 SNPs used in this study. The SNPs clearly form groups.
Fig. 3.
Fig. 3.
VBM ROIs used in this study are mapped onto a brain.
Fig. 4.
Fig. 4.
Illustration of the proposed G-SMuRFS method. We incorporate the group structural information of the genetic markers through a new group ℓ2,1-norm regularization (‖WG2,1), and enforce ℓ2,1-norm regularization (‖WG2,1) to jointly select prominent SNPs across all endophenotypes.
Fig. 5.
Fig. 5.
Performance comparison: The mean and SD of the root mean square errors (RMSEs) obtained from five cross-validation trials in each experiment are plotted, where each error bar indicates ±1 SD. (a) FreeSurfer imaging phenotypes; (b) VBM imaging phenotypes.
Fig. 6.
Fig. 6.
(a and b) Show the heat maps of RMSEs for predicting VBM (a) and FreeSurfer (b) measures using LR, RR, our G-SMuRFS method with SNPs grouped by gene and G-SMuRFS with SNPs grouped by r2>0.2, where top 10 SNPs were used in our G-SMuRFS methods. In (c), RMSEs for predicting VBM measures using four methods are mapped onto the brain volume.
Fig. 7.
Fig. 7.
Regression coefficients are visualized for top 10 selected SNPs in each of the four experiments (from top to bottom): (i) group by r2>0.2, regression on VBM measures; (ii) group by gene, regression on VBM measures; (iii) group by r2>0.2, regression on FreeSurfer measures; and (iv) group by gene, regression on FreeSurfer measures.
Fig. 8.
Fig. 8.
Pair-wise LD in a group of 46 SNPs proximal to SORCS1. Numerical values r2 of the LD maps are determined by Haploview and visualized with WGAViewer. The top panel is the ideogram of the chromosome and the vertical red line represents the relative location of the locus of interest. In the second panel, regression coefficients*100 is plotted for each SNP for the FreeSurfer data, where two top hits rs765651 and rs1931600 are labeled with red lines. In the third panel, regression coefficients*100 is plotted for each SNP for the VBM data, where two top hits rs1931600 and rs1936488 are labeled with red lines. The fourth panel shows the recent selection score (Voight et al., 2006). The bottom figure demonstrates the LD pattern among 46 SNPs.

References

    1. Argyriou A., et al. Advances in Neural Information Processing Systems. Vol. 19. MIT Press; 2007. Multi-task feature learning; pp. 41–48. - PubMed
    1. Ashburner J., Friston K. Voxel-based morphometry–the methods. Neuroimage. 2000;11:805–821. - PubMed
    1. Ballard D.H., et al. Comparisons of multi-marker association methods to detect association between a candidate region and disease. Genet. Epidemiol. 2010;34:201–212. - PMC - PubMed
    1. Barrett J.C., et al. Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics. 2005;21:263–265. - PubMed
    1. Bertram L., et al. Systematic meta-analyses of Alzheimer disease genetic association studies: the AlzGene database. Nat. Genet. 2007;39:17–23. - PubMed

Publication types