Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 Jan 6;11(1):Article 7.
doi: 10.2202/1544-6115.1755.

Fast identification of biological pathways associated with a quantitative trait using group lasso with overlaps

Collaborators, Affiliations

Fast identification of biological pathways associated with a quantitative trait using group lasso with overlaps

Matt Silver et al. Stat Appl Genet Mol Biol. .

Abstract

Where causal SNPs (single nucleotide polymorphisms) tend to accumulate within biological pathways, the incorporation of prior pathways information into a statistical model is expected to increase the power to detect true associations in a genetic association study. Most existing pathways-based methods rely on marginal SNP statistics and do not fully exploit the dependence patterns among SNPs within pathways.We use a sparse regression model, with SNPs grouped into pathways, to identify causal pathways associated with a quantitative trait. Notable features of our "pathways group lasso with adaptive weights" (P-GLAW) algorithm include the incorporation of all pathways in a single regression model, an adaptive pathway weighting procedure that accounts for factors biasing pathway selection, and the use of a bootstrap sampling procedure for the ranking of important pathways. P-GLAW takes account of the presence of overlapping pathways and uses a novel combination of techniques to optimise model estimation, making it fast to run, even on whole genome datasets.In a comparison study with an alternative pathways method based on univariate SNP statistics, our method demonstrates high sensitivity and specificity for the detection of important pathways, showing the greatest relative gains in performance where marginal SNP effect sizes are small.

PubMed Disclaimer

Figures

Figure 1
Figure 1
The problem of overlapping pathways: here there are three pathways, G1,G2 and G3, two of which overlap. A: Standard formulation. Pathway parameter vectors β1 and β2 overlap, since they have SNPs in common (shaded dark grey). Where an overlapping SNP has a non-zero coEfficient, only G3, can be selected independently. B: Formulation with duplicated SNPs. An expanded G parameter vector, β*, is created by duplicating overlapping SNPs (dotted line). β1* and β2* now enter the model separately, so that pathways can be selected independently.
Figure 2
Figure 2
SNP to pathway mapping.
Figure 3
Figure 3
Frequency distribution of ADNI SNPs by number of pathways they map to. SNPs are mapped to genes within 10kbp. The data set consists of 8,078 SNPs and 551 pathways.
Figure 4
Figure 4
Distributions of C across 500 MC simulations for the 6 scenarios described in Table 1. Where SNPs are distributed within a single gene (scenarios (c) and (f)), the number of causal pathways tends to be larger, since a single gene can map to multiple pathways. Where SNPs are distributed randomly across Gϕ (scenarios (a), (b), (d), and (e)), this number tends to be smaller, particularly where the number of causal SNPs is large (scenarios (a) and (d)).
Figure 5
Figure 5
Application of bias-adjusted weighting procedure to the data used in the simulation study. R = 40,000, with a different null response, y~N(0, 1), at each MC simulation. α = 0.98. (a) Empirical pathway selection frequency distribution, Π*, with standard, pathway size weighting, wl=Sl. D = 2.24. Dotted horizontal line shows the expected distribution, IIl = 1/L ≃ 0.002. (b) Π* with bias-adjusted weights after 10 iterations. D = 0.12. (c) Variation of weighting adjustment factor w(τ)/w(τ–1) with dl at a single iteration, with α = 0.98. Each point represents the adjustment to a single wl, l = 1,…,L. (d) Decrease in K-L divergence, D, over 10 iterations.
Figure 6
Figure 6
Comparison of ranking performance: adaptive weighting scheme (section 2.3) vs. standard pathway size weighting (13). S = 10; δk = 0.005; SNPs randomly distributed across Gϕ. (a) ROC curves illustrating power to identify at least one causal pathway in the top 100. Power is average across 500 simulations. (b) Distribution of ranking power, p100, across 500 simulations. This is the proportion C100*C of causal pathways in C that are ranked in the top 100 pathways. Notches indicate 95% confidence intervals for the true median. (c) Distribution of the power-adjusted, normalised, weighted ranking score, R, across 500 simulations. The final ‘50+’ column includes simulations for which no causal pathway was ranked in the top 100, i.e. C100*=; R = 100.
Figure 7
Figure 7
ROC curves illustrating proportion of simulations with rk1z, for ranks z = 1,2,…,100. Power is average across 500 simulations. False positive rate = (z – 1)/L. Scenarios corresponding to the higher SNP effiect size (δk = 0.005) are presented in the left-hand column, with the equivalent scenarios at the lower effect size (δk = 0.001) on the right.
Figure 8
Figure 8
Box plots of distribution of ranking power, p100, across 500 simulations. This is the proportion C100*C of causal pathways in C that are ranked in the top 100 pathways. Notches indicate 95% confidence intervals for the true median.
Figure 9
Figure 9
Distribution of the power-adjusted, normalised, weighted ranking score, R, across 500 simulations. The final ‘50+’ column includes simulations for which no causal pathway was ranked in the top 100, i.e. C100*=; R = 100.

References

    1. Bach F. Consistency of the group lasso and multiple kernel learning. J. Mach. Learn. Res. 2008;9:1179–1225.
    1. Ballard DH, Cho J, Zhao H. Comparisons of multi-marker association methods to detect association between a candidate region and disease. Genetic epidemiology. 2010;34:201–12. - PMC - PubMed
    1. Baranzini SE, Galwey NW, Wang J, Khankhanian P, Lindberg R, Pelletier D, Wu W, Uitdehaag BMJ, Kappos L, Polman CH, Matthews PM, Hauser SL, Gibson RA, Oksenberg JR, Barnes MR. Pathway and network-based analysis of genome-wide association studies in multiple sclerosis. Human molecular genetics. 2009;18:2078–90. - PMC - PubMed
    1. Bigos KL, Weinberger DR. Imaging genetics-days of future past. NeuroImage. 2010;53:804–809. - PubMed
    1. Breheny P, Huang J. Penalized methods for bi-level variable selection. Statistics and Its Interface. 2009;2:369–380. - PMC - PubMed

Publication types

LinkOut - more resources