Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Apr;45(4):400-5, 405e1-3.
doi: 10.1038/ng.2579. Epub 2013 Mar 3.

Projecting the performance of risk prediction based on polygenic analyses of genome-wide association studies

Affiliations

Projecting the performance of risk prediction based on polygenic analyses of genome-wide association studies

Nilanjan Chatterjee et al. Nat Genet. 2013 Apr.

Abstract

We report a new method to estimate the predictive performance of polygenic models for risk prediction and assess predictive performance for ten complex traits or common diseases. Using estimates of effect-size distribution and heritability derived from current studies, we project that although 45% of the variance of height has been attributed to SNPs, a model trained on one million people may only explain 33.4% of variance of the trait. Models based on current studies allow for identification of 3.0%, 1.1% and 7.0% of the populations at twofold or higher than average risk for type 2 diabetes, coronary artery disease and prostate cancer, respectively. Tripling of sample sizes could elevate these percentages to 18.8%, 6.1% and 12.2%, respectively. The utility of polygenic models for risk prediction will depend on achievable sample sizes for the training data set, the underlying genetic architecture and the inclusion of information on other risk factors, including family history.

PubMed Disclaimer

Figures

Figure 1
Figure 1. Predictive correlation coefficient (PCC) for polygenic models and corresponding optimal significance level for SNP selection under three models for polygenic architectures for adult height
Each model assumes a total of 45% of phenotypic variance of adult height can be explained by common SNPs included in standard GWAS platforms involving M=200,000 independent SNPs. The effect size distribution for susceptibility SNPs are assumed to follow an exponential distribution (black line), a mixture of two exponential distributions (red line) or a mixture of three exponential distributions (blue line). Panel (a) and (b) show expected value of squared PCC and corresponding optimal significance level (αopt), respectively, as a function of sample size (N). Panel (c) compares PCC values reported in a predictive analysis of the GIANT study (dashed line) with corresponding theoretical expected values under the three different models.
Figure 2
Figure 2
Expected predictive correlation coefficient (PCC) for polygenic models at optimal significance level for SNP selection for four quantitative traits. For HDL and BMI, range of performance is shown corresponding to estimate of hg2 (yellow line) and associated 95% confidence interval (dark blue region). For LDL and TC, for which direct estimate of hg2 is not available, a range of values are chosen based on constraints imposed by the observed discoveries. For all traits, the underlying effect-size distribution is assumed to follow a mixture of three exponential distributions, which together with hg2 is calibrated to explain observed discoveries from the largest GWAS (see Methods).
Figure 3
Figure 3. Expected AUC statistics at optimal significance level for SNP selection for five disease traits.
For all diseases except CAD, range of performance is shown corresponding to estimate of hg2 (yellow line) and associated 95% confidence intervals (dark blue region). For CAD, for which direct estimate of hg2 is not available, a range of its values are chosen based on constraints imposed by the observed discoveries. For all traits, the underlying effect-size distribution is assumed to follow a mixture of two or three exponential distribution, which together with hg2 is calibrated to explain observed discoveries from the largest GWAS (see Methods).

References

    1. Bowles Biesecker B, Marteau TM. The future of genetic counselling: an international perspective. Nat Genet. 1999;22:133–7. - PubMed
    1. Pharoah PD, et al. Polygenic susceptibility to breast cancer and implications for prevention. Nat Genet. 2002;31:33–6. - PubMed
    1. van Hoek M, et al. Predicting type 2 diabetes based on polymorphisms from genome-wide association studies: a population-based study. Diabetes. 2008;57:3122–8. - PMC - PubMed
    1. Pharoah PD, Antoniou AC, Easton DF, Ponder BA. Polygenes, risk prediction, and targeted prevention of breast cancer. N Engl J Med. 2008;358:2796–803. - PubMed
    1. Wacholder S, et al. Performance of common genetic variants in breast-cancer risk models. N Engl J Med. 2010;362:986–93. - PMC - PubMed

Publication types