Projecting the performance of risk prediction based on polygenic analyses of genome-wide association studies

Nilanjan Chatterjee¹, Bill Wheeler, Joshua Sampson, Patricia Hartge, Stephen J Chanock, Ju-Hyun Park

Affiliations

Affiliation

¹ Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Department of Human and Human Services, Rockville, Maryland, USA.

PMID: 23455638
PMCID: PMC3729116
DOI: 10.1038/ng.2579

Projecting the performance of risk prediction based on polygenic analyses of genome-wide association studies

Nilanjan Chatterjee et al. Nat Genet. 2013 Apr.

. 2013 Apr;45(4):400-5, 405e1-3.

doi: 10.1038/ng.2579. Epub 2013 Mar 3.

Authors

Nilanjan Chatterjee¹, Bill Wheeler, Joshua Sampson, Patricia Hartge, Stephen J Chanock, Ju-Hyun Park

Affiliation

¹ Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Department of Human and Human Services, Rockville, Maryland, USA.

PMID: 23455638
PMCID: PMC3729116
DOI: 10.1038/ng.2579

Abstract

We report a new method to estimate the predictive performance of polygenic models for risk prediction and assess predictive performance for ten complex traits or common diseases. Using estimates of effect-size distribution and heritability derived from current studies, we project that although 45% of the variance of height has been attributed to SNPs, a model trained on one million people may only explain 33.4% of variance of the trait. Models based on current studies allow for identification of 3.0%, 1.1% and 7.0% of the populations at twofold or higher than average risk for type 2 diabetes, coronary artery disease and prostate cancer, respectively. Tripling of sample sizes could elevate these percentages to 18.8%, 6.1% and 12.2%, respectively. The utility of polygenic models for risk prediction will depend on achievable sample sizes for the training data set, the underlying genetic architecture and the inclusion of information on other risk factors, including family history.

PubMed Disclaimer

Figures

**Figure 1. Predictive correlation coefficient (PCC) for polygenic models and corresponding optimal significance level for SNP selection under three models for polygenic architectures for adult height**
Each model assumes a total of 45% of phenotypic variance of adult height can be explained by common SNPs included in standard GWAS platforms involving M=200,000 independent SNPs. The effect size distribution for susceptibility SNPs are assumed to follow an exponential distribution (black line), a mixture of two exponential distributions (red line) or a mixture of three exponential distributions (blue line). Panel (a) and (b) show expected value of squared PCC and corresponding optimal significance level (α_opt), respectively, as a function of sample size (N). Panel (c) compares PCC values reported in a predictive analysis of the GIANT study (dashed line) with corresponding theoretical expected values under the three different models.

**Figure 2**
Expected predictive correlation coefficient (PCC) for polygenic models at optimal significance level for SNP selection for four quantitative traits. For HDL and BMI, range of performance is shown corresponding to estimate of $h_{g}^{2}$ (yellow line) and associated 95% confidence interval (dark blue region). For LDL and TC, for which direct estimate of $h_{g}^{2}$ is not available, a range of values are chosen based on constraints imposed by the observed discoveries. For all traits, the underlying effect-size distribution is assumed to follow a mixture of three exponential distributions, which together with $h_{g}^{2}$ is calibrated to explain observed discoveries from the largest GWAS (see **Methods**).

**Figure 3. Expected AUC statistics at optimal significance level for SNP selection for five disease traits.**
For all diseases except CAD, range of performance is shown corresponding to estimate of $h_{g}^{2}$ (yellow line) and associated 95% confidence intervals (dark blue region). For CAD, for which direct estimate of $h_{g}^{2}$ is not available, a range of its values are chosen based on constraints imposed by the observed discoveries. For all traits, the underlying effect-size distribution is assumed to follow a mixture of two or three exponential distribution, which together with $h_{g}^{2}$ is calibrated to explain observed discoveries from the largest GWAS (see **Methods**).

See this image and copyright information in PMC

References

1. Bowles Biesecker B, Marteau TM. The future of genetic counselling: an international perspective. Nat Genet. 1999;22:133–7. - PubMed
1. Pharoah PD, et al. Polygenic susceptibility to breast cancer and implications for prevention. Nat Genet. 2002;31:33–6. - PubMed
1. van Hoek M, et al. Predicting type 2 diabetes based on polymorphisms from genome-wide association studies: a population-based study. Diabetes. 2008;57:3122–8. - PMC - PubMed
1. Pharoah PD, Antoniou AC, Easton DF, Ponder BA. Polygenes, risk prediction, and targeted prevention of breast cancer. N Engl J Med. 2008;358:2796–803. - PubMed
1. Wacholder S, et al. Performance of common genetic variants in breast-cancer risk models. N Engl J Med. 2010;362:986–93. - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

ZIA CP010181/ImNIH/Intramural NIH HHS/United States

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Projecting the performance of risk prediction based on polygenic analyses of genome-wide association studies

Affiliation

Projecting the performance of risk prediction based on polygenic analyses of genome-wide association studies

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources