Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Jul 2;107(1):46-59.
doi: 10.1016/j.ajhg.2020.05.004. Epub 2020 May 28.

Non-parametric Polygenic Risk Prediction via Partitioned GWAS Summary Statistics

Affiliations

Non-parametric Polygenic Risk Prediction via Partitioned GWAS Summary Statistics

Sung Chun et al. Am J Hum Genet. .

Abstract

In complex trait genetics, the ability to predict phenotype from genotype is the ultimate measure of our understanding of genetic architecture underlying the heritability of a trait. A complete understanding of the genetic basis of a trait should allow for predictive methods with accuracies approaching the trait's heritability. The highly polygenic nature of quantitative traits and most common phenotypes has motivated the development of statistical strategies focused on combining myriad individually non-significant genetic effects. Now that predictive accuracies are improving, there is a growing interest in the practical utility of such methods for predicting risk of common diseases responsive to early therapeutic intervention. However, existing methods require individual-level genotypes or depend on accurately specifying the genetic architecture underlying each disease to be predicted. Here, we propose a polygenic risk prediction method that does not require explicitly modeling any underlying genetic architecture. We start with summary statistics in the form of SNP effect sizes from a large GWAS cohort. We then remove the correlation structure across summary statistics arising due to linkage disequilibrium and apply a piecewise linear interpolation on conditional mean effects. In both simulated and real datasets, this new non-parametric shrinkage (NPS) method can reliably allow for linkage disequilibrium in summary statistics of 5 million dense genome-wide markers and consistently improves prediction accuracy. We show that NPS improves the identification of groups at high risk for breast cancer, type 2 diabetes, inflammatory bowel disease, and coronary heart disease, all of which have available early intervention or prevention treatments.

Keywords: genome-wide association study; linkage disequilibrium; non-parametric prediction; phenotype prediction; polygenic score; prognosis; summary statistics.

PubMed Disclaimer

Conflict of interest statement

S.K. is a co-founder, chief executive officer, and a board member of Verve Therapeutics.

Figures

Figure 1
Figure 1
Overview of Non-Parametric Shrinkage (NPS) (A) For unlinked markers, NPS partitions SNPs into K subgroups splitting the GWAS effect sizes (βˆj) at cut-offs of b0,b1,,bK. Partitioned risk scores Gik are calculated for each partition k and individual i using an independent genotype-level training cohort. The per-partition shrinkage weights ωk are determined by the separation of Gik between training case subjects and control subjects. Estimating the per-partition shrinkage weights is a far easier problem than estimating per-SNP effects. The training sample size is small but still larger than the number of partitions, whereas for per-SNP effects, the GWAS sample size is considerably smaller than the number of markers in the genome. This procedure “shrinks” the estimated effect sizes not relying on any specific assumption about the distribution of true effect sizes. (B) For markers in LD, genotypes and estimated effects are decorrelated first by a linear projection P in non-overlapping windows of ∼2.5 Mb in length, and then NPS is applied to the data. The size of black dots indicates genotype frequencies in population. Before projection, genotypes at SNP 1 and 2 are correlated due to LD (D), and thus sampling errors of estimated effects (βˆj|βj) are also correlated between adjacent SNPs. The projection P neutralizes both correlation structures. The axes of projection are marked by red dashed lines. βj denotes the true genetic effect at SNP j. Ng is the sample size of GWAS cohort.
Figure 2
Figure 2
Per-Partition Shrinkage Weights Estimated by Non-Parametric Shrinkage (NPS) Approximate the Conditional Mean Effects in the Decorrelated Space (A) NPS shrinkage weights ωk (red line) compared to the theoretical optimum (black line), λlj/(λlj+MNgh2), under infinitesimal architecture. The partition of largest eigenvalues S10 is marked by gray box. (B) Conditional mean effects estimated by NPS (red line) in sub-partitions of S10 by |ηˆlj| under infinitesimal architecture. The theoretical line (black) is the average over all λlj in S10. (C and D) Conditional mean effects estimated by NPS (red line) in sub-partitions of S10 (C) and S2 (D) on intervals of |ηˆlj| under non-infinitesimal architecture with the causal SNP fraction of 1%. The true conditional means (black) were estimated over 40 simulation runs. The mean NPS shrinkage weights (red line) and their 95% CIs (red shade) were estimated from five replicates. Grey vertical lines indicate partitioning cut-offs. No shrinkage line (green) indicates ωk=1. The number of markers M is 101,296. The discovery GWAS size Ng equals to M. The heritability h2 is 0.5.

References

    1. Grundy S.M., Stone N.J., Bailey A.L., Beam C., Birtcher K.K., Blumenthal R.S., Braun L.T., de Ferranti S., Faiella-Tommasino J., Forman D.E. 2018 AHA/ACC/AACVPR/AAPA/ABC/ACPM/ADA/AGS/APhA/ASPC/NLA/PCNA Guideline on the Management of Blood Cholesterol: A Report of the American College of Cardiology/American Heart Association Task Force on Clinical Practice Guidelines. Circulation. 2018;139:e1082–e1143. - PMC - PubMed
    1. Goddard M.E., Hayes B.J. Mapping genes for complex traits in domestic animals and their use in breeding programmes. Nat. Rev. Genet. 2009;10:381–391. - PubMed
    1. Falke K.C., Glander S., He F., Hu J., de Meaux J., Schmitz G. The spectrum of mutations controlling complex traits and the genetics of fitness in plants. Curr. Opin. Genet. Dev. 2013;23:665–671. - PubMed
    1. Meuwissen T.H., Hayes B.J., Goddard M.E. Prediction of total genetic value using genome-wide dense marker maps. Genetics. 2001;157:1819–1829. - PMC - PubMed
    1. Ripatti S., Tikkanen E., Orho-Melander M., Havulinna A.S., Silander K., Sharma A., Guiducci C., Perola M., Jula A., Sinisalo J. A multilocus genetic risk score for coronary heart disease: case-control and prospective cohort analyses. Lancet. 2010;376:1393–1400. - PMC - PubMed

Publication types

LinkOut - more resources