Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Jan 21;3(2):100091.
doi: 10.1016/j.xhgg.2022.100091. eCollection 2022 Apr 14.

Stability of polygenic scores across discovery genome-wide association studies

Affiliations

Stability of polygenic scores across discovery genome-wide association studies

Laura M Schultz et al. HGG Adv. .

Abstract

Polygenic scores (PGS) are commonly evaluated in terms of their predictive accuracy at the population level by the proportion of phenotypic variance they explain. To be useful for precision medicine applications, they also need to be evaluated at the individual level when phenotypes are not necessarily already known. We investigated the stability of PGS in European American (EUR) and African American (AFR)-ancestry individuals from the Philadelphia Neurodevelopmental Cohort and the Adolescent Brain Cognitive Development study using different discovery genome-wide association study (GWAS) results for post-traumatic stress disorder (PTSD), type 2 diabetes (T2D), and height. We found that pairs of EUR-ancestry GWAS for the same trait had genetic correlations >0.92. However, PGS calculated from pairs of same-ancestry and different-ancestry GWAS had correlations that ranged from <0.01 to 0.74. PGS stability was greater for height than for PTSD or T2D. A series of height GWAS in the UK Biobank suggested that correlation between PGS is strongly dependent on the extent of sample overlap between the discovery GWAS. Focusing on the upper end of the PGS distribution, different discovery GWAS do not consistently identify the same individuals in the upper quantiles, with the best case being 60% of individuals above the 80th percentile of PGS overlapping from one height GWAS to another. The degree of overlap decreases sharply as higher quantiles, less heritable traits, and different-ancestry GWAS are considered. PGS computed from different discovery GWAS have only modest correlation at the individual level, underscoring the need to proceed cautiously with integrating PGS into precision medicine applications.

Keywords: Adolescent Brain Cognitive Development study; African American; PRS-CS; PTSD; Philadelphia Neurodevelopmental Cohort; UK Biobank; ancestry; height; methods development; type 2 diabetes.

PubMed Disclaimer

Conflict of interest statement

R.B. reports serving on the scientific board and owning stock in Taliaz Health, with no conflict of interest relevant to this work. The other authors declare no competing interests.

Figures

Figure 1
Figure 1
First and second principal components of cohort genotypes Principal components (PCs) were computed and projected to a 1000 Genomes reference using KING (Manichaikul et al.52). Colors indicate inferred genetic ancestry for the (A) 9,206 Philadelphia Neurodevelopmental Cohort (PNC) and (B) 10,318 Adolescent Brain Cognitive Development (ABCD) genotyped samples.
Figure 2
Figure 2
Reproducibility of Bayesian posterior effects computed by PRS-CS As illustrated for chromosome 3 (76,064 SNPs) and chromosome 21 (15,447 SNPs) using the Nievergelt et al. EUR PTSD discovery GWAS with the PNC EUR dataset, posterior effects were more strongly correlated between PRS-CS runs as the number of MCMC iterations (and burn-in iterations) increased.
Figure 3
Figure 3
Reproducibility of PGS across multiple runs of PRS-CS PC-adjusted standardized PGS computed from posterior effects generated by two runs of PRS-CS using the same PTSD discovery GWAS from Nievergelt et al. had correlations greater than r = 0.999 for both the EUR (n = 5,239) and AFR (n = 3,260) cohorts of PNC.
Figure 4
Figure 4
Correlation between PGS computed from two different AFR-ancestry PTSD discovery GWAS for AFR-ancestry individuals Significant positive correlations were observed between the AFR PGS computed from the PGC Freeze 1 and Freeze 2 AFR PTSD GWAS for both the PNC (r = 0.696, t(3,258) = 55.26, p < 2 × 10−16) and ABCD (r = 0.657, t(1,739) = 36.34, p < 2 × 10−16) AFR cohorts.
Figure 5
Figure 5
Correlation between PGS computed from two different EUR-ancestry discovery GWAS for EUR-ancestry individuals Pairs of PGS computed for the EUR samples of PNC (n = 5,239) and ABCD (n = 5,815) using two different EUR discovery GWAS for PTSD,, T2D,, and height, all showed significant positive correlations.
Figure 6
Figure 6
Correlation between PGS computed from AFR-ancestry and EUR-ancestry discovery GWAS for AFR-ancestry individuals Pairs of PGS computed for the AFR samples of PNC and ABCD from the newer EUR and AFR discovery GWAS were not significantly correlated for either PTSD or T2D,, but there was a significant positive correlation for height.
Figure 7
Figure 7
Correlation between PGS computed from EUR-ancestry and AFR-ancestry discovery GWAS for EUR-ancestry individuals Pairs of PGS computed for the EUR samples of PNC and ABCD from the newer EUR and AFR discovery GWAS were not significantly correlated for either PTSD or T2D,, but there was a significant positive correlation for height.
Figure 8
Figure 8
Correlation between PGS computed from seven white British height GWAS for an independent test set of 8,107 unrelated white British individuals from the UK Biobank GWAS A and GWAS B were each run for n = 134,000 non-overlapping, unrelated white British individuals using sex, age at height measurement, and the first 20 ancestry PCs as covariates. The GWAS A and GWAS B samples were combined to run GWAS AB (n = 268,000). GWAS C was run using a random subsample (n = 75,000) of the individuals included in GWAS A, and GWAS E was run using a random subsample (n = 10,000) of the individuals included in GWAS C. The same relationship exists between GWAS B, GWAS D (n = 75,000), and GWAS F (n = 10,000). The strength of the correlation between PGS is driven by both GWAS sample size and the degree of sample overlap between the GWAS. ∗∗∗p < 0.001.
Figure 9
Figure 9
Contributions of GWAS sample size and proportional sample overlap to the correlation between height PGS Height GWAS A and GWAS B were each run for n = 134,000 non-overlapping, unrelated white British individuals using sex, age at height measurement, and the first 20 ancestry PCs as covariates. The GWAS A and GWAS B samples were combined to run GWAS AB (n = 268,000). GWAS C was run using a random subsample (n = 75,000) of the individuals included in GWAS A, and GWAS E was run using a random subsample (n = 10,000) of the individuals included in GWAS C. The same relationship exists between GWAS B, GWAS D (n = 75,000), and GWAS F (n = 10,000). Black dots correspond to the Pearson correlation coefficients for height PGS computed from pairs of discovery GWAS with no sample overlap. When the PGS were computed from overlapping discovery GWAS, the correlation coefficients are depicted using colored dots; the legend lists the number of samples in common as well as the proportion of samples in common for each color. Error bars denote 95% confidence intervals. PGS from pairs of discovery GWAS are more strongly correlated when there is a higher proportion of sample overlap between the GWAS.
Figure 10
Figure 10
Comparison of the samples comprising the top PGS quantiles for the PNC AFR cohort (A) The samples located at the top 20%, 10%, and 5% of the PTSD PGS distribution were virtually the same when PGS were computed twice using the same discovery GWAS. For example, 644 out of the 652 samples (98.7%) at or above the 80th percentile were the same between the two batches of PGS. (B) The overlap between samples at all three quantiles dropped substantially when the PGS computed from the AFR PGC Freeze 1 PTSD discovery GWAS were compared with those computed from the AFR Freeze 2 PTSD discovery GWAS (Nievergelt et al.39), with the degree of overlap being reduced at higher quantiles. (C) The degree of overlap was further reduced when comparing PGS computed from an AFR-ancestry discovery GWAS to those computed from a EUR-ancestry GWAS for PTSD (Nievergelt et al.39), T2D,, and height (Marouli et al.44). For context, the green bars depict the number of samples included at or above the 80th percentile (n = 652), 90th percentile (n = 326), and 95th percentile (n = 163). Additional results can be found in Tables S10 and S11.
Figure 11
Figure 11
Comparison of the samples comprising the top PGS quantiles for the PNC EUR cohort (A) The EUR samples located within the top 20%, 10%, and 5% of the PTSD PGS distribution were nearly the same when PGS were computed twice using the same EUR discovery GWAS (Nievergelt et al.39). For example, 1,026 out of the 1,048 samples (97.9%) at or above the 80th percentile were the same between the two runs of PRS-CS. (B) The overlap between samples at all three quantiles dropped substantially when the PGS computed from two different EUR discovery GWAS were compared for PTSD,, T2D,, and height., (C) The degree of overlap was dramatically reduced when comparing PGS computed from an AFR-ancestry discovery GWAS with those computed from an EUR-ancestry GWAS for PTSD (Nievergelt et al.39), T2D,, and height., Green bars depict the number of samples included at or above the 80th percentile (n = 1,048), 90th percentile (n = 524), and 95th percentile (n = 262). Additional results can be found in Tables S12 and S13.

Similar articles

Cited by

References

    1. Ma Y., Zhou X. Genetic prediction of complex traits with polygenic scores: a statistical review. Trends Genet. 2021;37:995–1011. - PMC - PubMed
    1. Vilhjálmsson B.J., Yang J., Finucane H.K., Gusev A., Lindström S., Ripke S., Genovese G., Loh P.-R., Bhatia G., Do R., et al. Modeling linkage disequilibrium increases accuracy of polygenic risk scores. Am. J. Hum. Genet. 2015;97:576–592. - PMC - PubMed
    1. Lloyd-Jones L.R., Zeng J., Sidorenko J., Yengo L., Moser G., Kemper K.E., Wang H., Zheng Z., Magi R., Esko T., et al. Improved polygenic prediction by Bayesian multiple regression on summary statistics. Nat. Commun. 2019;10:5086. - PMC - PubMed
    1. Ge T., Chen C.-Y., Ni Y., Feng Y.-C.A., Smoller J.W. Polygenic prediction via Bayesian regression and continuous shrinkage priors. Nat. Commun. 2019;10:1776. - PMC - PubMed
    1. Ni G., Zeng J., Revez J.A., Wang Y., Zheng Z., Ge T., et al. A comparison of ten polygenic score methods for psychiatric disorders applied across multiple cohorts. Biol. Psychiatry. 2021;90:611–620. - PMC - PubMed

LinkOut - more resources