. 2022 Jan 21;3(2):100091.

doi: 10.1016/j.xhgg.2022.100091. eCollection 2022 Apr 14.

Stability of polygenic scores across discovery genome-wide association studies

Laura M Schultz^{1

2}, Alison K Merikangas^{1

2

3}, Kosha Ruparel^{2

4}, Sébastien Jacquemont^{5

6}, David C Glahn^{7

8}, Raquel E Gur^{2

4

9}, Ran Barzilay^{2

4

9}, Laura Almasy^{1

2

3}

Affiliations

¹ Department of Biomedical and Health Informatics, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA.
² Lifespan Brain Institute, Children's Hospital of Philadelphia and Penn Medicine, Philadelphia, PA 19104, USA.
³ Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA.
⁴ Department of Psychiatry, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA.
⁵ UHC Sainte-Justine Research Center, Université de Montréal, Montréal, QC H3T 1C5, Canada.
⁶ Department of Pediatrics, Université de Montréal, Montréal, QC H3T 1C5, Canada.
⁷ Tommy Fuss Center for Neuropsychiatric Disease Research, Boston Children's Hospital, Boston, MA, USA.
⁸ Department of Psychiatry, Harvard Medical School, Boston, MA, USA.
⁹ Department of Child Adolescent Psychiatry and Behavioral Sciences, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA.

PMID: 35199043
PMCID: PMC8841810
DOI: 10.1016/j.xhgg.2022.100091

Stability of polygenic scores across discovery genome-wide association studies

Laura M Schultz et al. HGG Adv. 2022.

. 2022 Jan 21;3(2):100091.

doi: 10.1016/j.xhgg.2022.100091. eCollection 2022 Apr 14.

Authors

Laura M Schultz^{1

2}, Alison K Merikangas^{1

2

3}, Kosha Ruparel^{2

4}, Sébastien Jacquemont^{5

6}, David C Glahn^{7

8}, Raquel E Gur^{2

4

9}, Ran Barzilay^{2

4

9}, Laura Almasy^{1

2

3}

Affiliations

¹ Department of Biomedical and Health Informatics, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA.
² Lifespan Brain Institute, Children's Hospital of Philadelphia and Penn Medicine, Philadelphia, PA 19104, USA.
³ Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA.
⁴ Department of Psychiatry, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA.
⁵ UHC Sainte-Justine Research Center, Université de Montréal, Montréal, QC H3T 1C5, Canada.
⁶ Department of Pediatrics, Université de Montréal, Montréal, QC H3T 1C5, Canada.
⁷ Tommy Fuss Center for Neuropsychiatric Disease Research, Boston Children's Hospital, Boston, MA, USA.
⁸ Department of Psychiatry, Harvard Medical School, Boston, MA, USA.
⁹ Department of Child Adolescent Psychiatry and Behavioral Sciences, Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA.

PMID: 35199043
PMCID: PMC8841810
DOI: 10.1016/j.xhgg.2022.100091

Abstract

Polygenic scores (PGS) are commonly evaluated in terms of their predictive accuracy at the population level by the proportion of phenotypic variance they explain. To be useful for precision medicine applications, they also need to be evaluated at the individual level when phenotypes are not necessarily already known. We investigated the stability of PGS in European American (EUR) and African American (AFR)-ancestry individuals from the Philadelphia Neurodevelopmental Cohort and the Adolescent Brain Cognitive Development study using different discovery genome-wide association study (GWAS) results for post-traumatic stress disorder (PTSD), type 2 diabetes (T2D), and height. We found that pairs of EUR-ancestry GWAS for the same trait had genetic correlations >0.92. However, PGS calculated from pairs of same-ancestry and different-ancestry GWAS had correlations that ranged from <0.01 to 0.74. PGS stability was greater for height than for PTSD or T2D. A series of height GWAS in the UK Biobank suggested that correlation between PGS is strongly dependent on the extent of sample overlap between the discovery GWAS. Focusing on the upper end of the PGS distribution, different discovery GWAS do not consistently identify the same individuals in the upper quantiles, with the best case being 60% of individuals above the 80th percentile of PGS overlapping from one height GWAS to another. The degree of overlap decreases sharply as higher quantiles, less heritable traits, and different-ancestry GWAS are considered. PGS computed from different discovery GWAS have only modest correlation at the individual level, underscoring the need to proceed cautiously with integrating PGS into precision medicine applications.

Keywords: Adolescent Brain Cognitive Development study; African American; PRS-CS; PTSD; Philadelphia Neurodevelopmental Cohort; UK Biobank; ancestry; height; methods development; type 2 diabetes.

PubMed Disclaimer

Conflict of interest statement

R.B. reports serving on the scientific board and owning stock in Taliaz Health, with no conflict of interest relevant to this work. The other authors declare no competing interests.

Figures

**Figure 1**
First and second principal components of cohort genotypes Principal components (PCs) were computed and projected to a 1000 Genomes reference using KING (Manichaikul et al.⁵²). Colors indicate inferred genetic ancestry for the (A) 9,206 Philadelphia Neurodevelopmental Cohort (PNC) and (B) 10,318 Adolescent Brain Cognitive Development (ABCD) genotyped samples.

**Figure 2**
Reproducibility of Bayesian posterior effects computed by PRS-CS As illustrated for chromosome 3 (76,064 SNPs) and chromosome 21 (15,447 SNPs) using the Nievergelt et al. EUR PTSD discovery GWAS with the PNC EUR dataset, posterior effects were more strongly correlated between PRS-CS runs as the number of MCMC iterations (and burn-in iterations) increased.

**Figure 3**
Reproducibility of PGS across multiple runs of PRS-CS PC-adjusted standardized PGS computed from posterior effects generated by two runs of PRS-CS using the same PTSD discovery GWAS from Nievergelt et al. had correlations greater than r = 0.999 for both the EUR (n = 5,239) and AFR (n = 3,260) cohorts of PNC.

**Figure 4**
Correlation between PGS computed from two different AFR-ancestry PTSD discovery GWAS for AFR-ancestry individuals Significant positive correlations were observed between the AFR PGS computed from the PGC Freeze 1 and Freeze 2 AFR PTSD GWAS for both the PNC (r = 0.696, t(3,258) = 55.26, p < 2 × 10⁻¹⁶) and ABCD (r = 0.657, t(1,739) = 36.34, p < 2 × 10⁻¹⁶) AFR cohorts.

**Figure 5**
Correlation between PGS computed from two different EUR-ancestry discovery GWAS for EUR-ancestry individuals Pairs of PGS computed for the EUR samples of PNC (n = 5,239) and ABCD (n = 5,815) using two different EUR discovery GWAS for PTSD,^, T2D,^, and height^, all showed significant positive correlations.

**Figure 6**
Correlation between PGS computed from AFR-ancestry and EUR-ancestry discovery GWAS for AFR-ancestry individuals Pairs of PGS computed for the AFR samples of PNC and ABCD from the newer EUR and AFR discovery GWAS were not significantly correlated for either PTSD or T2D,^, but there was a significant positive correlation for height.

**Figure 7**
Correlation between PGS computed from EUR-ancestry and AFR-ancestry discovery GWAS for EUR-ancestry individuals Pairs of PGS computed for the EUR samples of PNC and ABCD from the newer EUR and AFR discovery GWAS were not significantly correlated for either PTSD or T2D,^, but there was a significant positive correlation for height.

**Figure 8**
Correlation between PGS computed from seven white British height GWAS for an independent test set of 8,107 unrelated white British individuals from the UK Biobank GWAS A and GWAS B were each run for n = 134,000 non-overlapping, unrelated white British individuals using sex, age at height measurement, and the first 20 ancestry PCs as covariates. The GWAS A and GWAS B samples were combined to run GWAS AB (n = 268,000). GWAS C was run using a random subsample (n = 75,000) of the individuals included in GWAS A, and GWAS E was run using a random subsample (n = 10,000) of the individuals included in GWAS C. The same relationship exists between GWAS B, GWAS D (n = 75,000), and GWAS F (n = 10,000). The strength of the correlation between PGS is driven by both GWAS sample size and the degree of sample overlap between the GWAS. ∗∗∗p < 0.001.

**Figure 9**
Contributions of GWAS sample size and proportional sample overlap to the correlation between height PGS Height GWAS A and GWAS B were each run for n = 134,000 non-overlapping, unrelated white British individuals using sex, age at height measurement, and the first 20 ancestry PCs as covariates. The GWAS A and GWAS B samples were combined to run GWAS AB (n = 268,000). GWAS C was run using a random subsample (n = 75,000) of the individuals included in GWAS A, and GWAS E was run using a random subsample (n = 10,000) of the individuals included in GWAS C. The same relationship exists between GWAS B, GWAS D (n = 75,000), and GWAS F (n = 10,000). Black dots correspond to the Pearson correlation coefficients for height PGS computed from pairs of discovery GWAS with no sample overlap. When the PGS were computed from overlapping discovery GWAS, the correlation coefficients are depicted using colored dots; the legend lists the number of samples in common as well as the proportion of samples in common for each color. Error bars denote 95% confidence intervals. PGS from pairs of discovery GWAS are more strongly correlated when there is a higher proportion of sample overlap between the GWAS.

**Figure 10**
Comparison of the samples comprising the top PGS quantiles for the PNC AFR cohort (A) The samples located at the top 20%, 10%, and 5% of the PTSD PGS distribution were virtually the same when PGS were computed twice using the same discovery GWAS. For example, 644 out of the 652 samples (98.7%) at or above the 80th percentile were the same between the two batches of PGS. (B) The overlap between samples at all three quantiles dropped substantially when the PGS computed from the AFR PGC Freeze 1 PTSD discovery GWAS were compared with those computed from the AFR Freeze 2 PTSD discovery GWAS (Nievergelt et al.³⁹), with the degree of overlap being reduced at higher quantiles. (C) The degree of overlap was further reduced when comparing PGS computed from an AFR-ancestry discovery GWAS to those computed from a EUR-ancestry GWAS for PTSD (Nievergelt et al.³⁹), T2D,^, and height (Marouli et al.⁴⁴). For context, the green bars depict the number of samples included at or above the 80th percentile (n = 652), 90th percentile (n = 326), and 95th percentile (n = 163). Additional results can be found in Tables S10 and S11.

**Figure 11**
Comparison of the samples comprising the top PGS quantiles for the PNC EUR cohort (A) The EUR samples located within the top 20%, 10%, and 5% of the PTSD PGS distribution were nearly the same when PGS were computed twice using the same EUR discovery GWAS (Nievergelt et al.³⁹). For example, 1,026 out of the 1,048 samples (97.9%) at or above the 80th percentile were the same between the two runs of PRS-CS. (B) The overlap between samples at all three quantiles dropped substantially when the PGS computed from two different EUR discovery GWAS were compared for PTSD,^, T2D,^, and height.^, (C) The degree of overlap was dramatically reduced when comparing PGS computed from an AFR-ancestry discovery GWAS with those computed from an EUR-ancestry GWAS for PTSD (Nievergelt et al.³⁹), T2D,^, and height.^, Green bars depict the number of samples included at or above the 80th percentile (n = 1,048), 90th percentile (n = 524), and 95th percentile (n = 262). Additional results can be found in Tables S12 and S13.

See this image and copyright information in PMC

Cited by

Genetic predisposition for negative affect predicts mental health burden during the COVID-19 pandemic.
Schowe AM, Godara M, Czamara D, Adli M, Singer T, Binder EB. Schowe AM, et al. Eur Arch Psychiatry Clin Neurosci. 2025 Feb;275(1):61-73. doi: 10.1007/s00406-024-01795-y. Epub 2024 Apr 8. Eur Arch Psychiatry Clin Neurosci. 2025. PMID: 38587666 Free PMC article.
Considerations for the application of polygenic scores to clinical care of individuals with substance use disorders.
Kember RL, Davis CN, Feuer KL, Kranzler HR. Kember RL, et al. J Clin Invest. 2024 Oct 15;134(20):e172882. doi: 10.1172/JCI172882. J Clin Invest. 2024. PMID: 39403926 Free PMC article. Review.
Impact of Copy Number Variants and Polygenic Risk Scores on Psychopathology in the UK Biobank.
Mollon J, Schultz LM, Huguet G, Knowles EEM, Mathias SR, Rodrigue A, Alexander-Bloch A, Saci Z, Jean-Louis M, Kumar K, Douard E, Almasy L, Jacquemont S, Glahn DC. Mollon J, et al. Biol Psychiatry. 2023 Oct 1;94(7):591-600. doi: 10.1016/j.biopsych.2023.01.028. Epub 2023 Feb 9. Biol Psychiatry. 2023. PMID: 36764568 Free PMC article.
Gene-based polygenic risk scores analysis of alcohol use disorder in African Americans.
Lai D, Schwantes-An TH, Abreu M, Chan G, Hesselbrock V, Kamarajan C, Liu Y, Meyers JL, Nurnberger JI Jr, Plawecki MH, Wetherill L, Schuckit M, Zhang P, Edenberg HJ, Porjesz B, Agrawal A, Foroud T. Lai D, et al. Transl Psychiatry. 2022 Jul 5;12(1):266. doi: 10.1038/s41398-022-02029-2. Transl Psychiatry. 2022. PMID: 35790736 Free PMC article.
Impact of individual level uncertainty of lung cancer polygenic risk score (PRS) on risk stratification.
Wang X, Zhang Z, Ding Y, Chen T, Mucci L, Albanes D, Landi MT, Caporaso NE, Lam S, Tardon A, Chen C, Bojesen SE, Johansson M, Risch A, Bickeböller H, Wichmann HE, Rennert G, Arnold S, Brennan P, McKay JD, Field JK, Shete SS, Le Marchand L, Liu G, Andrew AS, Kiemeney LA, Zienolddiny-Narui S, Behndig A, Johansson M, Cox A, Lazarus P, Schabath MB, Aldrich MC, Hung RJ, Amos CI, Lin X, Christiani DC. Wang X, et al. Genome Med. 2024 Feb 5;16(1):22. doi: 10.1186/s13073-024-01298-4. Genome Med. 2024. PMID: 38317189 Free PMC article.

See all "Cited by" articles

References

1. Ma Y., Zhou X. Genetic prediction of complex traits with polygenic scores: a statistical review. Trends Genet. 2021;37:995–1011. - PMC - PubMed
1. Vilhjálmsson B.J., Yang J., Finucane H.K., Gusev A., Lindström S., Ripke S., Genovese G., Loh P.-R., Bhatia G., Do R., et al. Modeling linkage disequilibrium increases accuracy of polygenic risk scores. Am. J. Hum. Genet. 2015;97:576–592. - PMC - PubMed
1. Lloyd-Jones L.R., Zeng J., Sidorenko J., Yengo L., Moser G., Kemper K.E., Wang H., Zheng Z., Magi R., Esko T., et al. Improved polygenic prediction by Bayesian multiple regression on summary statistics. Nat. Commun. 2019;10:5086. - PMC - PubMed
1. Ge T., Chen C.-Y., Ni Y., Feng Y.-C.A., Smoller J.W. Polygenic prediction via Bayesian regression and continuous shrinkage priors. Nat. Commun. 2019;10:1776. - PMC - PubMed
1. Ni G., Zeng J., Revez J.A., Wang Y., Zheng Z., Ge T., et al. A comparison of ten polygenic score methods for psychiatric disorders applied across multiple cohorts. Biol. Psychiatry. 2021;90:611–620. - PMC - PubMed

Grants and funding

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Stability of polygenic scores across discovery genome-wide association studies

Affiliations

Stability of polygenic scores across discovery genome-wide association studies

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Grants and funding

LinkOut - more resources

Full Text Sources

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Related information

Grants and funding

LinkOut - more resources

Full Text Sources