Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Nov 1;90(9):611-620.
doi: 10.1016/j.biopsych.2021.04.018. Epub 2021 May 4.

A Comparison of Ten Polygenic Score Methods for Psychiatric Disorders Applied Across Multiple Cohorts

Collaborators, Affiliations

A Comparison of Ten Polygenic Score Methods for Psychiatric Disorders Applied Across Multiple Cohorts

Guiyan Ni et al. Biol Psychiatry. .

Abstract

Background: Polygenic scores (PGSs), which assess the genetic risk of individuals for a disease, are calculated as a weighted count of risk alleles identified in genome-wide association studies. PGS methods differ in which DNA variants are included and the weights assigned to them; some require an independent tuning sample to help inform these choices. PGSs are evaluated in independent target cohorts with known disease status. Variability between target cohorts is observed in applications to real data sets, which could reflect a number of factors, e.g., phenotype definition or technical factors.

Methods: The Psychiatric Genomics Consortium Working Groups for schizophrenia and major depressive disorder bring together many independently collected case-control cohorts. We used these resources (31,328 schizophrenia cases, 41,191 controls; 248,750 major depressive disorder cases, 563,184 controls) in repeated application of leave-one-cohort-out meta-analyses, each used to calculate and evaluate PGS in the left-out (target) cohort. Ten PGS methods (the baseline PC+T method and 9 methods that model genetic architecture more formally: SBLUP, LDpred2-Inf, LDpred-funct, LDpred2, Lassosum, PRS-CS, PRS-CS-auto, SBayesR, MegaPRS) were compared.

Results: Compared with PC+T, the other 9 methods gave higher prediction statistics, MegaPRS, LDPred2, and SBayesR significantly so, explaining up to 9.2% variance in liability for schizophrenia across 30 target cohorts, an increase of 44%. For major depressive disorder across 26 target cohorts, these statistics were 3.5% and 59%, respectively.

Conclusions: Although the methods that more formally model genetic architecture have similar performance, MegaPRS, LDpred2, and SBayesR rank highest in most comparisons and are recommended in applications to psychiatric disorders.

Keywords: LDpred2; Lassosum; Major depressive disorder; MegaPRS; PRS-CS; Polygenic scores; Psychiatric disorders; Risk prediction; SBayesR; Schizophrenia.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Figure 1.
Figure 1.. Prediction results for SCZ case/control status using different PGS methods.
The PGS were constructed from SCZ GWAS summary statistics excluding the target cohort and a tuning cohort (shading legend). Each bar reflects the median across 30 target cohorts, the whiskers show the 95% confidence interval for comparing medians. The area under curve (AUC) statistic (A) can be interpreted as the probability that a case ranks higher than a control. Panel (B) is the proportion of variance explained by PGS on the scale of liability, assuming a population lifetime risk of 1%. The third panel (C) is the odds ratio when considering the odds of being a case comparing the top 10% vs bottom 10% of PGS. The bottom panel (D) is the odds of being a case in the top 10% of PGS vs odds of being a case in the middle of the PGS distribution. The middle was calculated as the averaged odds ratio of the top 10% ranked on PGS relative to the 5th decile and 6th decile. PC+T (also known as P+T) is the benchmark method which is shown in orange. Pink shows the methods that use an infinitesimal model assumption. The green shows the methods that model the genetic architecture, with light green for the methods using a tuning cohort to determine the genetic architecture of a trait; dark green shows the methods learning the genetic architecture from discovery sample, without using a tuning cohort. Dark orange is for MegaPRS using the BLD-LDAK model that assume the distribution of SNP effect depends on its allele frequency, LD and function annotation. MegaPRS assign four priors to each of SNP: LASSO, Bridge, BOLT-LMM, BayesR. Each prior has different hyperparameters that identified using the tuning cohort. The dashed grey lines are the maximum of the average across the four tuning cohorts. The sample sizes of the tuning cohorts are swe6: 1094 cases,1219 controls; lie2: 137 cases, 269 controls; msaf: 327 cases, 139 controls; gras: 1086 cases, 1232 controls.
Figure 2.
Figure 2.. Sensitivity analyses using different tuning cohorts comparing different PGS methods.
Differences in the AUC of SCZ of a PGS method when using different tuning cohorts. The different bars in each method (x-axis) refer to different validation cohorts ordered by sample size. The y-axis is the AUC difference when using alternative tuning cohort (i.e. lie2 (137 cases, 269 controls), msaf (327 cases, 139 controls), or gras (1086 cases, 1232 controls)), compared to ‘swe6’ (1094 cases, 1219 controls). The MAF QC threshold is 0.1. Note: SBLUP, LDpred2-Inf and LDpred-funct, PRS-CS-auto and SBayesR do not need a tuning cohort, but serve as a benchmark to the other methods which need a tuning cohort. These methods differ when a different tuning cohort is left out because the discovery GWAS also changes.
Figure 3.
Figure 3.. Sensitivity analyses using different MAF quality control thresholds.
Differences in AUC of SCZ of a PGS method when using different MAF QC thresholds. The different bars in each method (x-axis) refer to different validation cohorts ordered by sample size. The y-axis is the AUC difference between analyses using A) MAF<0.05 and MAF <0.1 B) MAF<0.01 and MAF <0.1 as a QC threshold. The tuning cohort is ‘swe6’.

Comment in

References

    1. The International Schizophrenia Consortium (2009): Common polygenic variation contributes to risk of schizophrenia and bipolar disorder. Nature. 460:748–752. - PMC - PubMed
    1. Palk AC, Dalvie S, De Vries J, Martin AR, Stein DJ (2019): Potential use of clinical polygenic risk scores in psychiatry–ethical implications and communicating high polygenic risk. Philos Ethics Humanit Med. 14:4. - PMC - PubMed
    1. Wray NR, Goddard ME, Visscher PM (2007): Prediction of individual genetic risk to disease from genome-wide association studies. Genome Res. 17:1520–1528. - PMC - PubMed
    1. Wray NR, Lin T, Austin J, McGrath JJ, Hickie IB, Murray GK, et al. (2021): From basic science to clinical application of polygenic risk scores: A primer. JAMA psychiatry. 78:101–109. - PubMed
    1. Jenkins MA, Win AK, Dowty JG, MacInnis RJ, Makalic E, Schmidt DF, et al. (2019): Ability of known susceptibility snps to predict colorectal cancer risk for persons with and without a family history. Fam Cancer. 18:389–397. - PMC - PubMed

Publication types

MeSH terms