Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Aug 5;14(1):4702.
doi: 10.1038/s41467-023-40330-w.

Multi-PGS enhances polygenic prediction by combining 937 polygenic scores

Affiliations

Multi-PGS enhances polygenic prediction by combining 937 polygenic scores

Clara Albiñana et al. Nat Commun. .

Abstract

The predictive performance of polygenic scores (PGS) is largely dependent on the number of samples available to train the PGS. Increasing the sample size for a specific phenotype is expensive and takes time, but this sample size can be effectively increased by using genetically correlated phenotypes. We propose a framework to generate multi-PGS from thousands of publicly available genome-wide association studies (GWAS) with no need to individually select the most relevant ones. In this study, the multi-PGS framework increases prediction accuracy over single PGS for all included psychiatric disorders and other available outcomes, with prediction R2 increases of up to 9-fold for attention-deficit/hyperactivity disorder compared to a single PGS. We also generate multi-PGS for phenotypes without an existing GWAS and for case-case predictions. We benchmark the multi-PGS framework against other methods and highlight its potential application to new emerging biobanks.

PubMed Disclaimer

Conflict of interest statement

C.M.B. reports: Lundbeckfonden (grant recipient); Pearson (author, royalty recipient); Equip Health Inc. (Stakeholder Advisory Board). B.M.N. is a member of the scientific advisory board at Deep Genomics and Neumora, consultant of the scientific advisory board for Camp4 Therapeutics and consultant for Merck. B.J.V. is on Allelica’s international advisory board. The remaining authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Overview of the multi-PGS framework.
The framework consists of 3 sequential steps: Step 1) Build PGS Library. Construct an agnostic library of single-GWAS PGSs from publicly-available GWAS summary statistic resources. Step 2) Train Multi-PGS Models. Fivefold cross validation training of multi-PGS models using the PGS library from Step 1 and target outcome. Step 3) The resulting multi-PGS models from Step 2 were benchmarked in terms of prediction accuracy and risk stratification.
Fig. 2
Fig. 2. Performance of the different risk scores including covariates.
Comparison between the per-disorder attention-deficit/hyperactivity disorder (ADHD), affective disorder (AFF), anorexia nervosa (AN), autism spectrum disorder (ASD), bipolar disorder (BD) and schizophrenia (SCZ) single GWAS PGS (specific details on SD2) and the multi-PGS trained with 937 PGS in terms of A liability adjusted R2 and B log odds ratios of the top risk score quintile compared to the middle risk score quintiles. All models included sex, age and first 20 PCs as covariates for training and calculating the risk score on the test set in a fivefold cross-validation scheme. The MultiPGS_lasso and MultiPGS_xgboost were trained with lasso regression and XGBoost respectively, using the 937 PGS and the covariates as explanatory variables. The MultiPGS_lassoPGS_xgboostCOV was generated with lasso regression, combining the 937 PGS and the predicted values of an XGBoost model that included only the covariates. 95% confidence intervals were calculated from 10,000 bootstrap samples of the mean adjusted R2or logOR, where the adjusted R2 was the variance explained by the full model after accounting for the variance explained by a logistic regression covariates-only model as R2adjusted = (R2full - R2cov)/(1 − R2cov). Prevalences used for the liability are shown beneath each disorder label and case-control ratios are available on SD2. All association logOR for all quintiles are available in SF6.
Fig. 3
Fig. 3. Comparison between single-phenotype and multi-phenotype PGS (multi-PGS and wMT-SBLUP).
Mean liability adjusted R2 estimates between attention-deficit/hyperactivity disorder (ADHD), affective disorder (AFF), anorexia nervosa (AN), autism spectrum disorder (ASD), bipolar disorder (BD) and schizophrenia (SCZ) and multi-phenotype predictors (colored bars, multiPGS_lasso, multiPGS_lasso_s, wMT-SBLUP) or single-phenotype PGS (grayscale bars, single LDpred2-auto PGS). The 5 single-phenotype PGSs shown were selected based on the top ranking absolute lasso weights. The adjusted R2 estimates are the mean of the fivefold cross-validation training-testing subsets. CI were calculated from 10k bootstrap samples of the mean. The numbers inside each multi-phenotype predictor correspond to the number PGS included in each model. Both the simplified multi-PGS (multiPGS_lasso_s) and wMT-SBLUP predictors were calculated by keeping the top PGS with an absolute lasso weights >0.01 from the full multi-PGS, including the top 5 shown in the figure.
Fig. 4
Fig. 4. Performance of the PGS trained with different data.
Comparison between the per-disorder attention-deficit/hyperactivity disorder (ADHD), affective disorder (AFF), anorexia nervosa (AN), autism spectrum disorder (ASD), bipolar disorder (BD) and schizophrenia (SCZ) single GWAS PGS (PGS_single_GWAS) (Details on SD2), the per-disorder BLUP PGS and the multi-PGS in terms of A liability adjusted R2 and B log odd ratios of the top quintile compared to the middle quintile. The multiPGS_lasso_excluding_single_GWAS represents the PGS where the specific single GWAS PGS was removed from the set of 937 PGS. All models were adjusted for sex, age and first 20 PCs. The adjusted liability R2 shows the mean of the fivefold cross-validation training-testing subsets. CI were calculated from 10k bootstrap samples of the mean adjusted R2 or logOR, where the adjusted R2 was the variance explained by the full model after accounting for the variance explained by a logistic-regression covariates-only model as R2_adjusted = (R2_full − R2_cov)/(1 − R2_cov). Prevalences used for the liability are shown beneath each disorder label and case-control ratios are available on SD2. All association OR for all quintiles are available in SF14.
Fig. 5
Fig. 5. Examples of the prediction accuracy of multi-PGS vs. top predictive single-GWAS-PGS on register-based phenotypes.
Comparison between a per-phenotype single GWAS PGS (the top-ranked PGS with largest weight from the lasso multi-PGS model on each outcome, details on SD4) and the multi-PGS trained with 937 PGS in terms of adjusted R2. The set of outcomes includes a other outcomes with available GWAS, b outcomes with no available GWAS, c continuous phenotypes from the MBR and d Case−case predictions. All models included sex, age and first 20 PCs for training the different PGS weights and calculating the risk score on the test set in a fivefold cross-validation scheme. CI were calculated from 10,000 bootstrap samples of the mean adjusted R2, where the adjusted R2 was the variance explained by the full model after accounting for the variance explained by a logistic regression covariates-only model as R2adjusted = (R2full − R2cov)/(1 − R2cov). The number next to the multiPGS bar indicates the number of non-zero lasso mean weights for the 5 cross-validation subsets.

References

    1. Inouye M, et al. Genomic risk prediction of coronary artery disease in 480,000 adults: implications for primary prevention. J. Am. Coll. Cardiol. 2018;72:1883–1893. doi: 10.1016/j.jacc.2018.07.079. - DOI - PMC - PubMed
    1. Lambert SA, Abraham G, Inouye M. Towards clinical utility of polygenic risk scores. Hum. Mol. Genet. 2019;28:R133–R142. doi: 10.1093/hmg/ddz187. - DOI - PubMed
    1. Khera AV, et al. Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations. Nat. Genet. 2018;50:1219–1224. doi: 10.1038/s41588-018-0183-z. - DOI - PMC - PubMed
    1. Martin AR, et al. Clinical use of current polygenic risk scores may exacerbate health disparities. Nat. Genet. 2019;51:584–591. doi: 10.1038/s41588-019-0379-x. - DOI - PMC - PubMed
    1. Duncan L, et al. Analysis of polygenic risk score usage and performance in diverse human populations. Nat. Commun. 2019;10:3328. doi: 10.1038/s41467-019-11112-0. - DOI - PMC - PubMed

Publication types

MeSH terms