Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Apr 1;36(22-23):5424-5431.
doi: 10.1093/bioinformatics/btaa1029.

LDpred2: better, faster, stronger

Affiliations

LDpred2: better, faster, stronger

Florian Privé et al. Bioinformatics. .

Abstract

Motivation: Polygenic scores have become a central tool in human genetics research. LDpred is a popular method for deriving polygenic scores based on summary statistics and a matrix of correlation between genetic variants. However, LDpred has limitations that may reduce its predictive performance.

Results: Here, we present LDpred2, a new version of LDpred that addresses these issues. We also provide two new options in LDpred2: a 'sparse' option that can learn effects that are exactly 0, and an 'auto' option that directly learns the two LDpred parameters from data. We benchmark predictive performance of LDpred2 against the previous version on simulated and real data, demonstrating substantial improvements in robustness and predictive accuracy compared to LDpred1. We then show that LDpred2 also outperforms other polygenic score methods recently developed, with a mean AUC over the 8 real traits analyzed here of 65.1%, compared to 63.8% for lassosum, 62.9% for PRS-CS and 61.5% for SBayesR. Note that LDpred2 provides more accurate polygenic scores when run genome-wide, instead of per chromosome.

Availability and implementation: LDpred2 is implemented in R package bigsnpr.

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Two variants of LDpred1 are compared with four variants of LDpred2 (run per chromosome) in the seven simulation scenarios summarized in Table 1. Briefly, the first part of the scenario name corresponds to the location of causal variants, the second part is the heritability (in %), the third part is the number of causal variants, and the prevalence is always 15%. Bars present the mean and 95% CI of 10 000 non-parametric bootstrap replicates of the mean AUC of 10 simulations for each scenario. Corresponding values are reported in Supplementary Table S1
Fig. 2.
Fig. 2.
LDpred1-grid is compared to LDpred2-grid when varying GWAS sample size in scenario ‘all_40_3000’. Bars present the mean and 95% CI of 10 000 non-parametric bootstrap replicates of the mean AUC of 10 simulations for each scenario. Corresponding values are reported in Supplementary Table S2
Fig. 3.
Fig. 3.
Two variants of LDpred1 are compared with four variants of LDpred2 (run genome-wide) in the real data applications using published external summary statistics. Bars present AUC values on the test set of UKBB (mean and 95% CI from 10 000 bootstrap samples). Corresponding values are reported in Supplementary Table S3
Fig. 4.
Fig. 4.
LDpred2 is compared with C+T, SCT, lassosum, PRS-CS and SBayesR in the real data applications using published external summary statistics. Bars present AUC values on the test set of UKBB (mean and 95% CI from 10 000 bootstrap samples). Corresponding values are reported in Supplementary Table S4

References

    1. Abraham G. et al. (2019) Genomic risk score offers predictive performance comparable to clinical risk factors for ischaemic stroke. Nat. Commun., 10, 1–10. - PMC - PubMed
    1. Allegrini A.G. et al. (2019) Genomic prediction of cognitive traits in childhood and adolescence. Mol. Psychiatry, 24, 819–827. - PMC - PubMed
    1. Barth D. et al. (2020) Genetic endowments and wealth inequality. J. Political Econ., 128, 1474–1522. - PMC - PubMed
    1. Bengtsson H. (2020) A unifying framework for parallel and distributed processing in R using futures. arXiv preprint arXiv : 2008.00553.
    1. Bolli A. et al. (2019) Software as a service for the genomic prediction of complex diseases. bioRxiv, 763722.