. 2019 Dec 5;105(6):1213-1221.

doi: 10.1016/j.ajhg.2019.11.001. Epub 2019 Nov 21.

Making the Most of Clumping and Thresholding for Polygenic Scores

Florian Privé¹, Bjarni J Vilhjálmsson², Hugues Aschard³, Michael G B Blum⁴

Affiliations

¹ Laboratoire TIMC-IMAG, UMR 5525, Univ. Grenoble Alpes, CNRS, La Tronche, France; Department of Economics and Business Economics, National Centre for Register-Based Research, Aarhus University, Aarhus, Denmark. Electronic address: florian.prive.21@gmail.com.
² Department of Economics and Business Economics, National Centre for Register-Based Research, Aarhus University, Aarhus, Denmark.
³ Centre de Bioinformatique, Biostatistique et Biologie Intégrative (C3BI), Institut Pasteur, Paris, France.
⁴ Laboratoire TIMC-IMAG, UMR 5525, Univ. Grenoble Alpes, CNRS, La Tronche, France. Electronic address: michael.blum@univ-grenoble-alpes.fr.

PMID: 31761295
PMCID: PMC6904799
DOI: 10.1016/j.ajhg.2019.11.001

Making the Most of Clumping and Thresholding for Polygenic Scores

Florian Privé et al. Am J Hum Genet. 2019.

. 2019 Dec 5;105(6):1213-1221.

doi: 10.1016/j.ajhg.2019.11.001. Epub 2019 Nov 21.

Authors

Florian Privé¹, Bjarni J Vilhjálmsson², Hugues Aschard³, Michael G B Blum⁴

Affiliations

¹ Laboratoire TIMC-IMAG, UMR 5525, Univ. Grenoble Alpes, CNRS, La Tronche, France; Department of Economics and Business Economics, National Centre for Register-Based Research, Aarhus University, Aarhus, Denmark. Electronic address: florian.prive.21@gmail.com.
² Department of Economics and Business Economics, National Centre for Register-Based Research, Aarhus University, Aarhus, Denmark.
³ Centre de Bioinformatique, Biostatistique et Biologie Intégrative (C3BI), Institut Pasteur, Paris, France.
⁴ Laboratoire TIMC-IMAG, UMR 5525, Univ. Grenoble Alpes, CNRS, La Tronche, France. Electronic address: michael.blum@univ-grenoble-alpes.fr.

PMID: 31761295
PMCID: PMC6904799
DOI: 10.1016/j.ajhg.2019.11.001

Abstract

Polygenic prediction has the potential to contribute to precision medicine. Clumping and thresholding (C+T) is a widely used method to derive polygenic scores. When using C+T, several p value thresholds are tested to maximize predictive ability of the derived polygenic scores. Along with this p value threshold, we propose to tune three other hyper-parameters for C+T. We implement an efficient way to derive thousands of different C+T scores corresponding to a grid over four hyper-parameters. For example, it takes a few hours to derive 123K different C+T scores for 300K individuals and 1M variants using 16 physical cores. We find that optimizing over these four hyper-parameters improves the predictive performance of C+T in both simulations and real data applications as compared to tuning only the p value threshold. A particularly large increase can be noted when predicting depression status, from an AUC of 0.557 (95% CI: [0.544-0.569]) when tuning only the p value threshold to an AUC of 0.592 (95% CI: [0.580-0.604]) when tuning all four hyper-parameters we propose for C+T. We further propose stacked clumping and thresholding (SCT), a polygenic score that results from stacking all derived C+T scores. Instead of choosing one set of hyper-parameters that maximizes prediction in some training set, SCT learns an optimal linear combination of all C+T scores by using an efficient penalized regression. We apply SCT to eight different case-control diseases in the UK biobank data and find that SCT substantially improves prediction accuracy with an average AUC increase of 0.035 over standard C+T.

Keywords: C+T; PRS; UK Biobank; clumping and thresholding; complex traits; polygenic risk scores; stacking.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

**Figure 1**
Results of the Six Simulation Scenarios with Well-Imputed Variants Scenarios are (100) 100 random causal variants; (10K) 10,000 random causal variants; (1M) all 1M variants are causal variants; (2chr) 100 variants of chromosome 1 are causal and all variants of chromosome 2, with half of the heritability for both chromosomes; (err) 10,000 random causal variants, but 10% of the GWAS effects are reported with an opposite effect; (HLA) 7,105 causal variants in a long-range LD region of chromosome 6. Mean and 95% CI of 10⁴ non-parametric bootstrap replicates of the mean AUC of 10 simulations for each scenario. The blue dotted line represents the maximum achievable AUC for these simulations (87.5% for a prevalence of 10% and an heritability of 50%; see Equation 3 of Wray et al.³⁰). See corresponding values in Table S1.

**Figure 2**
Results of the Real Data Applications with Large Training Size AUC values on the test set of UKBB (mean and 95% CI from 10⁴ bootstrap samples). Training SCT and choosing optimal hyper-parameters for C+T and lassosum use 63%–90% of the individuals reported in Table 1. See corresponding values in Table S2.

See this image and copyright information in PMC

References

1. Wray N.R., Goddard M.E., Visscher P.M. Prediction of individual genetic risk to disease from genome-wide association studies. Genome Res. 2007;17:1520–1528. - PMC - PubMed
1. Purcell S.M., Wray N.R., Stone J.L., Visscher P.M., O’Donovan M.C., Sullivan P.F., Sklar P., International Schizophrenia Consortium Common polygenic variation contributes to risk of schizophrenia and bipolar disorder. Nature. 2009;460:748–752. - PMC - PubMed
1. Dudbridge F. Power and predictive accuracy of polygenic risk scores. PLoS Genet. 2013;9:e1003348. - PMC - PubMed
1. Wray N.R., Lee S.H., Mehta D., Vinkhuyzen A.A., Dudbridge F., Middeldorp C.M. Research review: Polygenic methods and their application to psychiatric traits. J. Child Psychol. Psychiatry. 2014;55:1068–1087. - PubMed
1. Euesden J., Lewis C.M., O’Reilly P.F. PRSice: polygenic risk score software. Bioinformatics. 2015;31:1466–1468. - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
Medical
- ClinicalTrials.gov

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Making the Most of Clumping and Thresholding for Polygenic Scores

Affiliations

Making the Most of Clumping and Thresholding for Polygenic Scores

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Medical