Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Apr;27(4):599-612.
doi: 10.1089/cmb.2019.0325. Epub 2020 Feb 20.

Efficient Estimation and Applications of Cross-Validated Genetic Predictions to Polygenic Risk Scores and Linear Mixed Models

Affiliations

Efficient Estimation and Applications of Cross-Validated Genetic Predictions to Polygenic Risk Scores and Linear Mixed Models

Joel Mefford et al. J Comput Biol. 2020 Apr.

Abstract

Large-scale cohorts with combined genetic and phenotypic data, coupled with methodological advances, have produced increasingly accurate genetic predictors of complex human phenotypes called polygenic risk scores (PRSs). In addition to the potential translational impacts of identifying at-risk individuals, PRS are being utilized for a growing list of scientific applications, including causal inference, identifying pleiotropy and genetic correlation, and powerful gene-based and mixed-model association tests. Existing PRS approaches rely on external large-scale genetic cohorts that have also measured the phenotype of interest. They further require matching on ancestry and genotyping platform or imputation quality. In this work, we present a novel reference-free method to produce a PRS that does not rely on an external cohort. We show that naive implementations of reference-free PRS either result in substantial overfitting or prohibitive increases in computational time. We show that our algorithm avoids both of these issues and can produce informative in-sample PRSs over a single cohort without overfitting. We then demonstrate several novel applications of reference-free PRSs, including detection of pleiotropy across 246 metabolic traits and efficient mixed-model association testing.

Keywords: BLUP; PCA; PRS; linear mixed model; polygenic risk score.

PubMed Disclaimer

Conflict of interest statement

The authors declare they have no competing financial interests.

Figures

FIG. 1.
FIG. 1.
Independence of cvBLUPs and nongenetic factors. Correlations of genetic predictions, BLUPs and cvBLUPs, true genetic factors Zb, and independent environmental factors ε in a simulation of a continuous phenotype with h250%, 1000 subjects, and 1000 independent SNPs having random effect sizes. BLUPs are correlated with ε, while cvBLUPs are not. Lines and p-values are from linear regression fits. R2 values: A:0.21, B:0.0019, C:0.64, D:0.38. BLUPs, Best Linear Unbiased Predictors; cvBLUPs, cross-validated Best Linear Unbiased Predictors; SNP, single-nucleotide polymorphism.
FIG. 2.
FIG. 2.
Cross-trait correlations with cvBLUPs. Correlations of phenotypes (rows) and genetic predictions (cvBLUPs, columns) across 246 phenotypes. Many cvBLUPs are strongly correlated with additional phenotypes.

References

    1. Aulchenko Y.S., De Koning D.-J., and Haley C.. 2007. Grammar: A fast and simple method for genome-wide pedigree-based quantitative trait loci association analysis. Genetics 177, 577. - PMC - PubMed
    1. Balding D.J., and Nichols R.A.. 1995. A method for quantifying differentiation between populations at multi-allelic loci and its implications for investigating identity and paternity. Genetica 96, 3–12 - PubMed
    1. Benjamini Y., and Yekutieli D.. 2001. The control of the false discovery rate in multiple testing under dependency. Ann. Stat. 29, 1165–1188
    1. Burgess S., and Thompson S.G.. 2013. Use of allele scores as instrumental variables for mendelian randomization. Int. J. Epidemiol. 42, 1134–1144 - PMC - PubMed
    1. Chen G.-B. 2014. Estimating heritability of complex traits from genome-wide association studies using ibs-based haseman–elston regression. Front. Genet. 5, 107. - PMC - PubMed

Publication types