. 2022 Jan 6;109(1):12-23.

doi: 10.1016/j.ajhg.2021.11.008.

Portability of 245 polygenic scores when derived from the UK Biobank and applied to 9 ancestry groups from the same cohort

Florian Privé¹, Hugues Aschard², Shai Carmi³, Lasse Folkersen⁴, Clive Hoggart⁵, Paul F O'Reilly⁵, Bjarni J Vilhjálmsson⁶

Affiliations

¹ National Centre for Register-Based Research, Aarhus University, Aarhus 8210, Denmark. Electronic address: florian.prive.21@gmail.com.
² Department of Computational Biology, Institut Pasteur, Paris 75015, France; Program in Genetic Epidemiology and Statistical Genetics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA.
³ Braun School of Public Health and Community Medicine, The Hebrew University of Jerusalem, Jerusalem 9112102, Israel.
⁴ Danish National Genome Center, Copenhagen 2300, Denmark.
⁵ Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA.
⁶ National Centre for Register-Based Research, Aarhus University, Aarhus 8210, Denmark; Bioinformatics Research Centre, Aarhus University, Aarhus 8000, Denmark.

PMID: 34995502
PMCID: PMC8764121
DOI: 10.1016/j.ajhg.2021.11.008

Portability of 245 polygenic scores when derived from the UK Biobank and applied to 9 ancestry groups from the same cohort

Florian Privé et al. Am J Hum Genet. 2022.

. 2022 Jan 6;109(1):12-23.

doi: 10.1016/j.ajhg.2021.11.008.

Authors

Florian Privé¹, Hugues Aschard², Shai Carmi³, Lasse Folkersen⁴, Clive Hoggart⁵, Paul F O'Reilly⁵, Bjarni J Vilhjálmsson⁶

Affiliations

¹ National Centre for Register-Based Research, Aarhus University, Aarhus 8210, Denmark. Electronic address: florian.prive.21@gmail.com.
² Department of Computational Biology, Institut Pasteur, Paris 75015, France; Program in Genetic Epidemiology and Statistical Genetics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA.
³ Braun School of Public Health and Community Medicine, The Hebrew University of Jerusalem, Jerusalem 9112102, Israel.
⁴ Danish National Genome Center, Copenhagen 2300, Denmark.
⁵ Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA.
⁶ National Centre for Register-Based Research, Aarhus University, Aarhus 8210, Denmark; Bioinformatics Research Centre, Aarhus University, Aarhus 8000, Denmark.

PMID: 34995502
PMCID: PMC8764121
DOI: 10.1016/j.ajhg.2021.11.008

Erratum in

Portability of 245 polygenic scores when derived from the UK Biobank and applied to 9 ancestry groups from the same cohort.
Privé F, Aschard H, Carmi S, Folkersen L, Hoggart C, O'Reilly PF, Vilhjálmsson BJ. Privé F, et al. Am J Hum Genet. 2022 Feb 3;109(2):373. doi: 10.1016/j.ajhg.2022.01.007. Am J Hum Genet. 2022. PMID: 35120604 Free PMC article. No abstract available.

Abstract

The low portability of polygenic scores (PGSs) across global populations is a major concern that must be addressed before PGSs can be used for everyone in the clinic. Indeed, prediction accuracy has been shown to decay as a function of the genetic distance between the training and test cohorts. However, such cohorts differ not only in their genetic distance but also in their geographical distance and their data collection and assaying, conflating multiple factors. In this study, we examine the extent to which PGSs are transferable between ancestries by deriving polygenic scores for 245 curated traits from the UK Biobank data and applying them in nine ancestry groups from the same cohort. By restricting both training and testing to the UK Biobank data, we reduce the risk of environmental and genotyping confounding from using different cohorts. We define the nine ancestry groups at a sub-continental level, based on a simple, robust, and effective method that we introduce here. We then apply two different predictive methods to derive polygenic scores for all 245 phenotypes and show a systematic and dramatic reduction in portability of PGSs trained using Northwestern European individuals and applied to nine ancestry groups. These analyses demonstrate that prediction already drops off within European ancestries and reduces globally in proportion to genetic distance. Altogether, our study provides unique and robust insights into the PGS portability problem.

Keywords: ancestry; polygenic scores; portability.

PubMed Disclaimer

Conflict of interest statement

Declaration of interests S.C. is a paid consultant to MyHeritage. The other authors declare no competing interests.

Figures

**Figure 1**
The first eight PC scores of the UK Biobank (field 22009) colored by the homogeneous ancestry group we infer for these individuals Only 50,000 individuals are represented at random. “NA” means that the corresponding individual is not categorized in any of the nine ancestry groups.

**Figure 2**
Partial correlation and 95% CI in the UK test set versus in a test set from another ancestry group Each point represents a phenotype and training has been performed with penalized regression on UK individuals (training 1 in Table 1) and HapMap3 variants. The slope (in blue) is computed using Deming regression accounting for standard errors in both x and y, fixing the intercept at 0. The square of this slope is provided above each plot, which we report as the relative predictive performance compared to testing in the “United Kingdom” ancestry group.

**Figure 3**
Relative variance explained compared to the UK versus PC distance from the UK PC distances are computed using Euclidean distance between geometric medians of the first 16 reported PC scores (field 22009) of each ancestry group. Relative performance values are the ones reported in Figure 2. The slope and standard errors are computed internally by function geom_smooth(method = “lm”) of R package ggplot2.

**Figure 4**
Zoomed Manhattan plot for lipoprotein(a) concentration The phenotypic variance explained per variant is computed as $r^{2} = t^{2} / (n + t^{2})$ , where t is the t-score from GWAS and n is the degrees of freedom (the sample size minus the number of variables in the model, i.e., the covariates used in the GWAS, the intercept, and the variant). The GWAS includes all variants with an imputation INFO score larger than 0.3 and within a 500 kb radius around the top hit from the GWAS performed in the UK training set and on the HapMap3 variants, represented by a vertical dotted line.

**Figure 5**
Predictive performance with LDpred2-auto for eight phenotypes, when using either HapMap3 variants or the 1M most significant variants One phenotype shown in each panel. Bars represent the 95% confidence intervals. Phecode 174.1: breast cancer; 185: prostate cancer; 411.4: coronary artery disease. HM3, HapMap3; top1M, the 1M most significant variants out of more than 8M common variants (see Material and methods).

See this image and copyright information in PMC

References

1. Choi S.W., Mak T.S.-H., O’Reilly P.F. Tutorial: a guide to performing polygenic risk score analyses. Nat. Protoc. 2020;15:2759–2772. - PMC - PubMed
1. de los Campos G., Gianola D., Allison D.B. Predicting genetic predisposition in humans: the promise of whole-genome markers. Nat. Rev. Genet. 2010;11:880–886. - PubMed
1. Abraham G., Tye-Din J.A., Bhalala O.G., Kowalczyk A., Zobel J., Inouye M. Accurate and robust genomic prediction of celiac disease using statistical learning. PLoS Genet. 2014;10:e1004137. - PMC - PubMed
1. Privé F., Aschard H., Blum M.G.B. Efficient implementation of penalized regression for genetic risk prediction. Genetics. 2019;212:65–74. - PMC - PubMed
1. Bycroft C., Freeman C., Petkova D., Band G., Elliott L.T., Sharp K., Motyer A., Vukcevic D., Delaneau O., O’Connell J., et al. The UK Biobank resource with deep phenotyping and genomic data. Nature. 2018;562:203–209. - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- H1 Connect - Access expert opinions and insights on biomedical research.

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Portability of 245 polygenic scores when derived from the UK Biobank and applied to 9 ancestry groups from the same cohort

Affiliations

Portability of 245 polygenic scores when derived from the UK Biobank and applied to 9 ancestry groups from the same cohort

Authors

Affiliations

Erratum in

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources