Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Sep 14;3(10):100408.
doi: 10.1016/j.xgen.2023.100408. eCollection 2023 Oct 11.

Polygenic prediction across populations is influenced by ancestry, genetic architecture, and methodology

Affiliations

Polygenic prediction across populations is influenced by ancestry, genetic architecture, and methodology

Ying Wang et al. Cell Genom. .

Abstract

Polygenic risk scores (PRSs) developed from multi-ancestry genome-wide association studies (GWASs), PRSmulti, hold promise for improving PRS accuracy and generalizability across populations. To establish best practices for leveraging the increasing diversity of genomic studies, we investigated how various factors affect the performance of PRSmulti compared with PRSs constructed from single-ancestry GWASs (PRSsingle). Through extensive simulations and empirical analyses, we showed that PRSmulti overall outperformed PRSsingle in understudied populations, except when the understudied population represented a small proportion of the multi-ancestry GWAS. Furthermore, integrating PRSs based on local ancestry-informed GWASs and large-scale, European-based PRSs improved predictive performance in understudied African populations, especially for less polygenic traits with large-effect ancestry-enriched variants. Our work highlights the importance of diversifying genomic studies to achieve equitable PRS performance across ancestral populations and provides guidance for developing PRSs from multiple studies.

Keywords: genetic architecture; genome-wide association studies; multi-ancestry; polygenic risk scores.

PubMed Disclaimer

Conflict of interest statement

H.H. received consultancy fees from Ono Pharmaceutical and honorarium from Xian Janssen Pharmaceutical.

Figures

None
Graphical abstract
Figure 1
Figure 1
Study design in both simulations and empirical analyses (1) In the context of single-ancestry GWASs, we randomly split individuals in European (EUR) and other minority populations, including East Asian and African populations, into equally sized bins. Simulations involved a total of 52 bins per population, each containing 10,000 individuals. For empirical analysis, bin number was dependent on the sample size of the respective phenotype in that population (Table S3), with 5,000 individuals per bin. A GWAS was conducted within each bin for each individual population, followed by meta-analysis of GWASs from various numbers of bins within each population. To construct PRSs derived from single-ancestry GWASs (PRSsingle) in the target population, we applied P + T for both simulations and empirical analyses, utilizing PRS-CS for the latter. Subsequently, we combined PRSsingle developed from EUR GWAS (PRSEUR_GWAS) and other minority population-based GWAS (PRSMinor_GWAS) through a linear weighted strategy (denoted as PRSweighted, highlighted in red box) for empirical analyses. Note that PRSweighted was also developed using PRS-CSx, which utilizes GWAS summary statistics from multiple populations. (2) For meta-analyzed multi-ancestry GWASs (referred to as Meta), we ran meta-analyses on EUR GWASs and Minor GWASs with varying ancestry compositions. In simulations, we incrementally included 4 bins from EUR GWASs for the meta-analysis, while in empirical analyses, we increased the number to 8 bins. Simultaneously, we varied the number of bins in Minor GWASs from 1 to the total number. Following the meta-analysis, we constructed PRSs based on Meta (referred to as PRSmulti), using the P + T method for simulations, and employing both P + T and PRS-CS for empirical analyses.
Figure 2
Figure 2
Improvement of PRS accuracy through meta-analyzed multi-ancestry GWASs compared with large-scale EUR GWASs across 6 simulated genetic architectures The multi-ancestry GWASs included populations of EUR and East Asian (EAS) ancestry, with the EAS sample size varying as indicated on the x axis. For illustrative purposes, we present the results using 32 EUR bins, each consisting of 10,000 individuals, which were included in both EUR GWASs and multi-ancestry GWASs. The PRS was separately evaluated in African (AFR), EAS, and EUR populations. Full results are shown in Table S1. Mc indicates the number of causal variants, and h2 refers to SNP-based heritability. In each panel, the red vertical dashed line indicates the point where an equal number of bins from EUR and EAS populations was included in the multi-ancestry GWAS. The error bars represent the SEs of predictive accuracy differences between PRSmulti and PRSEUR_GWAS.
Figure 3
Figure 3
Genetic architecture of 17 studied traits between the BioBank Japan and the UK Biobank The error bar is the standard deviation of the corresponding estimate. The vertical dashed line was the median estimate. Full results are shown in Table S4. The phenotypes were ranked according to their polygenicity estimates using GWASs from the UKBB, including: BMI (body mass index); height; DBP (diastolic blood pressure); SBP (systolic blood pressure); WBC (white blood cell count); lymphocyte (lymphocyte count); RBC (red blood cell count); neutrophil (neutrophil count); HB (hemoglobin concentration); HT (hematocrit percentage); eosinophil (eosinophil count); PLT (platelet count); monocyte (monocyte count); MCV (mean corpuscular volume); MCH (mean corpuscular hemoglobin); basophil (basophil count); and MCHC (mean corpuscular hemoglobin concentration).
Figure 4
Figure 4
Accuracy improvement of PRS in the UKBB-EAS population using multi-ancestry GWASs compared with using EUR GWASs for P + T and PRS-CS The multi-ancestry GWASs were obtained by meta-analyzing EUR GWASs and EAS GWASs, with the EAS sample size from the BBJ varying as indicated on the x axis. For illustrative purposes, we present the results using 64 EUR bins, each containing 5,000 individuals, which were included in both EUR GWASs and multi-ancestry GWASs. The y axis is the accuracy difference of PRSs when using multi-ancestry GWASs (PRSmulti) compared with using EUR GWASs (PRSEUR_GWAS). The error bars indicate the SE of accuracy improvement. The red dashed line is y = 0. We showed the results for 7 traits with SNP-based heritability >0.1 in both the BBJ and the UKBB, and they were ranked by polygenicity estimates using the UKBB (Figure 3). Abbreviations are the same as in Figure 3. Full results are shown in Table S7.
Figure 5
Figure 5
Predictive accuracy using different PRS methods in the UKBB-EAS population We showed the results for 7 traits with SNP-based heritability >0.1 in both the BBJ and the UKBB. Traits were ranked by polygenicity estimates using the UKBB (Figure 3). Boxes represent the first and third quartiles, with the whiskers extending to 1.5-fold the interquartile range. Abbreviations are the same as in Figure 3. Full results are shown in Tables S8 and S9.
Figure 6
Figure 6
Accuracy of PRSs derived from local ancestry-informed GWASs vs. other discovery GWASs in the UKBB-AFR population AFRTractor denotes the AFR-specific GWAS performed using Tractor on the UKBB admixed AFR-EUR individuals. EURstandard refers to standard GWASs performed on the EUR population in the UKBB. Metastandard is the meta-analysis performed on AFRTractor and EURstandard. Furthermore, we constructed a weighted PRS by combining PRSs generated from AFRTractor and EURstandard through a linear weighted approach. The figure shows the results for traits with SNP-based heritability >0.1 in the UKBB-AFR. Full results are shown in Table S10.
Figure 7
Figure 7
General practices for developing PRSs using different discovery GWASs We summarized the general practice for developing PRSs (A) using single-ancestry GWASs (PRSsingle) and (B) using GWASs from multiple ancestries (PRSmulti or PRSweighted). rg, cross-ancestry genetic correlation; hd2 and ht2, SNP-based heritability in discovery and target populations, respectively; Nd, discovery GWAS sample size; Md, the number of genome-wide independent segments in the discovery population.

References

    1. Inouye M., Abraham G., Nelson C.P., Wood A.M., Sweeting M.J., Dudbridge F., Lai F.Y., Kaptoge S., Brozynska M., Wang T., et al. Genomic Risk Prediction of Coronary Artery Disease in 480,000 Adults: Implications for Primary Prevention. J. Am. Coll. Cardiol. 2018;72:1883–1893. doi: 10.1016/j.jacc.2018.07.079. - DOI - PMC - PubMed
    1. Khera A.V., Chaffin M., Aragam K.G., Haas M.E., Roselli C., Choi S.H., Natarajan P., Lander E.S., Lubitz S.A., Ellinor P.T., Kathiresan S. Genome-wide polygenic scores for common diseases identify individuals with risk equivalent to monogenic mutations. Nat. Genet. 2018;50:1219–1224. doi: 10.1038/s41588-018-0183-z. - DOI - PMC - PubMed
    1. Mars N., Widén E., Kerminen S., Meretoja T., Pirinen M., Della Briotta Parolo P., Palta P., FinnGen. Palotie A., Kaprio J., et al. The role of polygenic risk and susceptibility genes in breast cancer over the course of life. Nat. Commun. 2020;11:6383. doi: 10.1038/s41467-020-19966-5. - DOI - PMC - PubMed
    1. Maas P., Barrdahl M., Joshi A.D., Auer P.L., Gaudet M.M., Milne R.L., Schumacher F.R., Anderson W.F., Check D., Chattopadhyay S., et al. Breast Cancer Risk From Modifiable and Nonmodifiable Risk Factors Among White Women in the United States. JAMA Oncol. 2016;2:1295–1302. doi: 10.1001/jamaoncol.2016.1025. - DOI - PMC - PubMed
    1. Craig J.E., Han X., Qassim A., Hassall M., Cooke Bailey J.N., Kinzy T.G., Khawaja A.P., An J., Marshall H., Gharahkhani P., et al. Multitrait analysis of glaucoma identifies new risk loci and enables polygenic prediction of disease susceptibility and progression. Nat. Genet. 2020;52:160–166. doi: 10.1038/s41588-019-0556-y. - DOI - PMC - PubMed

LinkOut - more resources