Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Nov 5;10(11):4027-4036.
doi: 10.1534/g3.120.401658.

Polygenic Scores for Height in Admixed Populations

Affiliations

Polygenic Scores for Height in Admixed Populations

Bárbara D Bitarello et al. G3 (Bethesda). .

Abstract

Polygenic risk scores (PRS) use the results of genome-wide association studies (GWAS) to predict quantitative phenotypes or disease risk at an individual level, and provide a potential route to the use of genetic data in personalized medical care. However, a major barrier to the use of PRS is that the majority of GWAS come from cohorts of European ancestry. The predictive power of PRS constructed from these studies is substantially lower in non-European ancestry cohorts, although the reasons for this are unclear. To address this question, we investigate the performance of PRS for height in cohorts with admixed African and European ancestry, allowing us to evaluate ancestry-related differences in PRS predictive accuracy while controlling for environment and cohort differences. We first show that the predictive accuracy of height PRS increases linearly with European ancestry and is partially explained by European ancestry segments of the admixed genomes. We show that recombination rate, differences in allele frequencies, and differences in marginal effect sizes across ancestries all contribute to the decrease in predictive power, but none of these effects explain the decrease on its own. Finally, we demonstrate that prediction for admixed individuals can be improved by using a linear combination of PRS that includes ancestry-specific effect sizes, although this approach is at present limited by the small size of non-European ancestry discovery cohorts.

Keywords: GenPred; Genomic prediction; Shared data resources; admixture; ancestry; height; polygenic scores.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Partial-R2 as a function of European ancestry in admixed populations. Each admixed dataset is split up into quantiles of European ancestry proportion. Each quantile has between 886 and 2,175 individuals, and plotted values represent the median of each bin. Vertical bars represent 95% confidence intervals estimated from case resampling bootstrap (1,000 replicates). The dashed line shows the regression with standard errors shaded in light gray. A: Using all segments of the genome. B: Using only European ancestry segments. The orange lines represent the equation y=0.15peurk, for k={1,1.5,2}. k = 1 and k = 2 represent the extreme cases where the predictive power in admixed individuals comes entirely from European ancestry segments of the genomes (k = 1) or is uniformly distributed across the whole genome (k = 2).
Figure 2
Figure 2
Predictive power of linear combinations of PRS. Relative partial-R2 increase for HRS_afr (N = 2,251), JHS_afr (N = 1,773), and WHI_afr (N = 6,863) from three linear combinations of PRSeur and PRSafr. The dashed line represents no difference in performance between the linear combinations and PRSeur. For PRSc1 and PRSc2, α represents the constant weight given to the African component across individuals. PRSc2, in addition to α, weights the African component based on individual African ancestry. PRSc3uses European effect sizes for PRS effect alleles falling in European ancestry segments, and a linear combination of European and African effect sizes (weighted by α) for PRS effect alleles falling in African ancestry segments (Equations 3-5).
Figure 3
Figure 3
Effect of recombination rates on predictive power. A and B: PRS SNPs from each dataset were binned into quartiles of African American recombination rate. Absolute (A) and relative (B) partial-R2 for subsets of SNPs divided by the total partial-R2 for each dataset (Table 1). Vertical bars show 95% bootstrap confidence intervals. C: correlation between PRS SNPs effect sizes from Europeans and Admixed Africans in the WHI_afr dataset. The inset shows a qq-plot of χdiff2 for PRS SNPs. The dashed line shows the regression with standard errors shaded in light gray. D: X-axis, recombination rate in cM/20Kb. Y-axis, statistic for the difference in betas between European and African ancestries (Equation 1) in WHI_afr. Cut-off at 15 for display purposes excludes 10 data points. The dashed line shows regression with standard errors shaded in light gray. Red points represent the median recombination rate for each of 20 quantiles of recombination rate.
Figure 4
Figure 4
Imputed data. A: Partial-R2 as a function of European ancestry, where each admixed dataset is split up into quantiles of European ancestry proportion. Vertical bars show 95% bootstrap confidence intervals estimated from case resampling bootstrap (1,000 replicates). The dashed line shows the regression with standard errors shaded in light gray. B: Partial-R2 for two clumping strategies (100 and 500Kb windows with either P < 0.005 or P < 0.00005) for imputed and genotyped sets of SNPs. C: additive genetic variance ratio for PRS SNPs (Equation 7).
Figure 5
Figure 5
Unweighted PRS and the effect of local allele frequency differences on effect size differences. A: Partial-R2 for an unweighted PRS that uses the sign but not the magnitude of each SNP effect (Methods). Each admixed population is split up into quantiles of European ancestry proportion. Vertical bars represent 95% confidence intervals estimated from a case resampling bootstrap (1,000 replicates). The dashed line shows the regression with standard errors shaded in light gray. B: X-axis, mean squared frequency difference for PRS SNPs for European and African ancestries in a 6 Kb window around each PRS SNP (Methods). Frequencies were calculated per dataset (HRS_eur, HRS_afr) for the causal allele. Y-axis, statistic for the difference in betas between European and Admixed African ancestries (Equation 1) in WHI_afr. Cut-off at 15 for display purposes excludes 15 data points. Dashed line shows the regression with standard errors shaded in light gray. Red points represent the median recombination rate for each of 5 quantiles of mean squared difference.

References

    1. Adeyemo A. A., Tekola-Ayele F., Doumatey A. P., Bentley A. R., Chen G. et al. , 2015. Evaluation of genome wide association study associated type 2 diabetes susceptibility loci in sub Saharan Africans. Front. Genet. 6: 2–9. 10.3389/fgene.2015.00335 - DOI - PMC - PubMed
    1. Alexander D. H., Novembre J., and Lange K., 2009. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 19: 1655–1664. 10.1101/gr.094052.109 - DOI - PMC - PubMed
    1. Berg J. J., Harpak A., Sinnott-Armstrong N., Joergensen A. M., Mostafavi H. et al. , 2019. Reduced signal for polygenic adaptation of height in UK Biobank. eLife 8: e39725 10.7554/eLife.39725 - DOI - PMC - PubMed
    1. Berisa T., and Pickrell J. K., 2016. Approximately independent linkage disequilibrium blocks in human populations. Bioinformatics 32: 283–285. - PMC - PubMed
    1. Bulik-Sullivan B., Loh P. R., Finucane H. K., Ripke S., Yang J. et al. , 2015. LD score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet. 47: 291–295. 10.1038/ng.3211 - DOI - PMC - PubMed

Publication types

LinkOut - more resources