Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Meta-Analysis
. 2023 Apr;55(4):549-558.
doi: 10.1038/s41588-023-01338-6. Epub 2023 Mar 20.

Causal effects on complex traits are similar for common variants across segments of different continental ancestries within admixed individuals

Affiliations
Meta-Analysis

Causal effects on complex traits are similar for common variants across segments of different continental ancestries within admixed individuals

Kangcheng Hou et al. Nat Genet. 2023 Apr.

Abstract

Individuals of admixed ancestries (for example, African Americans) inherit a mosaic of ancestry segments (local ancestry) originating from multiple continental ancestral populations. This offers the unique opportunity of investigating the similarity of genetic effects on traits across ancestries within the same population. Here we introduce an approach to estimate correlation of causal genetic effects (radmix) across local ancestries and analyze 38 complex traits in African-European admixed individuals (N = 53,001) to observe very high correlations (meta-analysis radmix = 0.95, 95% credible interval 0.93-0.97), much higher than correlation of causal effects across continental ancestries. We replicate our results using regression-based methods from marginal genome-wide association study summary statistics. We also report realistic scenarios where regression-based methods yield inflated heterogeneity-by-ancestry due to ancestry-specific tagging of causal effects, and/or polygenicity. Our results motivate genetic analyses that assume minimal heterogeneity in causal effects by ancestry, with implications for the inclusion of ancestry-diverse individuals in studies.

PubMed Disclaimer

Figures

Extended Data Fig. 1 ∣
Extended Data Fig. 1 ∣. Consistency of radmix for shared traits across studies.
We compared estimated radmix for shared traits across studies. We compared both r^admix (a–c) and log10(p) (for one-sided test of H0:radmix=1; Methods) (d-f). Three traits (Height, Triglycerides, Total cholesterol) with the most significant p-values for H0:radmix=1 were annotated. Number of common traits shared across studies (ncommon) and Spearman correlation p-value were shown in the title for each panel. Overall, there were weak consistency of estimated r^admix for shared traits across studies (although p-values for H0:radmix=1 were consistent significantly). Numerical results are reported in Supplementary Table 7.
Extended Data Fig. 2 ∣
Extended Data Fig. 2 ∣. radmix estimation is robust to the assumption of radmix>0.
We performed radmix estimation using alternative assumption of 1radmix1 in real trait analysis in PAGE in light of potential scenarios of effect sizes in opposite directions,. We compared estimated radmix when assuming 0radmix1 (default Methods) and when assuming 1radmix1. Left: comparing point estimates of radmix across 24 traits in PAGE. Right: comparing the meta-analyzed log-likelihood. Results obtained from two methods are highly consistent.
Extended Data Fig. 3 ∣
Extended Data Fig. 3 ∣. radmix estimation is robust to genetic architecture and SNP set.
We performed radmix estimation under the assumption of alternative genetic architecture and SNP set on real trait analysis across PAGE and UKBB. We compared p-values (for one-sided test of H0:radmix=1) of our default setting (using frequency-dependent genetic architecture and imputed SNPs; Table 1) to those obtained using GCTA genetic architecture and imputed SNPs (a), and to those obtained using frequency-dependent genetic architecture and HM3 SNPs (b). Numerical results are reported in Supplementary Table 8.
Extended Data Fig. 4 ∣
Extended Data Fig. 4 ∣. radmix estimation is robust to subsetting PAGE African American individuals based on genotype PCs.
We subsetted PAGE individuals with self-identified race/ethnicity label of ‘African American’ (total N = 17,327) based on genotype PCs and retained N = 17,167 individuals (a). We found that the estimated radmix were highly consistent between using all PAGE African American individuals (default) and using subset of PAGE African American individuals based on genotype PCs. (b) comparing point estimates of radmix across 24 traits in PAGE. (Dot on the bottom left of the figure corresponds to MCHC trait, with a small sample size of 3,650.) (c) comparing the meta-analyzed log-likelihood. Results obtained from two sets of individuals are highly consistent.
Extended Data Fig. 5 ∣
Extended Data Fig. 5 ∣. Comparing estimated radmix between alternative method formulations and default method.
Each dot corresponds to a trait. (a) Comparing results of default method and of directly optimizing and estimating σg2, ρg. (b) Comparing results of default method and of directly optimizing and estimating σg,12, σg,22 (different variance components per ancestry) and ρg. See Supplementary Table 9 and Supplementary Note for details.
Extended Data Fig. 6 ∣
Extended Data Fig. 6 ∣. Multiple conditionally independent association signals for loci with heterogeneity by ancestry.
Upper panel corresponds to the two-sided association p-values and lower panel corresponds to the fine-mapping PIP. Different colors in the PIP plot corresponds to different credible sets. (a) MCH at 16p13.3 for UK Biobank European-African admixed individuals. (b) RBC at 16p13.3 for UK Biobank European-African admixed individuals. (c) CRP at 1q23.2 for PAGE European-African admixed individuals.
Extended Data Fig. 7 ∣
Extended Data Fig. 7 ∣. Simulations with single causal variant.
Simulations were based on 100 regions each spanning 20 Mb on chromosome 1 and 17,299 PAGE individuals. In each simulation, we randomly selected single causal variant and simulated quantitative phenotypes where these causal variants had same causal effects across ancestries and each causal variant was expected to explain a fixed amount of heritability (0.2%, 0.6%, 1.0%). Each panel corresponds to one metric for both causal and clumped variants. (a) False positive rate (FPR) of HET test. (b) Deming regression slope with βafrβeur. (c) Deming regression slope with βeurβafr. (d) Pearson correlation. (e) OLS regression slope with βafrβeur. (f) OLS regression slope with βeurβafr. 95% confidence intervals were based on 100 random sub-samplings with each sample consisted of 500 SNPs (Methods). Numerical results are reported in Supplementary Table 13.
Extended Data Fig. 8 ∣
Extended Data Fig. 8 ∣. Simulation with multiple causal variants at other sample sizes (Fig. 6d-f).
Simulations were based on chromosome 1 (515,087 SNPs) and 17,299 PAGE individuals. We drew 62,125, 250, 500, 1000 causal variants to simulate different level of polygenicity, such that on average there were approximately 0.25, 0.5, 1.0, 2.0, 4.0 causal variants per Mb. The heritability explained by all causal variants was fixed at hg2=10%. (a-c) False positive rate of HET test for the causal variants and clumped variants. (d-f) Deming regression slope of estimated ancestry-specific effects (βeur~βaf) for the causal variants and clumped variants. 95% confidence intervals were based on 100 random sub-samplings with each sub-sample consisted of n=50,100,500 SNPs (instead of n = 1,000 SNPs in Fig. 6c, d) (Methods).
Extended Data Fig. 9 ∣
Extended Data Fig. 9 ∣. Additional results for simulations with single causal variant with varying βeur:βafr and hg2.
Simulations were based on 100 regions each spanning 20 Mb on chromosome 1 from 17299 PAGE individuals. In each simulation, we randomly selected single causal variant and simulated quantitative phenotypes where these causal variants had varying causal effects across ancestries and each causal variant was expected to explain a fixed amount of heritability (0.2%, 0.6%, 1.0%, 2.0%, 5.0%). We provide results for both causal variants and LD-clumped variants. We separate results into two rows for better visualization: upper row (a-c): βeur:βafr=0.9,1.0,1.1; lower row (d-f): βeur:βafr=0.0. We show results for False positive rate (FPR) of HET test, Deming regression slope with βeur~βafr, and OLS regression slope with βeur~βafr. 95% confidence intervals were based on 100 random sub-samplings with each sample consisted of 500 SNPs (Methods). Numerical results and further discussions are provided in Supplementary Table 15.
Fig. 1 ∣
Fig. 1 ∣. Concepts of estimating similarity in the causal effects across local ancestries.
a, For a given trait, with phased genotype (paternal haplotype at the top and maternal haplotype at the bottom) and inferred local ancestry (denoted by color), we investigate whether βs,afrβs,eur across each causal SNPs. b, We focus on estimating the genome-wide correlation of genetic effects across ancestries radmix=Cor[βafr,βeur], which is the regression slope (orange line) of ancestry-specific causal effects. For reference, the gray dashed line corresponds βafr=βeur.
Fig. 2 ∣
Fig. 2 ∣. Results of genetic correlation radmix estimation in genome-wide simulations.
Simulations were based on 17,299 PAGE individuals and 6.9 million genome-wide imputed variants with MAF > 0.5% in both ancestries. We fixed the proportion of causal variants Pcausal as 0.1% and varied genetic correlation radmix=0.90,0.95 and 1.0. a, Impact of using HapMap3 or imputed variants in estimation. We varied simulated genome-wide heritability hg2=0.1,0.25 and 0.5. b, Impact of selecting common variants at different MAF thresholds in estimation. hg2 was fixed to 0.25, and imputed variants at different MAF thresholds were used in estimation. c, Impact of prior assumption in estimation. hg2 was fixed to 0.25, and imputed variants were used in estimation. For each simulated genetic architecture, we plot the mode and 95% credible interval based on the meta-analysis across 100 simulations (Methods). Numerical results are reported in Supplementary Tables 1-4 (including results for other Pcausal, radmix).
Fig. 3 ∣
Fig. 3 ∣. Similarity of causal effects and marginal effects across local ancestries meta-analyzed across PAGE, UKBB and AoU.
a, We plot the trait-specific estimated radmix for 16 traits. For each trait, dots denote the estimation modes; bold lines and thin lines denote 50%/95% highest density credible intervals, respectively. Traits are ordered according to total number of individuals included in the estimation (shown in parentheses). These traits are selected to be displayed either because they have the largest total sample sizes, or because the associated SNPs of these traits exhibit heterogeneity in marginal effects (see the panel on the right). We also display the meta-analysis results across 60 study–trait pairs (38 unique traits). Numerical results are provided in Table 1. b, Comparison of radmix (n = 38 traits) to meta-analysis results from transcontinental genetic correlation of African versus European (n = 26 traits) and East Asian versus European (n = 31 traits). Point estimates and 95% confidence intervals are denoted using triangles and lines. c, We plot the ancestry-specific marginal effects for 217 GWAS significant clumped trait–SNP pairs across 60 study–trait pairs. Trait–SNP pairs with significant heterogeneity in marginal effects by ancestry (pHET<0.05217 via HET test) are denoted in color (non-significant trait–SNP pairs denoted as black dots; some black dots with large differences across ancestries were not significant because of the large standard errors in estimated effects). Numerical results are reported in Supplementary Table 11. Point estimates and 95% confidence intervals for Deming regression slopes of βs,eur(m)^βs,afr(m)^ are provided either for all 217 SNPs (red), or for 193 SNPs after excluding 24 MCH-associated SNPs (blue). RBC, red blood cell; CRP, C-reactive protein; LDL, low-density lipoprotein cholesterol; HDL, high-density lipoprotein cholesterol; TC, total cholesterol; BMI, body mass index; WHR, waist-to-hip ratio.
Fig. 4 ∣
Fig. 4 ∣. Induced heterogeneities in marginal effects across local ancestries.
a, Illustrations that different LD patterns across local ancestries can induce differential tagging between a causal SNP and a tag SNP in b or another causal SNP in c. LD strengths between the two SNPs are indicated both in the thickness of arrows and in the color shades of ‘*’ elements in LD matrices. b, Example of single causal SNP with no heterogeneity. Causal effects are the same across local ancestries, and the estimated marginal effects at causal SNP will be also very similar with sufficient sample size. However, because of differential tagging across local ancestries, the estimated marginal effects evaluated at the tag SNP are difference. c, Example of multiple causal SNPs with no heterogeneity. Causal effects for both SNPs are the same across local ancestries. In this example, the correlation between the two causal variants is higher for genotypes in African local ancestries than those in European local ancestries. Therefore, African ancestry-specific genotypes tag more effects, creating different ancestry-specific marginal effects at each causal SNP.
Fig. 5 ∣
Fig. 5 ∣. Pitfalls of including local ancestry in estimating heterogeneity.
In each simulation, we selected a single causal variant and simulated quantitative phenotypes where these causal variants explain heritability hg2=0.6%; we also varied ratios of effects across ancestries βeur:βafr. a, False positive rate in null simulation βeur:βafr=1.0. b, Power to detect βeurβafr in power simulations with βeur:βafr>1. We did not include ‘lanc regressed’ because it is not well-calibrated in null simulations. We plot the mean and 95% confidence intervals, calculated via 100 random subsamplings with each sample consisting of 500 SNPs (Methods). Numerical results are reported in Supplementary Table 12.
Fig. 6 ∣
Fig. 6 ∣. Miscalibration of HET test/Deming regression/OLS regression in simulations with radmix=1.
a–c, Simulations with single causal variant. Each causal variant had the same causal effects across local ancestries and each causal variant explained a fixed amount of heritability (0.2%, 0.6% and 1.0%): false positive rate (FPR) of HET test (a); Deming regression slope (b) and of OLS regression slope (c) of βeur(m)^βafr(m)^. Numerical results are reported in Supplementary Table 13. d–f, Simulation with multiple causal variants, where we simulated different levels of polygenicity, such that on average there were approximately 0.25, 0.5, 1.0, 2.0 and 4.0 causal variants per Mb; causal variants had the same causal effects across local ancestries, and the heritability explained by all causal variants was fixed at hg2=10%: FPR of HET test (d); Deming regression slope (e) and OLS regression slope (f) of βeur(m)^βafr(m)^. The 95% confidence intervals were based on 100 random subsamplings with each sample consisting of 1,000 SNPs (Methods). Results for other number of SNPs used for subsampling are shown in Extended Data Fig. 8. Numerical results are reported in Supplementary Table 14.

Comment in

References

    1. Wojcik GL et al. Genetic analyses of diverse populations improves discovery for complex traits. Nature 570, 514–518 (2019). - PMC - PubMed
    1. Bycroft C. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203–209 (2018). - PMC - PubMed
    1. Ramirez AH et al. The All of Us Research Program: data quality, utility, and diversity. Patterns 3, 100570 (2022). - PMC - PubMed
    1. Zhou W. et al. Global Biobank Meta-analysis Initiative: Powering genetic discovery across human disease. Cell Genomics 2, 100192 (2022). - PMC - PubMed
    1. Brown BC, Ye CJ, Price AL & Zaitlen N Transethnic genetic-correlation estimates from summary statistics. Am. J. Hum. Genet 99, 76–88 (2016). - PMC - PubMed

Publication types

MeSH terms