Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Sep;11(9):e1001661.
doi: 10.1371/journal.pbio.1001661. Epub 2013 Sep 17.

Generalization and dilution of association results from European GWAS in populations of non-European ancestry: the PAGE study

Collaborators, Affiliations

Generalization and dilution of association results from European GWAS in populations of non-European ancestry: the PAGE study

Christopher S Carlson et al. PLoS Biol. 2013 Sep.

Abstract

The vast majority of genome-wide association study (GWAS) findings reported to date are from populations with European Ancestry (EA), and it is not yet clear how broadly the genetic associations described will generalize to populations of diverse ancestry. The Population Architecture Using Genomics and Epidemiology (PAGE) study is a consortium of multi-ancestry, population-based studies formed with the objective of refining our understanding of the genetic architecture of common traits emerging from GWAS. In the present analysis of five common diseases and traits, including body mass index, type 2 diabetes, and lipid levels, we compare direction and magnitude of effects for GWAS-identified variants in multiple non-EA populations against EA findings. We demonstrate that, in all populations analyzed, a significant majority of GWAS-identified variants have allelic associations in the same direction as in EA, with none showing a statistically significant effect in the opposite direction, after adjustment for multiple testing. However, 25% of tagSNPs identified in EA GWAS have significantly different effect sizes in at least one non-EA population, and these differential effects were most frequent in African Americans where all differential effects were diluted toward the null. We demonstrate that differential LD between tagSNPs and functional variants within populations contributes significantly to dilute effect sizes in this population. Although most variants identified from GWAS in EA populations generalize to all non-EA populations assessed, genetic models derived from GWAS findings in EA may generate spurious results in non-EA populations due to differential effect sizes. Regardless of the origin of the differential effects, caution should be exercised in applying any genetic risk prediction model based on tagSNPs outside of the ancestry group in which it was derived. Models based directly on functional variation may generalize more robustly, but the identification of functional variants remains challenging.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist. The contents of this paper are solely the responsibility of the authors and do not necessarily represent the official views of the NIH, the Centers for Disease Control, the Indian Health Service, or any other funding agency.

Figures

Figure 1
Figure 1. Generalization analysis in the PAGE populations.
We plot the ratio of formula image on the y-axis as an indicator of both consistency of direction (positive values are consistent with effects in the same direction) and relative magnitude of effect (consistent but weaker effects in the non-EA will have ratios between 0 and 1). The p value for trait association in the PAGE European American population (pEA) is an indicator of the strength of the original association. For each index SNP, we plot formula image against −log10(pEA). Data points are colored as follows: ambiguous SNPs are light blue (formula image and formula image), strictly generalized SNPs are dark blue (formula image and formula image), differentially generalized SNPs are dark red (formula image and formula image), and differential SNPs are pink (formula image and formula image). The y-axis has been constrained to (−4,4) for illustrative purposes; some loci yielded formula image ratios outside this range, but pEA>0.05 for all of these. As expected, larger non-EA populations show less scatter in formula image than the smaller non-EA populations (particularly Pacific Islanders), consistent with more precise estimates of formula image in the larger non-EA populations. Two clear trends are apparent in these plots: first, a trend toward formula image ratios greater than zero in all populations, especially for stronger effects in EA (−log10(pEA)>10), reflecting consistency of direction between EA and non-EA populations. Second, a trend toward ratios greater than zero but less than one is observed in African Americans, representing the trend toward dilution in this population, relative to EA. The second trend is not apparent in the other non-EA populations. Similar plots of formula image against observed allele frequency in the non-EA populations demonstrate that the allele frequency distribution for differential observations in AA is not different from the distribution of either ambiguous or strictly generalized loci, so the significantly diluted effects are not attributable to variants with low allele frequency in this population (Figure S1).
Figure 2
Figure 2. Dilution of effect size at PSRC1 for LDL.
In panel (a), we show a locuszoom plot for the tagSNP rs599839 and LDL, using imputed data in a meta-analysis of more than 100,000 European individuals (image from the GLGC consortium locuszoom website [31]). The y-axis plots −log10(p value), which is a proxy for effect size, assuming similar allele frequencies. In panel (a) the size of the dot for each tagSNP represents the effective number of samples for which imputed data were available. The cluster of overlapping red dots at the top represents a bin of SNPs that are in very strong LD with the tagSNP, and have indistinguishable effect sizes in the EA study. Panel (b) shows data from our metabochip analysis in African Americans, but with dots color-coded using LD from the EA population. The scale of the y-axis has changed due to dramatically different sample sizes, but p value is still a useful proxy for effect size. Note how the tagSNP and several strongly associated SNPs (red data points) have effect sizes indistinguishable from background, while several other EA strongly associated SNPs remain significant, including rs12740374, the strongest signal in our data. Panel (c) shows our metabochip data again, but now color coding LD with the tagSNP rs599839 in our AA samples, rather than using EA LD. Rs599839 continues to tag several SNPs strongly in AA, and these are all among the SNPs with nonsignificant effect sizes in AA, while the SNPs with strongest residual signal are weakly tagged in AA. These data suggest that rs12740374 is the functional SNP; if so, then differential LD between rs12740374 and rs599839 in EA (r2>0.8) and AA (r2<0.2) would explain the diluted effect observed at rs599839 in AA.
Figure 3
Figure 3. Examples of loci without evidence of association in AAmchip or fine mapping EA signal.
(a) At rs16996148 (CILP2/LDL) we are reasonably well powered, and no significant associations were observed in AA, suggesting that either the associated variant, or the synthetic allele that tags it is EA-restricted. Similar null results at (b) rs5219 (KCNJ11/T2D) and (c) rs17145738 (MLXIPL/logTG) were underpowered to draw strong conclusions. (d) At rs780094 (GCKR/logTG) and (e) rs599839 (PSRC1/LDL) the index tagSNP from EA showed significantly diluted signal in AA (purple dot). However, in each region a tagged SNP showed an effect size consistent with the EA index tagSNP, and after adjustment for this variant no residual evidence for association was observed at any additional variants in the region. (f) At rs2954029 (TRIB1/logTG) a similar effect was observed, save for the fact that the strongest AA association was imperfectly tagged in EA (r2 = 0.33).
Figure 4
Figure 4. Examples of secondary alleles in the AA population.
(a) At rs28927680 (APOA1/C3/A4/A5 gene cluster, logTG) the index tagSNP fine maps (red point in upper right of a). Panel (b) shows residual signal in the same region after adjustment for genotype at this variant, and significant secondary signals are observed. (c) At FTO, the SNPs tagged by rs9939069 in EA are all null in the subsample, but a secondary association is observed at very low frequency SNP (rs75569526, MAF 1% in AAmchip). In this example the secondary SNP is the only significant association in the region from our subsample analysis. Panels (d–f) illustrate multiple, independent associations at CETP. At CETP, the significant residual signal after adjusting for the best signal in each EA-tagged bin (Figure S2) is consistent multiple factors that might contribute to differential signal in the region. The number of independent statistical associations observed within the locus is a rough proxy for the number of functional alleles. Here we show a series of LocusZoom plots sequentially adjusting results for the SNP with the strongest observed association in the previous cycle. LD in EA samples is color coded relative to rs3764261 in all panels, and the region-wide threshold for significance after Bonferroni adjustment for the 84 SNPs genotyped in the 25 kb region (residual p<1.1 * 10−4) is shown as a horizontal red line. (d) CETP/HDL regional data adjusted only for ancestry. The strongest observed association at rs17231520 is indicated with an arrowhead. (e) After adjustment for genotype at rs17231520, the strongest residual association at rs4783961 is indicated with an arrowhead. (f) After adjustment for genotype at rs17231520 and rs4783961, the strongest residual association is still significant. These results suggest the presence of at least three statistically independent associations with HDL in the CETP region, in the AA population. Assuming that the functional variation has been directly genotyped, rather than tagged by LD, this would indicate the presence of at least three functional alleles, clustered within a 5 kb window spanning the putative CETP promoter region.

Comment in

  • Deep genealogy and the dilution of risk.
    Roberts RG. Roberts RG. PLoS Biol. 2013 Sep;11(9):e1001660. doi: 10.1371/journal.pbio.1001660. Epub 2013 Sep 17. PLoS Biol. 2013. PMID: 24068892 Free PMC article. No abstract available.

References

    1. Need AC, Goldstein DB (2009) Next generation disparities in human genomics: concerns and remedies. Trends Genet 25 (11) 489–494. - PubMed
    1. Hindorff LA, et al. (2009) Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc Natl Acad Sci U S A 106 (23) 9362–9367. - PMC - PubMed
    1. Lu X, et al. (2012) Genome-wide association study in Han Chinese identifies four new susceptibility loci for coronary artery disease. Nat Genet 44 (8) 890–894. - PMC - PubMed
    1. Okada Y, et al. (2012) Common variants at CDKAL1 and KLF9 are associated with body mass index in east Asian populations. Nat Genet 44 (3) 302–306. - PMC - PubMed
    1. Prasad P, et al. (2012) Caucasian and Asian specific rheumatoid arthritis risk loci reveal limited replication and apparent allelic heterogeneity in north Indians. PLoS ONE 7 (2) e31584 doi:10.1371/journal.pone.0031584 - DOI - PMC - PubMed

Publication types

MeSH terms

Grants and funding