Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Sep;54(9):1345-1354.
doi: 10.1038/s41588-022-01158-0. Epub 2022 Aug 22.

Gene-environment correlations across geographic regions affect genome-wide association studies

Affiliations

Gene-environment correlations across geographic regions affect genome-wide association studies

Abdel Abdellaoui et al. Nat Genet. 2022 Sep.

Abstract

Gene-environment correlations affect associations between genetic variants and complex traits in genome-wide association studies (GWASs). Here we showed in up to 43,516 British siblings that educational attainment polygenic scores capture gene-environment correlations, and that migration extends these gene-environment correlations beyond the family to broader geographic regions. We then ran GWASs on 56 complex traits in up to 254,387 British individuals. Controlling for geographic regions significantly decreased the heritability for socioeconomic status (SES)-related traits, most strongly for educational attainment and income. For most traits, controlling for regions significantly reduced genetic correlations with educational attainment and income, most significantly for body mass index/body fat, sedentary behavior and substance use, consistent with gene-environment correlations related to regional socio-economic differences. The effects of controlling for birthplace and current address suggest both passive and active sources of gene-environment correlations. Our results show that the geographic clustering of DNA and SES introduces gene-environment correlations that affect GWAS results.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Schematic illustration of gene–environment correlations (rGE).
The geographic region at the bottom is the birthplace of the sibling pair, who migrate as adults to two different regions at the top; the sibling with a higher polygenic score migrates to the top-left region with healthier environmental influences. Passive gene–environment correlations occur when the environment that parents provide (at the bottom) correlates with a heritable trait, and active gene–environment correlations come about when heritable behaviors (for example, migration; from bottom to top) lead to correlations between polygenic effects and environmental factors.
Fig. 2
Fig. 2. Assessment centers, local authority and MSOA regions.
The two bottom maps show the locations of birthplace and current address of participants analyzed in the GWASs. Histograms show distributions of these participants across assessment centers and geographic regions for birthplaces and current addresses. All maps and histograms show the same 254,577 UK Biobank participants. Maps were adapted from 2011 Census aggregate data (UK Data Service, February 2017 edition). Office for National Statistics; National Records of Scotland; Northern Ireland Statistics and Research Agency (2017).
Fig. 3
Fig. 3. Results of sibling educational attainment polygenic score analyses for 56 complex traits.
Sample sizes range from 11,093 to 43,516 (see Supplementary Table 1 for sample size per trait). Error bars in a, c and d indicate 95% confidence intervals. a, Comparison of the within-effect estimate of model 2 and the within-effect estimate of model 4. Trait names are shown for the ten traits that showed a significant decrease (p-value of the difference based on 1,000 bootstraps). b, Comparison of the marginal R2 of models 2 and 4 (that is, variance explained by all fixed effects, including age and sex). Trait names are shown for the six traits for which the difference between models 2 and 4 was more than 2%. c, Comparison of the between-family effect and between-region effect estimates of model 4. d, Comparison of the within-region effect and between-region effect estimates of model 5.
Fig. 4
Fig. 4. SNP-based heritabilities of 56 complex traits, corrected and uncorrected for geographic region.
The panels show the estimated SNP-based heritabilities after controlling for local authority or MSOA regions based on birthplace and/or current address. Error bars indicate 95% confidence intervals. Trait names are shown for traits with a significant change in SNP-based heritability (FDR-corrected P < 0.05). Sample sizes of the GWASs ranged from 63,780 to 254,387 (see Supplementary Table 1 for sample size per trait).
Fig. 5
Fig. 5. The ratio of decrease in SNP-based heritability after controlling for geographic region for 56 complex traits.
The ratio of the decrease was computed by dividing the corrected SNP-based heritability estimate by the uncorrected SNP-based heritability estimate. Error bars indicate 95% confidence intervals. Asterisks indicate FDR-corrected P < 0.05, corresponding to a significant change in SNP-based heritability. Sample sizes of the GWASs ranged from 63,780 to 254,387 (see Supplementary Table 1 for sample size per trait).
Fig. 6
Fig. 6. The change in absolute genetic correlations with educational attainment (EA, top) and household income (HI, bottom).
The genetic correlations were computed with LD score regression. The change in absolute genetic correlation is shown to visualize the change in the strength of the genetic relationships with educational attainment or household income (the directions of the genetic correlations vary between traits and are displayed in Extended Data Figs. 3 and 4). Error bars indicate 95% confidence intervals. Asterisks indicate FDR-corrected P < 0.05, corresponding to a significant change in genetic correlation. Sample sizes of the GWASs ranged from 63,780 to 254,387 (see Supplementary Table 1 for sample size per trait).
Extended Data Fig. 1
Extended Data Fig. 1. Results of sibling educational attainment polygenic score analyses for 56 complex traits.
Sample sizes range from 11,093 to 43,516, see Supplementary Table 1 for sample size per trait. Panel a shows the comparison between the individual-level effect estimate of model 1 and the within-family effect estimate from model 2. Panel b shows the comparison between the between-family effect estimate of model 2 and the between-family effect estimate from model 4. A total of 38 traits showed a significant decrease (FDR-corrected P value of the difference based on 1000 bootstraps), of which the top five are: household income (p = 1.6 × 10−103), educational attainment (p = 1.6 × 10−83), time spent watching TV (p = 4 × 10−61), body fat (p = 7 × 10−43) and BMI (p = 5 × 10−40).
Extended Data Fig. 2
Extended Data Fig. 2. Variation explained by regional differences.
Linear mixed model results, with phenotype as a dependent variable and region as random effect (N = 77,111-309,387 unrelated individuals). All FDR-corrected P values were < 0.05.
Extended Data Fig. 3
Extended Data Fig. 3. SNP-based heritabilities of 56 complex traits, corrected and uncorrected for geographic variables.
The panels show the estimated SNP-based heritabilities after controlling for latitude + longitude or assessment centers based on birthplace and/or current address. Error bars in indicate 95% confidence intervals. The trait names are shown for traits that showed a significant change in SNP-based heritability (FDR-corrected P values < 0.05). Sample sizes of the GWASs range from 63,780 to 254,387, see Supplementary Table 1 for sample size per trait.
Extended Data Fig. 4
Extended Data Fig. 4. The ratio of decrease in SNP-based heritability after controlling for geographic variables (latitude + longitude or assessment centers) for 56 complex traits.
The ratio of the decrease is computed by dividing the corrected by the uncorrected SNP-based heritability estimate. Error bars in indicate 95% confidence intervals. The asteriks indicate FDR-corrected P values < 0.05, indicating significant changes in SNP-based heritability. Sample sizes of the GWASs range from 63,780 to 254,387, see Supplementary Table 1 for sample size per trait.
Extended Data Fig. 5
Extended Data Fig. 5. The change in absolute genetic correlations with educational attainment (EA, top) and household income (HI, bottom) after controlling for geographic variables (latitude + longitude or assessment centers).
The genetic correlations were computed with LDSC regression. We display the change in absolute genetic correlation to visualize the change in the strength of the genetic relationships with EA/HI (the directions of the genetic correlations vary between traits and are displayed in Extended Data Fig. 6). Error bars in indicate 95% confidence intervals. The asteriks indicate FDR-corrected P values < 0.05, indicating significant changes in genetic correlation. Sample sizes of the GWASs range from 63,780 to 254,387, see Supplementary Table 1 for sample size per trait.
Extended Data Fig. 6
Extended Data Fig. 6. Genetic correlations (rg) with educational attainment (EA) and household income (HI) as computed with LDSC regression, before and after controlling for Local Authority or MSOA region, based on birthplace, current address, or birthplace + current address.
Error bars in indicate 95% confidence intervals. Sample sizes of the GWASs range from 63,780 to 254,387, see Supplementary Table 1 for sample size per trait.
Extended Data Fig. 7
Extended Data Fig. 7. Genetic correlations (rg), as computed with LDSC regression, with cognitive and non-cognitive skills as extracted from educational attainment GWAS by Demange et al (2019), before and after controlling for MSOA region, based on birthplace + current address.
Error bars in indicate 95% confidence intervals. Sample sizes of the GWASs range from 63,780 to 254,387, see Supplementary Table 1 for sample size per trait.
Extended Data Fig. 8
Extended Data Fig. 8. The change in absolute genetic correlations with cognitive and non-cognitive skills as extracted from educational attainment GWAS by Demange et al (2019) after controlling for MSOA region, based on birthplace+ current address.
The genetic correlations were computed with LDSC regression. We display the change in absolute genetic correlation to visualize the change in the strength of the genetic relationships (the directions of the genetic correlations vary between traits and are displayed in Extended Data Fig. 7). Error bars in indicate 95% confidence intervals. The asteriks indicate FDR-corrected P values < 0.05, indicating significant changes in genetic correlation. Sample sizes of the GWASs range from 63,780 to 254,387, see Supplementary Table 1 for sample size per trait.
Extended Data Fig. 9
Extended Data Fig. 9. SNP-based heritability estimates of educational attainment (EA) under different GWAS designs.
Error bars indicate 95% confidence intervals. The sample sizes are: Lee et al : N = 1,131,881; Abdellaoui et al (2022): N = 252,521; Wu et al (2020): N = 24,434; Young et al (2020)/Wu et al (2020): N = 22,207; Howe et al (2022): N = 128,777. The results from Abdellaoui et al show SNP-based heritability estimates after controlling for MSOA regions.
Extended Data Fig. 10
Extended Data Fig. 10. Directed acyclic graphs (DAGs) of the possible causal processes underlying the gene–environment correlations.
P = parental influence, BP = birthplace, CA = current address, SNP = single nucleotide polymorphism, Y = phenotypic outcome.

References

    1. Polderman TJ, et al. Meta-analysis of the heritability of human traits based on fifty years of twin studies. Nat. Genet. 2015;47:702–709. doi: 10.1038/ng.3285. - DOI - PubMed
    1. Frazer KA, Murray SS, Schork NJ, Topol EJ. Human genetic variation and its contribution to complex traits. Nat. Rev. Genet. 2009;10:241–251. doi: 10.1038/nrg2554. - DOI - PubMed
    1. Abdellaoui A, Verweij KJ. Dissecting polygenic signals from genome-wide association studies on human behaviour. Nat. Hum. Behav. 2021;5:686–694. doi: 10.1038/s41562-021-01110-y. - DOI - PubMed
    1. Freedman ML, et al. Assessing the impact of population stratification on genetic association studies. Nat. Genet. 2004;36:388–393. doi: 10.1038/ng1333. - DOI - PubMed
    1. Price AL, Zaitlen NA, Reich D, Patterson N. New approaches to population stratification in genome-wide association studies. Nat. Rev. Genet. 2010;11:459–463. doi: 10.1038/nrg2813. - DOI - PMC - PubMed

Publication types