Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2024 Feb 14:2024.02.12.579913.
doi: 10.1101/2024.02.12.579913.

Controlling for polygenic genetic confounding in epidemiologic association studies

Affiliations

Controlling for polygenic genetic confounding in epidemiologic association studies

Zijie Zhao et al. bioRxiv. .

Update in

Abstract

Epidemiologic associations estimated from observational data are often confounded by genetics due to pervasive pleiotropy among complex traits. Many studies either neglect genetic confounding altogether or rely on adjusting for polygenic scores (PGS) in regression analysis. In this study, we unveil that the commonly employed PGS approach is inadequate for removing genetic confounding due to measurement error and model misspecification. To tackle this challenge, we introduce PENGUIN, a principled framework for polygenic genetic confounding control based on variance component estimation. In addition, we present extensions of this approach that can estimate genetically-unconfounded associations using GWAS summary statistics alone as input and between multiple generations of study samples. Through simulations, we demonstrate superior statistical properties of PENGUIN compared to the existing approaches. Applying our method to multiple population cohorts, we reveal and remove substantial genetic confounding in the associations of educational attainment with various complex traits and between parental and offspring education. Our results show that PENGUIN is an effective solution for genetic confounding control in observational data analysis with broad applications in future epidemiologic association studies.

PubMed Disclaimer

Conflict of interest statement

Competing interests The authors declare no competing interests.

Figures

Figure 1.
Figure 1.. Workflow for removing genetic confounding effects in observational data analysis.
(A) Both PENGUIN and PENGUIN-S quantify the exposure-outcome association while adjusting for all SNPs in the analysis. PENGUIN uses both summary-level GWAS data and individual-level phenotype data as inputs. PENGUIN-S employs a summary-statistics-based approach without relying on any individual-level data. (B) Methods using PGS as input hypothesize that the genetic confounding can be quantified by PGS of a particular trait. These methods demand independent GWAS summary statistics and individual-level genotype and phenotype data for PGS calculation and association analysis.
Figure 2.
Figure 2.. Simulation results.
(A and C) Exposure-outcome associations estimated by different methods across 100 replications. (B and D) 95% confidence interval coverage for each method across 100 replications. GWAS summary-level data and individual-level testing dataset are independent in A-B (0% sample overlap) while all testing samples are included in the GWAS data in C-D (100% sample overlap). Y-axis: exposure effects for A and C and 95% coverage for B and D; X-axis: true exposure effect size. Red dashed lines are true effects in A and C and 95% coverage threshold in B and D. Across settings shown in this figure, proportion of causal variants is 0.1% and true genetic confounding effects are the entire genetic component for the exposure. Detailed simulation settings are described in Supplementary Table 1. Results for other simulation settings are summarized in Supplementary Tables 2–4 and Supplementary Figures 1–6.
Figure 3.
Figure 3.. EA effects on various complex traits in UKB.
(A-B) Comparing EA effects on complex traits estimated by PENGUIN, PGSX, and marginal regression. Effect size point estimates and standard errors for quantitative and binary traits are shown in panels A and B, respectively. Y-axis: outcome trait; X-axis: EA effect size estimates (per SD change in EA). (C) Comparison between EA effects estimated by PENGUIN and PENGUIN-S for 15 quantitative traits. Y-axis: PENGUIN estimates; X-axis: PENGUIN-S estimates. The black solid line is the diagonal line. Standard error bars from both approaches are shown for all data points. Full association results are summarized in Supplementary Tables 7–8 and Supplementary Figures 8–9.
Figure 4.
Figure 4.. Associations of EA with SCZ and BD.
(A-B) Genetic correlation between SCZ/BD and cognitive/non-cognitive/full EA genetic components. Y-axis: EA genetic components; X-axis: genetic correlation. (C-D) Estimated EA effects on SCZ and BD after controlling for cog/non-cog/full EA genetic components. Y-axis: effect size estimates (per SD change in EA); X-axis: adjusted EA genetic components. Standard error bars are shown for all data points. Full association results, heritability and genetic covariance/correlation estimates are summarized in Supplementary Table 9. SCZ: schizophrenia; BD: bipolar disorder.
Figure 5.
Figure 5.. Effects of parental EA on offspring EA.
Paternal EA (A) and maternal EA (B) effect sizes are estimated from PENGUIN, PGSY, and marginal regression. Y-axis: EA effect size estimates (per SD change in parental EA); X-axis: association approaches. Additional association results are summarized in Supplementary Table 10 and Supplementary Figure 11.

Similar articles

References

    1. Wong N.D. Epidemiological studies of CHD and the evolution of preventive cardiology. Nature Reviews Cardiology 11, 276–289 (2014). - PubMed
    1. Britt K.L., Cuzick J. & Phillips K.-A. Key steps for effective breast cancer prevention. 20, 417–436 (2020). - PubMed
    1. Talmor-Barkan Y. et al. Metabolomic and microbiome profiling reveals personalized risk factors for coronary artery disease. 28, 295–302 (2022). - PubMed
    1. Barnes J.C., Boutwell B.B., Beaver K.M., Gibson C.L. & Wright J.P. On the consequences of ignoring genetic influences in criminological research. Journal of Criminal Justice 42, 471–482 (2014).
    1. D’Onofrio B.M., Sjölander A., Lahey B.B., Lichtenstein P. & Öberg A.S. Accounting for Confounding in Observational Studies. Annual Review of Clinical Psychology 16, 25–48 (2016). - PubMed

Publication types