Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2018 Aug 31:19:303-327.
doi: 10.1146/annurev-genom-083117-021731. Epub 2018 Apr 25.

Inferring Causal Relationships Between Risk Factors and Outcomes from Genome-Wide Association Study Data

Affiliations
Review

Inferring Causal Relationships Between Risk Factors and Outcomes from Genome-Wide Association Study Data

Stephen Burgess et al. Annu Rev Genomics Hum Genet. .

Abstract

An observational correlation between a suspected risk factor and an outcome does not necessarily imply that interventions on levels of the risk factor will have a causal impact on the outcome (correlation is not causation). If genetic variants associated with the risk factor are also associated with the outcome, then this increases the plausibility that the risk factor is a causal determinant of the outcome. However, if the genetic variants in the analysis do not have a specific biological link to the risk factor, then causal claims can be spurious. We review the Mendelian randomization paradigm for making causal inferences using genetic variants. We consider monogenic analysis, in which genetic variants are taken from a single gene region, and polygenic analysis, which includes variants from multiple regions. We focus on answering two questions: When can Mendelian randomization be used to make reliable causal inferences, and when can it be used to make relevant causal inferences?

Keywords: causal inference; drug discovery; genetic epidemiology; instrumental variable; target validation.

PubMed Disclaimer

Figures

Figure 1
Figure 1
(a) Diagram illustrating pleiotropy (horizontal pleiotropy). The genetic variant is separately associated with the risk factor and covariate via different causal pathways. (b) Diagram illustrating mediation (vertical pleiotropy). The genetic variant is directly associated with the risk factor, and the association with the covariate is a downstream consequence of the risk factor.
Figure 2
Figure 2
(a) Genetic associations with risk factor and outcome (coronary heart disease risk) for eight genetic variants that have biological links to low-density lipoprotein (LDL) cholesterol. The blue lines are 95% confidence intervals for the genetic associations (all associations are oriented to the LDL cholesterol–increasing allele); the vertical axis is plotted on a log scale. (b) Variant-specific causal estimates (odds ratio for coronary heart disease per 1-mmol/L increase in LDL cholesterol) from the ratio method for eight variants. The solid blue lines are 95% confidence intervals for the causal estimates; the dashed blue line is the inverse-variance weighted estimate.
Figure 3
Figure 3
(a) Genetic associations with risk factor and outcome (type 2 diabetes risk) for eight genetic variants that have biological links to low-density lipoprotein (LDL) cholesterol. The blue lines are 95% confidence intervals for the genetic associations (all associations are oriented to the LDL cholesterol–increasing allele); the vertical axis is plotted on a log scale. (b) Variant-specific causal estimates (odds ratio for type 2 diabetes per 1-mmol/L increase in LDL cholesterol) from the ratio method for eight variants. The solid blue lines are 95% confidence intervals for the causal estimates; the dashed blue line is the inverse-variance weighted estimate.
Figure 4
Figure 4
(a) Genetic associations with risk factor [high-density lipoprotein (HDL) cholesterol] and outcome (coronary heart disease risk) for 86 genetic variants. The blue lines are 95% confidence intervals for the genetic associations (all associations are oriented to the HDL cholesterol–increasing allele). (b) Genetic associations with risk factor [low-density lipoprotein (LDL) cholesterol] and outcome (Alzheimer’s disease risk) for 76 genetic variants. The blue lines are 95% confidence intervals for the genetic associations (all associations are oriented to the LDL cholesterol–increasing allele).
Figure 5
Figure 5
Schematic diagrams illustrating different assumptions made in (a) Mendelian randomization (in which the genetic variant is assumed to associate directly with the risk factor and with the outcome only via the risk factor) and (b) colocalization (in which the genetic variant is allowed to associate with both traits directly, and causal effects may occur between the traits in either direction or not at all).

References

    1. Angrist J, Imbens G, Rubin D. Identification of causal effects using instrumental variables. J Am Stat Assoc. 1996;91:444–55.
    1. Barfield R, Feng H, Gusev A, Wu L, Zheng W, et al. Assessing the genetic effect mediated through gene expression from summary eQTL and GWAS data. bioRxiv. 2017 doi: 10.1101/223263. 223263. - DOI
    1. Baum C, Schaffer M, Stillman S. Instrumental variables and GMM: estimation and testing. Stata J. 2003;3:1–31.
    1. Benn M, Nordestgaard BG, Frikke-Schmidt R, Tybjærg-Hansen A. Low LDL cholesterol, PCSK9 and HMGCR genetic variation, and risk of Alzheimer’s disease and Parkinson’s disease: Mendelian randomisation study. Br Med J. 2017;357:j1648. - PMC - PubMed
    1. Benner C, Spencer CC, Havulinna AS, Salomaa V, Ripatti S, Pirinen M. FINEMAP: efficient variable selection using summary data from genome-wide association studies. Bioinformatics. 2016;32:1493–501. - PMC - PubMed

Publication types