Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Oct 1;178(7):1177-84.
doi: 10.1093/aje/kwt084. Epub 2013 Jul 17.

Efficient design for Mendelian randomization studies: subsample and 2-sample instrumental variable estimators

Efficient design for Mendelian randomization studies: subsample and 2-sample instrumental variable estimators

Brandon L Pierce et al. Am J Epidemiol. .

Abstract

Mendelian randomization (MR) is a method for estimating the causal relationship between an exposure and an outcome using a genetic factor as an instrumental variable (IV) for the exposure. In the traditional MR setting, data on the IV, exposure, and outcome are available for all participants. However, obtaining complete exposure data may be difficult in some settings, due to high measurement costs or lack of appropriate biospecimens. We used simulated data sets to assess statistical power and bias for MR when exposure data are available for a subset (or an independent set) of participants. We show that obtaining exposure data for a subset of participants is a cost-efficient strategy, often having negligible effects on power in comparison with a traditional complete-data analysis. The size of the subset needed to achieve maximum power depends on IV strength, and maximum power is approximately equal to the power of traditional IV estimators. Weak IVs are shown to lead to bias towards the null when the subsample is small and towards the confounded association when the subset is relatively large. Various approaches for confidence interval calculation are considered. These results have important implications for reducing the costs and increasing the feasibility of MR studies.

Keywords: Mendelian randomization; epidemiologic methods; instrumental variable.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Power (left) and median standard error (right) of the subsample instrumental-variable (IV) estimate for different values of the causal effect size (βXY) and the sample size of the first-stage regression (nX), with a strong IV (R2 = 0.025), a sample size for the reduced-form regression (nY) of 10,000, and a confounding variable with equal effects on X and YUX = βUY = 0.2). βXY values are 0.0 (filled diamond), 0.05 (open diamond), 0.1 (filled triangle), 0.15 (open triangle), 0.2 (filled square), and 0.3 (open square).
Figure 2.
Figure 2.
Power (left) and median standard error (right) of the subsample instrumental-variable (IV) estimate for different values of the first-stage R2 and the sample size of the first-stage regression (nX), with a constant effect size (βXY = 0.2), a sample size for the reduced-form regression (nY) of 10,000, and a confounding variable with equal effects on X and YUX = βUY = 0.2). First-stage R2 values are 0.002 (filled diamond), 0.004 (open diamond), 0.007 (filled triangle), 0.01 (open triangle), 0.0015 (filled square), 0.2 (open square), 0.03 (filled circle), and 0.05 (open circle).
Figure 3.
Figure 3.
Bias in the subsample instrumental-variable (IV) estimate in confounded (left) and unconfounded (right) scenarios for different values of the average first-stage F statistic and the relative size of the subsample used in the first-stage regression (nX:nY), with a constant causal effect size (βXY = 0.1) and a confounding variable with equal effects on X and YUX = βUY = 0.3). Values for nX:nY are 1 (filled diamond), 0.75 (open diamond), 0.5 (filled triangle), 0.25 (open triangle), and 0.1 (filled square). The sample size for the reduced-form regression equation (nY, on the right vertical axis) is shown as dots connected with a dashed line.
Figure 4.
Figure 4.
Bias in the subsample instrumental-variable (IV) estimate for different values of the first-stage R2 and the relative size of the sample used in the first-stage regression (nX:nY). The sample size for the reduced-form regression equation (nY) is 10,000 (top), 3,000 (middle), and 1,000 (bottom), with a constant causal effect size (βXY = 0.1) and a confounding variable with equal effects on X and YUX = βUY = 0.3). Values for nX:nY are 1 (filled diamond), 0.75 (open diamond), 0.5 (filled triangle), 0.25 (open triangle), and 0.1 (filled square).

References

    1. Lawlor DA, Harbord RM, Sterne JA, et al. Mendelian randomization: using genes as instruments for making causal inferences in epidemiology. Stat Med. 2008;27(8):1133–1163. - PubMed
    1. Didelez V, Sheehan N. Mendelian randomization as an instrumental variable approach to causal inference. Stat Methods Med Res. 2007;16(4):309–330. - PubMed
    1. Glymour MM, Tchetgen EJ, Robins JM. Credible Mendelian randomization studies: approaches for evaluating the instrumental variable assumptions. Am J Epidemiol. 2012;175(4):332–339. - PMC - PubMed
    1. Angrist JD, Krueger AB. The effect of age at school entry on educational attainment: an application of instrumental variables with moments from two samples. J Am Stat Assoc. 1992;87(418):328–336.
    1. Inoue A, Solon G. Two-sample instrumental variables estimators. Rev Econ Stat. 2010;92(3):557–561.

Publication types