Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Nov;40(7):597-608.
doi: 10.1002/gepi.21998. Epub 2016 Sep 14.

Bias due to participant overlap in two-sample Mendelian randomization

Affiliations

Bias due to participant overlap in two-sample Mendelian randomization

Stephen Burgess et al. Genet Epidemiol. 2016 Nov.

Abstract

Mendelian randomization analyses are often performed using summarized data. The causal estimate from a one-sample analysis (in which data are taken from a single data source) with weak instrumental variables is biased in the direction of the observational association between the risk factor and outcome, whereas the estimate from a two-sample analysis (in which data on the risk factor and outcome are taken from non-overlapping datasets) is less biased and any bias is in the direction of the null. When using genetic consortia that have partially overlapping sets of participants, the direction and extent of bias are uncertain. In this paper, we perform simulation studies to investigate the magnitude of bias and Type 1 error rate inflation arising from sample overlap. We consider both a continuous outcome and a case-control setting with a binary outcome. For a continuous outcome, bias due to sample overlap is a linear function of the proportion of overlap between the samples. So, in the case of a null causal effect, if the relative bias of the one-sample instrumental variable estimate is 10% (corresponding to an F parameter of 10), then the relative bias with 50% sample overlap is 5%, and with 30% sample overlap is 3%. In a case-control setting, if risk factor measurements are only included for the control participants, unbiased estimates are obtained even in a one-sample setting. However, if risk factor data on both control and case participants are used, then bias is similar with a binary outcome as with a continuous outcome. Consortia releasing publicly available data on the associations of genetic variants with continuous risk factors should provide estimates that exclude case participants from case-control samples.

Keywords: Mendelian randomization; aggregated data; instrumental variables; summarized data; weak instrument bias.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Mean two‐stage least squares/inverse‐variance weighted estimates plotted against sample overlap for different values of instrument strength (α=0.4, circle; α=0.6, triangle; α=0.8, plus) and different values of the confounder effect on the outcome (βU=0.6, black solid line; βU=1, mid‐gray dashed line; βU=2, light‐gray dotted line). Left panel: positive causal effect (βX=0.2); right panel: null causal effect (βX=0)

References

    1. Angrist, J. , Imbens, G. , & Krueger, A. (1999). Jackknife instrumental variables estimation. Journal of Applied Econometrics, 14(1), 57–67.
    1. Angrist, J. , & Krueger, A. (1992). The effect of age at school entry on educational attainment: An application of instrumental variables with moments from two samples. Journal of the American Statistical Association, 87(418), 328–336.
    1. Angrist, J. , & Pischke, J. (2009). Mostly harmless econometrics: An empiricist's companion. Chapter 4: Instrumental variables in action: Sometimes you get what you need. Princeton: Princeton University Press.
    1. Angrist, J. D. , & Krueger, A. B. (1995). Split‐sample instrumental variables estimates of the return to schooling. Journal of Business and Economic Statistics, 13(2), 225–235.
    1. Bound, J. , Jaeger, D. , & Baker, R. (1995). Problems with instrumental variables estimation when the correlation between the instruments and the endogenous explanatory variable is weak. Journal of the American Statistical Association, 90(430), 443–450.

Publication types

LinkOut - more resources