Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2025 Aug 1:2025.07.30.25332465.
doi: 10.1101/2025.07.30.25332465.

Mind the gap: characterizing bias due to population mismatch in two-sample Mendelian randomization

Affiliations

Mind the gap: characterizing bias due to population mismatch in two-sample Mendelian randomization

Jack Li et al. medRxiv. .

Update in

Abstract

Mendelian randomization (MR) is a statistical method for estimating causal effects using genetic variants as instrumental variables. In two sample MR (2SMR), different study samples are used to estimate genetic associations with the exposure and outcome. For valid inference, these studies must include individuals from the same population. Using studies from different populations may bias the 2SMR estimate due to differences in linkage disequilibrium or genetic effects on the exposure trait. We show that violation of the same-population assumption leads to bias in the causal estimate towards zero on average, and does not increase the rate of false positives. We verify this result in a broad survey of 2SMR estimates, comparing estimates made with matching and mismatching populations across 546 trait pairs measured in 2-7 ancestries. We find that most population-mismatched estimates are attenuated towards zero compared to their corresponding population-matched estimates, and that increasing genetic distance between study populations is associated with greater shrinkage. We observe bias even when mismatched populations have the same continental ancestry. However, we also find that, in some cases, using a larger exposure study with mismatching ancestry can improve power by dramatically increasing precision. These results show that even intra-continental population mismatch can bias 2SMR estimates, but also suggests there is potential to improve the power of 2SMR in understudied populations by properly leveraging larger, mismatching study populations.

PubMed Disclaimer

Figures

Figure 1:
Figure 1:
(A) Diagram depicting a 2SMR study and the calculation of the causal estimate as a weighted average of ratio estimates. (B) Two possible sources of differences between βX,j,t and βX,j,e. Left, LD patterns are the same in the two populations but the causal variant has a different effect on the exposure. Right, the causal variant has the same effect in both populations, but LD structure differs. In both cases, there is a strong exposure-instrument association in population e, but a weaker association in population t.
Figure 2:
Figure 2:
Study design of cross-population survey of 2SMR estimates. Right, example of an exposure trait-outcome study group for the effect of urate on gout with UK Biobank defining the target population. Exposure studies are color-coded based on how closely study populations match the target population: exact match (dark blue), continental match but a different subpopulation (light blue), or a continental mismatch (orange).
Figure 3:
Figure 3:
Forest plots of 2SMR estimates of the effect of LDL cholesterol on coronary heart disease from GRAPPLE at two instrument selection p-value thresholds. Points indicate effect estimates and intervals indicate 95% confidence intervals. Color indicates if an estimate is made with exact matching (dark blue), continentally matching (light blue), or continentally mismatching (red) exposure studies. A star indicates that an estimate is significantly different than the reference estimate (FDR < 0.05), when a reference estimate is available.
Figure 4:
Figure 4:
Population pair specific shrinkage coefficients vs approximate Fst distance between populations. Population pair specific shrinkage coefficients were estimated using SIMEX. The dashed line represents the univariable regression line from inverse-variance weighted regression of Fst on estimated shrinkage coefficient. Vertical confidence intervals represent 95% confidence intervals for each shrinkage coefficient.

References

    1. Haycock Philip C, Burgess Stephen, Wade Kaitlin H, Bowden Jack, Relton Caroline, and Davey Smith George. Best (but oft-forgotten) practices: the design, analysis, and interpretation of mendelian randomization studies. American Journal of Clinical Nutrition, 103(4):965–78, 2016. - PMC - PubMed
    1. Cerezo Maria, Sollis Elliot, Ji Yue, Lewis Elizabeth, Abid Ala, Bircan Karatuğ Ozan, Hall Peggy, Hayhurst James, John Sajo, Mosaku Abayomi, Ramachandran Santhi, Foreman Amy, Ibrahim Arwa, McLaughlin James, Pendlington Zoë, Stefancsik Ray, Lambert Samuel A, McMahon Aoife, Morales Joannella, Keane Thomas, Inouye Michael, Parkinson Helen, and Harris Laura W. The nhgri-ebi gwas catalog: standards for reusability, sustainability and diversity. Nucleic Acids Research, 53(D1):D998–D1005, 2025. - PMC - PubMed
    1. Lyon Matthew S., Andrews Shea J., Elsworth Ben, Gaunt Tom R., Hemani Gibran, and Marcora Edoardo. The variant call format provides efficient and robust storage of gwas summary statistics. Genome Biology, 22(32), 2021. - PMC - PubMed
    1. Burgess Stephen, Davies Neil M., and Thompson Simon G.. Bias due to participant overlap in two-sample mendelian randomization. Genetic Epidemiology, 40(7):597–608, 2016. - PMC - PubMed
    1. Wang Jingshu, Zhao Qingyuan, Bowden Jack, Hemani Gibran, Davey Smith George, Small Dylan S., and Zhang Nancy R.. Causal inference for heritable phenotypic risk factors using heterogeneous genetic instruments. PLoS Genetics, 17 (6), 2021. - PMC - PubMed

Publication types

LinkOut - more resources