Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2023 Feb;47(1):3-25.
doi: 10.1002/gepi.22506. Epub 2022 Oct 23.

Statistical methods for cis-Mendelian randomization with two-sample summary-level data

Affiliations
Review

Statistical methods for cis-Mendelian randomization with two-sample summary-level data

Apostolos Gkatzionis et al. Genet Epidemiol. 2023 Feb.

Abstract

Mendelian randomization (MR) is the use of genetic variants to assess the existence of a causal relationship between a risk factor and an outcome of interest. Here, we focus on two-sample summary-data MR analyses with many correlated variants from a single gene region, particularly on cis-MR studies which use protein expression as a risk factor. Such studies must rely on a small, curated set of variants from the studied region; using all variants in the region requires inverting an ill-conditioned genetic correlation matrix and results in numerically unstable causal effect estimates. We review methods for variable selection and estimation in cis-MR with summary-level data, ranging from stepwise pruning and conditional analysis to principal components analysis, factor analysis, and Bayesian variable selection. In a simulation study, we show that the various methods have comparable performance in analyses with large sample sizes and strong genetic instruments. However, when weak instrument bias is suspected, factor analysis and Bayesian variable selection produce more reliable inferences than simple pruning approaches, which are often used in practice. We conclude by examining two case studies, assessing the effects of low-density lipoprotein-cholesterol and serum testosterone on coronary heart disease risk using variants in the HMGCR and SHBG gene regions, respectively.

Keywords: Bayesian variable selection; Mendelian randomization; correlated instruments; factor analysis; principal components analysis; pruning.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
A causal diagram representation of the three assumptions of Mendelian randomization. Here, X represents the risk factor, Y the outcome, G the genetic instrument and U denotes confounders of the XY relationship.
Figure 2
Figure 2
Directed acyclic graph illustrating why pleiotropic bias is less likely in MR studies of protein expression than in studies using downstream phenotypic traits as risk factors. Here, G denotes the genetic instrument, P protein expression, B the downstream biomarker, Y the outcome and U denotes confounding factors. Both types of MR studies are affected by pleiotropy due to “direct” effects of the genetic instrument on the outcome (GY pathway), but standard MR analyses are also subject to pleiotropy due to potential effects of the protein on the outcome (GPY pathway) which is not the case for studies of disease progression. MR, Mendelian randomization.
Figure 3
Figure 3
Left: genetic correlations in the HMGCR region. Right: Manhattan plot of p values for associations of HMGCR variants with LDL‐cholesterol.
Figure 4
Figure 4
Left: genetic correlations in the SHBG region. Right: Manhattan plot of p values for associations of SHBG variants with testosterone levels.

References

    1. Allara, E. , Morani, G. , Carter, P. , Gkatzionis, A. , Zuber, V. , Foley, C. N. , Rees, J. M. , Mason, A. M. , Bell, S. , Gill, D. , Lindstrom, S. , Butterworth, A. S. , DiAngelantonio, E. , Peters, J. , Burgess, S. , & INVENT consortium (2019). Genetic determinants of lipids and cardiovascular disease outcomes: A wide‐angled Mendelian randomization investigation. Circulation. Genomic and precision medicine, 12(12), e002711. - PMC - PubMed
    1. Anderson, T. W. , & Rubin, H. (1949). Estimation of the parameters of a single equation in a complete system of stochastic equations. Annals of Mathematical Statistics, 20(1), 46–63.
    1. Angrist, J. D. , & Pischke, J.‐S. (2008). Mostly harmless econometrics: An empiricist's companion. Princeton University Press.
    1. Asimit, J. L. , Rainbow, D. B. , Fortune, M. D. , Grinberg, N. F. , Wicker, L. S. , & Wallace, C. (2019). Stochastic search and joint fine‐mapping increases accuracy and identifies previously unreported associations in immune‐mediated diseases. Nature Communications, 10. 10.1038/s41467-019-11271-0 - DOI - PMC - PubMed
    1. Bai, J. , & Ng, S. (2002). Determining the number of factors in approximate factor models. Econometrica, 70(1), 191–221.

Publication types

LinkOut - more resources