Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 May 1;37(4):531-541.
doi: 10.1093/bioinformatics/btaa778.

MR-Clust: clustering of genetic variants in Mendelian randomization with similar causal estimates

Affiliations

MR-Clust: clustering of genetic variants in Mendelian randomization with similar causal estimates

Christopher N Foley et al. Bioinformatics. .

Abstract

Motivation: Mendelian randomization is an epidemiological technique that uses genetic variants as instrumental variables to estimate the causal effect of a risk factor on an outcome. We consider a scenario in which causal estimates based on each variant in turn differ more strongly than expected by chance alone, but the variants can be divided into distinct clusters, such that all variants in the cluster have similar causal estimates. This scenario is likely to occur when there are several distinct causal mechanisms by which a risk factor influences an outcome with different magnitudes of causal effect. We have developed an algorithm MR-Clust that finds such clusters of variants, and so can identify variants that reflect distinct causal mechanisms. Two features of our clustering algorithm are that it accounts for differential uncertainty in the causal estimates, and it includes 'null' and 'junk' clusters, to provide protection against the detection of spurious clusters.

Results: Our algorithm correctly detected the number of clusters in a simulation analysis, outperforming methods that either do not account for uncertainty or do not include null and junk clusters. In an applied example considering the effect of blood pressure on coronary artery disease risk, the method detected four clusters of genetic variants. A post hoc hypothesis-generating search suggested that variants in the cluster with a negative effect of blood pressure on coronary artery disease risk were more strongly related to trunk fat percentage and other adiposity measures than variants not in this cluster.

Availability and implementation: MR-Clust can be downloaded from https://github.com/cnfoley/mrclust.

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Directed acyclic graph illustrating relationships between three genetic variants that are valid IVs with a risk factor, outcome and confounders of the risk factor–outcome associations. The causal effect of the risk factor on the outcome is indicated by θ
Fig. 2.
Fig. 2.
Scenarios that could lead to clustered heterogeneity, defined as the case where causal estimates from multiple variants tend towards a number of distinct values as the sample size increases. Clustered heterogeneity could arise in a number of ways: the mechanisms may represent distinct components of the risk factor, or distinct pathways by which the risk factor may influence the outcome, or intermediaries on the causal pathway from the genetic variant to the outcome
Fig. 3.
Fig. 3.
Results from the simulation study with sample size N =1000 for MR-Clust (with and without junk cluster), Mclust and TAGM methods under four scenarios for the Rand index (top panel) and the number of clusters identified (bottom panel). Points represent median values across simulated datasets, and vertical bars represent the first and ninth deciles. The horizontal line in the bottom panel represents the true number of clusters in each scenario. Two versions of each method are presented: (A) each variant is assigned to the cluster with the greatest conditional probability; (B) variants are only assigned to a cluster if the conditional probability is 0.8 and clusters are only displayed if at least 4 variants are assigned to the cluster
Fig. 4.
Fig. 4.
Results from the simulation study with sample size N =5000 for MR-Clust (with and without junk cluster), Mclust and TAGM methods under four scenarios for the Rand index (top panel) and the number of clusters identified (bottom panel). Points represent median values across simulated datasets, and vertical bars represent the first and ninth deciles. The horizontal line in the bottom panel represents the true number of clusters in each scenario. Two versions of each method are presented: (A) each variant is assigned to the cluster with the greatest conditional probability; (B) variants are only assigned to a cluster if the conditional probability is 0.8 and clusters are only displayed if at least 4 variants are assigned to the cluster
Fig. 5.
Fig. 5.
Kernel-weighted density plot of cluster means identified by MR-Clust method in simulation scenario 4. Dashed vertical lines represent the true values of the cluster means
Fig. 6.
Fig. 6.
Genetic associations with blood pressure traits (mmHg) and coronary artery disease risk (log odds) per additional blood pressure-increasing allele. Each genetic variant is represented by a point. Error bars are 95% confidence intervals for the genetic associations. Colours represent the clusters, and dotted lines represent the cluster means. Top row: method version (A)—each variant is assigned to the cluster with the greatest conditional probability. Bottom row: method version (B)—variants are only assigned to a cluster if the conditional probability is 0.8, and clusters are only displayed if at least 4 variants are assigned to the cluster. Left column: systolic blood pressure; middle column: diastolic blood pressure; and right column: pulse pressure

References

    1. Angrist J. et al. (1996) Identification of causal effects using instrumental variables. J. Am. Stat. Assoc., 91, 444–455.
    1. Baiocchi M. et al. (2014) Instrumental variable methods for causal inference. Stat. Med., 33, 2297–2340. - PMC - PubMed
    1. Burgess S., Thompson S.G. (2015) Mendelian Randomization: Methods for Using Genetic Variants in Causal Estimation. Chapman & Hall, Boca Raton, FL.
    1. Burgess S. et al. (2013) Mendelian randomization analysis with multiple genetic variants using summarized data. Genet. Epidemiol., 37, 658–665. - PMC - PubMed
    1. Burgess S. et al. (2016. a) Bias due to participant overlap in two-sample Mendelian randomization. Genet. Epidemiol., 40, 597–608. - PMC - PubMed

Publication types