Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Apr 30:12:651812.
doi: 10.3389/fgene.2021.651812. eCollection 2021.

MRPC: An R Package for Inference of Causal Graphs

Affiliations

MRPC: An R Package for Inference of Causal Graphs

Md Bahadur Badsha et al. Front Genet. .

Abstract

Understanding the causal relationships between variables is a central goal of many scientific inquiries. Causal relationships may be represented by directed edges in a graph (or equivalently, a network). In biology, for example, gene regulatory networks may be viewed as a type of causal networks, where X→Y represents gene X regulating (i.e., being causal to) gene Y. However, existing general-purpose graph inference methods often result in a high number of false edges, whereas current causal inference methods developed for observational data in genomics can handle only limited types of causal relationships. We present MRPC (a PC algorithm with the principle of Mendelian Randomization), an R package that learns causal graphs with improved accuracy over existing methods. Our algorithm builds on the powerful PC algorithm (named after its developers Peter Spirtes and Clark Glymour), a canonical algorithm in computer science for learning directed acyclic graphs. The improvements in MRPC result in increased accuracy in identifying v-structures (i.e., X→Y←Z), and robustness to how the nodes are arranged in the input data. In the special case of genomic data that contain genotypes and phenotypes (e.g., gene expression) at the individual level, MRPC incorporates the principle of Mendelian randomization as constraints on edge direction to help orient the edges. MRPC allows for inference of causal graphs not only for general purposes, but also for biomedical data where multiple types of data may be input to provide evidence for causality. The R package is available on CRAN and is a free open-source software package under a GPL (≥2) license.

Keywords: R package; causal inference; gene regulatory networks; graphical models; networks; principle of Mendelian randomization.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

FIGURE 1
FIGURE 1
Basic causal graphs under the principle of Mendelian randomization. (A) The five basic (inferred) causal graphs. Each includes a genotype node (also an instrumental variable), V1, and two phenotype nodes, T1 and T2. (B) Two DAGs M5 and M6 are Markov equivalent, and can both be represented by M4.
FIGURE 2
FIGURE 2
The four modules in the MRPC package. Inputs are listed on the left and outputs on the right. The inference module is at the center of the package, which may take the correlation matrix from real or simulated data as input, and outputs a graph object, the core of which is the (asymmetric) adjacency matrix. For genomic data, we require that the genotype (instrumental variable) nodes are placed in the data matrix before the phenotype nodes. Thus, the rows and columns of the correlation matrix and the adjacency matrix also start with genotypes, followed by phenotypes. The simulation module can generate a data matrix from which the correlation matrix may be derived and used as input to the inference module. A graph object, constructed directly or provided by the inference module, can be passed through the visualization module for displaying the graph topology and for clustering nodes into modules. The difference between two graph objects (e.g., true and inferred graphs, graphs inferred by two different methods) may be evaluated by multiple metrics in the assessment module.
FIGURE 3
FIGURE 3
Visualization of a complex graph in MRPC. (A) The true graph includes 14 genetic variants and 8 phenotype nodes. (B) The inferred graph. (C) The dendrogram of the inferred graph with four modules identified when the minimum module size is set to 5. (D) Redrawing the inferred graph based on the dendrogram. Nodes of the same color belong to the same module.
FIGURE 4
FIGURE 4
Impact of outliers on graph inference. (A) The true graph under which data of sample size 1,000 with and without outliers were simulated. (B) Inference by MRPC, pc, pc.stable, mmpc, hc, and mmhc on the simulated data that do not contain outliers, using Pearson correlation as input. (C) Inference by the five functions on the simulated data that contain 10 outliers, using Pearson correlation as input. In (B) and (C) the blacklist argument in pc.stable, mmpc, hc, and mmhc was used to disallow edges pointing to a genetic variant. (D) Inference by MRPC and pc on the simulated data with 10 outliers using robust correlation as input.
FIGURE 5
FIGURE 5
Examples of the GEUVADIS data analysis accounting for confounding variables. Each of the five sets in (AE) contains an eQTL and multiple genes. These genes have been identified by GEUVADIS to be significantly associated with the corresponding eQTL. We derived the principal components (PCs) from the whole-genome gene expression matrix and identified the PCs that are significantly associated with the eQTLs or genes. We applied MRPC to the eQTL-gene set without and with the associated PCs. The PCs can have diverse relationships with the genes.

References

    1. Badsha M. B., Fu A. Q. (2019). Learning causal biological networks with generalized Mendelian randomization. Front. Genet. 10:460. 10.3389/fgene.2019.00460 - DOI - PMC - PubMed
    1. Badsha M. B., Mollah M. N., Jahan N., Kurata H. (2013). Robust complementary hierarchical clustering for gene expression data analysis by beta-divergence. J. Biosci. Bioeng. 116 397–407. 10.1016/j.jbiosc.2013.03.010 - DOI - PubMed
    1. Colombo D., Maathuis M. H. (2014). Order-independent constraint-based causal structure learning. J. Mach. Learn. Res. 15 3921–3962.
    1. Dawid A. P. (2010). Beware of the DAG! J. Mach. Learn. Res. Proc. 6 59–86.
    1. Didelez V., Sheehan N. (2007). Mendelian randomization as an instrumental variable approach to causal inference. Stat. Methods Med. Res. 16 309–330. 10.1177/0962280206077743 - DOI - PubMed