Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Oct 22;9(10):e0130323.
doi: 10.1128/msystems.01303-23. Epub 2024 Sep 6.

Unfolding and de-confounding: biologically meaningful causal inference from longitudinal multi-omic networks using METALICA

Affiliations

Unfolding and de-confounding: biologically meaningful causal inference from longitudinal multi-omic networks using METALICA

Daniel Ruiz-Perez et al. mSystems. .

Abstract

A key challenge in the analysis of microbiome data is the integration of multi-omic datasets and the discovery of interactions between microbial taxa, their expressed genes, and the metabolites they consume and/or produce. In an effort to improve the state of the art in inferring biologically meaningful multi-omic interactions, we sought to address some of the most fundamental issues in causal inference from longitudinal multi-omics microbiome data sets. We developed METALICA, a suite of tools and techniques that can infer interactions between microbiome entities. METALICA introduces novel unrolling and de-confounding techniques used to uncover multi-omic entities that are believed to act as confounders for some of the relationships that may be inferred using standard causal inferencing tools. The results lend support to predictions about biological models and processes by which microbial taxa interact with each other in a microbiome. The unrolling process helps identify putative intermediaries (genes and/or metabolites) to explain the interactions between microbes; the de-confounding process identifies putative common causes that may lead to spurious relationships to be inferred. METALICA was applied to the networks inferred by existing causal discovery, and network inference algorithms were applied to a multi-omics data set resulting from a longitudinal study of IBD microbiomes. The most significant unrollings and de-confoundings were manually validated using the existing literature and databases.

Importance: We have developed a suite of tools and techniques capable of inferring interactions between microbiome entities. METALICA introduces novel techniques called unrolling and de-confounding that are employed to uncover multi-omic entities considered to be confounders for some of the relationships that may be inferred using standard causal inferencing tools. To evaluate our method, we conducted tests on the inflammatory bowel disease (IBD) dataset from the iHMP longitudinal study, which we pre-processed in accordance with our previous work. From this dataset, we generated various subsets, encompassing different combinations of metagenomics, metabolomics, and metatranscriptomics datasets. Using these multi-omics datasets, we demonstrate how the unrolling process aids in the identification of putative intermediaries (genes and/or metabolites) to explain the interactions between microbes. Additionally, the de-confounding process identifies potential common causes that may give rise to spurious relationships to be inferred. The most significant unrollings and de-confoundings were manually validated using the existing literature and databases.

Keywords: causal inference; de-confounding; longitudinal microbiome analysis; multi-omic integration; unfolding.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Fig 1
Fig 1
Samples of the two-time-slice DBN networks for the four different multi-omic subsets produced by PALM. Self-edges are not displayed to avoid clutter. Networks were learned with a maximum number of parents of 3. The four networks show the nodes representing variables from each omics data source organized in two large circles, one representing the variables for the current time point (blue) and the other for the next time point (orange). Node shapes represent the omics data source of the variable. Taxa nodes are represented as filled circles, metabolites as filled squares, genes as filled diamonds, and clinical variables as filled triangles. Red (green) edges represent negative (positive resp.) regression coefficients. Edge width is proportional to the regression coefficient and edge opacity to the bootstrap score. Finally, node opacity is proportional to abundance. (a) DBN learned with just taxa abundance (T). The data set included abundance of 27 bacteria and a clinical variable indicating the week the sample was obtained and resulted in a network with 95 edges. (b) DBN learned with taxa and metabolites (TM). A set of 19 metabolites were added to the previous data set, and 164 edges were learned in this network. (c) DBN learned with the taxa and genes data set (TG). A set of 34 genes were added to the taxa data set, and a network with 230 edges was learned. (d) DBN learned with the 27 taxa, 34 genes, and 19 metabolites (TGM), resulting in a total of 311 edges.
Fig 2
Fig 2
Heatmap showing the proportion of edges unrolled by METALICA in the Crohn’s disease data sets for the networks obtained from PyCausal (TETRAD) as the alpha parameter varies using data sets with and without temporal alignment. Last column shows the overall bootstrap score.
Fig 3
Fig 3
Heatmap showing percentages of edges unrolled by METALICA in the Crohn’s disease data sets for all the methods averaged over all parameter choices. The last column shows the overall bootstrap score.
Fig 4
Fig 4
Biologically confirmed unrolling. The edge Eubacterium siraeumBacteroides thetaiotaomicron learned in GT (T) is unrolled into Eubacterium siraeum → uridine kinase → cytidine → Bacteroides thetaiotaomicron in GTGM.
Fig 5
Fig 5
Biologically confirmed unrolling. The edge Bacteroides stercorisBacteroides stercoris learned in GT (T) is unrolled into Bacteroides stercoris → uridine kinase → cytidine → Bacteroides stercoris in GTGM

Update of

References

    1. Riesenfeld CS, Schloss PD, Handelsman J. 2004. Metagenomics: genomic analysis of microbial communities. Annu Rev Genet 38:525–552. doi:10.1146/annurev.genet.38.072902.091216 - DOI - PubMed
    1. Fernandez M, Aguiar-Pulido V, Riveros J, Huang W, Segal J, Zeng E, Campos M, Mathee K, Narasimhan G. 2016. Microbiome analysis: state of the art and future trends. Comput Methods for Next Gener Seq Data Anal:401–424. doi:10.1002/9781119272182 - DOI
    1. Bashiardes S, Zilberman-Schapira G, Elinav E. 2016. Use of metatranscriptomics in microbiome research. Bioinform Biol Insights 10:19–25. doi:10.4137/BBI.S34610 - DOI - PMC - PubMed
    1. Turnbaugh PJ, Gordon JI. 2008. An invitation to the marriage of metagenomics and metabolomics. Cell 134:708–713. doi:10.1016/j.cell.2008.08.025 - DOI - PubMed
    1. Stebliankin V, Sazal M, Valdes C, Mathee K, Narasimhan G. 2022. A novel approach for combining the metagenome, metaresistome, metareplicome and causal inference to determine the microbes and their antibiotic resistance gene repertoire that contribute to dysbiosis. Microb Genom 8:mgen000899. doi:10.1099/mgen.0.000899 - DOI - PMC - PubMed

LinkOut - more resources