Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Mar 24;3(5):100473.
doi: 10.1016/j.patter.2022.100473. eCollection 2022 May 13.

Essential Regression: A generalizable framework for inferring causal latent factors from multi-omic datasets

Affiliations

Essential Regression: A generalizable framework for inferring causal latent factors from multi-omic datasets

Xin Bing et al. Patterns (N Y). .

Abstract

High-dimensional cellular and molecular profiling of biological samples highlights the need for analytical approaches that can integrate multi-omic datasets to generate prioritized causal inferences. Current methods are limited by high dimensionality of the combined datasets, the differences in their data distributions, and their integration to infer causal relationships. Here, we present Essential Regression (ER), a novel latent-factor-regression-based interpretable machine-learning approach that addresses these problems by identifying latent factors and their likely cause-effect relationships with system-wide outcomes/properties of interest. ER can integrate many multi-omic datasets without structural or distributional assumptions regarding the data. It outperforms a range of state-of-the-art methods in terms of prediction. ER can be coupled with probabilistic graphical modeling, thereby strengthening the causal inferences. The utility of ER is demonstrated using multi-omic system immunology datasets to generate and validate novel cellular and molecular inferences in a wide range of contexts including immunosenescence and immune dysregulation.

Keywords: causal inference; dimensionality reduction; interpretable machine learning; latent model; machine learning; systems biology; systems immunology.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

None
Graphical abstract
Figure 1
Figure 1
Essential Regression: A novel interpretable machine-learning approach to uncover causal latent factors from high-dimensional multi-omic datasets (A) Schematic illustrating the different kinds of multi-omic datasets typically used in systems analyses and the key advantages of the methods introduced in this study over existing approaches. (B) Schematic summarizing the steps in ER. (C) Schematic summarizing the steps in Composite Regression. (D–G) Comparison of the predictive performance of PLS, PFR, LASSO, and ER on simulated datasets across a range of parameter settings.
Figure 2
Figure 2
Identifying causal signatures of age-induced immunosenescent responses to the Zostavax vaccine (A) Schematic summarizing the input data and the problem of interest. (B) ROC curves for the different methods at discriminating between elderly people and younger adults in an LOOCV framework. (C) Pearson correlations of the different methods at predicting age as a continuous variable, as measured in an LOOCV cross-validation framework. (D) CausalMGM on all Zs identified by ER. The Markov blanket is highlighted with a blue border and bolder fonts. A directed edge X → Y indicates X is a cause of Y, while a bidirected edge X ←→ Y indicates the presence of a latent confounder that is a common cause of X and Y. A partially oriented edge X o→ Y indicates that Y is not a cause of X but that either X or a latent confounder causes Y. Unoriented edge indicates directionality could not be inferred for that edge. (E) Network distances in the causal graph generated by CausalMGM of the significant and non-significant Zs identified by ER from the outcome variable of interest. p value calculated using a Mann-Whitney U test (I) Mechanistic insights obtained from ER. (F) Correlations involving the NK cell latent factor, B cell latent factor, and age. Top panels show correlations between the NK cell latent factor and age (top left), and the B cell latent factor and age (top right). Bottom panels show correlations between the NK cell latent factor and the B cell latent factor without correcting for age (bottom left) and after correcting for age (bottom right). (G) Correlations between NK cells and B cells in the context of vaccination against SARS-CoV2 in a NHP model. (H) Correlations between NK cells and B cells in the context of vaccination against SARS-CoV2 in a NHP model, after correcting for treatment (vaccination arm) and timepoint.
Figure 3
Figure 3
Identifying differences in vaccine-induced transcriptomic profiles over time (A) Schematic summarizing the input data and the problem of interest. (B) Ternary classification accuracy of the different methods at discriminating among G1, G2, and G3 in a replicated k-fold cross-validation framework. (C) Confusion matrix summarizing the performance of the different methods at discriminating among G1, G2, and G3 in an LOOCV framework. (D) ROC curves for the different methods at discriminating between G3 and G1 and G2 combined in an LOOCV framework. (E) ROC curves for the different methods at discriminating between G3 and G1 in an LOOCV framework. (F) Fraction of true G3 correctly classified as G3 (as measured in an LOOCV framework). (G) CausER graph i.e., CausalMGM on the significant Zs from ER. The Markov blanket is highlighted with a blue border and bolder fonts. A directed edge X → Y indicates X is a cause of Y, while a bidirected edge X ←→ Y indicates the presence of a latent confounder that is a common cause of X and Y. A partially oriented edge X o→ Y indicates that Y is not a cause of X but that either X or a latent confounder causes Y. Unoriented edge indicates directionality could not be inferred for that edge. (H) Heatmap of genes in CausER hits (significant Zs in the Markov blanket) for G1 and G3 samples. (I) Heatmap of genes in CausER hits (significant Zs in the Markov blanket) for G1, G2, and G3 samples.
Figure 4
Figure 4
Elucidating markers of latent and active tuberculosis (Tb) (A) Schematic summarizing the input data and the problem of interest. (B) Classification accuracy of the different methods at discriminating between latent and active Tb, measured in a replicated k-fold cross-validation framework. (C) Heatmap of features in the single CausER hit.
Figure 5
Figure 5
Uncovering specific immune parameters from term and pre-term infants that do not achieve stereotypic convergence (A) Schematic summarizing the input data and the problem of interest. (B) Classification accuracy of the different methods at discriminating between term and pre-term births using immune profiles at 3 months after birth, measured in a replicated k-fold cross-validation framework. (C) ROC curves for the different methods at discriminating between term and pre-term births as measured in an LOOCV framework. (D) Heatmap of features (plasma proteins and immune cells) in the single hit (significant Z identified by ER in the Markov blanket of outcome). (E) Mechanistic insights obtained from ER.

Similar articles

Cited by

References

    1. Hagan T., Pulendran B. Will systems biology deliver its promise and contribute to the development of new or improved vaccines? From data to understanding through systems biology. Cold Spring Harb. Perspect. Biol. 2018;10:a028894. doi: 10.1101/cshperspect.a028894. - DOI - PMC - PubMed
    1. Pulendran B., Li S., Nakaya H.I. Systems vaccinology. Immunity. 2010;33:516–529. doi: 10.1016/j.immuni.2010.10.006. - DOI - PMC - PubMed
    1. Davis M.M., Tato C.M., Furman D. Systems immunology: just getting started. Nat. Immunol. 2017;18:725–732. doi: 10.1038/ni.3768. - DOI - PMC - PubMed
    1. Villani A.C., Sarkizova S., Hacohen N. Systems immunology: learning the rules of the immune system. Annu. Rev. Immunol. 2018;36:813–842. doi: 10.1146/annurev-immunol-042617-053035. - DOI - PMC - PubMed
    1. Suscovich T.J., Fallon J.K., Das J., Demas A.R., Crain J., Linde C.H., Michell A., Natarajan H., Arevalo C., Broge T., et al. Mapping functional humoral correlates of protection against malaria challenge following RTS,S/AS01 vaccination. Sci. Transl. Med. 2020;12:eabb4757. doi: 10.1126/scitranslmed.abb4757. - DOI - PubMed

LinkOut - more resources