Integration of mechanistic immunological knowledge into a machine learning pipeline improves predictions

Anthony Culos^{1

2

3}, Amy S Tsai^{1

3}, Natalie Stanley^{1

2}, Martin Becker^{1

2}, Mohammad S Ghaemi^{1

2

4}, David R McIlwain⁵, Ramin Fallahzadeh^{1

2}, Athena Tanada^{1

2}, Huda Nassar^{1

2}, Camilo Espinosa^{1

2}, Maria Xenochristou^{1

2}, Edward Ganio¹, Laura Peterson^{1

6}, Xiaoyuan Han¹, Ina A Stelzer¹, Kazuo Ando¹, Dyani Gaudilliere¹, Thanaphong Phongpreecha^{1

2

7}, Ivana Marić^{1

6}, Alan L Chang^{1

2}, Gary M Shaw⁶, David K Stevenson⁶, Sean Bendall⁷, Kara L Davis⁶, Wendy Fantl^{5

8

9}, Garry P Nolan⁷, Trevor Hastie^{2

10}, Robert Tibshirani^{2

10}, Martin S Angst^{1

11}, Brice Gaudilliere^{1

6

11}, Nima Aghaeepour^{1

2

6

11}

Affiliations

¹ Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University School of Medicine, Stanford, CA, USA.
² Department of Biomedical Data Sciences, Stanford University, Stanford, CA, USA.
³ These authors contributed equally: Anthony Culos, Amy S. Tsai.
⁴ Digital Technologies Research Centre, National Research Council Canada, Toronto, Ontario, Canada.
⁵ Department of Microbiology and Immunology, Baxter Laboratory in Stem Cell Biology, Stanford University School of Medicine, Stanford, CA, USA.
⁶ Department of Pediatrics, Division of Neonatal and Developmental Medicine, Stanford University School of Medicine, Stanford, CA, USA.
⁷ Department of Pathology, Stanford University School of Medicine, Stanford, CA, USA.
⁸ Department of Obstetrics and Gynecology, Stanford University School of Medicine, Stanford, CA, USA.
⁹ Department of Urology, Stanford University School of Medicine, Stanford, CA, USA.
¹⁰ Department of Statistics, Stanford University, Stanford, CA, USA.
¹¹ These authors jointly supervised this work: Martin S. Angst, Brice Gaudilliere, Nima Aghaeepour.

PMID: 33294774
PMCID: PMC7720904
DOI: 10.1038/s42256-020-00232-8

Integration of mechanistic immunological knowledge into a machine learning pipeline improves predictions

Anthony Culos et al. Nat Mach Intell. 2020 Oct.

. 2020 Oct;2(10):619-628.

doi: 10.1038/s42256-020-00232-8. Epub 2020 Oct 12.

Authors

Affiliations

¹ Department of Anesthesiology, Perioperative and Pain Medicine, Stanford University School of Medicine, Stanford, CA, USA.
² Department of Biomedical Data Sciences, Stanford University, Stanford, CA, USA.
³ These authors contributed equally: Anthony Culos, Amy S. Tsai.
⁴ Digital Technologies Research Centre, National Research Council Canada, Toronto, Ontario, Canada.
⁵ Department of Microbiology and Immunology, Baxter Laboratory in Stem Cell Biology, Stanford University School of Medicine, Stanford, CA, USA.
⁶ Department of Pediatrics, Division of Neonatal and Developmental Medicine, Stanford University School of Medicine, Stanford, CA, USA.
⁷ Department of Pathology, Stanford University School of Medicine, Stanford, CA, USA.
⁸ Department of Obstetrics and Gynecology, Stanford University School of Medicine, Stanford, CA, USA.
⁹ Department of Urology, Stanford University School of Medicine, Stanford, CA, USA.
¹⁰ Department of Statistics, Stanford University, Stanford, CA, USA.
¹¹ These authors jointly supervised this work: Martin S. Angst, Brice Gaudilliere, Nima Aghaeepour.

PMID: 33294774
PMCID: PMC7720904
DOI: 10.1038/s42256-020-00232-8

Abstract

The dense network of interconnected cellular signalling responses that are quantifiable in peripheral immune cells provides a wealth of actionable immunological insights. Although high-throughput single-cell profiling techniques, including polychromatic flow and mass cytometry, have matured to a point that enables detailed immune profiling of patients in numerous clinical settings, the limited cohort size and high dimensionality of data increase the possibility of false-positive discoveries and model overfitting. We introduce a generalizable machine learning platform, the immunological Elastic-Net (iEN), which incorporates immunological knowledge directly into the predictive models. Importantly, the algorithm maintains the exploratory nature of the high-dimensional dataset, allowing for the inclusion of immune features with strong predictive capabilities even if not consistent with prior knowledge. In three independent studies our method demonstrates improved predictions for clinically relevant outcomes from mass cytometry data generated from whole blood, as well as a large simulated dataset. The iEN is available under an open-source licence.

PubMed Disclaimer

Conflict of interest statement

Competing interests The authors declare no competing interests.

Figures

**Fig. 1 |. The immunological Elastic-Net analysis pipeline.**
a,b, Immunological prior knowledge for each feature, in response to each ex vivo stimulation condition, is extracted by a panel of experts (a) and encoded into a prior knowledge tensor to guide the model optimization process (b). c,d, Individuals within the cohort of study (c) provide blood samples, which are subsequently stimulated with ligands ex vivo to activate various signalling pathways of the immune system (d). e, This produces single-cell measurements of the immune system, resulting in a complex network of cell types and signalling pathways representing both innate and adaptive immunity. f,g, This dataset is then fed into the ieN algorithm (f) for predictive modelling of the outcome of interest (g).

**Fig. 2 |. Integration of immunological priors.**
a, Overview of the LTP study. A correlation network of intracellular signalling responses, measured in peripheral immune cells and coloured by ex vivo stimulation status, is visualized. edges represent significant (P < 0.05) pairwise correlation after Bonferroni adjustment for multiple hypothesis correction. Node sizes represent the significance of correlations with the response variable (gestational age during term pregnancy). b, Immune features that were congruent with domain-specific knowledge as determined by a panel of five immunologists were refined into a tensor and used to determine the node colour of the correlation network. Here, immune features that have a value of 1 (full agreement among the panel) are coloured red and all other immune features are coloured black. c, The network is coloured by the standard deviation of scores assigned to each feature by the panel of immunologists. Overall, the consistency of the prior knowledge among panel members is higher in the features with a higher score, indicating a stronger agreement regarding the top features that should be prioritized by the algorithm and disagreements regarding features inconsistent with prior knowledge.

**Fig. 3 |. Prior knowledge and sparsification.**
Visualization of an example of the impact of prior immunological knowledge on various features in an ieN model. As φ is increased across the x axis (increased impact of prior knowledge), the contributions of each feature to the final model (y axis) change to select models consistent with immunological priors. We have highlighted two examples where a feature is emphasized or de-emphasized (in red and black, respectively) by prior knowledge. In this example, the STAT1 response to IFN-α stimulation in regulatory T cells is prioritized as STAT1 is downstream of the IFN-α/β receptor and is integral to their homeostasis and function. Conversely, the prpS6 response to stimulation by IL-2 and IL-6 in non-classical monocytes is increasingly deprioritized as this signalling response is inconsistent with prior understanding of these signal transduction pathways in this cell type; IL-2 primarily drives T-cell differentiation through the Janus kinase (JAK)/STAT pathway. Similarly, IL-6 primarily activates the JAK/STAT pathway and IL-6 receptors are expressed only in a subset of immune cells^,. This confirms that integration of the priors can not only modify the algorithm’s behaviour, but also that the intensity of this impact can be controlled through the φ free parameter.

**Fig. 4 |. Incorporation of prior knowledge improves predictions in two clinical studies and a simulated experiment.**
a, Boxplot of Pearson correlation P values calculated on out-of-sample predictions from repeated 10-fold cross-validation of eN (black) and ieN (red) models for the LTP dataset. b, Validation of the LTP model on an independent validation cohort. These predictions were compared against the true response variable using −log₁₀ Pearson correlation P values. c, Boxplots of Wilcoxon rank-sum test P values similarly calculated on out-of-sample predictions for the ChP dataset (null hypothesis: the sample-to-class assignment probabilities produced by the model are equal between the two outcome classes). Comparison of model performance for the respective datasets demonstrated improved predictions for the ieN, as shown by −log₁₀ P values. d, A simulated study with varying cohort sizes of simulated ‘patients’ with 700 features demonstrated a larger gain (measured by −log₁₀ Pearson’s test P values) for the integration of prior immunological knowledge in datasets with a relatively small cohort size and a large number of features. Locally fitted polynomial curves of prediction performance over multiple cohort sizes are displayed with 95% confidence intervals (CIs). e–h, R.m.s.e. values to demonstrate the effect sizes of the models in a–d.

**Fig. 5 |. iEN is robust to errors in the prior knowledge tensor.**
a–d, Various levels of noise were artificially added to the prior knowledge values, as indicated by the r.m.s.e. values of the true prior values versus the simulated ones (x axis). As the value on the x axis increases, the amount of noise in the simulated prior increases until all priors are sampled from a uniform and random distribution (vertical dashed line). Reassuringly, at this point, the performance of ieN is close to that of the eN (with no priors), as indicated by a horizontal dashed line. Importantly, ieN continues to outperform the eN (horizontal dashed line) for even high amounts of error in the priors. All curves are locally fitted polynomials of predictive performance with the shaded region representing 95% CI.

See this image and copyright information in PMC

References

1. Davis MM, Tato CM & Furman D Systems immunology: just getting started. Nat. Immunol 18, 725–732 (2017). - PMC - PubMed
1. Rieckmann JC et al. Social network architecture of human immune cells unveiled by quantitative proteomics. Nat. Immunol 18, 583–593 (2017). - PubMed
1. Mathew D et al. Deep immune profiling of COVID-19 patients reveals distinct immunotypes with therapeutic implications. Science (2020); 10.1126/science.abc8511. - DOI - PMC - PubMed
1. Wilk AJ et al. A single-cell atlas of the peripheral immune response in patients with severe COVID-19. Nat. Med 26, 1070–1076 (2020). - PMC - PubMed
1. Porter DL, Levine BL, Kalos M, Bagg A & June CH Chimeric antigen receptor-modified T cells in chronic lymphoid leukemia. New Engl. J. Med 365, 725–733 (2011). - PMC - PubMed

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Integration of mechanistic immunological knowledge into a machine learning pipeline improves predictions

Affiliations

Integration of mechanistic immunological knowledge into a machine learning pipeline improves predictions

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources