. 2020 Dec 30;36(Suppl_2):i822-i830.

doi: 10.1093/bioinformatics/btaa861.

Inferring signaling pathways with probabilistic programming

David Merrell^{1

2}, Anthony Gitter^{1

2

3}

Affiliations

¹ Department of Computer Sciences, University of Wisconsin-Madison, Madison, WI 53706, USA.
² Morgridge Institute for Research, Madison, WI 53715, USA.
³ Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI 53726, USA.

PMID: 33381832
PMCID: PMC7773483
DOI: 10.1093/bioinformatics/btaa861

Inferring signaling pathways with probabilistic programming

David Merrell et al. Bioinformatics. 2020.

. 2020 Dec 30;36(Suppl_2):i822-i830.

doi: 10.1093/bioinformatics/btaa861.

Authors

David Merrell^{1

2}, Anthony Gitter^{1

2

3}

Affiliations

¹ Department of Computer Sciences, University of Wisconsin-Madison, Madison, WI 53706, USA.
² Morgridge Institute for Research, Madison, WI 53715, USA.
³ Department of Biostatistics and Medical Informatics, University of Wisconsin-Madison, Madison, WI 53726, USA.

PMID: 33381832
PMCID: PMC7773483
DOI: 10.1093/bioinformatics/btaa861

Abstract

Motivation: Cells regulate themselves via dizzyingly complex biochemical processes called signaling pathways. These are usually depicted as a network, where nodes represent proteins and edges indicate their influence on each other. In order to understand diseases and therapies at the cellular level, it is crucial to have an accurate understanding of the signaling pathways at work. Since signaling pathways can be modified by disease, the ability to infer signaling pathways from condition- or patient-specific data is highly valuable. A variety of techniques exist for inferring signaling pathways. We build on past works that formulate signaling pathway inference as a Dynamic Bayesian Network structure estimation problem on phosphoproteomic time course data. We take a Bayesian approach, using Markov Chain Monte Carlo to estimate a posterior distribution over possible Dynamic Bayesian Network structures. Our primary contributions are (i) a novel proposal distribution that efficiently samples sparse graphs and (ii) the relaxation of common restrictive modeling assumptions.

Results: We implement our method, named Sparse Signaling Pathway Sampling, in Julia using the Gen probabilistic programming language. Probabilistic programming is a powerful methodology for building statistical models. The resulting code is modular, extensible and legible. The Gen language, in particular, allows us to customize our inference procedure for biological graphs and ensure efficient sampling. We evaluate our algorithm on simulated data and the HPN-DREAM pathway reconstruction challenge, comparing our performance against a variety of baseline methods. Our results demonstrate the vast potential for probabilistic programming, and Gen specifically, for biological network inference.

Availability and implementation: Find the full codebase at https://github.com/gitter-lab/ssps.

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

None — $λ_{j} \sim Uniform (λ_{min}, λ_{max}) \forall j \in {1 \dots | V |}$ $z_{i j} | c_{i j}, λ_{j} \sim Bernoulli (\frac{e^{- λ_{j}}}{e^{- c_{i j} λ_{j}} + e^{- λ_{j}}}) \forall i, j \in {1 \dots | V |}$ $σ_{j}^{2} \propto \frac{1}{σ_{j}^{2}} \forall j \in {1 \dots | V |}$ $β_{j} | σ_{j}^{2} \sim N (0, T σ_{j}^{2} {(B_{j}^{⊤} B_{j})}^{- 1}) \forall j \in {1 \dots | V |}$ $X_{+, j} | B_{j}, β_{j}, σ_{j}^{2} \sim N (B_{j} β_{j}, σ_{j}^{2} I) \forall j \in {1 \dots | V |}$ **Fig. 1.** Our generative model. (Top) Plate notation. DBN parameters *β_j* and $σ_{j}^{2}$ have been marginalized out. (Bottom) Full probabilistic specification. We usually set $λ_{min} ≃ 3$ and $λ_{max} = 15$ . If $λ_{min} > 0$ is too small, Markov chains will occasionally be initialized with very large numbers of edges, causing computational issues. The method is insensitive to $λ_{max}$ as long as it is sufficiently large. Notice the improper prior $1 / σ_{j}^{2}$ . In this specification, *B_j* denotes $X_{-, {pa}_{Z} (j)}$ ; i.e. the parents of vertex j depend on edge existence variables Z

**Fig. 2.**
Action probabilities as a function of parent set size. The reference size $\hat{s}$ is determined from prior knowledge. It approximates the size of a ‘typical’ parent set. When $s < \hat{s}$ , add-parent is most probable; when $s > \hat{s}$ , remove-parent is most probable; and when $s = \hat{s}$ , all actions have equal probability

**Fig. 3.**
Heatmap of AUCPR values from the simulation study. Both DBN-based techniques (SSPS and the exact method) score well on this, since the data are generated by a DBN. On large problems the exact DBN method needs strict in-degree constraints, leading to poor prediction quality. LASSO and FunChisq both perform relatively weakly. See Supplementary Figure S2 for representative ROC and precision-recall curves

**Fig. 4.**
Heatmap of differential performance against the prior knowledge, measured by AUCPR paired t-statistics. SSPS consistently outperforms the prior knowledge across problem sizes and shows robustness to errors in the prior knowledge

**Fig. 5.**
Methods’ performances across contexts in the HPN-DREAM Challenge. MCMC is stochastic, so we run SSPS 5 times; the error bars show the range of AUCROC scores. The other methods are all deterministic and require no error bars. See Supplementary Figure S3 for example predicted networks, Supplementary Figure S4 for AUCPR scores and Supplementary Figure S5 for representative ROC and precision-recall curves

See this image and copyright information in PMC

References

1. Bingham E. et al. (2019) Pyro: deep universal probabilistic programming. J. Mach. Learn. Res., 20, 1–6.
1. Budak G. et al. (2015) Reconstruction of the temporal signaling network in Salmonella-infected human cells. Front. Microbiol., 6, 730. - PMC - PubMed
1. Cardner M. et al. (2019) Inferring signalling dynamics by integrating interventional with observational data. Bioinformatics, 35, i577–i585. - PMC - PubMed
1. Carlin D.E. et al. (2017) Prophetic Granger Causality to infer gene regulatory networks. PLoS One, 12, e0170340. - PMC - PubMed
1. Carpenter B. et al. (2017) Stan: a probabilistic programming language. J. Stat. Softw., 76. - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

T32 LM012413/LM/NLM NIH HHS/United States

LinkOut - more resources

Full Text Sources
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Inferring signaling pathways with probabilistic programming

Affiliations

Inferring signaling pathways with probabilistic programming

Authors

Affiliations

Abstract

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Miscellaneous