Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 May 2;20(1):220.
doi: 10.1186/s12859-019-2798-1.

WASABI: a dynamic iterative framework for gene regulatory network inference

Affiliations

WASABI: a dynamic iterative framework for gene regulatory network inference

Arnaud Bonnaffoux et al. BMC Bioinformatics. .

Abstract

Background: Inference of gene regulatory networks from gene expression data has been a long-standing and notoriously difficult task in systems biology. Recently, single-cell transcriptomic data have been massively used for gene regulatory network inference, with both successes and limitations.

Results: In the present work we propose an iterative algorithm called WASABI, dedicated to inferring a causal dynamical network from time-stamped single-cell data, which tackles some of the limitations associated with current approaches. We first introduce the concept of waves, which posits that the information provided by an external stimulus will affect genes one-by-one through a cascade, like waves spreading through a network. This concept allows us to infer the network one gene at a time, after genes have been ordered regarding their time of regulation. We then demonstrate the ability of WASABI to correctly infer small networks, which have been simulated in silico using a mechanistic model consisting of coupled piecewise-deterministic Markov processes for the proper description of gene expression at the single-cell level. We finally apply WASABI on in vitro generated data on an avian model of erythroid differentiation. The structure of the resulting gene regulatory network sheds a new light on the molecular mechanisms controlling this process. In particular, we find no evidence for hub genes and a much more distributed network structure than expected. Interestingly, we find that a majority of genes are under the direct control of the differentiation-inducing stimulus.

Conclusions: Together, these results demonstrate WASABI versatility and ability to tackle some general gene regulatory networks inference issues. It is our hope that WASABI will prove useful in helping biologists to fully exploit the power of time-stamped single-cell data.

Keywords: Erythropoiesis; Gene network inference; High parallel computing; Multiscale modelling; Proteomic; Single-cell transcriptomics; T2EC.

PubMed Disclaimer

Conflict of interest statement

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The results of this work will be exploited within the frame of a new company VIDIUM for which AB will serve as CSO.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Figures

Fig. 1
Fig. 1
WASABI at a glance. a Schematic view of a GRN: the stimulus is represented by a yellow flash, genes by blue circles and interactions by green (activation) or red (inhibition) arrows. The stimulus-induced information propagation is represented by blue arcs corresponding to wave times. Genes and interactions that are not affected by information at a given wave time are shaded. At wave time 5, gene C returns information on gene A and B by feedback interaction creating a backflow wave. b Promoter wave times: Promoter wave times correspondto inflections point of gene promoter activity defined as the kon/(kon+koff) ratio. c Protein wave times: Protein wave times correspondto inflections point of mean protein level. d Inference process. Blue arrows represent interactions selected for calibration. Based on promoter waves classification genes are iteratively added to sub-GRN previously inferred to get new expanded GRN. Calibration is performed by comparison of marginal RNA distributions between in silico and in vitro data. Inference is initialized with calibration of early genes interaction with stimulus, which gives initial sub-GRN. Latter genes are added one by one to a subset of potential regulators for which a protein wave time is close enough to the added gene promoter wave time. Each resulting sub-GRN is selected regarding its fit distance to in vitro data. If fit distance is too important sub-GRN can be eliminated (red cross). An important benefit of this process is the possibility to parallelize the sub-GRN calibrations over several cores, which results in a linear computational time regarding the number of genes. Note that only a fraction of all tested sub-GRN is shown
Fig. 2
Fig. 2
Cascade in silico GRN a Cascade GRN types are generated to study wave dynamics. Genes correspond to in vitro ones with their estimated parameters. S1 corresponds to stimulus. Genes are identified by our list gene ID. b Based on 10 in silico GRN we compare promoter wave time of early genes (blue) with other genes (red). Displayed are promoter waves with a wave time lower than 15h for graph clarity. c For each interactions of 10 in silico GRNs we compute the difference between estimated regulated promoter wave time minus its regulator protein wave time. Distribution of promoter/protein wave time difference is given for all interactions of all in silico GRNs
Fig. 3
Fig. 3
In silico cascade GRN inference a The cascade GRN. Genes parameters were taken from in vitro estimations to mimic realistic behavior. Experimental data were generated to obtain time courses of transciptomic data, at single-cell and population scale, and also proteomic data at population scale. b WASABI was run to infer in silico cascade GRN and generated 88 candidates. A dot represents a network candidate with its associated fit distance and inference quality (percentage of true interactions). True GRN is inferred (red dot, 100% quality). Acceptable maximum fit distance (green dashed line) corresponds to variability of true GRN fit distance. Its computation is detailed in figure C. Three GRN candidates (including the true one) have a fit distance below threshold. c Variability of true GRN fit distance (green dashed line in figures B and C) is estimated as the threshold where 95% of true GRN fit distance is below. Fit distance distribution is represented for true GRN (green) and candidates (blue) for cascade in silico GRN benchmark. True GRNs are calibrated by WASABI directed inference while candidates are inferred from non-directed inference. Fit distance represents similitude between candidates generated data and reference experimental data
Fig. 4
Fig. 4
In silico GRN with feedbacks a Addition of one positive feedback onto the cascade GRN. b WASABI was run to infer in silico cascade GRN with a positive feedback and generated 59 candidates, 31 of which having an acceptable fit distance. See legend to Fig. 3-b for details. c Addition of one negative feedback onto the cascade GRN. d WASABI was run to infer in silico cascade GRN with a negative feedback and generated 476 candidates, all of which having an acceptable fit distance. See legend to Fig. 3-b for details
Fig. 5
Fig. 5
Promoter and protein wave time distributions. Distribution of in vitro promoter (a) and protein (b) wave times for all genes estimated from RNA and proteomic data at population scale. Counts represent number of genes. Note: a gene can have several waves for its promoter or protein
Fig. 6
Fig. 6
Inference from in vitro data aIn vitro interaction consensus matrix. Each square in the matrix represents either the absence of any interaction, in black, or the presence of an interaction, the frequency of which is color-coded, between the considered regulator ID (row) and regulated gene ID (column). First row correspond to stimulus interactions. b Best candidate. Green: positive interaction; red: negative interaction; plain lines: interactions found in 100% of the candidates; dashed lines: interaction found only in some of the candidates; orange: genes the product of which participates to the sterol synthesis pathway; purple: 5 last added genes during iterative inference
Fig. 7
Fig. 7
GRN mechanistic and stochastic model. Our GRN model is composed of coupled piecewise deterministic Markov processes. In this example 2 genes are coupled. A gene i is represented by its promoter state (dashed box) which can switch randomly from ON to OFF, and OFF to ON, respectively at kon,i and koff,i mean rate. When promoter state is ON, mRNA molecules are continuously produced at a s0,i rate. mRNA molecules are constantly degraded at a d0,i rate. Proteins are constantly translated from mRNA at a s1,i rate and degraded at a d1,i rate. The interaction between a regulator gene j and a target gene i is defined by the dependence of kon,i and koff,i with respect to the protein level Pj of gene j and the interaction parameter θi,j. Likewise, a stimulus (yellow flash) can regulate a gene i by modulating its kon,i and koff,i switching rates with interaction parameter θi,0
Fig. 8
Fig. 8
Parameters estimation workflow. Schematic view of WASABI workflow with 3 main steps: (1) individual gene parameters estimation (red zone), (2) waves sorting (green zone) and (3) network iterative interaction inference (blue zone). Wave concept is introduced in “Results” section. Model parameters (square boxes) are estimated from experimental data (flasks) with a specific method (grey hexagones). All methods are detailed in “Methods” section. Estimated data relative to waves are represented by round boxes. Input arrows represent data required by methods to compute parameters. There are 3 types of experimental data, (i) bulk transcription inhibition kinetic (green flask), (ii) single-cell transcriptomic (blue flask) and (iii) proteomic data (orange flask). Model parameters are specific to each gene, except for θ, which is specific to a pair of regulator/regulated genes. Notations are consistent with Eq. (1), γauto represents exponent term of auto-positive feedback interaction. Only d0(t), d1(t) and s1(t) are time dependent. One gene can have several wave times

References

    1. MacNeil LT, Walhout AJ. Gene regulatory networks and the role of robustness and stochasticity in the control of gene expression. Genome Res. 2011;21(5):645–57. - PMC - PubMed
    1. Greene JA, Loscalzo J. Putting the patient back together - social medicine, network medicine, and the limits of reductionism. N Engl J Med. 2017;377(25):2493–9. - PubMed
    1. Sugimura Ryohichi, Jha Deepak Kumar, Han Areum, Soria-Valles Clara, da Rocha Edroaldo Lummertz, Lu Yi-Fen, Goettel Jeremy A., Serrao Erik, Rowe R. Grant, Malleshaiah Mohan, Wong Irene, Sousa Patricia, Zhu Ted N., Ditadi Andrea, Keller Gordon, Engelman Alan N., Snapper Scott B., Doulatov Sergei, Daley George Q. Haematopoietic stem and progenitor cells from human pluripotent stem cells. Nature. 2017;545(7655):432–438. - PMC - PubMed
    1. Lis Raphael, Karrasch Charles C., Poulos Michael G., Kunar Balvir, Redmond David, Duran Jose G. Barcia, Badwe Chaitanya R., Schachterle William, Ginsberg Michael, Xiang Jenny, Tabrizi Arash Rafii, Shido Koji, Rosenwaks Zev, Elemento Olivier, Speck Nancy A., Butler Jason M., Scandura Joseph M., Rafii Shahin. Conversion of adult endothelium to immunocompetent haematopoietic stem cells. Nature. 2017;545(7655):439–445. - PMC - PubMed
    1. Ieda M, Fu J-D, Delgado-Olguin P, Vedantham V, Hayashi Y, Bruneau BG, Srivastava D. Direct reprogramming of fibroblasts into functional cardiomyocytes by defined factors. Cell. 2010;142(3):375–86. - PMC - PubMed

LinkOut - more resources