Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Mar 25;10(3):265-274.e11.
doi: 10.1016/j.cels.2020.02.003. Epub 2020 Mar 4.

Inferring Causal Gene Regulatory Networks from Coupled Single-Cell Expression Dynamics Using Scribe

Affiliations

Inferring Causal Gene Regulatory Networks from Coupled Single-Cell Expression Dynamics Using Scribe

Xiaojie Qiu et al. Cell Syst. .

Abstract

Here, we present Scribe (https://github.com/aristoteleo/Scribe-py), a toolkit for detecting and visualizing causal regulatory interactions between genes and explore the potential for single-cell experiments to power network reconstruction. Scribe employs restricted directed information to determine causality by estimating the strength of information transferred from a potential regulator to its downstream target. We apply Scribe and other leading approaches for causal network reconstruction to several types of single-cell measurements and show that there is a dramatic drop in performance for "pseudotime"-ordered single-cell data compared with true time-series data. We demonstrate that performing causal inference requires temporal coupling between measurements. We show that methods such as "RNA velocity" restore some degree of coupling through an analysis of chromaffin cell fate commitment. These analyses highlight a shortcoming in experimental and computational methods for analyzing gene regulation at single-cell resolution and suggest ways of overcoming it.

Keywords: RNA velocity; Scribe; causal network inference; coupled dynamics; gene regulatory network inference; pseudotime; real time; single-cell RNA-seq; single-cell trajectories; slam-seq.

PubMed Disclaimer

Conflict of interest statement

Declaration of Interests The authors declare no competing interests.

Figures

Figure 1:
Figure 1:. Scribe, a toolkit for inferring and visualizing causal regulations.
(A). Inferring regulatory networks from gene expression data is challenging because the number of regulatory interactions that must be evaluated grows much more quickly than the number of genes in the analysis. (B) Ordering single-cell data in “pseudotime” or tracking how fluctuations in a regulatory are followed by changes in a putative target in the same individual cells could boost power to detect causal regulatory interactions. (C) Scribe detects causality from four types of single cell measurement (“pseudotime”, “live-image”, “RNA-velocity” and “real-time”) datasets with a the metric, restricted directed information (RDI). Scribe relies on RDI (Rahimzamani and Kannan, 2016) to quantify the information transferred from the potential regulator to the target under some time delay while conditioned over its past on this pseudo-time series data. A gene often has strong memory to its intermediate previous state (Yt−1) but RDI will only give highly positive causality score from the putative regulator to target in cases where there is still a strong relationship between the regulator’s history and the target’s present conditioned on target’s history (Case 1 vs. Case 2).
Fig 2:
Fig 2:. Live imaging dataset of C. elegans’ early embryogenesis captures transcription expression dynamics hierarchy.
(A) Scheme used by Murray et al for measuring transcription factors protein expression dynamics in real-time for every cell during early C. elegans embryogenesis. (B) Single cell lineage-resolved fluorescence data captures temporal dynamics of E lineage master regulators during C. elegans embryogenesis. The expression for each gene is scaled to be between 0 and 1 and then smoothed using LOESS regression, same in C. (C) Expression dynamics for 265 report TFs along the lineage leading to the Ealap cell. (D) Scribe reconstructs the causal regulatory network for the four master regulators (end-1/3, elt-2/7). Note that the outlined box corresponds to the previously known regulations. (E) A scheme for the multi-scale network for panel B. (F) An integrative multiscale model for the E lineage specification. Zoom in to see the network architecture in details. (G) Lineage (AB, P, MS, E, D, C) specific causal networks for the curated master regulators constructed with Scribe shown as a hiveplot.
Figure 3:
Figure 3:. Scribe recovers a core regulatory network responsible for myelopoiesis.
(A) A core network describes key regulators during the specification of monocytes and granulocytes (Olsson et al., 2016). (B) Examples of gene-target pair kinetic curves over pseudotime along the monocyte lineage. (C) Scribe infers the expected core regulatory network interactions for myelopoiesis. (D) Visualization of combinatorial gene regulation from Irf8 and Gfi1 to Zeb2 or Per3. (E) The normalized rank of lineage-specific genes’ total outgoing RDI sum. (F) Lineage-specific network of significant regulators during erythropoiesis. Edges supported by the SPRING database are colored as red lines. For panels E (F), BEAM analysis was used to identify significant branching genes associated with the four (one) lineage bifurcation events shown in the haematopoietic trajectory from ref. (Qiu et al., 2017a) based on the paul dataset (Paul et al., 2015). The top 1,000 differentially expressed genes associated with each bifurcation were chosen to build a causal network for each relevant lineage. A set of TFs relevant to specific lineages described previously is used for panel E or F. Neu: Neutrophil; Ery: Erythroid, Mk: Megakaryocyte; Mono: Monocyte; DC: Dendritic Cell; BE: Basophil / Eosinophil. (G, H) Receiver Operating Curves or ROC (G, top) and Area Under Curve or AUC (H, bottom) of the inferred causal network based on Scribe, GC and CCM, from left to right, on the Dendritic Cells (DC) dataset, the granulocyte or monocyte branch of the Olsson dataset, the erythroid branch of the Paul dataset. Four different variants of causal inference implemented in Scribe are tested: RDI ( L = 0): the default RDI method without conditioning on any other gene; RDI (L = 1): the RDI method based on conditioning on the incoming gene with highest causality score, except the current target; uRDI: the method based on the uniformization technique applied on the actual distribution in RDI; uRDI ( L = 1): the uRDI method but also with the conditioning on the incoming gene with the highest causality score, except the current target. (I) The network of the gene-set as included in the panel (panel F) retrieved from the STRING database.
Fig 4:
Fig 4:. Causal inference in Scribe with RNA-velocity.
(A) RNA-velocity vector projected onto the first two latent dimensions. A small subset of arrows is used to visualize the velocity field of the cells. S: Sympathoblasts; C: Chromaffin. SCP: Schwann Cell Progenitor. The color of each cell corresponds to the cluster id from Fig 5B of ref. (Furlan et al., 2017). (B) A core causal network for chromaffin cell commitment inferred based on RNA-velocity. Gene set is collected from ref. (Furlan et al., 2017). CLR (context likelihood of relatedness) regularization is used to remove spurious causal edges in the network (see STAR Methods). (C) Two potential coherent FFL (feed-forward loop) motifs of chromaffin differentiation are discovered from the core network. Edge width corresponds to causal regulation strength. (D) Visualization of the six causal regulations pairs in the feedforward loops of Eya1-Phox2a-Erbb3 and Gata3-Phox2a-Notch1. (See STAR Methods for details). (E) Visualizing combinatorial regulation logic for the two feedforward loops in Panel C with Scribe. For both Panels D and E, a grid with 625 cells (25 on each dimension) is used. Similarly, expected values are scaled by the maximum to obtain a range from 0 to 1. (F) Scribe’s ability to detect causal regulatory interactions is limited by the single-cell measurement technology used. Technologies that provide measurements that are coupled across time and between genes provide more power for inference than conventional single-cell RNA-seq experiments.

Similar articles

Cited by

References

    1. Aibar S, González-Blas CB, Moerman T, Huynh-Thu VA, Imrichova H, Hulselmans G, Rambow F, Marine J-C, Geurts P, Aerts J, et al. (2017). SCENIC: single-cell regulatory network inference and clustering. Nat. Methods 14, 1083–1086. - PMC - PubMed
    1. Alon U (2007). Network motifs: theory and experimental approaches. Nat. Rev. Genet. 8, 450–461. - PubMed
    1. Amit I, Garber M, Chevrier N, Leite AP, Donner Y, Eisenhaure T, Guttman M, Grenier JK, Li W, Zuk O, et al. (2009). Unbiased Reconstruction of a Mammalian Transcriptional Network Mediating Pathogen Responses. Science 326, 257–263. - PMC - PubMed
    1. Babtie AC, Chan TE, and Stumpf MPH (2017). Learning regulatory models for cell development from single cell transcriptomic data. Current Opinion in Systems Biology 5, 72–81.
    1. Bar-Joseph Z, Gitter A, and Simon I (2012). Studying and modelling dynamic biological processes using time-series gene expression data. Nat. Rev. Genet. 13, 552–564. - PubMed

Publication types