Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Jul 8;20(7):e1011620.
doi: 10.1371/journal.pcbi.1011620. eCollection 2024 Jul.

scBoolSeq: Linking scRNA-seq statistics and Boolean dynamics

Affiliations

scBoolSeq: Linking scRNA-seq statistics and Boolean dynamics

Gustavo Magaña-López et al. PLoS Comput Biol. .

Abstract

Boolean networks are largely employed to model the qualitative dynamics of cell fate processes by describing the change of binary activation states of genes and transcription factors with time. Being able to bridge such qualitative states with quantitative measurements of gene expression in cells, as scRNA-seq, is a cornerstone for data-driven model construction and validation. On one hand, scRNA-seq binarisation is a key step for inferring and validating Boolean models. On the other hand, the generation of synthetic scRNA-seq data from baseline Boolean models provides an important asset to benchmark inference methods. However, linking characteristics of scRNA-seq datasets, including dropout events, with Boolean states is a challenging task. We present scBoolSeq, a method for the bidirectional linking of scRNA-seq data and Boolean activation state of genes. Given a reference scRNA-seq dataset, scBoolSeq computes statistical criteria to classify the empirical gene pseudocount distributions as either unimodal, bimodal, or zero-inflated, and fit a probabilistic model of dropouts, with gene-dependent parameters. From these learnt distributions, scBoolSeq can perform both binarisation of scRNA-seq datasets, and generate synthetic scRNA-seq datasets from Boolean traces, as issued from Boolean networks, using biased sampling and dropout simulation. We present a case study demonstrating the application of scBoolSeq's binarisation scheme in data-driven model inference. Furthermore, we compare synthetic scRNA-seq data generated by scBoolSeq with BoolODE's, data for the same Boolean Network model. The comparison shows that our method better reproduces the statistics of real scRNA-seq datasets, such as the mean-variance and mean-dropout relationships while exhibiting clearly defined trajectories in two-dimensional projections of the data.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1
From left to right: (1) A branching trajectory constructed by merging two Boolean simulations, each leading to a different stable state. (2) A binarised expression matrix, having genes as columns and samples as rows. (3) A pseudocount matrix (same format as the Boolean matrix). (4) A STREAM-plot reconstructing the branching trajectory from synthetic data generated from the Boolean simulations [15]. scBoolSeq can be used to go from gene expression matrices (such as (3)) to Boolean matrices (such as (2)) and vice-versa.
Fig 2
Fig 2. Mean—Variance, and Mean—Dropout Rate relationships of HVGs in different datasets.
Each blue dot represents the average of 100 samples for a given gene.
Fig 3
Fig 3. Illustration of the category-dependent binarisation allows accounting for different shapes in empirical pseudocount distributions.
For each category, plots show the empirical distribution for a selected gene in the GSE81682 dataset, and the part of the values that are binarised with parameters z = “?” for zero-inflated case, q = 0.05 and α = 0 for unimodal and θ = 0.95 for bimodal.
Fig 4
Fig 4
Left: Distribution of categories among the studied datasets. Right: Proportion of binarised values across datasets using the default parameters for each distribution type. These proportions are both determined by the categories and the specified thresholds. These were obtained using parameters z = ? for zero-inflated case, q = 0.05 and α = 0 for unimodal, and θ = 0.95 for bimodal. The dropout rate threshold for marking a gene as discarded was set to 0.99.
Fig 5
Fig 5
Left: Simplified view of the set of minimal TF-TF interactions employed in the Boolean models reproducing the differentiation process. For display, all leaf nodes with an in-degree of 1 were recursively removed from the GRN. The full filtered GRN obtained with BoNesis is provided in S6 Fig. Right: Top Gene Ontology Terms related to the 184 genes of the filtered GRN.
Fig 6
Fig 6. Artificial Boolean models and generated synthetic scRNA-seq data.
Left: Influence graphs of the Boolean models. See S1 Notebooks for Boolean functions. Right: Two-dimensional projection of the synthetic scRNA-seq data generated by applying scBoolSeq to Boolean traces simulated from the models on the left; we used PCA and locally linear embedding (LLE) for (b) and (d), and t-SNE for (f). Dots are labelled with a description of the Boolean state they have been generated from: for (b) it is the number of active genes; for (d) and (f) they refer to the dynamical nature of the states in the 3-branches of the differentiation process.
Fig 7
Fig 7. Comparison of the per-gene Mean-Variance and Mean-DropOutRate profiles of reference dataset GSE122466 (red), BoolODE (blue), and scBoolSeq (green).
QC represents the quantile below which BoolODE simulates dropouts with a constant probability DP.
Fig 8
Fig 8. Python code snippet showing basic usage of scBoolSeq for binarisation and synthetic data generation from reference scRNA-seq data and Boolean states.

References

    1. Kerkhofs J, Roberts S, Luyten F, Van Oosterwyck H, Geris L. A Boolean network approach to developmental engineering. In: TERMIS-EU 2011, Date: 2011/06/06-2011/06/10, Location: Granada; 2011.
    1. Kerkhofs J, Roberts SJ, Luyten FP, van Oosterwyck H, Geris L. Relating the chondrocyte gene network to growth plate morphology: From genes to phenotype. PLoS ONE. 2012;7(4):1–11. doi: 10.1371/journal.pone.0034729 - DOI - PMC - PubMed
    1. Lesage R, Kerkhofs J, Geris L. Computational modeling and reverse engineering to reveal dominant regulatory interactions controlling osteochondral differentiation: Potential for regenerative medicine. Frontiers in Bioengineering and Biotechnology. 2018;6(NOV):1–16. doi: 10.3389/fbioe.2018.00165 - DOI - PMC - PubMed
    1. Nestorowa S, Hamey FK, Pijuan Sala B, Diamanti E, Shepherd M, Laurenti E, et al. A single-cell resolution map of mouse hematopoietic stem and progenitor cell differentiation. Blood. 2016;128(8):e20–e31. doi: 10.1182/blood-2016-05-716480 - DOI - PMC - PubMed
    1. Hérault L, Poplineau M, Duprez E, Remy É. A novel Boolean network inference strategy to model early hematopoiesis aging. Computational and Structural Biotechnology Journal. 2023;21:21–33. doi: 10.1016/j.csbj.2022.10.040 - DOI - PMC - PubMed

MeSH terms

LinkOut - more resources