Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Nov 1;41(11):btaf581.
doi: 10.1093/bioinformatics/btaf581.

Scalable inference and identifiability of kinetic parameters for transcriptional bursting from single cell data

Affiliations

Scalable inference and identifiability of kinetic parameters for transcriptional bursting from single cell data

Junhao Gu et al. Bioinformatics. .

Abstract

Motivation: Stochastic gene expression and cell-to-cell heterogeneity have attracted increased interest in recent years, enabled by advances in single-cell measurement technologies. These studies are also increasingly complemented by quantitative biophysical modeling, often using the framework of stochastic biochemical kinetic models. However, inferring parameters for such models (i.e., the kinetic rates of biochemical reactions) remains a technical and computational challenge, particularly doing so in a manner that can leverage high-throughput single-cell sequencing data.

Results: In this work, we develop a chemical master equation model reference library-based computational pipeline to infer kinetic parameters describing noisy mRNA distributions from single-cell RNA sequencing data, using the commonly applied stochastic telegraph model. The approach fits kinetic parameters via steady-state distributions, as measured across a population of cells in snapshot data. Our pipeline also serves as a tool for comprehensive analysis of parameter identifiability, in both a priori (studying model properties in the absence of data) and a posteriori (in the context of a particular dataset) use-cases. The pipeline can perform both of these tasks, i.e. inference and identifiability analysis, in an efficient and scalable manner, and also serves to disentangle contributions to uncertainty in inferred parameters from experimental noise versus structural properties of the model. We found that for the telegraph model, the majority of the parameter space is not practically identifiable from single-cell RNA sequencing data, and low experimental capture rates worsen the identifiability. Our methodological framework could be extended to other data types in the fitting of small biochemical network models.

Availability and implementation: All code relevant to this work is available at https://github.com/Read-Lab-UCI/TelegraphLikelihoodInfer, archival DOI: https://doi.org/10.5281/zenodo.16915450.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Computational pipeline workflow. (A) mRNA distribution computed from the telegraph model, where the promoter switches between inactive and active states. Parameter sets are sampled as a 3D grid library for the parameters ksyn, koff, and kon (see Section 2). (B) Representative experimentally measured target distribution, from which the negative log-likelihood (−LL) of sampled parameter sets θ can be obtained by comparison to computed distributions. Alternatively, the target distribution can be obtained from synthetic data (i.e. model-generated distributions) for a priori identifiability analysis. (C) The coarse-grained, 3D surface, i.e. the −LL value of every simulated mRNA distribution from the model library against the target distribution. (D) A schematic slice from the 3D −LL surface, demonstrating the optimization procedure: optimization is only performed within the search bounds obtained from the initially sampled coarse-grained −LL surface. (E) After optimization, the profile likelihood (PL) function for each parameter is obtained and confidence intervals are computed (see Section 2 and Section SI 1, available as supplementary data at Bioinformatics online).
Figure 2.
Figure 2.
Profile likelihoods (PL) from two representative parameter sets with 200 cells. Panels (A, B, C) Representative parameter set that is identifiable [ksyn:10, koff:0.1, kon:0.05], (D, E, F) Representative parameter set that is practically unidentifiable [ksyn:10, koff:0.01, kon:0.05]. (A, D) Original computed distributions (black) and sample replicates (red); (B, E) 3D −LL surface projected onto 2D burst frequency and burst size (performed as a scatter plot of sorted −LL values in 2D; when there is overlap, smaller values are in front); (C, F) PL of the three parameters: ksyn,koff,kon. For the red dots and stripes, the intensity indicates the frequency of the replicate MLE. The overall PL distribution covers the parameter range where the MLEs take place. The green horizontal lines indicate the 1.92 χ2 value.
Figure 3.
Figure 3.
Global a priori identifiability landscape over the entire studied parameter space at different capture rates for 10K cells. (A, B) Results for 100% capture rate. (A) mRNA distributions for representative parameter sets. (B) Identifiability (measured by APM) at each ground-truth point in the 3D parameter space of each parameter separately (left three columns) and the overall identifiability (last column, maximum APM from all parameters). Distributions in (A) correspond to dots (grayscale color) in the corresponding 3D surfaces in (B). (C, D) Same as top rows, but with 30% experimental capture rate.
Figure 4.
Figure 4.
The effect of cell number and capture rate. (A) Fraction of identifiable parameter sets from the whole library grid versus number of cells at different capture rates; (B–D) The profile likelihoods for a representative parameter set {ksyn:3.5,koff:0.1,kon:0.23} at 100% capture rate, cell number 1K (light blue), 10K (blue), 100K (violet). (E) The mRNA distribution conditioned on active (G*) and inactive (G) promoter states. (F–I) Corresponding results for the same parameter set as to (B–E) for 30% capture rate.
Figure 5.
Figure 5.
Inferred kinetic parameters of the telegraph model based on two datasets: SS3 cast of mouse fibroblast (CAST/EiJ × C57BL/6J) with cell numbers ranging from 6 to 224 (with mean 208), and HUES64 human embryonic stem cell with cell number of 1112. Each dot corresponds to a gene in the dataset. Color indicates the identifiability of the gene, as quantified by the APM metric derived from profile likelihood-based CIs (maximum over all three parameters). Only a small fraction of genes can be inferred with narrow CI, and thus reach the criterion of identifiability (APM<1).

References

    1. Balaban NQ, Merrin J, Chait R et al. Bacterial persistence as a phenotypic switch. Science (1979) 2004;305:1622–5. - PubMed
    1. Bass VL, Wong VC, Elise Bullock M et al. TNF stimulation primarily modulates transcriptional burst size of NF-κB-regulated genes. Mol Syst Biol 2021;17:e10127. - PMC - PubMed
    1. Bustin S, Dhillon HS, Kirvell S et al. Variability of the reverse transcription step: practical implications. Clin Chem 2015;61:202–12. - PubMed
    1. Cao Z, Filatova T, Oyarzún DA et al. A stochastic model of gene expression with polymerase recruitment and pause release. Biophys J 2020;119:1002–14. 10.1016/j.bpj.2020.07.020 - DOI - PMC - PubMed
    1. Chari T, Gorin G, Pachter L. Biophysically interpretable inference of cell types from multimodal sequencing data. Nat Comput Sci 2024;4:677–89. - PubMed