Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jan 11;11(1):6.
doi: 10.1038/s41540-025-00490-5.

Deterministic patterns in single-cell transcriptomic data

Affiliations

Deterministic patterns in single-cell transcriptomic data

Zhixing Cao et al. NPJ Syst Biol Appl. .

Abstract

We report the existence of deterministic patterns in statistical plots of single-cell transcriptomic data. We develop a theory showing that the patterns are neither artifacts introduced by the measurement process nor due to underlying biological mechanisms. Rather they naturally emerge from finite sample size effects. The theory precisely predicts the patterns in data from multiplexed error-robust fluorescence in situ hybridization and five different types of single-cell sequencing platforms.

PubMed Disclaimer

Conflict of interest statement

Competing interests: The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Mean-Fano factor plots of six genomic datasets using various types of scRNA-seq protocols and MERFISH.
Each plot shows a deterministic pattern; one pattern is highlighted for VASA-seq data. Each point represents the mean and Fano factor computed from the transcript counts of a gene in a finite number nc of cells (some genes may have the same coordinates). Information on each dataset can be found in Supplementary Note 1.
Fig. 2
Fig. 2. Analysis of the patterns in the mean-Fano factor plot of single-cell transcriptomic data.
a Points in the mean-Fano factor plot generated using the 10x Genomics v3 technology are arranged on distinct curves. We highlight three of them and call them Curves 0, 1 and 2. The inset zooms in on the curves and shows the existence of a periodic distance between the points on the same curve. The vertical distance between two successive points on the same curve is denoted by Δy. b For all six datasets in Fig. 1, Δy (computed from Curves 0–2) is a multiple of 1/nc where nc is the number of cells used to compute the mean and the Fano factor of transcript number fluctuations. Note that in 80% of cases, Δy = 1/nc (solid circles). c Variation in the number of transcripts per cell for three genes A, B, and C on Curves 0, 1, and 2, respectively, in (a). For gene A, cells only have 0 or 1 transcript. For gene B, cells have 0–2 transcripts but only one cell has 2 transcripts. For gene C, cells have 0–2 transcripts but only two cells have 2 transcripts. Note that mi is the number of cells with exactly i transcripts. d Fraction of cells with exactly zero transcripts for genes on Curves 0 (top), 1 (middle) and 2 (bottom) for all six datasets. e Same as (d) but showing fraction of cells with exactly one transcript. Table S1 summarizes the median values in (d, e). f Number of cells with exactly two transcripts for genes on Curves 0 (top), 1 (middle) and 2 (bottom) for all six datasets.
Fig. 3
Fig. 3. Mean-Fano factor plots of the six sequencing datasets in Fig. 1 and the theoretical predictions given by Eqs. (3)–(4).
The curves pass through all points in the dataset thus verifying the accuracy of the theory.
Fig. 4
Fig. 4. Theory predicts the patterns in the mean-Fano factor plot of VASA-seq data.
The dots are calculated from the data (each represents a gene) and the open circles have x-coordinate given by Eq. (2) and y-coordinate given by 〈n〉 = i/nc where nc is the sample size of 101 cells and i is a positive integer. Note that some open circles lack corresponding dots due to missing data points. This absence occurs either because the gene corresponding to the point does not exist or it was not detected because typically only a small fraction of the transcriptome of each cell is captured by sequencing methods.

References

    1. Sanchez, A. & Golding, I. Genetic determinants and cellular constraints in noisy gene expression. Science342, 1188–1193 (2013). - DOI - PMC - PubMed
    1. Elowitz, M. B., Levine, A. J., Siggia, E. D. & Swain, P. S. Stochastic gene expression in a single cell. Science297, 1183–1186 (2002). - DOI - PubMed
    1. Cai, L., Friedman, N. & Xie, X. S. Stochastic protein expression in individual cells at the single molecule level. Nature440, 358–362 (2006). - DOI - PubMed
    1. Taniguchi, Y. et al. Quantifying e. coli proteome and transcriptome with single-molecule sensitivity in single cells. science329, 533–538 (2010). - DOI - PMC - PubMed
    1. Tang, F. et al. mRNA-seq whole-transcriptome analysis of a single cell. Nat. Methods6, 377–382 (2009). - DOI - PubMed

MeSH terms

LinkOut - more resources