Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Sep 1;8(9):giz106.
doi: 10.1093/gigascience/giz106.

Evaluating stably expressed genes in single cells

Affiliations

Evaluating stably expressed genes in single cells

Yingxin Lin et al. Gigascience. .

Abstract

Background: Single-cell RNA-seq (scRNA-seq) profiling has revealed remarkable variation in transcription, suggesting that expression of many genes at the single-cell level is intrinsically stochastic and noisy. Yet, on the cell population level, a subset of genes traditionally referred to as housekeeping genes (HKGs) are found to be stably expressed in different cell and tissue types. It is therefore critical to question whether stably expressed genes (SEGs) can be identified on the single-cell level, and if so, how can their expression stability be assessed? We have previously proposed a computational framework for ranking expression stability of genes in single cells for scRNA-seq data normalization and integration. In this study, we perform detailed evaluation and characterization of SEGs derived from this framework.

Results: Here, we show that gene expression stability indices derived from the early human and mouse development scRNA-seq datasets and the "Mouse Atlas" dataset are reproducible and conserved across species. We demonstrate that SEGs identified from single cells based on their stability indices are considerably more stable than HKGs defined previously from cell populations across diverse biological systems. Our analyses indicate that SEGs are inherently more stable at the single-cell level and their characteristics reminiscent of HKGs, suggesting their potential role in sustaining essential functions in individual cells.

Conclusions: SEGs identified in this study have immediate utility both for understanding variation and stability of single-cell transcriptomes and for practical applications such as scRNA-seq data normalization. Our framework for calculating gene stability index, "scSEGIndex," is incorporated into the scMerge Bioconductor R package (https://sydneybiox.github.io/scMerge/reference/scSEGIndex.html) and can be used for identifying genes with stable expression in scRNA-seq datasets.

Keywords: gene expression variability; housekeeping genes; scRNA-seq; single cells; stably expressed genes.

PubMed Disclaimer

Figures

Figure 1:
Figure 1:
Schematic illustration of the computational framework for deriving gene stability index on the single-cell level. (A) Stability features extracted directly from the mixture model are colored in blue. Those extracted from additional scRNA-seq data characteristics are in red. The overall stability index is derived from the combination of all stability features. (B) Comparison of Gamma-Gaussian and Gamma-Gamma mixture models on 4 scRNA-seq datasets (i.e. E-MTAB-3929, GSE45719, GSE60361, E-MTAB-4079). y-axis represents the percentage of times a given model is selected by Bayesian information criterion. (C) Evaluation metrics used for evaluating gene expression stability in scRNA-seq datasets.
Figure 2:
Figure 2:
Characterizing gene stability features in single cells for human and mouse. (A) Percentage of zeros per gene across individual cells. (B) Fitted values of mixing proportion (λ), and variance (σ2) and mean (μ) in the Gaussian component (top panels) of the mixture model for each gene. Regularized percentage of zeros, F-statistics computed from predefined cell class (bottom left panel), and stability index derived for each gene (bottom right panel), respectively. (C) Scatter plot of stability index calculated from 2 random subsamplings of cells from human and mouse development datasets. Mean Pearson’s correlation coefficient and standard deviation (formula image) were calculated from pairwise comparison of 10 repeated random subsamplings on each dataset. (D) Scatter plot and correlation of stability indices calculated from each of 3 datasets. P-values denote t-distribution test on Pearson’s correlation coefficient.
Figure 3:
Figure 3:
Comparison of SEGs identified on individual cell level using scRNA-seq with HKGs defined on cell population level using bulk transcriptome data. (A) Scatter plot showing mean expression (x-axis) and variance (y-axis) of each gene (gray circles) across profiled single cells. Open red circles represent SEGs identified from early human development data (hSEG) in this study whereas dark and light blue solid circles represent HKGs defined previously using bulk microarray [15] and RNA-seq data [10]. (B) Same as (A) but for SEGs identified from early mouse development data (mSEG*; light blue points) and the union of these identified from both mouse development and mouse atlas datasets (mSEG; green circles). (C) Venn diagrams showing overlaps of hSEGs and HKGs defined using bulk microarray and RNA-seq. (D) Overlap of all human and mouse gene lists. (E) Expression patterns of example genes that are defined as SEGs using scRNA-seq data but not as HKGs using bulk microarray or RNA-seq data (RPL26 and RPL36) and vice versa (HINT and AGPAT1) across individual cells. (F) Expression patterns for GAPDH and ACTB in human and mouse (Gapdh and Actb) across individual cells.
Figure 4:
Figure 4:
Stability of SEGs and HKGs in human and mouse development scRNA-seq datasets. (A) PCA plots generated from human development data using all expressed genes, HKGs, or hSEGs. Cells are colored by their predefined developmental stages. (B) PCA plots generated from mouse development data using all expressed genes or mSEGs. Cells are colored by their predefined types and developmental stages. (C) Schematic showing the quantification of concordance of k-means clustering with predefined cell classes using a panel of metrics. (D) Bar plots of concordance between k-means clustering and predefined cell class labels, using all expressed genes, HKGs identified from microarray and RNA-seq data, SEGs identified from this study for human (hSEGs) and mouse (mSEGs), and size-matched subset of HKGs to SEGs and vice versa.
Figure 5:
Figure 5:
Characterization of stability index with sequence and gene characteristics. (A) Pearson correlation analyses of human and mouse gene stability features with respect to genomic structural and evolutionary gene features. P-values >0.001 are displayed. (B) Box plots of various gene characteristics for SEGs, HKGs, and all expressed genes. Coloured box captures lower quartile and upper quartile with median displayed as horizontal line in the middle. Dotted lines and bars represent whiskers. (C) Overrepresentation analyses of SEGs that are common between hSEG and mSEG (common SEGs); and HKGs that are common between HKG microarray and HKG RNA-seq (common HKGs), using Gene Ontology (GO) and Reactome databases. (D) Comparison of conservation for common SEGs and common HKGs in human and mouse genomes. P-values were calculated from a 2-sided Wilcoxon rank sum test.

References

    1. Martinez-Jimenez CP, Eling N, Chen HC, et al.. Aging increases cell-to-cell transcriptional variability upon immune stimulation. Science. 2017;355(6332):1433–6. - PMC - PubMed
    1. Marinov GK, Williams BA, McCue K, et al.. From single-cell to cell-pool transcriptomes: stochasticity in gene expression and RNA splicing. Genome Res. 2014;24(3):496–510. - PMC - PubMed
    1. Kolodziejczyk AA, Kim JK, Svensson V, et al.. The technology and biology of single-cell RNA sequencing. Mol Cell. 2015;58(4):610–20. - PubMed
    1. Suter DM, Molina N, Gatfield D, et al.. Mammalian genes are transcribed with widely different bursting kinetics. Science. 2011;332(6028):472–4. - PubMed
    1. Fukaya T, Lim B, Levine M. Enhancer control of transcriptional bursting. Cell. 2016;166(2):358–68. - PMC - PubMed

Publication types