Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Sep 27:6:33892.
doi: 10.1038/srep33892.

Identifying and removing the cell-cycle effect from single-cell RNA-Sequencing data

Affiliations

Identifying and removing the cell-cycle effect from single-cell RNA-Sequencing data

Martin Barron et al. Sci Rep. .

Abstract

Single-cell RNA-Sequencing (scRNA-Seq) is a revolutionary technique for discovering and describing cell types in heterogeneous tissues, yet its measurement of expression often suffers from large systematic bias. A major source of this bias is the cell cycle, which introduces large within-cell-type heterogeneity that can obscure the differences in expression between cell types. The current method for removing the cell-cycle effect is unable to effectively identify this effect and has a high risk of removing other biological components of interest, compromising downstream analysis. We present ccRemover, a new method that reliably identifies the cell-cycle effect and removes it. ccRemover preserves other biological signals of interest in the data and thus can serve as an important pre-processing step for many scRNA-Seq data analyses. The effectiveness of ccRemover is demonstrated using simulation data and three real scRNA-Seq datasets, where it boosts the performance of existing clustering algorithms in distinguishing between cell types.

PubMed Disclaimer

Figures

Figure 1
Figure 1. The simulation data projected onto its first two principal components.
The cell types are represented by the different shapes (circle, triangle) and the cell-cycle time point of each cell is represented by the different colors (red, blue, green). (a) Original Data. Here the data is clustered into six groups corresponding to the combinations of cell type and cell-cycle status. (b) scLVM corrected data (one latent factor removed). The data clusters into three groups corresponding to cell-cycle status. (c) scLVM corrected data (three latent factors removed). No distinct clusters are observed. (d) ccRemover corrected data. The data splits into two groups corresponding to the cell types.
Figure 2
Figure 2. Density plots of selected genes from the T-cell data.
The densities are displayed for the original (red), scLVM corrected (green) and ccRemover corrected (blue) data. The genes were selected from among the top ranked genes on Cyclebase. The original data displays bimodal densities which are common in scRNA-Seq data indicating genes whose expression switches on and off. When the cell-cycle effect is removed using ccRemover or scLVM these bimodal densities disappear.
Figure 3
Figure 3. Dendrogram plots from the hierarchical clustering on the original, ccRemover corrected and scLVM corrected glioblastoma data.
The tumor of each of the cells is represented by their colors, MGH26 (yellow), MGH28 (purple), MGH29 (orange), MGH30 (blue) and MGH31 (red). The clustering assignments are displayed as boxes separating the cells. (a) Original data. There are significant misclassifications within the clusters for the original dataset. In particular the MGH28, MGH30 and MGH31 clusters contain significant numbers of cells from the other tumors. (b) scLVM corrected data. There is an increase in the accuracy of the clustering from the original data, however the MGH26 and MGH30 cells are now mixed between clusters. (c) ccRemover corrected data. There is a significant improvement in the purity clusters here compared to the original and scLVM corrected data. The MGH28 cluster is now much purer and only contains a few cells from the other tumors.
Figure 4
Figure 4. Bar plots of the clustering assignments for the lung adenocarcinoma cells.
(a) Original data. The LC.PT and LC.PT_RE cells split into two clusters each containing a roughly equal proportion of cells from each sample, indicating that 4-means failed to separate the cells from these two samples. (b) scLVM corrected data. Similar to the original data scLVM fails to split the LC.PT and LC.PT_RE cells into separate clusters. (c) ccRemover corrected data. The separation of the LC.PT and LC.PT_RE cells between the clusters has improved significantly with one cluster dominated by LC.PT cells and the other by LC.PT_RE cells.
Figure 5
Figure 5. Heat maps of gene expression in the lung adenocarcinoma dataset.
The cell-cycle genes were chosen from the top ranked cell-cycle genes on Cyclebase and are ordered by their cell-cycle peak time. The cells were ordered based on a hierarchical clustering of the original data and the order is the same for each heat map. (a) Original Data. The blocks of similar expression indicate cells at a similar cell-cycle time point, indicating the presence of cell-cycle effects. (b) scLVM corrected data. The blocks of similar expression have been reduced but are still apparent. The color of the heat map is more balanced as the range of the expression levels is reduced after they have been corrected. (c) ccRemover corrected data. The obvious blocks have been removed from the corrected dataset.

References

    1. Trapnell C. Defining cell types and states with single-cell genomics. Genome Res. 25, 1491–1498 (2015). - PMC - PubMed
    1. Wills Q. F. & Mead A. J. Application of single-cell genomics in cancer: promise and challenges. Hum. Mol. Genet. 24, R74–R84 (2015). - PMC - PubMed
    1. Navin N. E. The first five years of single-cell cancer genomics and beyond. Genome Res. 25, 1499–1507 (2015). - PMC - PubMed
    1. Sandberg R. Entering the era of single-cell transcriptomics in biology and medicine. Nat. Methods 11, 22–24 (2014). - PubMed
    1. Wen L. & Tang F. Reconstructing complex tissues from single-cell analyses. Cell 157, 771–773 (2014). - PubMed