Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Feb 16;46(3):e14.
doi: 10.1093/nar/gkx1113.

A sparse differential clustering algorithm for tracing cell type changes via single-cell RNA-sequencing data

Affiliations

A sparse differential clustering algorithm for tracing cell type changes via single-cell RNA-sequencing data

Martin Barron et al. Nucleic Acids Res. .

Abstract

Cell types in cell populations change as the condition changes: some cell types die out, new cell types may emerge and surviving cell types evolve to adapt to the new condition. Using single-cell RNA-sequencing data that measure the gene expression of cells before and after the condition change, we propose an algorithm, SparseDC, which identifies cell types, traces their changes across conditions and identifies genes which are marker genes for these changes. By solving a unified optimization problem, SparseDC completes all three tasks simultaneously. SparseDC is highly computationally efficient and demonstrates its accuracy on both simulated and real data.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
A toy example of cell type changes and different categories of marker genes. (A and B) The composition of the cell population changes as the condition changes. Different colors denote different cell types. The blue and red cells are preserved in condition B but have changed as indicated by the stars. On the other hand, the green cells have died out and a new purple cell type has emerged. The proportion of cell types present in the population has also changed. (C and D) different categories of marker genes for the red cell type. A marker gene for a cell type is a gene whose expression is consistent in cells of this type and also different from the background. In the plot, the background expression is shown in dark red, and expression higher than the background is shown in yellow. The brighter the yellow is, the higher the expression is. Gene 1 is a housekeeping marker gene. Gene 2 is a condition-dependent marker gene, since although it is a marker gene in both conditions, its expression is lower (less bright yellow) in condition B. Gene 3 is not a marker gene in condition B anymore as its expression in condition B is the same as the background; it is thus a condition-A-specific marker gene. Gene 4 is a condition-B-specific marker gene. Gene 5 is a null gene.
Figure 2.
Figure 2.
The average classification rates from the simulation tests. The cluster scenario refers to the cell composition in each condition as displayed in Table 1. Different levels of marker gene sparsity are represented by the different shades. The error bars represent the standard error of the results from the 100 simulations. (A) All marker genes are housekeeping marker genes. (B) Half of the marker genes are condition-specific marker genes.
Figure 3.
Figure 3.
The average sensitivity and specificity from the simulation tests. The cluster scenario refers to the cell composition in each condition as displayed in Table 1. Different levels of marker gene sparsity are represented by the different shades. The error bars represent the standard error of the results from the 100 simulations. (A and B) Sensitivity and specificity for simulations with all housekeeping marker genes. (C and D) Sensitivity and Specificity for simulations with half condition-specific marker genes.
Figure 4.
Figure 4.
Heatmaps of the gene expression of condition-specific and condition-dependent marker genes for the neural cluster (GW), detected by SparseDC in the Pollen data. (A) Condition A and (B) condition B correspond to how the data was split into two conditions as described in the text. For the plot labels, 2338 and 2339 represent the cell types CRL-2338 and CRL-2339, respectively. The color bars at the top of the plots represents the cell type of each of the cells. The top five genes are condition-specific marker genes for the neural cluster in condition A (‘AS’ was added to the gene names to denote this type of marker gene). The next nine genes are condition-dependent marker genes for the neural cluster which are upregulated in condition A (‘AD’ was added to the gene names to denote this type of marker gene). The last gene is a condition-dependent marker gene for the neural cluster in condition B (‘BD’ was added to the gene name to denote this type of marker gene).
Figure 5.
Figure 5.
The heatmaps display the expression measurements for the top 10 upregulated marker genes detected by SparseDC in the Pollen data for each of the cell types in each condition. For a cell type k, the top 10 upregulated marker genes are the genes with the ten largest positive formula image or formula image values. (A) Condition A and (B) condition B correspond to how the data was split into two conditions as described in the text. The color bars above the heatmaps indicate the cell type of each of the cells, while the color bars along the left side of the heatmaps indicate which of the cell types each of the genes was detected as a marker for. For the plot labels, 2338 and 2339 represent the cell types CRL-2338 and CRL-2339, respectively. In the heatmap for condition A, there are clear blocks of similar expression for the marker genes of all the present cell types. Similar blocks can be seen in the heatmap for condition B for the cell types which are present. For example, there are clear blocks of high expression for the Kera marker genes in both heatmaps as this type is present in both conditions, while there is only a block for the BJ marker genes in the heatmap for condition B since the BJ cells are only present in condition B.
Figure 6.
Figure 6.
Heatmaps of the expression of the top 10 upregulated housekeeping marker genes detected by SparseDC for the Llorens–Bobadilla data. The top 10 housekeeping marker genes are identified as the 10 genes which have the largest positive center value, formula image, in both conditions, ischemic injured (A) and naive (B). The color bars at the top represent the clusters of the cells, while the color bars at the side represent the marker genes for each cluster. The numbers on the plot correspond to the clusters found in the data, where cluster 1 contains the likely qNSC cells, cluster 2 contains the likely oligodendrocyte cells, cluster 3 contains the likely aNSC cells and cluster 4 contains the likely neuroblast cells. For all of the cell clusters there are clear blocks relating to the marker genes for the cluster.

References

    1. Arendt D., Musser J.M., Baker C.V.H., Bergman A., Cepko C., Erwin D.H., Pavlicev M., Schlosser G., Widder S., Laubichler M.D. et al. . The origin and evolution of cell types. Nat. Rev. Genet. 2016; 17:744–757. - PubMed
    1. Saadatpour A., Lai S., Guo G., Yuan G.-C.. Single-cell analysis in cancer genomics. Trends Genet. 2015; 31:576–586. - PMC - PubMed
    1. Gawad C., Koh W., Quake S.R.. Single-cell genome sequencing: current state of the science. Nat. Rev. Genet. 2016; 17:175–188. - PubMed
    1. Kuipers J., Jahn K., Beerenwinkel N.. Advances in understanding tumour evolution through single-cell sequencing. Biochim. Biophys. Acta. 2017; 1867:127–138. - PMC - PubMed
    1. Patel A.P., Tirosh I., Trombetta J.J., Shalek A.K., Gillespie S.M., Wakimoto H., Cahill D.P., Nahed B.V., Curry W.T., Martuza R.L. et al. . Single-cell RNA-seq highlights intratumoral heterogeneity in primary glioblastoma. Science. 2014; 344:1396–1401. - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources