Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Apr 6:8:14825.
doi: 10.1038/ncomms14825.

Sensitive detection of rare disease-associated cell subsets via representation learning

Affiliations

Sensitive detection of rare disease-associated cell subsets via representation learning

Eirini Arvaniti et al. Nat Commun. .

Abstract

Rare cell populations play a pivotal role in the initiation and progression of diseases such as cancer. However, the identification of such subpopulations remains a difficult task. This work describes CellCnn, a representation learning approach to detect rare cell subsets associated with disease using high-dimensional single-cell measurements. Using CellCnn, we identify paracrine signalling-, AIDS onset- and rare CMV infection-associated cell subsets in peripheral blood, and extremely rare leukaemic blast populations in minimal residual disease-like situations with frequencies as low as 0.01%.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing financial interests.

Figures

Figure 1
Figure 1. CellCnn overview and demonstration.
(a) CellCnn convolutional neural network architecture. CellCnn takes as input groups of single-cell measurements (multi-cell inputs), where each group is annotated with a phenotype. Node activities in the convolutional layer are defined as weighted sums over single-cell molecular profiles. Nodes in the pooling layer evaluate the presence (max pooling) or frequency (mean pooling) of specific cell subsets. The output of the network estimates the sample-associated phenotype (e.g., disease condition, expected survival). Network training optimizes weights to match training data set phenotype. Trained filter weights correspond to molecular profiles of relevant cell subsets and allow for assignment of the cell subset membership of individual cells (cell-filter response). (b) Illustration of cell-filter response computations for individual cells. For instance, marker profiles of cell 1 and 3 exhibit perfect/no match with weights of filter 1/2 and therefore result in a high/low (red/blue) cell-filter response. (c) CellCnn classification of GM-CSF (un-) stimulated peripheral blood mononuclear cell populations monitored with mass cytometry. t-SNE projection based on all cell type-defining surface markers (not used by CellCnn), coloured by cell-filter response. (d) Density-based clustering of high cell filter-response regions using all cell-type-defining surface markers reveals two distinct cell types, namely monocytes (CD33+) and dendritic cells (CD123+). (e) Histograms of the signalling markers (used by CellCnn) showing greatest differential abundance in terms of the Kolmogorov–Smirnov two-sample test between the whole-cell population and the selected cell subsets. (f) Response of individual cells (grouped by manually gated cell types) is shown for both conditions. Significantly higher cell-filter response for monocytes and dendritic cells in the stimulated sample.
Figure 2
Figure 2. CellCnn analysis of immune cell populations associated with AIDS onset in HIV patients.
(a) Kaplan–Meier plots for high- and low-risk patient cohort according to CellCnn survival prediction (P=3.03e-03, log-rank test, computation time: 1 h, single laptop core) and state of the art: Citrus (P=2.97e-02, 3 days, 24 Intel Xeon cores). (b) Reconstruction of cell subsets predicting AIDS-free survival in HIV-infected patients. Cells selected by CellCnn filters are highlighted (in red) on the t-SNE map computed from all test samples. A distinct area is occupied by each selected subpopulation. Filters 1 and 2 are positively associated with survival, whereas filter 3 is negatively associated. Average frequency of the selected cell subsets in 10 test patients with lowest/highest survival times is reported. (c) Histograms of measured marker abundances for the whole-cell population and the selected cell subsets.
Figure 3
Figure 3. Detection of rare CMV seropositivity-associated cell populations.
(a) Visualization of the cell subsets selected by CellCnn and Citrus across 100 Monte Carlo cross-validation (CV) repetitions. Centroids of selected populations are highlighted on a t-SNE map computed from all samples using 20,000 cells per individual (see Methods for details). The cell population most frequently (81 out of 100 times) selected by CellCnn is positively associated with CMV prior infection, whereas the second most frequent cell subset is negatively associated with CMV seropositivity. (b) t-SNE map colour-coded according to abundance of selected markers. The top-left subplot depicts the cell subset most frequently selected by CellCnn, corresponding to cluster 1 in a, (see Methods for details). This cell subset corresponds to a memory-like (NKG2C+, CD57+) NK (CD56+, CD3−) and NK T (CD56+, CD3+) cell population. (c) Histograms of selected marker abundances for the whole-cell population and the cell subset most frequently selected by CellCnn. (d) Boxplot of area under the ROC curve (ROC AUC) on the test samples for 100 Monte Carlo CV repetitions. The median test ROC AUC for CellCnn is equal to 1.
Figure 4
Figure 4. Identification of in silico spike-in rare leukaemic blast populations for two AML subgroups.
(a) The spiked-in subset (frequency=0.1%) of blast cells from a cytogenetically normal (CN) patient is highlighted in red on the left plot (ground truth) and compared with cells identified by CellCnn, which are marked in red on the right plot. (b) Similar setting as (a) for a spiked-in subset of blast cells from a core-binding-factor translocation (CBF) patient. (c,d) Similar settings as (a,b) for spiked-in subsets of blast cells with even lower frequency (0.01%). (e) Histograms of selected cell surface markers for the disease-associated cell populations identified by CellCnn. The markers presented highlight the differences of blast cell immunophenotypic profiles between CBF and CN patients. CBF, core binding factor translocation; CN, cytogenetically normal.
Figure 5
Figure 5. Benchmark results on the identification of in silico spike-in rare leukaemic blast populations for two AML subclasses.
(a) Whole-sample representation learned by CellCnn for various AML blast cell population frequencies. The three classes are well separated (linearly separable) in the CellCnn-based representation space (i.e., when projected to the two most relevant AML-specific filters). (b) Comparison to the baseline methods for whole-sample representation (Citrus, moment-based: multi-cell input summary profiles, denoising autoencoder26) for AML blast population at 0.1%. The three classes are not well separated in the representation space learned by these approaches. (c) Comparison to baseline methods for single-cell classification (LR, logistic regression; outlier, distance-based outlier detection; RF, random forests; SVM, support vector machines Citrus9) for AML blast population at 0.1%. For all methods except Citrus, average precision–recall curves for recovery of blast cells on the test samples are reported. Shadowed areas indicate 95% confidence intervals. Citrus does not provide a precision–recall series; therefore, a single precision–recall point is computed for each test sample. (d) Single-cell classification performance of CellCnn for various low AML blast cell population frequencies. Average precision–recall curves on the test samples are reported with shadowed areas indicating 95% confidence intervals. CBF, core binding factor translocation; CN, cytogenetically normal.

References

    1. Hanahan D. & Weinberg R. A. Hallmarks of cancer: the next generation. Cell 144, 646–674 (2011). - PubMed
    1. Grün D. et al.. Single-cell messenger RNA sequencing reveals rare intestinal cell types. Nature 525, 251–255 (2015). - PubMed
    1. Kalisky T., Blainey P. & Quake S. R. Genomic analysis at the single-cell level. Annu. Rev. Genet. 45, 431–445 (2011). - PMC - PubMed
    1. Bendall S. C. et al.. Single-cell mass cytometry of differential immune and drug responses across a human hematopoietic continuum. Science 332, 687–696 (2011). - PMC - PubMed
    1. Battich N., Stoeger T. & Pelkmans L. Image-based transcriptomics in thousands of single human cells at single-molecule resolution. Nat. Methods 10, 1127–1133 (2013). - PubMed

MeSH terms