Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Sep 22;6(1):971.
doi: 10.1038/s42003-023-05325-9.

COSMOS: a platform for real-time morphology-based, label-free cell sorting using deep learning

Affiliations

COSMOS: a platform for real-time morphology-based, label-free cell sorting using deep learning

Mahyar Salek et al. Commun Biol. .

Erratum in

Abstract

Cells are the singular building blocks of life, and a comprehensive understanding of morphology, among other properties, is crucial to the assessment of underlying heterogeneity. We developed Computational Sorting and Mapping of Single Cells (COSMOS), a platform based on Artificial Intelligence (AI) and microfluidics to characterize and sort single cells based on real-time deep learning interpretation of high-resolution brightfield images. Supervised deep learning models were applied to characterize and sort cell lines and dissociated primary tissue based on high-dimensional embedding vectors of morphology without the need for biomarker labels and stains/dyes. We demonstrate COSMOS capabilities with multiple human cell lines and tissue samples. These early results suggest that our neural networks embedding space can capture and recapitulate deep visual characteristics and can be used to efficiently purify unlabeled viable cells with desired morphological traits. Our approach resolves a technical gap in the ability to perform real-time deep learning assessment and sorting of cells based on high-resolution brightfield images.

PubMed Disclaimer

Conflict of interest statement

All authors declare the following competing interests: current or former employment at Deepcell, Inc.

Figures

Fig. 1
Fig. 1. COSMOS platform workflow and schematic.
a System diagram. The hardware includes Fluidics (Fluid Control and Valve Control Modules), Optics and Imaging Module, and Hardware Control Unit for auto-focusing and -alignment (Tracking and Automation Modules). The software includes Classifier, Controller, and Data Storage modules. b Data annotation workflow. AI-assisted image annotation software is used to cluster individual cell images. A human expert uses the labeling tool to adjust and batch-label the cell clusters. In the example shown, one acute myeloid leukemia (AML) cell was misclustered with a group of PBMCs, and an image showing debris was misclustered with a group of NSCLC cells. These errors are corrected by the “Expert Clean-up” step. The annotated cells are then integrated into the Deep Cell Atlas (DCA). c Model training and validation process. Annotated cell images are split into independent training and validation image sets. AI image analysis depicting the InceptionV3 model architecture is shown. The fully connected layer of the architecture is used for cell clustering and UMAP visualization. The softmax layer generates per-cell classification and prediction probabilities. d Real-time AI-based sorting workflow. Images of single cells are converted to a vector, and a user-selected classifier assesses each cell. The embedding vector generated by the model is used to visualize the sample profile (e.g., UMAP depiction is drawn based on the embeddings). Real-time inferences guide a sorting decision based on user preferences. See “Methods” for comprehensive details.
Fig. 2
Fig. 2. Quantitative morphological assessment of single cells and performance of Circulating Cell Classifier in identifying cells.
a Number of cell images used in the training set for each of the cell classes. b Representative images of NSCLC, HCC, PBMC, and fnRBC cells captured by COSMOS. c UMAP projection of cell embeddings sampled from cell classes analyzed by the model. Each point represents a single cell. d Heatmap representation of the embedding space. Each column is a single cell for HCC, NSCLC, PBMC, and fNRBC classes. Each row is an embedding dimension. A.U. arbitrary units. e Confusion matrix representing Circulating Cell Classifier prediction accuracy (x-axis) versus ground truth (y-axis) on the validation set. f Estimated precision-recall curves at different proportions for positive selection of NSCLCs, HCCs, and fNRBCs in PBMC background. Precision: estimated purity and recall to the yield of target cells based on an in silico mixture of datasets of known cell types. Three curves are shown for different target cell proportions: 1:1000, 1:10,000, and 1:100,000. g Purity of pre-sorted and sorted cells estimated by comparing allele fractions with an SNP panel to the known genotypes at indicated spike-in ratios. h Frameshift mutation assay in the TP53 gene (c.572_572delC). i Indicated number of A549 cells were spiked into whole blood, and samples were processed on COSMOS for malignant cell identification and sorting. Sorted cell purity and fold enrichment quantified by SNP analysis.
Fig. 3
Fig. 3. Performance of COSMOS in identifying and isolating target cells.
a Confusion matrix representing Lung Cancer Classifier prediction accuracy (x-axis) vs the ground truth (y-axis) on the validation dataset. b Workflow schematic of COSMOS sorting and downstream molecular analysis of DTCs applied to (ci). c Allele frequency of KRAS mutation (Chr12:25245351C>A) and TP53 mutations (chr17:7673783C>A and chr17:7675208C>T) detected in four pre-sorted and sorted DTC aliquots following processing on two COSMOS instruments across two experimental runs. df WGA and CNV analysis of pre-sorted and sorted samples. Each data point represents 1 Mb bin. Red and blue colors indicate different chromosomes. GM12878 genomic DNA was used as a baseline control for copy number normalization. g scRNA-Seq gene expression t-SNE plot with all 924 feature-selected genes for pre-sorted (dark blue) and post-sorted (light blue) shown as an overlay. h Pseudo-color EpCAM gene expression level. i Gene expression correlation plot of mean (log10(molecules per cell per gene)) for pre-sorted and sorted cells from the EpCAM+/PTPRC(CD45) cluster. Each data point is a gene. The gene expression correlation coefficient (R2) was 0.98.

Similar articles

Cited by

References

    1. Ezran, C. et al. Tabula Microcebus: a transcriptomic cell atlas of mouse lemur, an emerging primate model organism. Preprint at bioRxiv10.1101/2021.12.12.469460 (2021).
    1. Quake SR. A decade of molecular cell atlases. Trends Genet. 2022;38:805–810. doi: 10.1016/j.tig.2022.01.004. - DOI - PubMed
    1. Regev, A. et al. The Human Cell Atlas. Elife6, e27041 (2017). - PMC - PubMed
    1. Rozenblatt-Rosen O, et al. Building a high-quality Human Cell Atlas. Nat. Biotechnol. 2021;39:149–153. doi: 10.1038/s41587-020-00812-4. - DOI - PubMed
    1. Stubbington MJT, Rozenblatt-Rosen O, Regev A, Teichmann SA. Single-cell transcriptomics to explore the immune system in health and disease. Science. 2017;358:58–63. doi: 10.1126/science.aan6828. - DOI - PMC - PubMed

Publication types

Substances