Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2024 Mar 22:2023.11.01.565120.
doi: 10.1101/2023.11.01.565120.

Quality Control for Single Cell Analysis of High-plex Tissue Profiles using CyLinter

Affiliations

Quality Control for Single Cell Analysis of High-plex Tissue Profiles using CyLinter

Gregory J Baker et al. bioRxiv. .

Update in

Abstract

Tumors are complex assemblies of cellular and acellular structures patterned on spatial scales from microns to centimeters. Study of these assemblies has advanced dramatically with the introduction of high-plex spatial profiling. Image-based profiling methods reveal the intensities and spatial distributions of 20-100 proteins at subcellular resolution in 103-107 cells per specimen. Despite extensive work on methods for extracting single-cell data from these images, all tissue images contain artefacts such as folds, debris, antibody aggregates, optical aberrations and image processing errors that arise from imperfections in specimen preparation, data acquisition, image assembly, and feature extraction. We show that these artefacts dramatically impact single-cell data analysis, obscuring meaningful biological interpretation. We describe an interactive quality control software tool, CyLinter, that identifies and removes data associated with imaging artefacts. CyLinter greatly improves single-cell analysis, especially for archival specimens sectioned many years prior to data collection, such as those from clinical trials.

Keywords: CyLinter; cancer; multiplex image analysis; quality control (QC); single-cell data; spatial omics; spatial profiling.

PubMed Disclaimer

Conflict of interest statement

P.K.S. is a cofounder and member of the Board of Directors of Glencoe Software, a member of the Board of Directors for Applied Biomath and a member of the Scientific Advisory Board for RareCyte, NanoString and Montai Health; he holds equity in Glencoe and RareCyte. P.K.S. is a consultant for Merck. PKS declares that none of these relationships have influenced the content of this manuscript. E. A. M. reports compensated service on Scientific Advisory Boards for Astra Zeneca, BioNTech and Merck; uncompensated service on Steering Committees for Bristol Myers Squibb and Roche/Genentech; speakers’ honoraria and travel support from Merck Sharp & Dohme; and institutional research support from Roche/Genentech (via an SU2C grant) and Gilead. She also reports research funding from Susan Komen for the Cure for which she serves as a Scientific Advisor, and uncompensated participation as a member of the American Society of Clinical Oncology Board of Directors. J. L. G. serves or has previously served on advisory boards and/or as a scientific advisory board member for Array BioPharma/Pfizer, AstraZeneca, BD Biosciences, Carisma, Codagenix, Duke Street Bio, GlaxoSmithKline, Kowa, Kymera, OncoOne and Verseau Therapeutics, and has research grants from Array BioPharma/Pfizer, Duke Street Bio, Eli Lilly, GlaxoSmithKline and Merck. The other authors declare no competing interests.

Figures

Extended Data Fig. 1 |
Extended Data Fig. 1 |. Recurring artefacts in whole slide immunofluorescence images of tissue and their effects on tissue-derived single-cell data.
a, Left: Field of view from Dataset 1 (TOPACIO, specimen 110) showing a tissue fold (ROI, dashed white outline) as viewed in channels PDL1 (colormap) and Hoechst (gray). Right: UMAP embedding of 19-channel single-cell data from the image at left colored by PDL1 intensity (top left), cells contained within the ROI (bottom left), and HDBSCAN cluster (center right). Cells in cluster 5 (labeled) are those affected by the tissue fold and form of a discrete cluster in UMAP space. b, Clustered heatmap showing channel z-scores for HDBSCAN clusters from panel (a) demonstrating that cluster 5 cells (those affected by the tissue fold) are artificially bright for all channels presumably due to a combination of tissue overlap and insufficient antibody washing. c, Left: Field of view from Dataset 2 (CRC) showing two illumination aberrations (ROIs, dashed white outlines) as viewed in channels CD163 (colormap) and Hoechst (gray). Right: UMAP embedding of 21-channel single-cell data from the image at left colored by CD163 intensity (left) and whether the cells fall within one of the two ROIs (right). d, UMAP embedding of the 52-channel single-cell data shown in Fig. 1j (Dataset 7, large intestine, CODEX) after cells affected by the five illumination aberrations have been removed. Three groups of cells bright for CD3 remain (groups 1-3). Image galleries at right show 4 examples of each cell type in representative channels: group 1 = CD8+ T cells, group 2 = CD4+ T cells, group 3 = undefined cells immunoreactive to all 52 channels (not due to microscopy artefacts). e, Clustered heatmap showing channel z-scores for HDBSCAN clusters from panel (d) demonstrating that group 3 cells are bright for all 52 channels despite not being affected by microscopy artefacts.
Extended Data Fig. 2 |
Extended Data Fig. 2 |. Evaluation of pre-QC cell clustering results from Dataset 6 (large intestine, CODEX) and Dataset 2 (CRC, CyCIF).
a, UMAP embedding of Dataset 6 showing ~3.8x104 cells colored by HDBSCAN cluster (numbered 0-31). Black scatter points represent unclustered cells (10.5% of cells). b, Silhouette scores for CODEX clusters shown in panel (a). Cluster 29 exhibits cells with negative silhouette scores indicative of under-clustering. c, Clustered heatmap of clusters from Dataset 6 showing mean signal intensities of clustering cells normalized across clusters (row-wise). d, Correlated, non-specific signals in a region of Dataset 6 as seen in channels MUC6 (red), CD154 (green), and NKG2D (blue). Yellow dots highlight cluster 0 cells which have formed due to this artefact; Hoechst (gray) shown for reference. e, Tissue fold in a region of Dataset 6 as seen in channels GATA3 (red), CD68 (green), and CD66 (blue). Yellow dots highlight cluster 9 cells which have formed due to this artefact; Hoechst (gray) shown for reference. f, Image blur in a region of Dataset 6 as seen in channels HLADR (red), CD206 (green), and CD38 (blue). Yellow dots highlight cluster 13 cells which have formed due to this artefact; Hoechst (DNA, gray) shown for reference. g, Location of CRC cluster 3 cells shown in panel (g) revealing no regional bias in the distribution of cells. h, Top three most highly expressed markers (1: green, 2: red, 3: blue) for the 25 members of CRC cluster 3 (memory helper T) cells represented by the rugplots of Fig. 2n. White asterisks highlight cells shown in enlarged format in Fig. 2m. A single white pixel at the center of each image patch highlights the reference cell. Nuclear segmentation outlines (translucent white outlines) and Hoechst (gray) shown for reference. i, Regression plots showing correlation (two-sided, Pearson R, p < 0.05) among CD4, CD45, and CD45RO marker expression by 1.9x103 CRC cluster 3 cells. j, CRC cluster 3 cells shown in panel (h) after signal intensity cutoffs have been adjusted per image to improve the homogeneity of their appearance. White asterisks highlight cells shown in enlarged format in (Fig. 2o). k, CRC cluster 3 cells shown in panels (h) and (j) with channels shown separately for clarity: Hoechst (gray), CD4 (green), CD45 (red), CD45RO (blue). Top panels show cells before contrast adjustment (panel h), bottom panels show cells after contrast adjustment (panel j). l, Top three most highly expressed markers (1: green, 2: red, 3: blue) for 25 CRC cluster 7 (Treg) cells. A single white pixel at the center of each image patch highlights the reference cell. Nuclear segmentation outlines (translucent white outlines); Hoechst (gray) shown for reference. m, Regression plots showing strong correlation (two-sided, Pearson R, p < 0.05) among CD4, CD45, and CD45RO marker expression of 1.9x103 CRC cluster 7 cells. n, Regression plots showing weak correlation (two-sided, Pearson R, p < 0.05) between FOXP3 and CD4, CD45, and CD45RO marker expression of 1.9x103 CRC cluster 7 cells.
Extended Data Fig. 3 |
Extended Data Fig. 3 |. Evaluation of pre-QC cell clustering results from Dataset 1 (TOPACIO).
a, Spatial distribution of unclustered (ambiguous) cells (green dots) from the pre-QC TOPACIO embedding shown in Fig. 3a as represented by specimen 55, which exhibits no discernable spatial pattern of sampling bias; Hoechst (gray) shown for reference. b, Stacked bar charts showing the relative contribution of each patient specimen to each cluster. c, TOPACIO specimen 55 at low (left) and high (right) magnification showing Hoechst signals for the first three imaging cycles: cycles 1 (green), 2 (red), and 3 (blue) have been superimposed to demonstrate a cross-cycle image alignment problem at the bottom of this specimen. Small white box at the bottom-right of the low magnification image shows the location of the higher magnification image. White dots in the high magnification image highlight TOPACIO cluster 15 cells which have formed due to this image alignment artefact.
Extended Data Fig. 4 |
Extended Data Fig. 4 |. Identifying and removing noisy single-cell data points with CyLinter.
CyLinter workflow (see project website for implementation details: https://labsyspharm.github.io/cylinter/modules/). a, Aggregate data (automated): raw spatial feature tables for all specimens in a batch are merged into a single Pandas (Python) dataframe. b, ROI selection (interactive or automated): multi-channel images are viewed to identify and gate on regions of tissue affected by microscopy artefacts (negative selection mode) or areas of tissue devoid of artefacts (positive selection mode. b1-b4, Demonstration of automated artefact detection in CyLinter: b1, CyLinter’s selectROIs module showing artefacts in the CDKN1A (green) channel of Dataset 3 (EMIT TMA, core 18, mesothelioma). b2, Transformed version of the original CDKN1A image such that artefacts appear as large, bright regions relative to channel intensity variations associated with true signal of immunoreactive cells which are suppressed. b3, Local intensity maxima are identified in the transformed image and a flood fill algorithm is used to create a pixel-level binary mask indicating regions of tissue affected by artefacts. In this example, the method identifies three artefacts in the image: one fluorescence aberration at the top of the core, and two tissue folds at the bottom of the core. b4, CyLinter’s selectROIs module showing the binary artefact mask (translucent gray shapes) and their corresponding local maxima (red dots) defining each of the three artefacts. c, DNA intensity filter (interactive): histogram sliders are used to define lower and upper bounds on nuclear counterstain single intensity. Cells between cutoffs are visualized as scatter points at their spatial coordinates in the corresponding tissue for gate confirmation or refinement. d, Segmentation area filter (interactive): histogram sliders are used to define lower and upper bounds on cell segmentation area (pixel counts). Cells between cutoffs are visualized as scatter points at their spatial coordinates in the corresponding tissue for gate confirmation or refinement. e, Cross-cycle correlation filter (interactive): applicable to multi-cycle experiments. Histogram sliders are used to define lower and upper bounds on the log-transformed ratio of DNA signals between the first and last imaging cycles. Cells between cutoffs are visualized as scatter points at their spatial coordinates in their corresponding tissues for gate confirmation or refinement. f, Log transformation (automated): single-cell data are log-transformed. g, Channel outliers filter (interactive): the distribution of cells according to antibody signal intensity is viewed for all specimens as a facet grid of scatter plots (or hexbin plots) against cell area (y-axes). Lower and upper percentile cutoffs are applied to remove outliers. Outliers are visualized as scatter points at their spatial coordinates in their corresponding tissues for gate confirmation or refinement. h, MetaQC (interactive): unsupervised clustering methods (UMAP or TSNE followed by HDBSCAN clustering) are used to correct for gating bias in prior data filtration modules by thresholding on the percent of each cluster composed of clean (maintained) or noisy (redacted) cells. i, Principal component analysis (PCA, automated): PCA is performed and Horn’s parallel analysis is used to determine the number of PCs associated with non-random variation in the dataset. j, Image contrast adjustment (interactive): channel contrast settings are optimized for visualization on reference tissues which are applied to all specimens in the cohort. k, Unsupervised clustering (interactive): UMAP (or TSNE) and HDBSCAN are used to identify unique cell states in a given cohort of tissues. Manual gating can also be performed to identify cell populations. l, Compute clustered heatmap (automated): clustered heatmap is generated showing channel z-scores for identified clusters (or gated populations). m, Compute frequency statistics (automated): pairwise t statistics on the frequency of each identified cluster or gated cell population between groups of tissues specified in CyLinter’s configuration file (cylinter_config.yml, e.g., treated vs. untreated, response vs. no response, etc.) are computed. n, Evaluate cluster membership (automated): cluster quality is checked by visualizing galleries of example cells drawn at random from each cluster identified in the clustering module (panel k).
Extended Data Fig. 5 |
Extended Data Fig. 5 |. Over-segmentation in Dataset 2 (CRC, CyCIF) and Cleaning of Dataset 6 (large intestine, CODEX) with CyLinter.
a, Gating of cells in the CRC image (Dataset 2) image according to nuclear segmentation area shows that this image contains several over-segmented nuclei (i.e., nuclei split into multiple segmentation objects). b, Fraction of cells in Dataset 6 (large intestine, CODEX, specimen 1) redacted by each QC filter in the CyLinter pipeline. Dropped ROIs = cells dropped by selectROIs module), Dim/over-saturated nuclei = cells dropped by dnaIntensity module), Segmentation errors = cells dropped by areaFilter module, Unstable cells = cells dropped by cycleCorrelation module, Channel outliers = cells dropped by pruneOutliers module, Artefact-free = cells remaining after QC. c, UMAP embedding of post-QC CODEX clusters showing ~3.1x104 cells colored by HDBSCAN cluster. Black scatter points represent unclustered cells (10.1% of cells). d, Silhouette scores for post-QC CODEX clusters shown in panel (c). e, Post-QC CODEX clustered heatmap showing mean signal intensities of clustering cells normalized across clusters (row-wise). Five (5) meta-clusters defined by the clustered heatmap dendrogram at the left are highlighted. f-h, Top three most highly expressed markers (1: green, 2: red, 3: blue) for clusters 0 (lymphatic endothelial cells, f), 15 (mast cells, g), and 17 (M2 macrophages, h). A single white pixel at the center of each image highlights the reference cell. Nuclear segmentation outlines (translucent outlines) and Hoechst (gray) shown for reference.
Extended data Fig. 6 |
Extended data Fig. 6 |. Location of cells redacted by CyLinter in Dataset 2 (CRC) and Dataset 1 (TOPACIO) and Post-QC TOPACIO UMAP embedding colored by patient ID.
a, Cells redacted by CyLinter from the Dataset 2 (CRC) demonstrating no discernable bias in the removal of cells from the image with the exception of those areas highlighted by the white arrows which were affected by focal artefacts and removed using CyLinter’s selectROIs module. b, Cells redacted by CyLinter from three arbitrary specimens from Dataset 1 (TOPACIO) demonstrating no discernable bias in the removal of cells from the images with the exception of those areas highlighted by the white arrows which were affected by focal artefacts and removed using CyLinter’s selectROIs module. c, UMAP embedding of post-QC TOPACIO data shown in (Fig. 6b) colored by specimen ID demonstrating patient-specific clustering in tumor cell populations, but not immune and stromal populations (refer to Fig. 6b,d,e–i and Online Supplementary Fig. 8 for cluster phenotype identities).
Fig. 1 |
Fig. 1 |. Recurring artefacts in whole slide immunofluorescence images of tissue and their effects on tissue-derived single-cell data.
a, Top: Field of view from Dataset 6 (large intestine, CODEX, specimen 1) with a tissue fold (ROI, dashed white outline) as viewed in channels SOX9 (colormap) and Hoechst (gray). Bottom: UMAP embedding of 57-channel single-cell data from the image above colored by SOX9 intensity (top left), cells contained fall within the ROI (top right), and HDBSCAN cluster (bottom center). Cluster 1 cells (labeled) are those affected by the tissue fold and form a discrete cluster in UMAP space. b, Clustered heatmap showing channel z-scores for HDBSCAN clusters from panel (a) demonstrating that cluster 1 cells (those affected by the tissue fold) are artificially bright for all channels presumably due to a combination of tissue overlap and insufficient antibody washing. c, Left: Antibody aggregate in the CD63 channel (colormap) of Dataset 3 (EMIT TMA, core 68, normal tonsil). Hoechst (gray), Ki67 (red), CD32 (green), αSMA (orange), and panCK (blue) are shown for context. Right: UMAP embedding of 20-channel single-cell data from the image shown at left colored by CD63 intensity (top) and whether cells fall within the ROI (bottom). d, Autofluorescent fiber in Dataset 1 (TOPACIO, specimen 128) as seen in channels 53BP1 (green) and Hoechst (gray). e, Necrosis in a region of tissue from Dataset 1 (TOPACIO, specimen 39) as seen in the CD3 channel (green). f, Coverslip air bubbles (green asterisks) in Dataset 1 (TOPACIO, specimen 48) as seen in the Hoechst channel (gray). g, Out-of-focus region of tissue in Dataset 1 (TOPACIO, specimen 55) as seen in the Hoechst channel (gray). h, Uneven tile illumination in Dataset 4 (HNSCC, CODEX, section 1) as seen in an empty Cy5 channel (green); Hoechst (gray) shown for tissue context. The standard deviation among per-tile median signal intensities was 19.9 arbitrary fluorescence units (AFU), 27.6% of the range (134-206 AFU). i, Bottom: Illumination aberration in the pCREB channel (colormap) of Dataset 3 (EMIT TMA, core 95, dedifferentiated liposarcoma) with nuclear segmentation outlines (translucent contours) shown for reference. Top: Line plot demonstrating that artificial pCREB signals of single cells affected by the aberration reach an order of magnitude above background. j, Top: Field of view from Dataset 7 (large intestine, CODEX, specimen 2) showing five illumination aberrations (ROIs, dashed white outlines) as viewed in channels CD3 (colormap) and Hoechst (gray). Bottom: UMAP embedding of 52-channel single-cell data from the image above colored by CD3 intensity (left) and whether the cells fall within one of the five different ROIs (right). k, Tile stitching errors in Dataset 5 (mIHC, normal human tonsil) as seen in the PD1 (green) channel. l, Cross-cycle image registration error in Dataset 3 (EMIT TMA, core 64, leiomyosarcoma) as demonstrated by the superimposition of cycle 1 Hoechst signal (gray) and cycle 9 pCREB signal (green). m, Cross-cycle tissue movement in Dataset 1 (TOPACIO, specimen 80) as demonstrated by the superimposition of Hoechst signals from three different imaging cycles: 1 (red), 2 (green), 3 (blue). n, Progressive tissue loss in Dataset 3 (EMIT TMA, core 1, normal kidney cortex) across 10 imaging cycles as observed in the Hoechst channel (gray) where overt tissue loss can be seen by cycle 8. o, UMAP embedding of cells from Dataset 3 (EMIT TMA, core 1, normal kidney cortex) colored by whether cells remained stable (gray data points) or became detached (blue data points) over the course of imaging demonstrating that unstable cells form discrete clusters in UMAP space.
Fig. 2 |
Fig. 2 |. Evaluation of pre-QC cell clustering results from Dataset 2 (CRC).
a, UMAP embedding of CRC data showing ~9.8x105 cells colored by HDBSCAN cluster (numbered 0-21). Black scatter points represent unclustered (ambiguous) cells. b, Silhouette scores for CRC clusters shown in panel (a). Clusters 6, 15, 17, and 21 exhibit cells with negative silhouette scores indicative of under-clustering. c, Clustered heatmap for CRC data showing mean signals of clustering cells normalized across clusters (row-wise). Four (4) meta-clusters defined by the heatmap dendrogram are highlighted. d, Cluster 6 cells (yellow dots) in a region of the CRC image demonstrating the co-clustering of distinct populations of B cells (CD20, blue), memory T cells (CD45RO, red), and stromal cells (desmin, green); Hoechst (gray) shown for reference. e, Anti-desmin antibody aggregates (red) in a region of the CRC image. Yellow dots highlight cluster 9 cells which have formed due to this artefact; Hoechst (gray) shown for reference. f, Anti-vimentin antibody aggregates (red) in a region of the CRC image. Yellow dots highlight cluster 11 cells that have formed due to this artefact; Hoechst (gray) shown for reference. g, Autofluorescent fiber in a region of the CRC image as seen in channels PD1 (magenta) and PD-L1 (green). Yellow dots highlight cluster 9 cells which have formed due to this artefact; Hoechst (gray) shown for reference. h, Cell loss in a region of the CRC image as indicated by anucleate segmentation outlines (green). Yellow dots highlight cluster 14 cells which have formed due to this artefact; Hoechst (gray) shown for reference. i, Contaminating (non-colonic) tissue at the top of the CRC image immunoreactive to anti-vimentin antibodies (cyan) comprising CRC cluster 10 (yellow dots); Hoechst (gray) shown for reference. j, Region of tissue at the bottom-left of the CRC image unexposed to antibodies during imaging cycle 3 which led to the formation of CRC clusters 2, 8, and 19; channels CD3 (colormap) and Hoechst (gray) shown for reference. k-m, Top three most highly expressed markers (1: green, 2: red, 3: blue) for clusters 0 (keratinocytes, k), 1 (crypt-forming mucosal epithelial cells, l), and 3 (memory helper T cells, m). A single white pixel at the center of each image patch highlights the reference cell. Nuclear segmentation outlines (translucent white outlines) and Hoechst (gray) shown for reference. n, Density histograms showing the distribution of cluster 3 cells according to channels CD4 (green outline), CD45 (red outline), and CD45RO (blue outline) superimposed on distributions of total cells according to the same channels (gray outlines). Rugplots at the bottom of each histogram show where 25 members of cluster 3 shown in panel (m) and Extended Data Fig. 2h reside in each distribution. o, Cluster 3 cells shown in panel (m) and Extended Data Fig. 2h after signal intensity cutoffs have been adjusted per image to improve the homogeneity of their appearance.
Fig. 3 |
Fig. 3 |. Evaluation of pre-QC cell clustering results from Dataset 1 (TOPACIO).
a, UMAP embedding of ~3x106 cells from the TOPACIO dataset colored by HDBSCAN cluster. Black scatter points represent unclustered (ambiguous) cells. b, Silhouette scores for TOPACIO clusters shown in panel (a). c, Line plot showing cell counts per TOPACIO cluster. Clusters with cell counts below the horizonal dashed red line are those with fewer than 3K cells which are highlighted in the TOPACIO embedding (inset) by red scatter points at their relative positions. d, Clustered heatmap of clusters from TOPACIO data showing mean signal intensities of clustering cells normalized across clusters (row-wise). Six (6) meta-clusters defined by the heatmap dendrogram at the left are highlighted. e, TOPACIO embedding colored by meta-clusters shown in panel (d). f-h, Top three most highly expressed markers (1: green, 2: red, 3: blue) for TOPACIO clusters 4 (f), 174 (g), and 197 (h) which were all severely affected by dataset noise. A single white pixel at the center of each image highlights the reference cell. Nuclear segmentation outlines (translucent white outlines) and Hoechst (gray) are shown for reference. i, Bar chart showing the average percentage of image tiles affected by a visual artefact across the 25 TOPACIO specimens; marker identities at left denote the affected channel. j, Stacked bar chart showing the cumulative percentage of channel-specific image tiles per TOPACIO specimen affected by miscellaneous visual artefacts. Because these artefacts can impact multiple channels at the same time, cumulative percentages can be higher than 100%. Inset shows an example illumination aberration in the CD163 channel of TOPACIO specimen 73. Categories for tissue biopsy method and patient treatment response are indicated below each bar. Artefacts were found to be less abundant in tissue resections as compared to fine-needle and punch-needle biopsies as determined by one-way ANOVA followed by pairwise Tukey’s HSD (F = 10.27, p = 0.0007; fine-needle vs. resection mean difference = 204.83, p-adj = 0.0145; resection vs. punch-needle mean difference = −283.0, p-adj: 0.0029).
Fig. 4 |
Fig. 4 |. Identifying and removing noisy single-cell data points with CyLinter.
a, Schematic representation of the CyLinter workflow. Modules are colored by type: data filtration (red), metaQC (green), cell clustering/visualization (blue). b-e, CyLinter input: b, Multiplex image file, c, Cell ID mask, d, Cell segmentation outlines, e, Single-cell feature table. f, Negative ROI selection in CyLinter. Dataset 2 (CRC) is shown with ROIs (yellow outlines) applied to various artefacts in the CD163 channel which will be dropped from subsequent analysis. g, Positive ROI selection in CyLinter. Dataset 1 (TOPACIO, specimen 152) is shown with ROIs (yellow outlines) applied to regions devoid of artefacts in the FOXP3 channel which will be retained for further analysis. h, Filtering dim nuclei. Top: Density histogram of mean Hoechst signal for cells in Dataset 3 (EMIT TMA, core 12, non-neoplastic lung). Bottom: Hoechst (colormap) in a region of the same core demonstrating dim nuclei (green dots) falling to the left of the red gate in the corresponding histogram. Nuclear segmentation outlines are shown for reference (translucent outlines). i, Filtering bright nuclei. Top: Density histogram of mean Hoechst signal for Dataset 1(TOPACIO, specimen 110). Bottom: Hoechst (colormap) in a region of the same specimen demonstrating bright nuclei (green dots) caused by tissue bunching that fall to the right of the gate in the corresponding histogram. Nuclear segmentation outlines are shown for reference (translucent outlines). j, Filtering over-segmented cells. Top: Density histogram of mean Hoechst signal for Dataset 2 (CRC). Bottom: Hoechst (colormap) in a region of the specimen demonstrating over-segmented cells (green dots) falling to the left of the red gate in the corresponding histogram. Nuclear segmentation outlines are shown for reference (translucent outlines). k, Filtering under-segmented cells. Top: Density histogram of mean Hoechst signal for Dataset 3 (EMIT TMA, core 84, non-neoplastic colon). Bottom: Hoechst (colormap) in a region of the specimen demonstrating under-segmented cells (green dots) falling to the right of the red gate in the corresponding histogram. Nuclear segmentation outlines are shown for reference (translucent outlines). l, Filtering unstable cells. Top: Density histogram of the log(ratio) between Hoechst signals from the first and last CyCIF imaging cycles for Dataset 3 (EMIT TMA, core 74, renal cell carcinoma). Bottom: Hoechst (last cycle, colormap) superimposed on Hoechst (first cycle, gray) in a region of the specimen demonstrating the selection of stable cells (green dots) falling to the left of the red gate in the corresponding histogram. Nuclear segmentation outlines are shown for reference (translucent outlines). Note: unlike panels (h-k) which highlight cells that will be excluded from an analysis, cells highlighted in this panel will be retained for further analysis. m, Filtering channel outliers. Top: Scatter plot showing CD3 (x-axis) vs. nuclear segmentation area (y-axis) of cells from Dataset 1 (TOPACIO, specimen 152) before (left) and after (right) outlier removal and signal rescaling (0-1). Bottom: CD3 (colormap) and Hoechst (gray) signals in a region of the same specimen with CD3+ cells (green dots) falling to the right of the red gate in the scatter plot in which outliers have been removed. Nuclear segmentation outlines are shown for reference (translucent outlines).
Fig. 5 |
Fig. 5 |. Cleaning Dataset 2 (CRC) with CyLinter.
a, Fraction of cells in Dataset 2 redacted by each QC filter in the CyLinter pipeline. Dropped ROIs = cells dropped by selectROIs module), Dim/over-saturated nuclei = cells dropped by dnaIntensity module, Segmentation errors = cells dropped by areaFilter module, Unstable cells = cells dropped by cycleCorrelation module, Channel outliers = cells dropped by pruneOutliers module, Artefact-free = cells remaining after QC. b, UMAP embedding of post-QC CRC data showing ~9.3x105 cells colored by HDBSCAN cluster. Black scatter points represent unclustered (ambiguous) cells. c, Silhouette scores for post-QC CRC clusters shown in panel (b), d, Clustered heatmap of post-QC CRC clusters showing mean signal intensities of clustered cells normalized across clusters (row-wise). Six (6) meta-clusters defined by the clustered heatmap dendrogram at the left are highlighted. e-g, Top three most highly expressed markers (1: green, 2: red, 3: blue) for post-QC CRC clusters 42 (B cells, e), 52 (CD8+ T cells near blood vessels—formed as a side effect of spatial crosstalk, f), and 74 (vascular endothelial cells, g). A single white pixel at the center of each image highlights the reference cell. Nuclear segmentation outlines (translucent outlines) and Hoechst (gray) shown for reference. h, Overlap between pre-QC CRC clusters (rows) and post-QC CRC clusters (columns) showing pre- and post-QC clusters have a one-to-many correspondence. i, Pre-QC CRC embedding showing the position of cluster 6 (red, inset) and its composition according to post-QC CRC clusters. j, Locations of cells in pre-QC cluster 6 colored by their post-QC cluster label showing that pre-QC cluster 6 is composed of cells occupying distinct regions throughout the muscularis propria of the CRC image—a non-cancerous, smooth muscle-rich region of tissue. k, Mean signal intensities for post-QC CRC cluster 13 cells. Black arrows point to bright channels consistent with both epithelial cells and CD8+ T cells. l, Post-QC CRC cluster 13 cells (white dots) shown in context of the CRC image demonstrating spatial crosstalk between keratin+ tumor cells (blue) and CD8+ T cells (orange). Nuclear segmentation outlines (translucent outlines) shown for reference.
Fig. 6 |
Fig. 6 |. Cleaning Dataset 1 (TOPACIO) with CyLinter.
a, Fraction of cells in the TOPACIO dataset redacted by each QC filter in the CyLinter pipeline. Dropped ROIs = cells dropped by selectROIs module, Dim/over-saturated nuclei = cells dropped by dnaIntensity module, Segmentation errors = cells dropped by areaFilter module, Unstable cells = cells dropped by cycleCorrelation module, Channel outliers = cells dropped by pruneOutliers module, Artefact-free = cells remaining after QC. b, UMAP embedding of TOPACIO data showing ~3.0x106 cells colored by HDBSCAN cluster. Black scatter points represent unclustered (ambiguous) cells. c, Silhouette scores for post-QC TOPACIO clusters shown in panel (b). Cluster 42 is an under-clustered population. d, Clustered heatmap for clusters from post-QC TOPACIO data showing mean signal intensities of clustered cells normalized across clusters (row-wise). Four (4) meta-clusters defined by the clustered heatmap dendrogram at the left are highlighted. e-i, Top three most highly expressed markers (1: green, 2: red, 3: blue) for clusters 0 (Tregs: phenotype 1, e), 17 (Tregs: phenotype 2, f), 21 (breast cancer cells with DNA damage, g), 35 (CD4+ T cells near breast cancer cells, h), and 42 (breast cancer cells without DNA damage, i). A single white pixel at the center of each image highlights the reference cell. Nuclear segmentation outlines (translucent outlines) and Hoechst (gray) shown for reference. j, Left: Pre-QC TOPACIO UMAP embedding (also shown in Fig. 3a) with the location of five clusters selected and highlighted at random. Right: Location of the cells from the four pre-QC clusters shown in the embedding at left in the context of the post-QC TOPACIO UMAP embedding (also shown in panel b) demonstrating that these pre-QC clusters actually consisted of different cell types. Image patches of cells representing post-QC clusters are shown at far right.

References

    1. Angelo M. et al. Multiplexed ion beam imaging of human breast tumors. Nat. Med. 20, 436–442 (2014). - PMC - PubMed
    1. Gerdes M. J. et al. Highly multiplexed single-cell analysis of formalin-fixed, paraffin-embedded cancer tissue. Proc. Natl. Acad. Sci. U.S.A. 110, 11982–11987 (2013). - PMC - PubMed
    1. Giesen C. et al. Highly multiplexed imaging of tumor tissues with subcellular resolution by mass cytometry. Nat. Methods 11, 417–422 (2014). - PubMed
    1. Goltsev Y. et al. Deep Profiling of Mouse Splenic Architecture with CODEX Multiplexed Imaging. Cell 174, 968–981.e15 (2018). - PMC - PubMed
    1. Gut G., Herrmann M. D. & Pelkmans L. Multiplexed protein maps link subcellular organization to cellular states. Science 361, (2018). - PubMed

Publication types

LinkOut - more resources