Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Dec;21(12):2248-2259.
doi: 10.1038/s41592-024-02328-0. Epub 2024 Oct 30.

Quality control for single-cell analysis of high-plex tissue profiles using CyLinter

Affiliations

Quality control for single-cell analysis of high-plex tissue profiles using CyLinter

Gregory J Baker et al. Nat Methods. 2024 Dec.

Abstract

Tumors are complex assemblies of cellular and acellular structures patterned on spatial scales from microns to centimeters. Study of these assemblies has advanced dramatically with the introduction of high-plex spatial profiling. Image-based profiling methods reveal the intensities and spatial distributions of 20-100 proteins at subcellular resolution in 103-107 cells per specimen. Despite extensive work on methods for extracting single-cell data from these images, all tissue images contain artifacts such as folds, debris, antibody aggregates, optical aberrations and image processing errors that arise from imperfections in specimen preparation, data acquisition, image assembly and feature extraction. Here we show that these artifacts dramatically impact single-cell data analysis, obscuring meaningful biological interpretation. We describe an interactive quality control software tool, CyLinter, that identifies and removes data associated with imaging artifacts. CyLinter greatly improves single-cell analysis, especially for archival specimens sectioned many years before data collection, such as those from clinical trials.

PubMed Disclaimer

Conflict of interest statement

Competing interests: P.K.S. is a cofounder and member of the Board of Directors of Glencoe Software, a member of the Board of Directors for Applied Biomath and a member of the Scientific Advisory Board for RareCyte, NanoString and Montai Health; he holds equity in Glencoe and RareCyte. P.K.S. is a consultant for Merck and declares that none of these relationships has influenced the content of this manuscript. E.A.M. reports compensated service on Scientific Advisory Boards for AstraZeneca, BioNTech and Merck; uncompensated service on Steering Committees for Bristol Myers Squibb and Roche/Genentech; speakers’ honoraria and travel support from Merck Sharp & Dohme; and institutional research support from Roche/Genentech (via an SU2C grant) and Gilead. She also reports research funding from Susan Komen for the Cure for which she serves as a Scientific Advisor, and uncompensated participation as a member of the American Society of Clinical Oncology Board of Directors. J.L.G. serves or has previously served on advisory boards and/or as a scientific advisory board member for Array BioPharma/Pfizer, AstraZeneca, BD Biosciences, Carisma, Codagenix, Duke Street Bio, GlaxoSmithKline, Kowa, Kymera, OncoOne and Verseau Therapeutics and has research grants from Array BioPharma/Pfizer, Duke Street Bio, Eli Lilly, GlaxoSmithKline and Merck. The other authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Recurring artifacts in whole-slide immunofluorescence images of tissue and their effects on tissue-derived single-cell data.
a, Top: Dataset 6 (large intestine, CODEX, specimen 1) containing a tissue fold (ROI, dashed white outline) as seen in channels SOX9 (colormap) and Hoechst (gray). Bottom: UMAP embedding of 57-channel single-cell data from the image above colored by SOX9 intensity (top left), ROI inclusion (top right) and HDBSCAN cluster (bottom center). Cluster 1 cells are those affected by the fold. b, Channel z scores for HDBSCAN clusters in a demonstrating that cluster 1 cells are artificially bright for all markers. c, Left: antibody aggregate in the CD63 channel (colormap) of Dataset 3 (EMIT TMA, core 68, normal tonsil). Other channels shown for context. Right: UMAP embedding of 20-channel single-cell data from the image shown at left colored by CD63 intensity (top) and ROI inclusion (bottom). d, Autofluorescent fiber in Dataset 1 (TOPACIO, specimen 128) as seen in channels 53BP1 (green) and Hoechst (gray). e, Necrosis in a region of tissue from Dataset 1 (TOPACIO, specimen 39) as seen in the CD3 channel (green). f, Coverslip air bubbles (green asterisks) in Dataset 1 (TOPACIO, specimen 48) as seen in the Hoechst channel (gray). g, Out-of-focus region of tissue in Dataset 1 (TOPACIO, specimen 55) as seen in the Hoechst channel (gray). h, Uneven tile illumination in Dataset 4 (HNSCC, CODEX, section 1) as seen in an empty Cy5 channel (green). AFU, arbitrary fluorescence units; s.d., standard deviation. i, Bottom: illumination aberration in the pCREB channel (colormap) of Dataset 3 (EMIT TMA, core 95, dedifferentiated liposarcoma) with superimposed nuclear segmentation outlines (translucent contours). Top: line plot demonstrating that artifactual per cell pCREB signals reach an order of magnitude above background. j, Top: field of view from Dataset 7 (large intestine, CODEX, specimen 2) showing five illumination aberrations (ROIs, dashed white outlines) in the CD3 channel (colormap). Bottom: UMAP embedding of 52-channel single-cell data from the image above colored by CD3 intensity (left) and ROI inclusion (right). k, Tile stitching errors in Dataset 5 (mIHC, normal human tonsil) as seen in the PD1 (green) channel. l, Cross-cycle image registration error in Dataset 3 (EMIT TMA, core 64, leiomyosarcoma) as demonstrated by the superimposition of cycle 1 Hoechst (gray) and cycle 9 pCREB (green) signals. m, Cross-cycle tissue movement in Dataset 1 (TOPACIO, specimen 80) as demonstrated by the superimposition of Hoechst signals from sequential imaging cycles: 1 (red), 2 (green) and 3 (blue). n, Progressive tissue loss in Dataset 3 (EMIT TMA, core 1, normal kidney cortex) across ten imaging cycles as observed in the Hoechst channel (gray). o, UMAP embedding of cells from Dataset 3 (EMIT TMA, core 1, normal kidney cortex) colored by stability.
Fig. 2
Fig. 2. Evaluation of pre-QC cell clustering results from Dataset 2 (CRC).
a, UMAP embedding of CRC data showing ~9.8 × 105 cells colored by HDBSCAN cluster. Black scatter points represent unclustered (ambiguous) cells. b, Silhouette scores for CRC clusters shown in a. c, Mean signals of clustering cells in the CRC dataset normalized across clusters (row-wise). Four (4) meta-clusters defined by the heatmap dendrogram are highlighted. d, Cluster 6 cells (yellow points) in a region of the CRC image demonstrating the co-clustering of B cells (CD20, blue), memory T cells (CD45RO, red) and stromal cells (desmin, green). e, Anti-desmin antibody aggregates (red) in the CRC image. Yellow points highlight cluster 9 cells formed due to this artifact. f, Anti-vimentin antibody aggregates (red) in the CRC image. Yellow points highlight cluster 11 cells formed due to this artifact. g, Autofluorescent fiber in the CRC image as seen in channels PD1 (magenta) and PDL1 (green). Yellow points highlight cluster 12 cells formed due to this artifact. h, Cell loss in the CRC image as indicated by anucleate segmentation outlines (green). Yellow points highlight cluster 14 cells formed due to this artifact. i, Contaminating (noncolonic) tissue in the CRC image immunoreactive to anti-vimentin antibodies (cyan) comprising CRC cluster 10 (yellow points). j, Region of tissue in the CRC image unexposed to antibodies during imaging cycle 3 leading to the formation of CRC clusters 2, 8 and 19 as observed in the CD3 channel (colormap). km, Top three most highly expressed markers (1, green; 2, red; 3, blue) for clusters 0 (keratinocytes, k), 1 (crypt-forming mucosal epithelial cells, l), and 3 (memory helper T cells, m). A single white pixel at the center of each image patch highlights the reference cell. Nuclear segmentation outlines (translucent white outlines) and Hoechst (gray) shown for reference. n, Density histograms showing the distribution of cluster 3 cells according to channels CD4 (green outline), CD45 (red outline) and CD45RO (blue outline) superimposed on distributions of total cells according to the same channels (gray outlines). Rugplots at the bottom of each histogram show where 25 cluster 3 cells (shown in Extended Data Fig. 2h) reside in each distribution. o, Cluster 3 cells shown in m after per channel and per image adjustment of signal intensity cutoffs to improve their homogeneity of appearance.
Fig. 3
Fig. 3. Evaluation of pre-QC cell clustering results from Dataset 1 (TOPACIO).
a, UMAP embedding of ~3 × 106 cells drawn randomly from the ~1.9 × 107 total segmented nuclei to reduce computing time colored by HDBSCAN cluster. Black scatter points represent unclustered (ambiguous) cells. b, Silhouette scores for TOPACIO clusters shown in a. c, Line plot showing cell counts per TOPACIO cluster. Clusters with cell counts below the horizonal dashed red line are those with fewer than 3,000 cells highlighted in the TOPACIO embedding (inset) by red scatter points at their relative positions. d, Mean signal intensities of clustering cells in the pre-QC TOPACIO dataset normalized across clusters (row-wise). Six meta-clusters defined by the heatmap dendrogram at the left are highlighted. e, TOPACIO embedding colored by meta-clusters shown in d. fh, Top three most highly expressed markers (1, green; 2, red; 3, blue) for clusters 4 (f), 174 (g) and 197 (h), which were all severely affected by dataset noise. A single white pixel at the center of each image highlights the reference cell. Nuclear segmentation outlines (translucent white outlines) and Hoechst (gray) are shown for reference. i, The average percentage of image tiles affected by visual artifacts in each channel among the 25 TOPACIO specimens. j, Stacked bar chart showing the cumulative percentage of channel-specific image tiles affected by visual artifacts per TOPACIO specimen. Note that, because these data represent cumulative percentages across imaging channels, the total y-axis percentage may supersede 100%. Inset shows an example illumination aberration in the CD163 channel of TOPACIO specimen 73. Categories for tissue biopsy method and patient treatment response are indicated below each specimen. Artifacts were less abundant in tissue resections compared to fine-needle and punch-needle biopsies as determined by one-way ANOVA followed by pairwise Tukey’s HSD (F = 10.27, P = 0.0007; fine-needle versus resection mean difference 204.83, Padj = 0.0145; resection versus punch-needle mean difference −283.0, Padj = 0.0029).
Fig. 4
Fig. 4. Identifying and removing noisy single-cell data points with CyLinter.
a, Schematic representation of the CyLinter workflow with modules colored by type; viz, visualization. be, CyLinter input files. f, Demonstration of negative ROI selection in CyLinter. Dataset 2 (CRC) is shown with ROIs (yellow outlines) applied to various artifacts in the CD163 channel to be dropped from subsequent analysis. g, Demonstration of positive ROI selection in CyLinter. Dataset 1 (TOPACIO, specimen 152) is shown with ROIs (yellow outlines) applied to regions devoid of artifacts in the FOXP3 channel to be retained for further analysis. hl, Data filtration techniques implemented by CyLinter for the removal of dim nuclei (h, as demonstrated in EMIT TMA, core 12, nonneoplastic lung), bright nuclei (i, as demonstrated in TOPACIO, specimen 110), oversegmented nuclei (j, as demonstrated in the CRC image), undersegmented nuclei (k, as demonstrated in EMIT TMA, core 84, nonneoplastic colon) and unstable nuclei (l, as demonstrated in EMIT TMA, core 74, renal cell carcinoma). The top plots show density histograms of mean Hoechst signal for cells in the given tissue. The bottom images show Hoechst signal (colormap) in a region of the same tissue with cells falling within the green region in the above histogram highlighted by green points. Nuclear segmentation outlines are shown for reference in all cases (translucent outlines). Note that, unlike hk, which highlight cells to be excluded from analysis, cells highlighted in l are to be retained for further analysis. m, Filtering channel outliers. Top: scatter plot showing CD3 (x axis) versus nuclear segmentation area (y axis) of cells from TOPACIO specimen 152 before (left) and after (right) outlier removal. Bottom: CD3 (colormap) and Hoechst (gray) signals in a region of the same specimen with CD3+ cells (green points) falling to the right of the red gate in the scatter plot in which outliers have been removed. Nuclear segmentation outlines (translucent outlines) shown for reference.
Fig. 5
Fig. 5. Cleaning Dataset 2 (CRC) with CyLinter.
a, Fraction of cells in Dataset 2 redacted by each QC filter in the CyLinter pipeline. Dropped ROIs, cells dropped by selectROIs module; dim/oversaturated nuclei, cells dropped by dnaIntensity module; segmentation errors, cells dropped by areaFilter module; unstable cells, cells dropped by cycleCorrelation module; channel outliers, cells dropped by pruneOutliers module; artifact-free, cells remaining after QC. b, UMAP embedding of post-QC CRC data showing ~9.3 × 105 cells colored by HDBSCAN cluster. Black scatter points represent unclustered (ambiguous) cells. c, Silhouette scores for post-QC CRC clusters shown in b. d, Mean signal intensities for clustering cells in post-QC CRC data normalized across clusters (row-wise). Six meta-clusters defined by the clustered heatmap dendrogram at the left are highlighted. eg, Top three most highly expressed markers (1, green; 2, red; 3, blue) for post-QC CRC clusters 42 (B cells, e), 52 (CD8+ T cells near blood vessels—formed as a byproduct of spatial crosstalk, f) and 74 (vascular endothelial cells, g). A single white pixel at the center of each image highlights the reference cell. Nuclear segmentation outlines (translucent outlines) and Hoechst (gray) shown for reference. h, Overlap between pre-QC CRC clusters (rows) and post-QC CRC clusters (columns) showing a one-to-many correspondence between pre- and post-QC clusters. i, Pre-QC CRC embedding showing the position of cluster 6 (red, inset) and its composition according to post-QC CRC clusters. j, Locations of cells in pre-QC cluster 6 colored by their post-QC cluster labels revealing that pre-QC cluster 6 was in fact composed of multiple cell states occupying distinct regions throughout the muscularis propria of the CRC image—a noncancerous, smooth muscle-rich region of tissue. k, Mean signal intensities for post-QC CRC cluster 13 cells. The black arrows highlight bright channels consistent with both proliferating epithelial cells and CD8+ T cells. l, Post-QC CRC cluster 13 cells (white points) shown in context of the CRC image demonstrating more than 30 instances of spatial crosstalk between keratin+ tumor cells (blue) and CD8+ T cells (orange). Nuclear segmentation outlines (translucent outlines) shown for reference.
Fig. 6
Fig. 6. Cleaning Dataset 1 (TOPACIO) with CyLinter.
a, Fraction of cells in the TOPACIO dataset redacted by each QC filter in the CyLinter pipeline. Dropped ROIs, cells dropped by selectROIs module; dim/oversaturated nuclei, cells dropped by dnaIntensity module; segmentation errors, cells dropped by areaFilter module; unstable cells, cells dropped by cycleCorrelation module; channel outliers, cells dropped by pruneOutliers module; artifact-free, cells remaining after QC. b, UMAP embedding of TOPACIO data showing ~3.0 × 106 cells colored by HDBSCAN cluster. Black scatter points represent unclustered (ambiguous) cells. c, Silhouette scores for post-QC TOPACIO clusters shown in b revealing cluster 42 as an underclustered population. d, Mean signal intensities for clustering cells in the post-QC TOPACIO dataset normalized across clusters (row-wise). Four meta-clusters defined by the clustered heatmap dendrogram at the left are highlighted. ei, Top three most highly expressed markers (1, green; 2, red; 3, blue) for clusters 0 (TReg cells: phenotype 1, e), 17 (TReg cells: phenotype 2, f), 21 (breast cancer cells with DNA damage, g), 35 (CD4+ T cells near breast cancer cells, h) and 42 (breast cancer cells without DNA damage, i). A single white pixel at the center of each image highlights the reference cell. Nuclear segmentation outlines (translucent outlines) and Hoechst (gray) shown for reference. j, Left: pre-QC TOPACIO UMAP embedding (also shown in Fig. 3a) with the location of five clusters selected at random highlighted. Right: location of the cells from the four pre-QC clusters shown in the embedding at left in the context of the post-QC TOPACIO UMAP embedding (also shown in b) demonstrating that these pre-QC clusters in fact consisted of multiple cell states. Far right: image patches of cells representing post-QC clusters 5, 10, 24 and 39.
Extended Data Fig. 1
Extended Data Fig. 1. Recurring artifacts in whole-slide immunofluorescence images of tissue and their effects on tissue-derived, single-cell data.
a, Left: Field of view from Dataset 1 (TOPACIO, specimen 110) showing a tissue fold (ROI, dashed white outline) as viewed in channels PDL1 (colormap) and Hoechst (gray). Right: UMAP embedding of 19-channel single-cell data from the left image colored by PDL1 intensity (top left), ROI inclusion (bottom left), and HDBSCAN cluster (center right). Cells in cluster 5 are those affected by the tissue fold. b, Channel z scores for HDBSCAN clusters infrom panel (a) demonstrating that cluster 5 cells (those affected by the tissue fold) are artificially bright for all channels likely due to a combination of tissue overlap and insufficient antibody washing. c, Left: Field of view from Dataset 2 (CRC) showing two illumination aberrations (ROIs, dashed white outlines) as viewed in channels CD163 (colormap) and Hoechst (gray). Right: UMAP embedding of 21-channel single-cell data from the left image colored by CD163 intensity (left) and inclusion in one of the two ROIs (right). d, UMAP embedding of the 52-channel single-cell data shown in Fig. 1j (Dataset 7, large intestine, CODEX) after cells affected by the five illumination aberrations have been removed. Three groups of cells bright for CD3 remain (groups 1–3). Image galleries at right show four examples of each cell type in representative channels: group 1 = CD8+ T cells, group 2 = CD4+ T cells, group 3 = undefined cells immunoreactive to all 52 channels. e, Channel z scores for HDBSCAN clusters in (d) demonstrating that group 3 cells are bright for all 52 channels despite not being affected by microscopy artifacts.
Extended Data Fig. 2
Extended Data Fig. 2. Evaluation of pre-QC cell clustering results from Datasets 6 (large intestine, CODEX) and 2 (CRC, CyCIF).
a, UMAP embedding of Dataset 6 showing ~3.8 × 104 cells colored by HDBSCAN cluster. Black scatter points represent ambiguous cells (10.5% of total). b, Silhouette scores for CODEX clusters in (a) revealing cluster 29 as an under-clustered population. c, Mean signal intensities of clustering cells from Dataset 6 normalized across clusters (row-wise). d, Correlated, non-specific signals in a region of Dataset 6 as seen in channels MUC6 (red), CD154 (green), and NKG2D (blue). Yellow points highlight cluster 0 cells formed due to this artifact. e, Tissue fold in a region of Dataset 6 as seen in channels GATA3 (red), CD68 (green), and CD66 (blue). Yellow points highlight cluster 9 cells formed due to this artifact. f, Image blur in a region of Dataset 6 as seen in channels HLADR (red), CD206 (green), and CD38 (blue). Yellow points highlight cluster 13 cells formed due to this artifact. g, Location of CRC cluster 3 cells in (g) revealing no spatial bias in the distribution of these cells. h, Top three most highly expressed markers (1: green, 2: red, 3: blue) for the 25 members of CRC cluster 3 (memory helper T) cells represented by the rugplots of Fig. 2n. White asterisks highlight cells shown in enlarged format in Fig. 2m. A single white pixel at the center of each image patch highlights the reference cell. Nuclear segmentation outlines (translucent white outlines) and Hoechst (gray) shown for reference. i, Regression plots showing correlation among CD4, CD45, and CD45RO marker expression by 1.9 × 103 CRC cluster 3 cells (two-sided, Pearson R, p < 0.05). j, CRC cluster 3 cells in (h) after signal intensity cutoffs have been adjusted per image to improve their homogeneity of appearance. White asterisks highlight cells shown in enlarged format in (Fig. 2o). k, CRC cluster 3 cells with channels shown separately for clarity. Top panels show cells before contrast adjustment (h), bottom panels show cells after contrast adjustment (j). l, Top three most highly expressed markers (1: green, 2: red, 3: blue) for 25 CRC cluster 7 (TReg) cells. A single white pixel at the center of each image patch highlights the reference cell. Nuclear segmentation outlines (translucent white outlines) and Hoechst (gray) shown for reference. m, Regression plots showing strong correlation among CD4, CD45, and CD45RO marker expression of 1.9 × 103 CRC cluster 7 cells (two-sided, Pearson R, p < 0.05). n, Regression plots showing weak correlation between FOXP3 and CD4, CD45, and CD45RO marker expression of 1.9 × 103 CRC cluster 7 cells (two-sided, Pearson R, p < 0.05).
Extended Data Fig. 3
Extended Data Fig. 3. Evaluation of pre-QC cell clustering results from Dataset 1 (TOPACIO).
a, Spatial distribution of unclustered (ambiguous) cells from the pre-QC TOPACIO embedding in Fig. 3a as seen in specimen 55 (green scatter points), exhibiting no discernable spatial pattern of sampling bias; Hoechst (gray) shown for tissue context. b, Stacked bar charts showing the relative contribution of each patient specimen to pre-QC TOPACIO clusters on log10 scale. c, TOPACIO specimen 55 at low (left) and high (right) magnification showing the superimposition of Hoechst signals for the first three imaging cycles: 1 (green), 2 (red), and 3 (blue) demonstrating a cross-cycle image alignment error at the bottom of this image. Box at the bottom-right of the low magnification image shows the location of the higher magnification image. White points in the high magnification image highlight TOPACIO cluster 15 cells formed as a consequence of this image alignment artifact.
Extended Data Fig. 4
Extended Data Fig. 4. Identifying and removing noisy single-cell data points with CyLinter.
CyLinter workflow (see project website for implementation details: https://labsyspharm.github.io/cylinter/modules/). a, Aggregate data (automated): raw spatial feature tables for all specimens in a batch are merged into a single Pandas (Python) dataframe. b, ROI selection (interactive or automated): multi-channel images are viewed to identify and gate on regions of tissue affected by microscopy artifacts (negative selection mode) or areas of tissue devoid of artifacts (positive selection mode). b1-b4, Demonstration of automated artifact detection in CyLinter: b1, CyLinter’s selectROIs module showing artifacts in the CDKN1A (green) channel of Dataset 3 (EMIT TMA, core 18, mesothelioma). b2, Transformed version of the original CDKN1A image such that artifacts appear as large, bright regions relative to channel intensity variations associated with true signals which are suppressed. b3, Local intensity maxima are identified in the transformed image and a flood fill algorithm is used to create a pixel-level binary mask indicating regions of tissue affected by artifacts. b4, CyLinter’s selectROIs module showing the binary artifact mask (translucent gray shapes) and their corresponding local maxima (red scatter points) for three artifacts in the image. c, DNA intensity filter (interactive): histogram sliders are used to define lower and upper bounds on nuclear counterstain single intensity. Cells between cutoffs are visualized as scatter points at their spatial coordinates in the corresponding tissue for gate confirmation or refinement. d, Segmentation area filter (interactive): histogram sliders are used to define lower and upper bounds on cell segmentation area (pixel counts). Cells between cutoffs are visualized as scatter points at their spatial coordinates in the corresponding tissue for gate confirmation or refinement. e, Cross-cycle correlation filter (interactive): applicable to multi-cycle experiments. Histogram sliders are used to define lower and upper bounds on the log-transformed ratio of DNA signals between the first and last imaging cycles (log10(DNA1/DNAn)). Cells between cutoffs are visualized as scatter points at their spatial coordinates in their corresponding tissues for gate confirmation or refinement. f, Log transformation (automated): single-cell data are log10-transformed. g, Channel outliers filter (interactive): the distribution of cells according to antibody signal intensity is viewed for all specimens as a facet grid of scatter plots (or hexbin plots) against cell area (y-axes). Lower and upper percentile cutoffs are applied to remove outliers. Outliers are visualized as scatter points at their spatial coordinates in their corresponding tissues for gate confirmation or refinement. h, MetaQC (interactive): unsupervised clustering methods (UMAP or t-SNE followed by HDBSCAN clustering) are used to correct for gating bias in prior data filtration modules by thresholding on the percent of each cluster composed of clean (maintained) or noisy (redacted) cells. i, Principal component analysis (PCA, automated): PCA is performed and Horn’s parallel analysis is used to determine the number of PCs associated with non-random variation in the dataset. j, Image contrast adjustment (interactive): channel contrast settings are optimized for visualization on reference tissues which are applied to all specimens in the cohort. k, Unsupervised clustering (interactive): UMAP (or t-SNE) and HDBSCAN are used to identify unique cell states in a given cohort of tissues. Manual gating can also be performed to identify cell populations. l, Compute clustered heatmap (automated): clustered heatmap is generated showing channel z scores for identified clusters (or gated populations). m, Compute frequency statistics (automated): pairwise t statistics on the frequency of each identified cluster or gated cell population between groups of tissues specified in CyLinter’s configuration file (cylinter_config.yml) are computed (for example, treated vs. untreated, response vs. no response, etc.). n, Evaluate cluster membership (automated): cluster quality is checked by visualizing galleries of example cells drawn at random from each cluster identified in the clustering module (k).
Extended Data Fig. 5
Extended Data Fig. 5. Over-segmentation in Dataset 2 (CRC, CyCIF) and Cleaning of Dataset 6 (large intestine, CODEX) with CyLinter.
a, CyLinter-based gating of cells in the CRC image (Dataset 2) according to nuclear segmentation area showing that this image contains several over-segmented nuclei (that is, single nuclei split into multiple segmentation objects). b, Fraction of cells in Dataset 6 (large intestine, CODEX, specimen 1) redacted by each QC filter in the CyLinter pipeline. Dropped ROIs, cells dropped by selectROIs module; dim/oversaturated nuclei, cells dropped by dnaIntensity module; segmentation errors, cells dropped by areaFilter module; unstable cells, cells dropped by cycleCorrelation module; channel outliers, cells dropped by pruneOutliers module; artifact-free, cells remaining after QC. c, UMAP embedding of post-QC CODEX clusters showing ~3.1 × 104 cells colored by HDBSCAN cluster. Black scatter points represent ambiguous cells (10.1% of total). d, Silhouette scores for post-QC CODEX clusters in (c). e, Mean signal intensities for clustering cells in post-QC CODEX data normalized across clusters (row-wise). Five (5) meta-clusters defined by the clustered heatmap dendrogram at the left are highlighted. f-h, Top three most highly expressed markers (1: green, 2: red, 3: blue) for clusters 0 (lymphatic endothelial cells, f), 15 (mast cells, g), and 17 (M2 macrophages, h). A single white pixel at the center of each image highlights the reference cell. Nuclear segmentation outlines (translucent outlines) and Hoechst (gray) shown for reference.
Extended Data Fig. 6
Extended Data Fig. 6. Location of cells redacted by CyLinter in Datasets 2 (CRC) and 1 (TOPACIO) and Post-QC TOPACIO UMAP embedding colored by patient ID.
a, b: Cells redacted by CyLinter from Dataset 2 (CRC, a) and three arbitrary specimens from Dataset 1 (TOPACIO, b) demonstrating no discernable bias in the removal of cells from the image with the exception of areas affected by focal artifacts removed using CyLinter’s selectROIs module (white arrows). These results are representative of the other 22 tissues in the TOPACIO cohort. c, UMAP embedding of post-QC TOPACIO data shown in (Fig. 6b) colored by specimen ID demonstrating patient-specific clustering in tumor cell populations, but not immune and stromal populations (for cluster phenotype identities, refer to Fig. 6b, d, e–i and Online Supplementary Fig. 8).

Update of

References

    1. Gerdes, M. J. et al. Highly multiplexed single-cell analysis of formalin-fixed, paraffin-embedded cancer tissue. Proc. Natl Acad. Sci. USA110, 11982–11987 (2013). - PMC - PubMed
    1. Lin, J.-R. et al. Highly multiplexed immunofluorescence imaging of human tissues and tumors using t-CyCIF and conventional optical microscopes. eLife7, e31657 (2018). - PMC - PubMed
    1. Lin, J.-R., Fallahi-Sichani, M. & Sorger, P. K. Highly multiplexed imaging of single cells using a high-throughput cyclic immunofluorescence method. Nat. Commun.6, 8390 (2015). - PMC - PubMed
    1. Goltsev, Y. et al. Deep profiling of mouse splenic architecture with CODEX multiplexed imaging. Cell174, 968–981.e15 (2018). - PMC - PubMed
    1. Gut, G., Herrmann, M. D. & Pelkmans, L. Multiplexed protein maps link subcellular organization to cellular states. Science361, eaar7042 (2018). - PubMed

LinkOut - more resources