Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2023 Jul 24:2023.07.21.548450.
doi: 10.1101/2023.07.21.548450.

Addressing persistent challenges in digital image analysis of cancerous tissues

Addressing persistent challenges in digital image analysis of cancerous tissues

Sandhya Prabhakaran et al. bioRxiv. .

Update in

Abstract

The National Cancer Institute (NCI) supports many research programs and consortia, many of which use imaging as a major modality for characterizing cancerous tissue. A trans-consortia Image Analysis Working Group (IAWG) was established in 2019 with a mission to disseminate imaging-related work and foster collaborations. In 2022, the IAWG held a virtual hackathon focused on addressing challenges of analyzing high dimensional datasets from fixed cancerous tissues. Standard image processing techniques have automated feature extraction, but the next generation of imaging data requires more advanced methods to fully utilize the available information. In this perspective, we discuss current limitations of the automated analysis of multiplexed tissue images, the first steps toward deeper understanding of these limitations, what possible solutions have been developed, any new or refined approaches that were developed during the Image Analysis Hackathon 2022, and where further effort is required. The outstanding problems addressed in the hackathon fell into three main themes: 1) challenges to cell type classification and assessment, 2) translation and visual representation of spatial aspects of high dimensional data, and 3) scaling digital image analyses to large (multi-TB) datasets. We describe the rationale for each specific challenge and the progress made toward addressing it during the hackathon. We also suggest areas that would benefit from more focus and offer insight into broader challenges that the community will need to address as new technologies are developed and integrated into the broad range of image-based modalities and analytical resources already in use within the cancer research community.

Keywords: Multiplexed images; artifact removal; cancer; domain representation; image analysis; scalability; thumbnail generation.

PubMed Disclaimer

Figures

Figure 1:
Figure 1:. Strategies for artifact detection and correction.
A) Examples of common imaging artifacts in fluorescence microscopy. From left to right: miscellaneous fluorescent contaminants, autofluorescent lint fibers, air bubbles causing refractive index mismatch, antibody hindrance (broad region of low antibody reactivity), and out-of-focus tissue. B) CyCIF datasets used for the artifact-related hackathon challenges, featuring human colorectal cancer and tonsil tissue. C) A fibrous artifact and illumination errors are visible (left) and manually annotated (middle) to facilitate its detection and suppression (right). D) ROC curve analysis for artifact detection performance of a multilayer perceptron trained on mean immunomarker signals alone (Features, FS1 in main text, left), or Features plus segmentation-based nuclear morphology attributes (Nuc Morph) and pixel-level image statistics (Pixel Thumb; FS3 in main text, right). Also see Supplementary Figure 2. E) Comparison of before (left) and after (right) automatic artifact correction. Artifacts that have been sufficiently removed or unresolved are shown with green or red boxes respectively. Image intensity was left unperturbed by artifact correction in tissues without artifacts. Examples shown with blue boxes.
Figure 2:
Figure 2:. Spatial spillover and visual comparisons of cell type calling.
A) Example of spatial cross-talk of adjacent cells in CyCIF stained images of tonsil. Boundaries of cells identified by segmentation are indicated by the dashed cyan lines and distinct cells are numbered. Pixel intensities from different markers are indicated by distinct colors. Spatial spillover of CD3 into adjacent cells is indicated by cyan arrows. B) Uniform Manifold Approximation and Projection (UMAP) of cell features and spatial representation of cells in a 200×200px tile before and after REDSEA. A novel Cluster 5 identified by REDSEA captures isolated cells at the image border (indicated by triangles). C) A traditional heatmap and (D) violin-matrix of cell data separated into clusters using the HDBSCAN algorithm. E) Visualizations generated by a web-based interactive tool for inspecting and comparing clustered data in a spatial context. Scatterplots of cells in UMAP embeddings with cells colored by cluster membership as a result of the respective clustering algorithms (top row) and colored by silhouette coefficients (bottom row). The plots are synchronized in navigation (zooming, panning, selections).
Figure 3:
Figure 3:. Image representation learning by VAEs and for thumbnail generation.
A). Each implementation of VAE was qualitatively assessed for their ability to distinguish control- (PBS-)treated from TGF-β-treated MCF10A cells using all morpho-spatial features or the top 10 variable (var) features compared to preselecting the top 10 discriminatory (discr) features extracted from the images. Feature space is reduced to two dimensions using UMAP embedding. Class labels of TGF-β- or PBS-treated cells are shown in pink and blue, respectively. B) Example thumbnail images. Each panel shows a thumbnail (or associated comparative plot) generated by the methods described in the main text (panel labels). All approaches were applied to a 0.9 mm2 (9 megapixels) 9-channel CyCIF image of a human tonsil germinal center.
Figure 4.
Figure 4.. Data Processing and Visualization pipeline developed during the challenge for Neuroglancer.
Highly multiplexed CyCIF data are stored as multi-channel imaging volumes (top, left), where each volume represents one channel. For simplicity, volumes are depicted as single slices in this figure. Each volume is segmented, either via thresholding or more complex machine learning approaches and stored as binary segmentation volume (top, middle). Subsequently, for each segmentation volume (i.e., segmented channel) the geometry of the segmented structures is extracted and stored as a geometry mesh for subsequent 3D surface rendering (top, right). The visualization pipeline supports a slice view that can combine an original imaging volume with several segmentation volumes (bottom, left) and a 3D view (bottom, right). The 3D view can represent the volume as extracted meshes or a clipping plane.

References

    1. Wagner RP. Rudolph Virchow and the genetic basis of somatic ecology. Genetics. 1999;151(3):917–920. - PMC - PubMed
    1. Hajdu SI. A note from history: landmarks in history of cancer, part 4. Cancer. 2012;118(20):4914–4928. - PubMed
    1. The human body at cellular resolution: the NIH Human Biomolecular Atlas Program. Nature. 2019;574(7777):187–192. - PMC - PubMed
    1. Smith JM, Conroy RM. The NIH common fund Human Biomolecular Atlas Program (HuBMAP): Building a framework for mapping the human body. The FASEB Journal. 2018;32:818–2.
    1. Regev A, Teichmann SA, Lander ES, et al. The human cell atlas. elife. 2017;6:e27041. - PMC - PubMed

Publication types