Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Sep 15;35(18):3461-3467.
doi: 10.1093/bioinformatics/btz083.

Structured crowdsourcing enables convolutional segmentation of histology images

Affiliations

Structured crowdsourcing enables convolutional segmentation of histology images

Mohamed Amgad et al. Bioinformatics. .

Abstract

Motivation: While deep-learning algorithms have demonstrated outstanding performance in semantic image segmentation tasks, large annotation datasets are needed to create accurate models. Annotation of histology images is challenging due to the effort and experience required to carefully delineate tissue structures, and difficulties related to sharing and markup of whole-slide images.

Results: We recruited 25 participants, ranging in experience from senior pathologists to medical students, to delineate tissue regions in 151 breast cancer slides using the Digital Slide Archive. Inter-participant discordance was systematically evaluated, revealing low discordance for tumor and stroma, and higher discordance for more subjectively defined or rare tissue classes. Feedback provided by senior participants enabled the generation and curation of 20 000+ annotated tissue regions. Fully convolutional networks trained using these annotations were highly accurate (mean AUC=0.945), and the scale of annotation data provided notable improvements in image classification accuracy.

Availability and implementation: Dataset is freely available at: https://goo.gl/cNM4EL.

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Study overview. (A) Slides from the TNBC cohort were reviewed for difficulty and the study coordinator selected a single representative ROI in each slide. (B) Participants were recruited on social media from medical student interest groups. Documentation and instructional videos were developed to train participants in breast cancer pathology and the use of DSA annotation tools. A spreadsheet lists slide-level descriptions of histologic features for each of the 151 images to aid in training. (C) Participants were each assigned six slides based on experience. Challenging slides were assigned to faculty/pathology residents, while standard slides were distributed among all participants. (D) The DSA was used by participants to draw the outlines of tissue regions in their assigned slides/ROIs. A Slack workspace enabled less experienced users to ask questions and receive guidance from the more experienced users. (E) Ten evaluation ROIs were identified in the slides and were annotated by all participants in an unsupervised manner to enable inter-participant comparisons. (F) Agreement between each pair of participants was evaluated using the Dice coefficient to generate an inter-participant discordance matrix
Fig. 2.
Fig. 2.
Screenshot of the DSA and HistomicsTK web interface. The main viewport allows panning and zooming within the slide. Annotations are grouped by class into layers (middle right panel) whose style properties like color and fill can be adjusted (bottom right panel). Other features include: controlling annotation transparency, an interactive mode to highlight individual annotations, and ability to download the WSI, regions of interest or annotations. Annotation properties can also be programmatically manipulated using the DSA API
Fig. 3.
Fig. 3.
Evaluation slide set concordance and model accuracy. (A) Inter-participant discordance matrices for SP, JP, NP and AL. (B) 2-D MDS plots of participant discordance. (C, D) Testing accuracy and confusion of comparison models trained on evaluation set ROIs from SPs (cyan) and NPs (magenta), measured against post-correction masks from the core set. Confusion matrix values are percentages relative to total pixel count. (Color version of this figure is available at Bioinformatics online.)
Fig. 4.
Fig. 4.
Model performance over the testing set. (A) Visualization of full semantic segmentation model predictions on testing set regions of interest. Color codes used: red (tumor); transparent (stroma); cyan (inflammatory infiltrates); yellow (necrosis). (B) Area under ROC curve for semantic segmentation algorithm, broken down by region class. (C) Effect of training sample size on scale-dependent patch classification models. Each point represents the macro-average AUC of a single model, trained on different sets of randomly selected slides. (Color version of this figure is available at Bioinformatics online.)

References

    1. Alialy R. et al. (2018) A review on the applications of crowdsourcing in human pathology. J. Pathol. Inform., 9, 2.. - PMC - PubMed
    1. Fouad Y.A., Aanei C. (2017) Revisiting the hallmarks of cancer. Am. J. Cancer Res., 7, 1016–1036. - PMC - PubMed
    1. Gutman D.A. et al. (2013) Cancer Digital Slide Archive: an informatics resource to support integrated in silico analysis of TCGA pathology data. J. Am. Med. Inform. Assoc., 20, 1091–1098. - PMC - PubMed
    1. Gutman D.A. et al. (2017) The digital slide archive: a software platform for management, integration, and analysis of histology for cancer research. Cancer Res., 77, e75–e78. - PMC - PubMed
    1. Hughes H. et al. (2018) Quanti.us: a tool for rapid, flexible, crowd-based annotation of images. Nat. Methods, 15, 587. - PMC - PubMed

Publication types