Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 May;41(5):1176-1187.
doi: 10.1109/TMI.2021.3135002. Epub 2022 May 2.

PathAL: An Active Learning Framework for Histopathology Image Analysis

PathAL: An Active Learning Framework for Histopathology Image Analysis

Wenyuan Li et al. IEEE Trans Med Imaging. 2022 May.

Abstract

Deep neural networks, in particular convolutional networks, have rapidly become a popular choice for analyzing histopathology images. However, training these models relies heavily on a large number of samples manually annotated by experts, which is cumbersome and expensive. In addition, it is difficult to obtain a perfect set of labels due to the variability between expert annotations. This paper presents a novel active learning (AL) framework for histopathology image analysis, named PathAL. To reduce the required number of expert annotations, PathAL selects two groups of unlabeled data in each training iteration: one "informative" sample that requires additional expert annotation, and one "confident predictive" sample that is automatically added to the training set using the model's pseudo-labels. To reduce the impact of the noisy-labeled samples in the training set, PathAL systematically identifies noisy samples and excludes them to improve the generalization of the model. Our model advances the existing AL method for medical image analysis in two ways. First, we present a selection strategy to improve classification performance with fewer manual annotations. Unlike traditional methods focusing only on finding the most uncertain samples with low prediction confidence, we discover a large number of high confidence samples from the unlabeled set and automatically add them for training with assigned pseudo-labels. Second, we design a method to distinguish between noisy samples and hard samples using a heuristic approach. We exclude the noisy samples while preserving the hard samples to improve model performance. Extensive experiments demonstrate that our proposed PathAL framework achieves promising results on a prostate cancer Gleason grading task, obtaining similar performance with 40% fewer annotations compared to the fully supervised learning scenario. An ablation study is provided to analyze the effectiveness of each component in PathAL, and a pathologist reader study is conducted to validate our proposed algorithm.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
(a) Schematics of our proposed PathAL. The core algorithm of PathAL consists of three steps in the ith iteration: 1) discarding noisy samples Ni; 2) requesting human experts to annotate informative samples Ii and adding them to Li+1; and 3) adding confident predictictive samples Ci with their “pseudo-labels” to Li+1. The curriculum classification (CC) algorithm and overfitting to underfitting (O2U) monitor are used to select Ni, Ii, Ci. (b) Illustration of the CC algorithm. Tissues from one slide are mapping to one single point in deep feature space, where K-Means Clustering is used to group them in subsets. The CC algorithm is applied to each subset and the image complexity is classified as “easy”, “medium” or “hard” based on their local density. (c) Principles on how to determine Ni and Ci based on CC and O2U results. A sample that is classified as “easy” based on its complexity but has large training loss variation is more likely to be incorrectly annotated. If it is classified as “hard” for its complexity, it is more likely to be a difficult sample. If a sample’s complexity is classified as “easy” and the variation of its predictive entropy is low by the current model, we will have a higher confidence that the current prediction is correct.
Fig. 2.
Fig. 2.
Illustration of data pre-processing steps. A binary mask of tissue is first extracted; then the mid-line is found using morphological closing; after that, the mid-line is partitioned to form patches based on the batch size and overlap; finally, the blue ratios of patches are calculated and the top k patches are selected.
Fig. 3.
Fig. 3.
(a) t-SNE plot in deep feature space. Each point in the figure represents a slide whose color indicates its ISUP grade. As training progressed, different ISUP grades became more separable in the deep feature space, indicating the model captured more essential information to make correct predictions. (b) The trend of “grade concentraion” that measured the ISUP grade distribution within subsets clustered by k-means. The insets of the figure demonstrate typical ISUP distributions for the subsets. At the beginning of training, the ISUP grades were more diffuse, while at the end of the training, each cluster concentrated on fewer grades. (c)(d) The training loss for every sample in Li, and predictive entropy for every sample in Ui during the O2U process.
Fig. 4.
Fig. 4.
(a) Toy example for a 2-class classification. The dotted circle indicates the decision boundary while inside the circle we have class 0 (C=0) and class 1 (C=1) out side the circle. Samples that are far away from the decision boundary are considered as easy samples (blue for C=0 and orange for C=1), while samples that are closer to the decision boundary are considered as hard samples. We also randomly insert noisy samples (indicated by larger purple dots) that have wrong labels. (b) A heat map of averaged predictive entropy of each samples during the O2U process. (c) A confusion matrix of easy, hard, and noisy samples with horizontal axis representing the classified results and vertical axis representing the original categories.
Fig. 5.
Fig. 5.
(a) Performance comparison between PathAL and other AL baselines. (b) QWK for each group (Ni, Ci, Ii) during the training process. (c) Percentage of noisy samples that returned for training in later iterations. (d)Performance comparison between PathAL and random sampling with noise samples injection.

References

    1. Litjens G, Kooi T, Bejnordi BE, Setio AAA, Ciompi F, Ghafoorian M, Van Der Laak JA, Van Ginneken B, and Sánchez CI, “A survey on deep learning in medical image analysis,” Medical image analysis, vol. 42, pp. 60–88, 2017. - PubMed
    1. Yang C-W, Lin T-P, Huang Y-H, Chung H-J, Kuo J-Y, Huang WJ, Wu HH, Chang Y-H, Lin AT, and Chen K-K, “Does extended prostate needle biopsy improve the concordance of gleason scores between biopsy and prostatectomy in the taiwanese population?” Journal of the Chinese Medical Association, vol. 75, no. 3, pp. 97–101, 2012. - PubMed
    1. Cheplygina V, de Bruijne M, and Pluim JP, “Not-so-supervised: a survey of semi-supervised, multi-instance, and transfer learning in medical image analysis,” Medical image analysis, vol. 54, pp. 280–296, 2019. - PubMed
    1. Budd S, Robinson EC, and Kainz B, “A survey on active learning and human-in-the-loop deep learning for medical image analysis,” arXiv preprint arXiv:1910.02923, 2019. - PubMed
    1. Dgani Y, Greenspan H, and Goldberger J, “Training a neural network based on unreliable human annotation of medical images,” in 2018 IEEE 15th International Symposium on Biomedical Imaging (ISBI 2018). IEEE, 2018, pp. 39–42.

Publication types