Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Feb;1(2):112-119.
doi: 10.1038/s42256-019-0018-3. Epub 2019 Feb 11.

An integrated iterative annotation technique for easing neural network training in medical image analysis

Affiliations

An integrated iterative annotation technique for easing neural network training in medical image analysis

Brendon Lutnick et al. Nat Mach Intell. 2019 Feb.

Abstract

Neural networks promise to bring robust, quantitative analysis to medical fields. However, their adoption is limited by the technicalities of training these networks and the required volume and quality of human-generated annotations. To address this gap in the field of pathology, we have created an intuitive interface for data annotation and the display of neural network predictions within a commonly used digital pathology whole-slide viewer. This strategy used a 'human-in-the-loop' to reduce the annotation burden. We demonstrate that segmentation of human and mouse renal micro compartments is repeatedly improved when humans interact with automatically generated annotations throughout the training process. Finally, to show the adaptability of this technique to other medical imaging fields, we demonstrate its ability to iteratively segment human prostate glands from radiology imaging data.

PubMed Disclaimer

Conflict of interest statement

Competing interests The authors declare no competing interests.

Figures

Fig. 1 |
Fig. 1 |. Iterative H-AI-L pipeline overview.
Schematic representation of the H-AI-L pipeline for training semantic segmentation of WSIs. Several rounds of training are performed using human expert feedback to optimize ideal performance, resulting in improved efficiency in network training with limited numbers of initial annotated WSIs.
Fig. 2 |
Fig. 2 |. H-AI-L pipeline performance analysis for glomerular segmentation on holdout mouse WSIs.
a, Average annotation time per glomerulus as a function of annotation iteration. The data are averaged per WSI and normalized by the number of glomeruli in each WSI. The 0th iteration was performed without pre-existing predicted annotations, whereas subsequent iterations use network predictions as an initial annotation prediction that can be corrected by the annotator. b, F1 score of glomerular segmentation of four holdout mouse renal WSIs as a function of training iteration. c, Run times for glomerular segmentation prediction on holdout mouse renal WSIs using H-AI-L with multi-pass (two-stage segmentation) versus full-resolution segmentation. d, Example of a mouse WSI with segmented glomeruli (×40 , H&E-stained). Network predictions are outlined in green. The error bars indicate ±1 standard deviation.
Fig. 3 |
Fig. 3 |. H-AI-L human annotation errors (mouse data).
ad, Comparison of initial manual annotations from iteration 0 (a,c) with their respective final network predictions from iteration 5 (b,d). These examples were selected due to poor manual annotation, where the glomerulus was not annotated (a) or showed poorly drawn boundaries (c). These images are captured at ×40, and tissue was stained using H&E.
Fig. 4 |
Fig. 4 |. Multiclass nuclei prediction on a mouse WSI.
Several examples of multi-class nuclei predictions are visualized on a mouse WSI (×40, PAS-stained). Here, transfer learning was used to adapt the high-resolution network from above (Fig. 2) to segment nuclei classes. This network was trained using 143 labelled mouse glomeruli. The low-resolution network was kept unchanged for the initial detection of glomeruli. We expect the results to significantly improve using more labelled training data.
Fig. 5 |
Fig. 5 |. Multiclass IFTA prediction on a holdout human renal WSI.
Segmentation of healthy and sclerotic glomeruli, as well as IFTA regions from human renal biopsy WSI (×40, PAS-stained). Due to the non-sparse nature of IFTA regions, these predictions were made using only a high-resolution pass. This is a screenshot of Aperio ImageScope, which we use to interactively visualize the network predictions.
Fig. 6 |
Fig. 6 |. H-AI-L method performance analysis for human prostate segmentation from T2 MRI slices.
a, Segmentation performance as a function of training iteration, evaluated on 7 patient holdout MRI images (224 slices). Performance was evaluated on a patient basis. We note that despite the decline in network precision after iteration 6, the F1 score improves as a result of increasing sensitivity. b, The prediction performance on added training data, before network training. This figure shows the prediction performance on newly added data with respect to the expert-corrected annotation, and is evaluated on a patient basis (data from four new patients were added at the beginning of each training iteration). c, The percentage of prostate regions where network prediction performance (F1 score) fell below an acceptable threshold (percentage of slices that needed expert correction) as a function of training iteration. We define acceptable performance as F1 score > 0.88. Using this criterion, expert annotation of new data is reduced by 92% by the fifth iteration. d, A randomly selected example of a T2 MRI slice with segmented prostate; the network predictions are outlined in green. The error bars indicate ±1 standard deviation. A detailed breakdown of the training and validation datasets is available in Supplementary Table 1.
Fig. 7 |
Fig. 7 |. Annotation time-savings using the H-AI-L method while comparing to baseline segmentation speed.
H-AI-L plots showing the annotation time per region normalized with respect to the baseline annotation speed of each annotator for the result shown in Fig. 2a. An exponential decay distribution (H-AI-L curve) is fitted to each annotator, where the H-AI-L factor is the exponential time constant: a derivation can be found in the Methods. The vertical lines are gaps between iterations (where the network was trained). The area under the H-AI-L curve represents the normalized annotation time per annotator. This can be compared to the area of the normalized baseline region, which represents the normalized annotation time without the H-AI-L method. a, The time-savings by annotator 1 (calculated to be 81.3%) when creating the training set used to train the glomerular segmentation network in Fig. 2. b, Annotator 2 was 82.0% faster. c, Annotator 3 was 72.7% faster. While the y axis in these plots is not a direct measure of network performance, it is highly correlated. The spike in annotation time seen at 600 regions is data from a WSI with severe glomerular damage from diabetic nephropathy. Future work will involve deriving optimal iterative training strategies based on information mined via such plots, with a goal of reducing annotation burdens for expert annotators.

References

    1. Krizhevsky A, Sutskever I & Hinton GE ImageNet classification with deep convolutional neural networks. Commun. ACM 60, 84–90 (2017).
    1. LeCun Y & Bengio Y in The Handbook of Brain Theory and Neural Networks (ed. Michael AA) 255–258 (MIT Press, Cambridge, 1998).
    1. LeCun Y, Bengio Y & Hinton G Deep learning. Nature 521, 436–444 (2015) - PubMed
    1. Pedraza A et al. Glomerulus classification with convolutional neural networks In Proc. Medical Image Understanding and Analysis: 21st Annual Conference, MIUA 2017 (eds Valdés Hernández M & González-Castro V) 839–849 (Springer, 2017).
    1. Schmidhuber J Deep learning in neural networks: an overview. Neural Netw 61, 85–117 (2015). - PubMed