. 2021 Jun 25;16(6):e0253829.

doi: 10.1371/journal.pone.0253829. eCollection 2021.

Harnessing clinical annotations to improve deep learning performance in prostate segmentation

Karthik V Sarma¹, Alex G Raman^{1

2}, Nikhil J Dhinagar^{1

3}, Alan M Priester¹, Stephanie Harmon^{4

5}, Thomas Sanford^{4

6}, Sherif Mehralivand⁴, Baris Turkbey⁴, Leonard S Marks¹, Steven S Raman¹, William Speier¹, Corey W Arnold¹

Affiliations

¹ University of California, Los Angeles, Los Angeles, CA, United States of America.
² Western University of Health Sciences, Pomona, CA, United States of America.
³ Keck School of Medicine, University of Southern California, Los Angeles, CA, United States of America.
⁴ National Cancer Institute, National Institutes of Health, Bethesda, MD, United States of America.
⁵ Clinical Research Directorate, Frederick National Laboratory for Cancer Research, Frederick, MD, United States of America.
⁶ SUNY Upstate Medical Center, Syracuse, NY, United States of America.

PMID: 34170972
PMCID: PMC8232529
DOI: 10.1371/journal.pone.0253829

Harnessing clinical annotations to improve deep learning performance in prostate segmentation

Karthik V Sarma et al. PLoS One. 2021.

. 2021 Jun 25;16(6):e0253829.

doi: 10.1371/journal.pone.0253829. eCollection 2021.

Authors

Affiliations

¹ University of California, Los Angeles, Los Angeles, CA, United States of America.
² Western University of Health Sciences, Pomona, CA, United States of America.
³ Keck School of Medicine, University of Southern California, Los Angeles, CA, United States of America.
⁴ National Cancer Institute, National Institutes of Health, Bethesda, MD, United States of America.
⁵ Clinical Research Directorate, Frederick National Laboratory for Cancer Research, Frederick, MD, United States of America.
⁶ SUNY Upstate Medical Center, Syracuse, NY, United States of America.

PMID: 34170972
PMCID: PMC8232529
DOI: 10.1371/journal.pone.0253829

Abstract

Purpose: Developing large-scale datasets with research-quality annotations is challenging due to the high cost of refining clinically generated markup into high precision annotations. We evaluated the direct use of a large dataset with only clinically generated annotations in development of high-performance segmentation models for small research-quality challenge datasets.

Materials and methods: We used a large retrospective dataset from our institution comprised of 1,620 clinically generated segmentations, and two challenge datasets (PROMISE12: 50 patients, ProstateX-2: 99 patients). We trained a 3D U-Net convolutional neural network (CNN) segmentation model using our entire dataset, and used that model as a template to train models on the challenge datasets. We also trained versions of the template model using ablated proportions of our dataset, and evaluated the relative benefit of those templates for the final models. Finally, we trained a version of the template model using an out-of-domain brain cancer dataset, and evaluated the relevant benefit of that template for the final models. We used five-fold cross-validation (CV) for all training and evaluation across our entire dataset.

Results: Our model achieves state-of-the-art performance on our large dataset (mean overall Dice 0.916, average Hausdorff distance 0.135 across CV folds). Using this model as a pre-trained template for refining on two external datasets significantly enhanced performance (30% and 49% enhancement in Dice scores respectively). Mean overall Dice and mean average Hausdorff distance were 0.912 and 0.15 for the ProstateX-2 dataset, and 0.852 and 0.581 for the PROMISE12 dataset. Using even small quantities of data to train the template enhanced performance, with significant improvements using 5% or more of the data.

Conclusion: We trained a state-of-the-art model using unrefined clinical prostate annotations and found that its use as a template model significantly improved performance in other prostate segmentation tasks, even when trained with only 5% of the original dataset.

PubMed Disclaimer

Conflict of interest statement

LSM and AMP report a financial interest in Avenda Health outside the submitted work. BT reports IP-related royalties from Philips outside the submitted work. The NIH has cooperative research and development agreements with NVIDIA, Philips, Siemens, Xact Robotics, Celsion Corp, and Boston Scientific outside the submitted work. The NIH has research partnerships with Angiodynamics, ArciTrax, and Exact Imaging outside the submitted work. CWA has received research equipment from NVIDIA Corporation, outside the submitted work. No commercial funding or equipment was used in the execution of this study. No other authors have competing interests to disclose.

Figures

**Fig 1. 3D U-Net model diagram and preprocessing steps.**
A) Network diagram of the 3D U-Net used for this study. Numbers within the ovals represent number of feature maps at that layer. Connections represent network operations, such as 3x3x3 3D convolution (“Conv”), 2x2x2 max pooling (“Max Pool”), 3x3x3 3D transposed convolution (“Deconv”), skip feature map concatenation (“Concat”), batch normalization (“BN”), rectified linear unit activation (“ReLU”), and softmax output (“Softmax”). B) Process diagram of preprocessing steps. Once images were imported from the archive (either PACS or challenge download), N4ITK bias field correction was applied. Images were then resampled to 1mm isotropic resolution and IQR normalized. During training, real-time augmentation was applied to each input image to create the training sample for that epoch.

**Fig 2. Example UCLA baseline model segmentations.**
The orange contour depicts ground truth segmentation and the shaded blue area depicts model segmentation. A) Example apex, midgland, and base slice from a sample in the primary dataset with a high metric on evaluation. The soft Dice coefficient for this sample was 0.928, and the average Hausdorff distance was 0.085. Images of all of the slices for this study are presented in **S1 Fig. B**) Example apex, midgland, and base slice from a sample in the primary dataset with a low metric on evaluation. The soft Dice coefficient for this sample was 0.738, and the average Hausdorff distance was 0.935. Images of all of the slices for this study are presented in **S2 Fig**.

**Fig 3. Evaluation metrics for PX2 and P12 datasets.**
Soft Dice coefficients (A) and average Hausdorff distances (B) for every sample in the ProstateX-2 (PX2, n = 99) and PROMISE12 (P12, n = 50) datasets, after model evaluation for the baseline, BraTS, and refined primary baseline models. Each solid dot represents a single training example. The models trained by refining the BraTS pretrained model or the baseline pretrained model both exhibited improved performance and reduced variance on both evaluation metrics, and with the refined primary baseline model exhibiting the highest performance and lowest variance. Detailed statistics are available in **Tables 1**, 2, and 4.

**Fig 4. Soft Dice coefficients for models trained with ablated dataset.**
Soft Dice Coefficients for models trained using the ablated primary dataset (“Primary”) or trained using an ablated primary model as weight initializer (“FT”). PX2 = ProstateX-2, P12 = PROMISE12, FT = fine-tuned. Significant improvements can be seen in the performance of the fine-tuned models at 5% of the primary dataset used for training the ablated primary baseline model, with the performance benefits leveling out at 60% of the dataset.

See this image and copyright information in PMC

References

1. Bray F, Ferlay J, Soerjomataram I, Siegel RL, Torre LA, Jemal A. Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin. 2018;68: 394–424. doi: 10.3322/caac.21492 - DOI - PubMed
1. Invivo-Philips. DynaCAD Prostate Advanced visualization for prostate MRI analysis | Philips Healthcare. [cited 26 Apr 2021]. Available: https://www.usa.philips.com/healthcare/product/HC784029/dynacad-prostate
1. Becker AS, Chaitanya K, Schawkat K, Muehlematter UJ, Hötker AM, Konukoglu E, et al.. Variability of manual segmentation of the prostate in axial T2-weighted MRI: A multi-reader study. Eur J Radiol. 2019;121: 108716. doi: 10.1016/j.ejrad.2019.108716 - DOI - PubMed
1. Jia H, Xia Y, Song Y, Zhang D, Huang H, Zhang Y, et al.. 3D APA-Net: 3D Adversarial Pyramid Anisotropic Convolutional Network for Prostate Segmentation in MR Images. IEEE Trans Med Imaging. 2020;39: 447–457. doi: 10.1109/TMI.2019.2928056 - DOI - PubMed
1. Jin Y, Yang G, Fang Y, Li R, Xu X, Liu Y, et al.. 3D PBV-Net: An automated prostate MRI data segmentation method. Comput Biol Med. 2021;128: 104160. doi: 10.1016/j.compbiomed.2020.104160 - DOI - PubMed

Publication types

Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- Dryad Digital Repository
Medical
- MedlinePlus Consumer Health Information
- MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Harnessing clinical annotations to improve deep learning performance in prostate segmentation

Affiliations

Harnessing clinical annotations to improve deep learning performance in prostate segmentation

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Medical