Training deep-learning segmentation models from severely limited data

Yao Zhao^{1

2}, Dong Joo Rhee^{1

2}, Carlos Cardenas¹, Laurence E Court¹, Jinzhong Yang¹

Affiliations

¹ Department of Radiation Physics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA.
² The University of Texas MD Anderson Graduate School of Biomedical Science, Houston, TX, USA.

PMID: 33474727
PMCID: PMC8058262
DOI: 10.1002/mp.14728

Training deep-learning segmentation models from severely limited data

Yao Zhao et al. Med Phys. 2021 Apr.

. 2021 Apr;48(4):1697-1706.

doi: 10.1002/mp.14728. Epub 2021 Feb 19.

Authors

Yao Zhao^{1

2}, Dong Joo Rhee^{1

2}, Carlos Cardenas¹, Laurence E Court¹, Jinzhong Yang¹

Affiliations

¹ Department of Radiation Physics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA.
² The University of Texas MD Anderson Graduate School of Biomedical Science, Houston, TX, USA.

PMID: 33474727
PMCID: PMC8058262
DOI: 10.1002/mp.14728

Abstract

Purpose: To enable generation of high-quality deep learning segmentation models from severely limited contoured cases (e.g., ~10 cases).

Methods: Thirty head and neck computed tomography (CT) scans with well-defined contours were deformably registered to 200 CT scans of the same anatomic site without contours. Acquired deformation vector fields were used to train a principal component analysis (PCA) model for each of the 30 contoured CT scans by capturing the mean deformation and most prominent variations. Each PCA model can produce an infinite number of synthetic CT scans and corresponding contours by applying random deformations. We used 300, 600, 1000, and 2000 synthetic CT scans and contours generated from one PCA model to train V-Net, a 3D convolutional neural network architecture, to segment parotid and submandibular glands. We repeated the training using same numbers of training cases generated from 7, 10, 20, and 30 PCA models, with the data distributed evenly between each PCA model. Performance of the segmentation models was evaluated with Dice similarity coefficients between auto-generated contours and physician-drawn contours on 162 test CT scans for parotid glands and another 21 test CT scans for submandibular glands.

Results: Dice values varied with the number of synthetic CT scans and the number of PCA models used to train the network. By using 2000 synthetic CT scans generated from 10 PCA models, we achieved Dice values of 82.8% ± 6.8% for right parotid, 82.0% ± 6.9% for left parotid, and 74.2% ± 6.8% for submandibular glands. These results are comparable with those obtained from state-of-the-art auto-contouring approaches, including a deep learning network trained from more than 1000 contoured patients and a multi-atlas algorithm from 12 well-contoured atlases. Improvement was marginal when >10 PCA models or >2000 synthetic CT scans were used.

Conclusions: We demonstrated an effective data augmentation approach to train high-quality deep learning segmentation models from a limited number of well-contoured patient cases.

Keywords: Auto-segmentation; convolutional neural networks; data augmentation; deep learning; principal component analysis.

PubMed Disclaimer

Figures

**Fig. 1.**
General workflow of the principal component analysis (PCA) approach to generate synthetic CT scans. Deformation vector fields (DVF) from deformable image registration (DIR) between a well-contoured image and other CT scans are used to create the PCA model. Then, the PCA model can simulate random deformations applied to the contoured CT scan, to create infinite number of synthetic CT scans with contours.

**Fig. 2.**
Comparisons between original and synthetic CT scans with contours. The top row are the original CT scans, and the bottom row are the corresponding synthetic CT scans.

**Fig. 3.**
Dice values vary with the number of synthetic CT scans (shown in the left column) and the number of PCA models used to generate the synthetic CT scans (shown in the right column) for training the V-net. Different numbers of PCA models were used to generate the synthetic CT scans (e.g., PCA1 represents one PCA model). Numbers of synthetic CT scans were evenly distributed between the PCA models. (a) Left parotid; (b) right parotid; (c) submandibular gland.

**Fig. 4.**
Physician-drawn contours (red colorwash) were compared with the auto-segmentation results (blue contours) by our networks trained on 2000 synthetic CT scans generated by 10 PCA models. (a) Parotid glands (Dice=86.9%). (b) Submandibular glands (Dice=82.0%).

**Fig. 5.**
Dice values varied with the number of synthetic CT scans used to train the V-net. Synthetic CT scans were generated from 10 PCA models, with the number of synthetic CT scans evenly distributed between the PCA models.

See this image and copyright information in PMC

References

1. lin D, Vasilakos AV, Tang Y, Yao Y. Neural networks for computer-aided diagnosis in medicine: A review. Neurocomputing. 2016;216:700–708. doi:10.1016/j.neucom.2016.08.039 - DOI
1. Cardenas CE, McCarroll RE, Court LE, et al. Deep Learning Algorithm for Auto-Delineation of High-Risk Oropharyngeal Clinical Target Volumes With Built-In Dice Similarity Coefficient Parameter Optimization Function. Int J Radiat Oncol. 2018;101(2):468–478. doi:10.1016/j.ijrobp.2018.01.114 - DOI - PMC - PubMed
1. Nikolov S, Blackwell S, Mendes R, et al. Deep learning to achieve clinically applicable segmentation of head and neck anatomy for radiotherapy. ArXiv180904430 Phys Stat. Published online September 12, 2018. Accessed June 25, 2020. http://arxiv.org/abs/1809.04430
1. Ibragimov B, Xing L. Segmentation of organs-at-risks in head and neck CT images using convolutional neural networks. Med Phys. 2017;44(2):547–557. doi:10.1002/mp.12045 - DOI - PMC - PubMed
1. Cardenas CE, Yang J, Anderson BM, Court LE, Brock KB. Advances in Auto-Segmentation. Semin Radiat Oncol. 2019;29(3):185–197. doi:10.1016/j.semradonc.2019.02.001 - DOI - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

P30 CA016672/CA/NCI NIH HHS/United States

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations
Medical
- MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Training deep-learning segmentation models from severely limited data

Affiliations

Training deep-learning segmentation models from severely limited data

Authors

Affiliations

Abstract

Figures

References

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Medical