Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Aug;23(8):e13647.
doi: 10.1002/acm2.13647. Epub 2022 May 17.

Automatic contouring QA method using a deep learning-based autocontouring system

Affiliations

Automatic contouring QA method using a deep learning-based autocontouring system

Dong Joo Rhee et al. J Appl Clin Med Phys. 2022 Aug.

Abstract

Purpose: To determine the most accurate similarity metric when using an independent system to verify automatically generated contours.

Methods: A reference autocontouring system (primary system to create clinical contours) and a verification autocontouring system (secondary system to test the primary contours) were used to generate a pair of 6 female pelvic structures (UteroCervix [uterus + cervix], CTVn [nodal clinical target volume (CTV)], PAN [para-aortic lymph nodes], bladder, rectum, and kidneys) on 49 CT scans from our institution and 38 from other institutions. Additionally, clinically acceptable and unacceptable contours were manually generated using the 49 internal CT scans. Eleven similarity metrics (volumetric Dice similarity coefficient (DSC), Hausdorff distance, 95% Hausdorff distance, mean surface distance, and surface DSC with tolerances from 1 to 10 mm) were calculated between the reference and the verification autocontours, and between the manually generated and the verification autocontours. A support vector machine (SVM) was used to determine the threshold that separates clinically acceptable and unacceptable contours for each structure. The 11 metrics were investigated individually and in certain combinations. Linear, radial basis function, sigmoid, and polynomial kernels were tested using the combinations of metrics as inputs for the SVM.

Results: The highest contouring error detection accuracies were 0.91 for the UteroCervix, 0.90 for the CTVn, 0.89 for the PAN, 0.92 for the bladder, 0.95 for the rectum, and 0.97 for the kidneys and were achieved using surface DSCs with a thickness of 1, 2, or 3 mm. The linear kernel was the most accurate and consistent when a combination of metrics was used as an input for the SVM. However, the best model accuracy from the combinations of metrics was not better than the best model accuracy from a surface DSC as an input.

Conclusions: We distinguished clinically acceptable contours from clinically unacceptable contours with an accuracy higher than 0.9 for the targets and critical structures in patients with cervical cancer; the most accurate similarity metric was surface DSC with a thickness of 1, 2, or 3 mm.

Keywords: auto-contour; deep learning; similarity metrics.

PubMed Disclaimer

Conflict of interest statement

This work was partially funded by the National Cancer Institute and Varian Medical Systems.

Figures

FIGURE 1
FIGURE 1
Examples of manually generated, clinically acceptable (green) and unacceptable (red) contours for the (a) UteroCervix, (b) bladder, (c) right kidney, and (d) rectum. (e) The reference autocontour (yellow) was clinically unacceptable when the verification autocontour (blue) was clinically acceptable. (f) Both the reference and the verification autocontours were clinically unacceptable
FIGURE 2
FIGURE 2
(a) Diagram demonstrating the data acquisition process for automatic contour QA model development and (b) demonstrating that each set was split equally into three for threefold cross‐validation. QA, quality assurance
FIGURE 3
FIGURE 3
Average accuracies of the contour QA model with an individual metric for each structure with various penalty parameters, C. The error bar represents ±1 standard deviation from threefold cross‐validation. QA, quality assurance
FIGURE 4
FIGURE 4
The ROC curves with a surface DSC with a tolerance of 2 mm, the best metric to predict the clinical acceptability of the automatically generated contours. DSC, Dice similarity coefficient
FIGURE 5
FIGURE 5
Average accuracies of the SVM model with multiple metrics for each structure. The error bar represents ±1 standard deviation. Four different kernels (linear, polynomial, rbf, and sigmoid) were tested. rbf, radial basis function; SVM, support vector machine
FIGURE 6
FIGURE 6
False positives can make the thresholds more generous (blue dashed lines) than the desired thresholds (brown dashed lines) and result in having more false negatives in clinical situations
FIGURE 7
FIGURE 7
The surface DSC distributions of the clinically acceptable and unacceptable kidney contours with (left) and without (right) the manually generated contours. The thresholds can be confidently determined with the manual contours, whereas the threshold can be anywhere between the blue and red dashed lines without the manual contours due to insufficient amount of data. DSC, Dice similarity coefficient

References

    1. Ford E, Conroy L, Dong L, et al. Strategies for effective physics plan and chart review in radiation therapy: report of AAPM Task Group 275. Med Phys. 2020;47(6):e236‐e272. 10.1002/mp.14030 - DOI - PMC - PubMed
    1. Chen H‐C, Tan J, Dolly S, et al. Automated contouring error detection based on supervised geometric attribute distribution models for radiation therapy: a general strategy. Med Phys. 2015;42(2):1048‐1059. 10.1118/1.4906197 - DOI - PubMed
    1. McIntosh C, Svistoun I, Purdie TG. Groupwise conditional random forests for automatic shape classification and contour quality assessment in radiotherapy planning. IEEE Trans Med Imaging. 2013;32(6):1043‐1057. 10.1109/TMI.2013.2251421 - DOI - PubMed
    1. Hui CB, Nourzadeh H, Watkins WT, et al. Automated OAR anomaly and error detection tool in radiation therapy. Int J Radiat Oncol Biol Phys. 2017;99(2):E554‐E555. 10.1016/j.ijrobp.2017.06.1932 - DOI
    1. Rhee DJ, Cardenas CE, Elhalawani H, et al. Automatic detection of contouring errors using convolutional neural networks. Med Phys. 2019;46(11):5089‐5097. 10.1002/mp.13814 - DOI - PMC - PubMed