Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Oct;34(10):6639-6651.
doi: 10.1007/s00330-024-10714-7. Epub 2024 Mar 27.

Enhancing a deep learning model for pulmonary nodule malignancy risk estimation in chest CT with uncertainty estimation

Affiliations

Enhancing a deep learning model for pulmonary nodule malignancy risk estimation in chest CT with uncertainty estimation

Dré Peeters et al. Eur Radiol. 2024 Oct.

Abstract

Objective: To investigate the effect of uncertainty estimation on the performance of a Deep Learning (DL) algorithm for estimating malignancy risk of pulmonary nodules.

Methods and materials: In this retrospective study, we integrated an uncertainty estimation method into a previously developed DL algorithm for nodule malignancy risk estimation. Uncertainty thresholds were developed using CT data from the Danish Lung Cancer Screening Trial (DLCST), containing 883 nodules (65 malignant) collected between 2004 and 2010. We used thresholds on the 90th and 95th percentiles of the uncertainty score distribution to categorize nodules into certain and uncertain groups. External validation was performed on clinical CT data from a tertiary academic center containing 374 nodules (207 malignant) collected between 2004 and 2012. DL performance was measured using area under the ROC curve (AUC) for the full set of nodules, for the certain cases and for the uncertain cases. Additionally, nodule characteristics were compared to identify trends for inducing uncertainty.

Results: The DL algorithm performed significantly worse in the uncertain group compared to the certain group of DLCST (AUC 0.62 (95% CI: 0.49, 0.76) vs 0.93 (95% CI: 0.88, 0.97); p < .001) and the clinical dataset (AUC 0.62 (95% CI: 0.50, 0.73) vs 0.90 (95% CI: 0.86, 0.94); p < .001). The uncertain group included larger benign nodules as well as more part-solid and non-solid nodules than the certain group.

Conclusion: The integrated uncertainty estimation showed excellent performance for identifying uncertain cases in which the DL-based nodule malignancy risk estimation algorithm had significantly worse performance.

Clinical relevance statement: Deep Learning algorithms often lack the ability to gauge and communicate uncertainty. For safe clinical implementation, uncertainty estimation is of pivotal importance to identify cases where the deep learning algorithm harbors doubt in its prediction.

Key points: • Deep learning (DL) algorithms often lack uncertainty estimation, which potentially reduce the risk of errors and improve safety during clinical adoption of the DL algorithm. • Uncertainty estimation identifies pulmonary nodules in which the discriminative performance of the DL algorithm is significantly worse. • Uncertainty estimation can further enhance the benefits of the DL algorithm and improve its safety and trustworthiness.

Keywords: Deep learning; Multiple pulmonary nodules; Tomography (X-ray computed); Uncertainty.

PubMed Disclaimer

Conflict of interest statement

The authors declare the following competing interests:

MP receives grants from Canon Medical Systems, Siemens Healthineers; royalties from Mevis Medical Solutions; and payment for lectures from Canon Medical Systems and Siemens Healthineers. The host institution of MP is a minority shareholder in Thirona. He reports no other relationships that are related to the subject matter of the article.

The host institution of CJ receives research grants and royalties from MeVis Medical Solutions, Bremen, Germany, and payment for lectures from Canon Medical Systems. CJ is a collaborator in a public-private research project where Radboudumc collaborates with Philips Medical Systems (Best, the Netherlands). CJ is a member of the Scientific Editorial Board for European Radiology (Imaging Informatics and Artificial Intelligence). He has not taken part in the selection or review processes for this article. He reports no other relationships that are related to the subject matter of the article.

The host institution of HH receives grants from Siemens Healthineers. He reports no other relationships that are related to the subject matter of the article.

RV is supported by an institutional research grant from Siemens Healthineers.

Figures

Fig. 1
Fig. 1
Flow chart of the data collection and selection of pulmonary nodules. a Nodules from the DLCST were used for the development of the uncertainty estimations. b Incidental nodules from the clinical dataset for validation of the uncertainty estimations. Retrieval errors may be due to anonymization, image quality, protected patients, or scan availability. Radiologist score of 0, 1, or 2 indicates no lesion, a benign nodule, or indeterminate nodule in the tumor-bearing lobe, respectively
Fig. 2
Fig. 2
Schematic overview of how the uncertainty score is utilized to split the dataset into a certain and uncertain group. A nodule block (50x50x50mm) is used as input of the DL algorithm that outputs a malignancy risk and uncertainty score. The uncertainty score is determined based on the score of the individual algorithms in the ensemble. A 90th/95th percentile cut-off value on the uncertainty distribution of all nodules in the dataset is used to split it into a certain and uncertain group to compare algorithm performance. The uncertainty distribution is based on the mean entropy of the individual outputs of the DL algorithm
Fig. 3
Fig. 3
AUC for a nodule malignancy risk estimation task when using mean entropy to determine certain and uncertain cases of the DLCST dataset. AUC: area under the receiver operating curve
Fig. 4
Fig. 4
AUC for a nodule malignancy risk estimation task when using the DLCST 90th and 95th percentile threshold of mean entropy to determine certain and uncertain cases of the clinical dataset. AUC: area under the receiver operating curve
Fig. 5
Fig. 5
Examples of uncertain cases from the Danish Lung Cancer Screening Trial (DLCST) dataset and the Clinical dataset. Numbers in the bottom right corner of each image indicate the predicted DL malignancy risk, with an extent of color filling in the rings that is proportional to the malignancy risk. A malignancy risk of 0 represents the lowest risk, and 1 represents the highest risk. Arrows indicate the nodule location. DL: Deep Learning Malignancy Risk Estimation. Small: < 6 mm, medium: ≥ 6 to < 8 mm and large: ≥ 8 mm
Fig. 6
Fig. 6
Examples of certain cases from the Danish Lung Cancer Screening Trial (DLCST) dataset and the Clinical dataset. Numbers in the bottom right corner of each image indicate the predicted DL malignancy risk, with an extent of color filling in the rings that is proportional to the malignancy risk. A malignancy risk of 0 represents the lowest risk, and 1 represents the highest risk. Arrows indicate the nodule location. DL: Deep Learning Malignancy Risk Estimation. Small: < 6 mm, medium: ≥ 6 to < 8 mm, and large: ≥ 8 mm

Similar articles

Cited by

References

    1. The National Lung Screening Trial Research Team, Aberle DR, Adams AM et al (2011) Reduced lung-cancer mortality with low-dose computed tomographic screening. N Engl J Med. 10.1056/NEJMoa110287310.1056/NEJMoa1102873 - DOI - PMC - PubMed
    1. de Koning HJ, van der Aalst CM, de Jong PA et al (2020) Reduced lung-cancer mortality with volume CT screening in a randomized trial. N Engl J Med. 10.1056/NEJMoa1911793 10.1056/NEJMoa1911793 - DOI - PubMed
    1. Siegel RL, Miller KD, Fuchs HE, Jemal A (2022) Cancer statistics, 2022. CA Cancer J Clin. 10.3322/caac.21708 10.3322/caac.21708 - DOI - PubMed
    1. Limb M (2022) Shortages of radiology and oncology staff putting cancer patients at risk, college warns. BMJ. 10.1136/bmj.o1430 10.1136/bmj.o1430 - DOI - PubMed
    1. Wille MM, Dirksen A, Ashraf H et al (2016) Results of the randomized Danish lung cancer screening trial with focus on high-risk profiling. Am J Respir Crit Care Med. 10.1164/rccm.201505-1040OC 10.1164/rccm.201505-1040OC - DOI - PubMed