Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Mar;51(3):1687-1701.
doi: 10.1002/mp.16937. Epub 2024 Jan 15.

Robust explanation supervision for false positive reduction in pulmonary nodule detection

Affiliations

Robust explanation supervision for false positive reduction in pulmonary nodule detection

Qilong Zhao et al. Med Phys. 2024 Mar.

Abstract

Background: Lung cancer is the deadliest and second most common cancer in the United States due to the lack of symptoms for early diagnosis. Pulmonary nodules are small abnormal regions that can be potentially correlated to the occurrence of lung cancer. Early detection of these nodules is critical because it can significantly improve the patient's survival rates. Thoracic thin-sliced computed tomography (CT) scanning has emerged as a widely used method for diagnosing and prognosis lung abnormalities.

Purpose: The standard clinical workflow of detecting pulmonary nodules relies on radiologists to analyze CT images to assess the risk factors of cancerous nodules. However, this approach can be error-prone due to the various nodule formation causes, such as pollutants and infections. Deep learning (DL) algorithms have recently demonstrated remarkable success in medical image classification and segmentation. As an ever more important assistant to radiologists in nodule detection, it is imperative ensure the DL algorithm and radiologist to better understand the decisions from each other. This study aims to develop a framework integrating explainable AI methods to achieve accurate pulmonary nodule detection.

Methods: A robust and explainable detection (RXD) framework is proposed, focusing on reducing false positives in pulmonary nodule detection. Its implementation is based on an explanation supervision method, which uses nodule contours of radiologists as supervision signals to force the model to learn nodule morphologies, enabling improved learning ability on small dataset, and enable small dataset learning ability. In addition, two imputation methods are applied to the nodule region annotations to reduce the noise within human annotations and allow the model to have robust attributions that meet human expectations. The 480, 265, and 265 CT image sets from the public Lung Image Database Consortium and Image Database Resource Initiative (LIDC-IDRI) dataset are used for training, validation, and testing.

Results: Using only 10, 30, 50, and 100 training samples sequentially, our method constantly improves the classification performance and explanation quality of baseline in terms of Area Under the Curve (AUC) and Intersection over Union (IoU). In particular, our framework with a learnable imputation kernel improves IoU from baseline by 24.0% to 80.0%. A pre-defined Gaussian imputation kernel achieves an even greater improvement, from 38.4% to 118.8% from baseline. Compared to the baseline trained on 100 samples, our method shows less drop in AUC when trained on fewer samples. A comprehensive comparison of interpretability shows that our method aligns better with expert opinions.

Conclusions: A pulmonary nodule detection framework was demonstrated using public thoracic CT image datasets. The framework integrates the robust explanation supervision (RES) technique to ensure the performance of nodule classification and morphology. The method can reduce the workload of radiologists and enable them to focus on the diagnosis and prognosis of the potential cancerous pulmonary nodules at the early stage to improve the outcomes for lung cancer patients.

Keywords: deep learning; explainable AI; pulmonary nodule detection.

PubMed Disclaimer

Conflict of interest statement

Conflict of interest

The authors have no conflict of interests to disclose.

Figures

Figure 1.
Figure 1.
Examples of positive and negative samples, along with the corresponding explanation annotations. The explanation annotation of the positive sample is a black and white mask, where the white region represents the nodule region, which has a value of 1 at each pixel. The black region represents the non-nodule region, which has a value of 0 at each pixel. As a comparison, the explanation annotation of the negative sample is a black mask with a value of 0 for each pixel.
Figure 2.
Figure 2.
Overview of the RXD framework. Subfigure (a) illustrates the input data, which includes 2D image slices and nodule region annotations. For RXD-L, the input annotations are of the original shape (i.e., 224 × 224); while for RXD-G, the input annotations need to be down sampled first. Eventually, RXD-L and RXD-G produce attention maps with the size (i.e., 7 × 7). Subfigure (b) shows the overall model architecture when the backbone model is ResNet18. Subfigure (c) shows the output including the intermediate results (imputations and attention maps) of the model. In the case of RXD-L, the imputation is an annotation processed by the imputation layer. In the case of RXD-G, the imputation is an annotation processed by the Gaussian kernel. The attention map is produced by Grad-CAM (described in Section 2.3.3) and can be used to generate visualizations. Subfigure (d) contains symbol descriptions.
Figure 3.
Figure 3.
Illustration of different imputation methods of RXD-G and RXD-L, where L-imputation and G-imputation represent the imputation methods of RXD-L and RXD-G, respectively. For RXD-G, original explanation annotations are first down sampled to 7 × 7. Then, the down sampled annotations are imputed by a 3 × 3 Gaussian kernel with Gaussian standard deviation of 0. For RXD-L, original explanation annotations are imputed by a convolution kernel of size 64, stride 32, and padding 16.
Figure 4.
Figure 4.
Mean AUC and IoU for different hyper-parameter α values of RXD-G with a sample size of (a) 10, (b) 30, (c) 50 and (d) 100 through five trials.
Figure 5.
Figure 5.
Percentage decrease in AUC of baseline and RXD-G trained on smaller sample sizes compared to the AUC of baseline trained on 100 samples through five trials. The gray and blue bars represent the decrease in AUC for baseline and RXD-G, respectively. At each sample size, the longer the bar, the more the AUC decreases compared to the AUC of the baseline trained on 100 samples.
Figure 6.
Figure 6.
Selected explanation visualization results of all methods with sample size of 10. In the first column, the ground truth nodule region is circled in red. The model-generated explanations are represented by heatmaps overlaid on the original images, where the warmer colored regions are given more importance. Baseline refers to ResNet18, MG and HAICS are comparative methods, while RXD-G and RXD-L are our methods.
Figure 7.
Figure 7.
Selected explanation visualization results of all methods with sample size of 30. In the first column, the ground truth nodule region is circled in red. The model-generated explanations are represented by heatmaps overlaid on the original images, where the warmer colored regions are given more importance. Baseline refers to ResNet18, MG and HAICS are comparative methods, while RXD-G and RXD-L are our methods.
Figure 8.
Figure 8.
Selected explanation visualization results of all methods with sample size of 50. In the first column, the ground truth nodule region is circled in red. The model-generated explanations are represented by heatmaps overlaid on the original images, where the warmer colored regions are given more importance. Baseline refers to ResNet18, MG and HAICS are comparative methods, while RXD-G and RXD-L are our methods.
Figure 9.
Figure 9.
Selected explanation visualization results of all methods with sample size of 100. In the first column, the ground truth nodule region is circled in red. The model-generated explanations are represented by heatmaps overlaid on the original images, where the warmer colored regions are given more importance. Baseline refers to ResNet18, MG and HAICS are comparative methods, while RXD-G and RXD-L are our methods.
Figure 10.
Figure 10.
The IoU obtained by each method using different attention value thresholds when the sample size is 50. The horizontal axis shows the different attention value thresholds, and the vertical axis shows the average IoU. All data points are the average of the model test results obtained through five trials.
Figure 11.
Figure 11.
The IoU obtained by each method using different attention value thresholds when the sample size is 100. The horizontal axis shows the different attention value thresholds, and the vertical axis shows the average IoU. All data points are the average of the model test results obtained through five trials.

Similar articles

Cited by

References

    1. Investigators IELCAP. Survival of patients with stage I lung cancer detected on CT screening. New England Journal of Medicine. 2006;355(17):1763–1771. - PubMed
    1. Diederich S, Das M. Solitary pulmonary nodule: detection and management. Cancer Imaging. 2006;6(Spec No A):S42. - PMC - PubMed
    1. Chen H, Huang S, Zeng Q, et al. A retrospective study analyzing missed diagnosis of lung metastases at their early stages on computed tomography. J Thorac Dis. 2019;11(8):3360–3368. - PMC - PubMed
    1. Way TW, Hadjiiski LM, Sahiner B, et al. Computer-aided diagnosis of pulmonary nodules on CT scans: Segmentation and classification using 3D active contours. Medical Physics. 2006;33(7Part1):2323–2337. - PMC - PubMed
    1. Hu M, Wang J, Chang C-W, Liu T, Yang X. End-to-end brain tumor detection using a graph-feature-based classifier. Vol 12468: SPIE; 2023.

MeSH terms