Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Apr 16;11(1):8366.
doi: 10.1038/s41598-021-87762-2.

Data valuation for medical imaging using Shapley value and application to a large-scale chest X-ray dataset

Affiliations

Data valuation for medical imaging using Shapley value and application to a large-scale chest X-ray dataset

Siyi Tang et al. Sci Rep. .

Abstract

The reliability of machine learning models can be compromised when trained on low quality data. Many large-scale medical imaging datasets contain low quality labels extracted from sources such as medical reports. Moreover, images within a dataset may have heterogeneous quality due to artifacts and biases arising from equipment or measurement errors. Therefore, algorithms that can automatically identify low quality data are highly desired. In this study, we used data Shapley, a data valuation metric, to quantify the value of training data to the performance of a pneumonia detection algorithm in a large chest X-ray dataset. We characterized the effectiveness of data Shapley in identifying low quality versus valuable data for pneumonia detection. We found that removing training data with high Shapley values decreased the pneumonia detection performance, whereas removing data with low Shapley values improved the model performance. Furthermore, there were more mislabeled examples in low Shapley value data and more true pneumonia cases in high Shapley value data. Our results suggest that low Shapley value indicates mislabeled or poor quality images, whereas high Shapley value indicates data that are valuable for pneumonia detection. Our method can serve as a framework for using data Shapley to denoise large-scale medical imaging datasets.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Figure 1
Figure 1
Overview of our method. (a) The input data were chest X-ray images and their corresponding labels (1 for pneumonia and 0 for no pneumonia) from ChestX-ray14 dataset. (b) To compute data Shapley values for the training data, we first extracted feature vectors from a pre-trained convolutional neural network (CNN), CheXNet. Next, we applied TMC-Shapley to approximate the Shapley value of each training datum, where the supervised learning algorithm was logistic regression, and the predictor performance score was prediction accuracy for pneumonia.
Figure 2
Figure 2
(a)–(c) Effects of removing high value data points to pneumonia detection performance. We removed the most valuable data points from the training set, as ranked by TMC-Shapley, leave-one-out (LOO) and uniform sampling (random) methods. We trained a new logistic regression model every time when 1% of the training data points were removed. The x-axis shows the percentage of training data removed, and the y-axis shows the model performance on the held-out test set in terms of (a) accuracy, (b) precision and (c) recall. Removing the most valuable data points identified by TMC-Shapley method decreased the model performance more than using LOO or randomly removing data. We note that after removing more than 20% of the training data, the precision and recall scores for TMC-Shapley values increased slightly, which might be because the percentage of positive cases increased after 20% of the training data were removed (see Supplementary Figure S2a). (d)–(f) Effects of removing low value data points to pneumonia detection performance. Conversely, we removed the least valuable data points from the training set. Removing the least valuable data points identified by TMC-Shapley method improved the model performance in terms of prediction (d) accuracy and (f) recall.
Figure 3
Figure 3
Example heatmaps for (a) low value images that were mislabeled as pneumonia, (b) low value images that were mislabeled as no pneumonia, (c) high value images that were mislabeled as pneumonia. (a) Heatmaps show low activations in relevant areas in the lung but high activations in irrelevant areas outside the lung, suggesting that the input feature vectors favored no pneumonia over pneumonia. (b) Heatmaps show high activations in lung areas indicating pneumonia, suggesting that the input feature vectors favored pneumonia over no pneumonia. (c) Heatmaps show high activations in abnormal lung areas. The first three images show abnormal opacity. The fourth image shows interstitial patterns, predominantly in the upper lung fields. The last image shows abnormal mass in the upper right lung field (i.e. upper left of the image).

Similar articles

Cited by

References

    1. Ouyang D, et al. Video-based AI for beat-to-beat assessment of cardiac function. Nature. 2020;580:252–256. doi: 10.1038/s41586-020-2145-8. - DOI - PMC - PubMed
    1. Ronneberger, O., Fischer, P. & Brox, T. U-Net: Convolutional networks for biomedical image segmentation. in Medical Image Computing and Computer-Assisted Intervention -- MICCAI 2015 234–241 (Springer International Publishing, 2015).
    1. Esteva A, et al. Dermatologist-level classification of skin cancer with deep neural networks. Nature. 2017;542:115–118. doi: 10.1038/nature21056. - DOI - PMC - PubMed
    1. Titano JJ, et al. Automated deep-neural-network surveillance of cranial images for acute neurologic events. Nat. Med. 2018;24:1337–1341. doi: 10.1038/s41591-018-0147-y. - DOI - PubMed
    1. Lee H, et al. An explainable deep-learning algorithm for the detection of acute intracranial haemorrhage from small datasets. Nat. Biomed. Eng. 2019;3:173. doi: 10.1038/s41551-018-0324-9. - DOI - PubMed

Publication types