. 2021 Apr 16;11(1):8366.

doi: 10.1038/s41598-021-87762-2.

Data valuation for medical imaging using Shapley value and application to a large-scale chest X-ray dataset

Siyi Tang¹, Amirata Ghorbani¹, Rikiya Yamashita², Sameer Rehman³, Jared A Dunnmon⁴, James Zou^{1

2

4}, Daniel L Rubin^{5

6}

Affiliations

¹ Department of Electrical Engineering, Stanford University, Stanford, CA, USA.
² Department of Biomedical Data Science, Stanford University, Stanford, CA, USA.
³ Department of Radiology, Stanford University, Stanford, CA, USA.
⁴ Department of Computer Science, Stanford University, Stanford, CA, USA.
⁵ Department of Biomedical Data Science, Stanford University, Stanford, CA, USA. dlrubin@stanford.edu.
⁶ Department of Radiology, Stanford University, Stanford, CA, USA. dlrubin@stanford.edu.

PMID: 33863957
PMCID: PMC8052417
DOI: 10.1038/s41598-021-87762-2

Data valuation for medical imaging using Shapley value and application to a large-scale chest X-ray dataset

Siyi Tang et al. Sci Rep. 2021.

. 2021 Apr 16;11(1):8366.

doi: 10.1038/s41598-021-87762-2.

Authors

Siyi Tang¹, Amirata Ghorbani¹, Rikiya Yamashita², Sameer Rehman³, Jared A Dunnmon⁴, James Zou^{1

2

4}, Daniel L Rubin^{5

6}

Affiliations

¹ Department of Electrical Engineering, Stanford University, Stanford, CA, USA.
² Department of Biomedical Data Science, Stanford University, Stanford, CA, USA.
³ Department of Radiology, Stanford University, Stanford, CA, USA.
⁴ Department of Computer Science, Stanford University, Stanford, CA, USA.
⁵ Department of Biomedical Data Science, Stanford University, Stanford, CA, USA. dlrubin@stanford.edu.
⁶ Department of Radiology, Stanford University, Stanford, CA, USA. dlrubin@stanford.edu.

PMID: 33863957
PMCID: PMC8052417
DOI: 10.1038/s41598-021-87762-2

Abstract

The reliability of machine learning models can be compromised when trained on low quality data. Many large-scale medical imaging datasets contain low quality labels extracted from sources such as medical reports. Moreover, images within a dataset may have heterogeneous quality due to artifacts and biases arising from equipment or measurement errors. Therefore, algorithms that can automatically identify low quality data are highly desired. In this study, we used data Shapley, a data valuation metric, to quantify the value of training data to the performance of a pneumonia detection algorithm in a large chest X-ray dataset. We characterized the effectiveness of data Shapley in identifying low quality versus valuable data for pneumonia detection. We found that removing training data with high Shapley values decreased the pneumonia detection performance, whereas removing data with low Shapley values improved the model performance. Furthermore, there were more mislabeled examples in low Shapley value data and more true pneumonia cases in high Shapley value data. Our results suggest that low Shapley value indicates mislabeled or poor quality images, whereas high Shapley value indicates data that are valuable for pneumonia detection. Our method can serve as a framework for using data Shapley to denoise large-scale medical imaging datasets.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

**Figure 1**
Overview of our method. (a) The input data were chest X-ray images and their corresponding labels (1 for pneumonia and 0 for no pneumonia) from ChestX-ray14 dataset. (b) To compute data Shapley values for the training data, we first extracted feature vectors from a pre-trained convolutional neural network (CNN), CheXNet. Next, we applied TMC-Shapley to approximate the Shapley value of each training datum, where the supervised learning algorithm was logistic regression, and the predictor performance score was prediction accuracy for pneumonia.

**Figure 2**
(a)–(c) Effects of removing high value data points to pneumonia detection performance. We removed the most valuable data points from the training set, as ranked by TMC-Shapley, leave-one-out (LOO) and uniform sampling (random) methods. We trained a new logistic regression model every time when 1% of the training data points were removed. The x-axis shows the percentage of training data removed, and the y-axis shows the model performance on the held-out test set in terms of (a) accuracy, (b) precision and (c) recall. Removing the most valuable data points identified by TMC-Shapley method decreased the model performance more than using LOO or randomly removing data. We note that after removing more than 20% of the training data, the precision and recall scores for TMC-Shapley values increased slightly, which might be because the percentage of positive cases increased after 20% of the training data were removed (see Supplementary Figure S2a). (d)–(f) Effects of removing low value data points to pneumonia detection performance. Conversely, we removed the least valuable data points from the training set. Removing the least valuable data points identified by TMC-Shapley method improved the model performance in terms of prediction (d) accuracy and (f) recall.

**Figure 3**
Example heatmaps for (a) low value images that were mislabeled as pneumonia, (b) low value images that were mislabeled as no pneumonia, (c) high value images that were mislabeled as pneumonia. (a) Heatmaps show low activations in relevant areas in the lung but high activations in irrelevant areas outside the lung, suggesting that the input feature vectors favored no pneumonia over pneumonia. (b) Heatmaps show high activations in lung areas indicating pneumonia, suggesting that the input feature vectors favored pneumonia over no pneumonia. (c) Heatmaps show high activations in abnormal lung areas. The first three images show abnormal opacity. The fourth image shows interstitial patterns, predominantly in the upper lung fields. The last image shows abnormal mass in the upper right lung field (i.e. upper left of the image).

See this image and copyright information in PMC

Cited by

Towards More Efficient Data Valuation in Healthcare Federated Learning using Ensembling.
Kumar S, Lakshminarayanan A, Chang K, Guretno F, Mien IH, Kalpathy-Cramer J, Krishnaswamy P, Singh P. Kumar S, et al. Distrib Collab Fed Learn Afford AI Healthc Resour Div Glob Health (2022). 2022 Sep;13573:119-129. doi: 10.1007/978-3-031-18523-6_12. Epub 2022 Oct 7. Distrib Collab Fed Learn Afford AI Healthc Resour Div Glob Health (2022). 2022. PMID: 36745141 Free PMC article.
Development and validation of interpretable machine learning models for inpatient fall events and electronic medical record integration.
Shim S, Yu JY, Jekal S, Song YJ, Moon KT, Lee JH, Yeom KM, Park SH, Cho IS, Song MR, Heo S, Hong JH. Shim S, et al. Clin Exp Emerg Med. 2022 Dec;9(4):345-353. doi: 10.15441/ceem.22.354. Epub 2022 Sep 21. Clin Exp Emerg Med. 2022. PMID: 36128798 Free PMC article.
Generalization-a key challenge for responsible AI in patient-facing clinical applications.
Goetz L, Seedat N, Vandersluis R, van der Schaar M. Goetz L, et al. NPJ Digit Med. 2024 May 21;7(1):126. doi: 10.1038/s41746-024-01127-3. NPJ Digit Med. 2024. PMID: 38773304 Free PMC article.
Evaluation of perceived urgency from single-trial EEG data elicited by upper-body vibration feedback using deep learning.
Alsuradi H, Shen J, Park W, Eid M. Alsuradi H, et al. Sci Rep. 2024 Aug 23;14(1):19604. doi: 10.1038/s41598-024-70508-1. Sci Rep. 2024. PMID: 39179642 Free PMC article.
Data Valuation with Gradient Similarity.
Evans NJ, Mills GB, Wu G, Song X, McWeeney S. Evans NJ, et al. ArXiv [Preprint]. 2024 May 13:arXiv:2405.08217v1. ArXiv. 2024. PMID: 38800649 Free PMC article. Preprint.

See all "Cited by" articles

References

1. Ouyang D, et al. Video-based AI for beat-to-beat assessment of cardiac function. Nature. 2020;580:252–256. doi: 10.1038/s41586-020-2145-8. - DOI - PMC - PubMed
1. Ronneberger, O., Fischer, P. & Brox, T. U-Net: Convolutional networks for biomedical image segmentation. in Medical Image Computing and Computer-Assisted Intervention -- MICCAI 2015 234–241 (Springer International Publishing, 2015).
1. Esteva A, et al. Dermatologist-level classification of skin cancer with deep neural networks. Nature. 2017;542:115–118. doi: 10.1038/nature21056. - DOI - PMC - PubMed
1. Titano JJ, et al. Automated deep-neural-network surveillance of cranial images for acute neurologic events. Nat. Med. 2018;24:1337–1341. doi: 10.1038/s41591-018-0147-y. - DOI - PubMed
1. Lee H, et al. An explainable deep-learning algorithm for the detection of acute intracranial haemorrhage from small datasets. Nat. Biomed. Eng. 2019;3:173. doi: 10.1038/s41551-018-0324-9. - DOI - PubMed

Publication types

Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

U01 MH098953/MH/NIMH NIH HHS/United States

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations
Medical
- MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Data valuation for medical imaging using Shapley value and application to a large-scale chest X-ray dataset

Affiliations

Data valuation for medical imaging using Shapley value and application to a large-scale chest X-ray dataset

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Medical