Disparities in dermatology AI performance on a diverse, curated clinical image set

Roxana Daneshjou^{1

2}, Kailas Vodrahalli³, Roberto A Novoa^{1

4}, Melissa Jenkins¹, Weixin Liang⁵, Veronica Rotemberg⁶, Justin Ko¹, Susan M Swetter¹, Elizabeth E Bailey¹, Olivier Gevaert², Pritam Mukherjee², Michelle Phung¹, Kiana Yekrang¹, Bradley Fong¹, Rachna Sahasrabudhe¹, Johan A C Allerup¹, Utako Okata-Karigane⁷, James Zou^{2

3

5

8}, Albert S Chiou¹

Affiliations

¹ Department of Dermatology, Stanford School of Medicine, Redwood City, CA, USA.
² Department of Biomedical Data Science, Stanford School of Medicine, Stanford, CA, USA.
³ Department of Electrical Engineering, Stanford University, Stanford, CA, USA.
⁴ Department of Pathology, Stanford School of Medicine, Stanford, CA, USA.
⁵ Department of Computer Science, Stanford University, Stanford, CA, USA.
⁶ Dermatology Service, Memorial Sloan Kettering Cancer Center, New York, NY, USA.
⁷ Department of Dermatology, Keio University School of Medicine, Tokyo, Japan.
⁸ Chan-Zuckerberg Biohub, San Francisco, CA, USA.

PMID: 35960806
PMCID: PMC9374341
DOI: 10.1126/sciadv.abq6147

Disparities in dermatology AI performance on a diverse, curated clinical image set

Roxana Daneshjou et al. Sci Adv. 2022.

. 2022 Aug 12;8(32):eabq6147.

doi: 10.1126/sciadv.abq6147. Epub 2022 Aug 12.

Authors

Affiliations

¹ Department of Dermatology, Stanford School of Medicine, Redwood City, CA, USA.
² Department of Biomedical Data Science, Stanford School of Medicine, Stanford, CA, USA.
³ Department of Electrical Engineering, Stanford University, Stanford, CA, USA.
⁴ Department of Pathology, Stanford School of Medicine, Stanford, CA, USA.
⁵ Department of Computer Science, Stanford University, Stanford, CA, USA.
⁶ Dermatology Service, Memorial Sloan Kettering Cancer Center, New York, NY, USA.
⁷ Department of Dermatology, Keio University School of Medicine, Tokyo, Japan.
⁸ Chan-Zuckerberg Biohub, San Francisco, CA, USA.

PMID: 35960806
PMCID: PMC9374341
DOI: 10.1126/sciadv.abq6147

Abstract

An estimated 3 billion people lack access to dermatological care globally. Artificial intelligence (AI) may aid in triaging skin diseases and identifying malignancies. However, most AI models have not been assessed on images of diverse skin tones or uncommon diseases. Thus, we created the Diverse Dermatology Images (DDI) dataset-the first publicly available, expertly curated, and pathologically confirmed image dataset with diverse skin tones. We show that state-of-the-art dermatology AI models exhibit substantial limitations on the DDI dataset, particularly on dark skin tones and uncommon diseases. We find that dermatologists, who often label AI datasets, also perform worse on images of dark skin tones and uncommon diseases. Fine-tuning AI models on the DDI images closes the performance gap between light and dark skin tones. These findings identify important weaknesses and biases in dermatology AI that should be addressed for reliable application to diverse patients and diseases.

PubMed Disclaimer

Figures

**Fig. 1.. DDI dataset and algorithm performance.**
Row 1: Performance of all three AI models and the majority vote of an ensemble of dermatologists on the entire DDI dataset (A), FST I–II (B), and FST V–VI (C). Row 2: Performance across the DDI common diseases dataset with the performance of all algorithms and ensemble of dermatologists on the entire DDI common diseases dataset (D), FST I–II (E), and FST V–VI (F). Row 3: Example images from the entire DDI dataset for all skin tones (G), FST I–II (H), and FST V–VI (I). Photo Credit: DDI dataset, Stanford School of Medicine.

**Fig. 2.. Algorithm performance after fine-tuning.**
Fine-tuned DeepDerm (A) and HAM10000 (B) on the DDI dataset (as described in Materials and Methods) compared to baseline (first three bars in each panel). Fine-tuning closes the gap between FST I–II and FST V–VI performance and leads to overall performance improvement. Ninety-five percent confidence interval is calculated using bootstrapping across the 20 seeds for both baseline and fine-tuned models to allow direct comparison.

See this image and copyright information in PMC

References

1. Coustasse A., Sarkar R., Abodunde B., Metzger B. J., Slater C. M., Use of teledermatology to improve dermatological access in rural areas. Telemed. J. E Health 25, 1022–1032 (2019). - PubMed
1. Tsang M. W., Resneck J. S., Even patients with changing moles face long dermatology appointment wait-times: A study of simulated patient calls to dermatologists. J. Am. Acad. Dermatol. 55, 54–58 (2006). - PubMed
1. Tschandl P., Rinner C., Apalla Z., Argenziano G., Codella N., Halpern A., Janda M., Lallas A., Longo C., Malvehy J., Paoli J., Puig S., Rosendahl C., Soyer H. P., Zalaudek I., Kittler H., Human-computer collaboration for skin cancer recognition. Nat. Med. 26, 1229–1234 (2020). - PubMed
1. Esteva A., Kuprel B., Novoa R. A., Ko J., Swetter S. M., Blau H. M., Thrun S., Dermatologist-level classification of skin cancer with deep neural networks. Nature 542, 115–118 (2017). - PMC - PubMed
1. Daneshjou R., Smith M. P., Sun M. D., Rotemberg V., Zou J., Lack of transparency and potential bias in artificial intelligence data sets and algorithms: A scoping review. JAMA Dermatol. 157, 1362–1369 (2021). - PMC - PubMed

Grants and funding

P30 CA008748/CA/NCI NIH HHS/United States

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Disparities in dermatology AI performance on a diverse, curated clinical image set

Affiliations

Disparities in dermatology AI performance on a diverse, curated clinical image set

Authors

Affiliations

Abstract

Figures

References

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Miscellaneous