Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Aug 12;8(32):eabq6147.
doi: 10.1126/sciadv.abq6147. Epub 2022 Aug 12.

Disparities in dermatology AI performance on a diverse, curated clinical image set

Affiliations

Disparities in dermatology AI performance on a diverse, curated clinical image set

Roxana Daneshjou et al. Sci Adv. .

Abstract

An estimated 3 billion people lack access to dermatological care globally. Artificial intelligence (AI) may aid in triaging skin diseases and identifying malignancies. However, most AI models have not been assessed on images of diverse skin tones or uncommon diseases. Thus, we created the Diverse Dermatology Images (DDI) dataset-the first publicly available, expertly curated, and pathologically confirmed image dataset with diverse skin tones. We show that state-of-the-art dermatology AI models exhibit substantial limitations on the DDI dataset, particularly on dark skin tones and uncommon diseases. We find that dermatologists, who often label AI datasets, also perform worse on images of dark skin tones and uncommon diseases. Fine-tuning AI models on the DDI images closes the performance gap between light and dark skin tones. These findings identify important weaknesses and biases in dermatology AI that should be addressed for reliable application to diverse patients and diseases.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.. DDI dataset and algorithm performance.
Row 1: Performance of all three AI models and the majority vote of an ensemble of dermatologists on the entire DDI dataset (A), FST I–II (B), and FST V–VI (C). Row 2: Performance across the DDI common diseases dataset with the performance of all algorithms and ensemble of dermatologists on the entire DDI common diseases dataset (D), FST I–II (E), and FST V–VI (F). Row 3: Example images from the entire DDI dataset for all skin tones (G), FST I–II (H), and FST V–VI (I). Photo Credit: DDI dataset, Stanford School of Medicine.
Fig. 2.
Fig. 2.. Algorithm performance after fine-tuning.
Fine-tuned DeepDerm (A) and HAM10000 (B) on the DDI dataset (as described in Materials and Methods) compared to baseline (first three bars in each panel). Fine-tuning closes the gap between FST I–II and FST V–VI performance and leads to overall performance improvement. Ninety-five percent confidence interval is calculated using bootstrapping across the 20 seeds for both baseline and fine-tuned models to allow direct comparison.

References

    1. Coustasse A., Sarkar R., Abodunde B., Metzger B. J., Slater C. M., Use of teledermatology to improve dermatological access in rural areas. Telemed. J. E Health 25, 1022–1032 (2019). - PubMed
    1. Tsang M. W., Resneck J. S., Even patients with changing moles face long dermatology appointment wait-times: A study of simulated patient calls to dermatologists. J. Am. Acad. Dermatol. 55, 54–58 (2006). - PubMed
    1. Tschandl P., Rinner C., Apalla Z., Argenziano G., Codella N., Halpern A., Janda M., Lallas A., Longo C., Malvehy J., Paoli J., Puig S., Rosendahl C., Soyer H. P., Zalaudek I., Kittler H., Human-computer collaboration for skin cancer recognition. Nat. Med. 26, 1229–1234 (2020). - PubMed
    1. Esteva A., Kuprel B., Novoa R. A., Ko J., Swetter S. M., Blau H. M., Thrun S., Dermatologist-level classification of skin cancer with deep neural networks. Nature 542, 115–118 (2017). - PMC - PubMed
    1. Daneshjou R., Smith M. P., Sun M. D., Rotemberg V., Zou J., Lack of transparency and potential bias in artificial intelligence data sets and algorithms: A scoping review. JAMA Dermatol. 157, 1362–1369 (2021). - PMC - PubMed