Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Jun 9;117(23):12592-12594.
doi: 10.1073/pnas.1919012117. Epub 2020 May 26.

Gender imbalance in medical imaging datasets produces biased classifiers for computer-aided diagnosis

Affiliations

Gender imbalance in medical imaging datasets produces biased classifiers for computer-aided diagnosis

Agostina J Larrazabal et al. Proc Natl Acad Sci U S A. .

Abstract

Artificial intelligence (AI) systems for computer-aided diagnosis and image-based screening are being adopted worldwide by medical institutions. In such a context, generating fair and unbiased classifiers becomes of paramount importance. The research community of medical image computing is making great efforts in developing more accurate algorithms to assist medical doctors in the difficult task of disease diagnosis. However, little attention is paid to the way databases are collected and how this may influence the performance of AI systems. Our study sheds light on the importance of gender balance in medical imaging datasets used to train AI systems for computer-assisted diagnosis. We provide empirical evidence supported by a large-scale study, based on three deep neural network architectures and two well-known publicly available X-ray image datasets used to diagnose various thoracic diseases under different gender imbalance conditions. We found a consistent decrease in performance for underrepresented genders when a minimum balance is not fulfilled. This raises the alarm for national agencies in charge of regulating and approving computer-assisted diagnosis systems, which should include explicit gender balance and diversity recommendations. We also establish an open problem for the academic medical image computing community which needs to be addressed by novel algorithms endowed with robustness to gender imbalance.

Keywords: computer-aided diagnosis; deep learning; gender bias; gendered innovations; medical image analysis.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interest.

Figures

Fig. 1.
Fig. 1.
Experimental results for a DenseNet-121 (18) classifier trained with images from the NIH dataset (16, 19) for 14 thoracic diseases under different gender imbalance ratios. (A) The box plots aggregate the results for 20 folds, training with male-only (blue) and female-only (orange) patients. Both models are evaluated given male (Top) and female (Bottom) test folds. A consistent decrease in performance is observed when using male patients for training and female for testing (and vice versa). (B and C) AUC achieved for two exemplar diseases under a gradient of gender imbalance ratios, from 0% of female images in training data to 100%, with increments of 25%. In B, 1 and 2 show the results when testing on male patients, while, in C, 1 and 2 present the results when testing on female patients. Statistical significance according to Mann–Whitney U test is denoted by **** (P 0.00001), *** (0.00001 <P 0.0001), ** (0.0001 <P 0.001), * (0.001 <P, 0.01), and not significant (ns) (P>0.01).

References

    1. Litjens G., et al. , A survey on deep learning in medical image analysis. Med. Image Anal. 42, 60–88 (2017). - PubMed
    1. Lindsey R., et al. , Deep neural network improves fracture detection by clinicians. Proc. Natl. Acad. Sci. U.S.A. 115, 11591–11596 (2018). - PMC - PubMed
    1. Esteva A., et al. , Dermatologist-level classification of skin cancer with deep neural networks. Nature 542, 115–118 (2017). - PMC - PubMed
    1. De Fauw J., et al. , Clinically applicable deep learning for diagnosis and referral in retinal disease. Nat. Med. 24, 1342–1350 (2018). - PubMed
    1. Chandrasekaran B., On evaluating artificial intelligence systems for medical diagnosis. AI Mag. 4, 34–34 (1983).

Publication types

LinkOut - more resources