Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Aug 24;4(5):e210290.
doi: 10.1148/ryai.210290. eCollection 2022 Sep.

Mitigating Bias in Radiology Machine Learning: 1. Data Handling

Affiliations

Mitigating Bias in Radiology Machine Learning: 1. Data Handling

Pouria Rouzrokh et al. Radiol Artif Intell. .

Abstract

Minimizing bias is critical to adoption and implementation of machine learning (ML) in clinical practice. Systematic mathematical biases produce consistent and reproducible differences between the observed and expected performance of ML systems, resulting in suboptimal performance. Such biases can be traced back to various phases of ML development: data handling, model development, and performance evaluation. This report presents 12 suboptimal practices during data handling of an ML study, explains how those practices can lead to biases, and describes what may be done to mitigate them. Authors employ an arbitrary and simplified framework that splits ML data handling into four steps: data collection, data investigation, data splitting, and feature engineering. Examples from the available research literature are provided. A Google Colaboratory Jupyter notebook includes code examples to demonstrate the suboptimal practices and steps to prevent them. Keywords: Data Handling, Bias, Machine Learning, Deep Learning, Convolutional Neural Network (CNN), Computer-aided Diagnosis (CAD) © RSNA, 2022.

Keywords: Bias; Computer-aided Diagnosis (CAD); Convolutional Neural Network (CNN); Data Handling; Deep Learning; Machine Learning.

PubMed Disclaimer

Conflict of interest statement

Disclosures of conflicts of interest: P.R. No relevant relationships. B.K. No relevant relationships. S.F. No relevant relationships. M.M. No relevant relationships. D.V.V.G. No relevant relationships. Y.S. No relevant relationships. K.Z. No relevant relationships. G.M.C. Member of the Radiology: Artificial Intelligence trainee editorial board. B.J.E. Grant from NCI; stock/stock options in FlowSIGMA, VoiceIT, and Yunu; consultant to the editor for Radiology: Artificial Intelligence.

Figures

An arbitrary framework for defining data handling, consisting of four
different steps: data collection, data investigation, data splitting, and
feature engineering. Different errors introduced in this report for each step
are also summarized. EDA = exploratory data analysis.
Figure 1:
An arbitrary framework for defining data handling, consisting of four different steps: data collection, data investigation, data splitting, and feature engineering. Different errors introduced in this report for each step are also summarized. EDA = exploratory data analysis.
A mosaic photograph of random radiographs was collected from our
institutional dataset of patients who underwent total hip arthroplasty.
Despite the reduced resolution of individual images, a quick look at this
photograph reveals valuable insights for developers who desire training
models on this dataset; for example, radiographs have different views, not
all radiographs have prostheses, radiographs are from different sexes (the
anatomy of pelvis is different between male and female patients), different
prosthesis brands are available in the data, some radiographs have outlier
intensities (presenting darker or brighter than expected), and so
forth.
Figure 2:
A mosaic photograph of random radiographs was collected from our institutional dataset of patients who underwent total hip arthroplasty. Despite the reduced resolution of individual images, a quick look at this photograph reveals valuable insights for developers who desire training models on this dataset; for example, radiographs have different views, not all radiographs have prostheses, radiographs are from different sexes (the anatomy of pelvis is different between male and female patients), different prosthesis brands are available in the data, some radiographs have outlier intensities (presenting darker or brighter than expected), and so forth.
Schematic description of (A) traditional train-validation-test splitting,
(B) fivefold cross-validation (k = 5; where k is the number of folds), and (C)
fivefold nested cross-validation (k = m = 5; where k is the number of folds in
the first-level cross-validation, and m is the number of folds in the
second-level cross-validation).
Figure 3:
Schematic description of (A) traditional train-validation-test splitting, (B) fivefold cross-validation (k = 5; where k is the number of folds), and (C) fivefold nested cross-validation (k = m = 5; where k is the number of folds in the first-level cross-validation, and m is the number of folds in the second-level cross-validation).
Example of how improper feature removal from imaging data may lead to
bias. (A) Chest radiograph in a male patient with pneumonia. (B)
Segmentation mask for the lung, generated using a deep learning model. (C)
Chest radiograph is cropped based on the segmentation mask. If the cropped
chest radiograph is fed to a subsequent classifier for detecting
consolidations, the consolidation that is located behind the heart will be
missed (arrow, A). This occurs because primary feature removal using the
segmentation model was not valid and unnecessarily removed the portion of
the lung located behind the heart.
Figure 4:
Example of how improper feature removal from imaging data may lead to bias. (A) Chest radiograph in a male patient with pneumonia. (B) Segmentation mask for the lung, generated using a deep learning model. (C) Chest radiograph is cropped based on the segmentation mask. If the cropped chest radiograph is fed to a subsequent classifier for detecting consolidations, the consolidation that is located behind the heart will be missed (arrow, A). This occurs because primary feature removal using the segmentation model was not valid and unnecessarily removed the portion of the lung located behind the heart.

References

    1. West E , Mutasa S , Zhu Z , Ha R . Global Trend in Artificial Intelligence-Based Publications in Radiology From 2000 to 2018 . AJR Am J Roentgenol 2019. ; 213 ( 6 ): 1204 – 1206 . - PubMed
    1. Tariq A , Purkayastha S , Padmanaban GP , et al . Current Clinical Applications of Artificial Intelligence in Radiology and Their Best Supporting Evidence . J Am Coll Radiol 2020. ; 17 ( 11 ): 1371 – 1381 . - PubMed
    1. Liew C . The future of radiology augmented with Artificial Intelligence: A strategy for success . Eur J Radiol 2018. ; 102 : 152 – 156 . - PubMed
    1. Krishna R , Maithreyi R , Surapaneni KM . Research bias: a review for medical students . J Clin Diagn Res 2010. ; 4 ( 2 ): 2320 – 2324 . https://www.jcdr.net/article_abstract.aspx?issn=0973-709x&year=2010&volu... .
    1. Kocak B , Kus EA , Kilickesmez O . How to read and review papers on machine learning and artificial intelligence in radiology: a survival guide to key methodological concepts . Eur Radiol 2021. ; 31 ( 4 ): 1819 – 1830 . - PubMed

LinkOut - more resources