Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Oct;42(5):101248.
doi: 10.1016/j.accpm.2023.101248. Epub 2023 May 20.

Availability of information needed to evaluate algorithmic fairness - A systematic review of publicly accessible critical care databases

Affiliations

Availability of information needed to evaluate algorithmic fairness - A systematic review of publicly accessible critical care databases

Nicholas Fong et al. Anaesth Crit Care Pain Med. 2023 Oct.

Abstract

Background: Machine learning (ML) may improve clinical decision-making in critical care settings, but intrinsic biases in datasets can introduce bias into predictive models. This study aims to determine if publicly available critical care datasets provide relevant information to identify historically marginalized populations.

Method: We conducted a review to identify the manuscripts that report the training/validation of ML algorithms using publicly accessible critical care electronic medical record (EMR) datasets. The datasets were reviewed to determine if the following 12 variables were available: age, sex, gender identity, race and/or ethnicity, self-identification as an indigenous person, payor, primary language, religion, place of residence, education, occupation, and income.

Results: 7 publicly available databases were identified. Medical Information Mart for Intensive Care (MIMIC) reports information on 7 of the 12 variables of interest, Sistema de Informação de Vigilância Epidemiológica da Gripe (SIVEP-Gripe) on 7, COVID-19 Mexican Open Repository on 4, and eICU on 4. Other datasets report information on 2 or fewer variables. All 7 databases included information about sex and age. Four databases (57%) included information about whether a patient identified as native or indigenous. Only 3 (43%) included data about race and/or ethnicity. Two databases (29%) included information about residence, and one (14%) included information about payor, language, and religion. One database (14%) included information about education and patient occupation. No databases included information on gender identity and income.

Conclusion: This review demonstrates that critical care publicly available data used to train AI algorithms do not include enough information to properly look for intrinsic bias and fairness issues towards historically marginalized populations.

Keywords: Artificial Intelligence; Bias; Dataset; Fairness; Machine learning; Publicly available.

PubMed Disclaimer

Similar articles

Cited by

Publication types

LinkOut - more resources