Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Aug:156:104677.
doi: 10.1016/j.jbi.2024.104677. Epub 2024 Jun 13.

Assessing fairness in machine learning models: A study of racial bias using matched counterparts in mortality prediction for patients with chronic diseases

Affiliations

Assessing fairness in machine learning models: A study of racial bias using matched counterparts in mortality prediction for patients with chronic diseases

Yifei Wang et al. J Biomed Inform. 2024 Aug.

Abstract

Objective: Existing approaches to fairness evaluation often overlook systematic differences in the social determinants of health, like demographics and socioeconomics, among comparison groups, potentially leading to inaccurate or even contradictory conclusions. This study aims to evaluate racial disparities in predicting mortality among patients with chronic diseases using a fairness detection method that considers systematic differences.

Methods: We created five datasets from Mass General Brigham's electronic health records (EHR), each focusing on a different chronic condition: congestive heart failure (CHF), chronic kidney disease (CKD), chronic obstructive pulmonary disease (COPD), chronic liver disease (CLD), and dementia. For each dataset, we developed separate machine learning models to predict 1-year mortality and examined racial disparities by comparing prediction performances between Black and White individuals. We compared racial fairness evaluation between the overall Black and White individuals versus their counterparts who were Black and matched White individuals identified by propensity score matching, where the systematic differences were mitigated.

Results: We identified significant differences between Black and White individuals in age, gender, marital status, education level, smoking status, health insurance type, body mass index, and Charlson comorbidity index (p-value < 0.001). When examining matched Black and White subpopulations identified through propensity score matching, significant differences between particular covariates existed. We observed weaker significance levels in the CHF cohort for insurance type (p = 0.043), in the CKD cohort for insurance type (p = 0.005) and education level (p = 0.016), and in the dementia cohort for body mass index (p = 0.041); with no significant differences for other covariates. When examining mortality prediction models across the five study cohorts, we conducted a comparison of fairness evaluations before and after mitigating systematic differences. We revealed significant differences in the CHF cohort with p-values of 0.021 and 0.001 in terms of F1 measure and Sensitivity for the AdaBoost model, and p-values of 0.014 and 0.003 in terms of F1 measure and Sensitivity for the MLP model, respectively.

Discussion and conclusion: This study contributes to research on fairness assessment by focusing on the examination of systematic disparities and underscores the potential for revealing racial bias in machine learning models used in clinical settings.

Keywords: Chronic Disease; Electronic Health Records; Fairness Analysis; Machine Learning; Mortality Prediction; Racism.

PubMed Disclaimer

Conflict of interest statement

Declaration of competing interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. Declaration of generative AI and AI-assisted technologies in the writing process Statement: During the preparation of this work the author(s) used ChatGPT in order to improve writing. After using this tool/service, the author(s) reviewed and edited the content as needed and take(s) full responsibility for the content of the publication.

Figures

Figure 1.
Figure 1.
Training-validation data splitting method with counterparts. The total population is initially divided into three distinct parts: Black, White-matched, and Others. The Black and White-matched groups serve as counterparts. The “Others” group comprises the remaining population, including non-matched White individuals and those of races other than Black or White. The splitting method for the matched White population mirrors that of the Black population to maintain counterpart matching. The “Others” group is splitted using randomization. These segments, labeled as B1, M1 and O1, are concatenated to form the final training set.
Figure 2.
Figure 2.
Performance comparison on Black, White-matched, White-total, and the total population. Five datasets are studied, each focusing on a different chronic condition: congestive heart failure (CHF), chronic kidney disease (CKD), chronic obstructive pulmonary disease (COPD), chronic liver disease (CLD), and dementia. For each chronic disease group, four types of 1-year mortality prediction models are compared, including Logistic Regression (LR), Random Forest (RF), AdaBoost (Ada), and Multilayer Perceptron (MLP). The model performance is assessed using sensitivity, F1 score, and AUROC.
Figure 3.
Figure 3.
Comparison of fairness evaluation methods on CHF and Dementia Cohorts. For each type of disease, two models (Ada and MLP) and two fairness assessments (Δf1 and ΔSe) are considered. Each scatter plot compares the results of fairness evaluation between counterparts (i.e., Black vs White-matched; x-axis) and fairness evaluation between overall groups (i.e., Black vs. White-total; y-axis) from five repeated experiments. Each point on the scatter plot represents a pair of results from the same experiment. The p-values are paired t-tests that are conducted to assess the significance of differences between two fairness evaluation methods.

Similar articles

Cited by

References

    1. Fiscella K, Sanders MR. Racial and ethnic disparities in the quality of health care. Annual review of public health. 2016;37:375–94. - PubMed
    1. Flores G, Research CoP. Racial and ethnic disparities in the health and health care of children. Pediatrics. 2010;125(4):e979–e1020. - PubMed
    1. National Academies of Sciences E, Medicine. Communities in action: Pathways to health equity. 2017. - PubMed
    1. Siddiqi AA, Wang S, Quinn K, Nguyen QC, Christy AD. Racial disparities in access to care under conditions of universal coverage. American journal of preventive medicine. 2016;50(2):220–5. - PubMed
    1. Wheeler SM, Bryant AS. Racial and ethnic disparities in health and health care. Obstetrics and Gynecology Clinics. 2017;44(1):1–11. - PubMed

Publication types

LinkOut - more resources