Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Feb:138:104294.
doi: 10.1016/j.jbi.2023.104294. Epub 2023 Jan 24.

Evaluating and mitigating bias in machine learning models for cardiovascular disease prediction

Affiliations

Evaluating and mitigating bias in machine learning models for cardiovascular disease prediction

Fuchen Li et al. J Biomed Inform. 2023 Feb.

Abstract

Objective: The study aims to investigate whether machine learning-based predictive models for cardiovascular disease (CVD) risk assessment show equivalent performance across demographic groups (such as race and gender) and if bias mitigation methods can reduce any bias present in the models. This is important as systematic bias may be introduced when collecting and preprocessing health data, which could affect the performance of the models on certain demographic sub-cohorts. The study is to investigate this using electronic health records data and various machine learning models.

Methods: The study used large de-identified Electronic Health Records data from Vanderbilt University Medical Center. Machine learning (ML) algorithms including logistic regression, random forest, gradient-boosting trees, and long short-term memory were applied to build multiple predictive models. Model bias and fairness were evaluated using equal opportunity difference (EOD, 0 indicates fairness) and disparate impact (DI, 1 indicates fairness). In our study, we also evaluated the fairness of a non-ML baseline model, the American Heart Association (AHA) Pooled Cohort Risk Equations (PCEs). Moreover, we compared the performance of three different de-biasing methods: removing protected attributes (e.g., race and gender), resampling the imbalanced training dataset by sample size, and resampling by the proportion of people with CVD outcomes.

Results: The study cohort included 109,490 individuals (mean [SD] age 47.4 [14.7] years; 64.5% female; 86.3% White; 13.7% Black). The experimental results suggested that most ML models had smaller EOD and DI than PCEs. For ML models, the mean EOD ranged from -0.001 to 0.018 and the mean DI ranged from 1.037 to 1.094 across race groups. There was a larger EOD and DI across gender groups, with EOD ranging from 0.131 to 0.136 and DI ranging from 1.535 to 1.587. For debiasing methods, removing protected attributes didn't significantly reduced the bias for most ML models. Resampling by sample size also didn't consistently decrease bias. Resampling by case proportion reduced the EOD and DI for gender groups but slightly reduced accuracy in many cases.

Conclusions: Among the VUMC cohort, both PCEs and ML models were biased against women, suggesting the need to investigate and correct gender disparities in CVD risk prediction. Resampling by proportion reduced the bias for gender groups but not for race groups.

Keywords: Bias mitigation; Cardiovascular diseases; Clinical predictive models; Electronic health records; Fairness; Machine learning.

PubMed Disclaimer

Conflict of interest statement

Declaration of Competing Interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Figures

Fig. 1.
Fig. 1.
Comparison of the mean and 95% confidence interval (CI) of fairness metrics for the ACC/AHA model, ML models trained with PCE features, and ML models trained with all EHR features. The bars represent 95% CI calculated from 10-time results using a t-distribution with a degree of freedom = 9. A: Equal opportunity (EOD) difference across two race groups (Black and White). B: Disparate impact (DI) across race groups. C: Equal opportunity difference across two gender groups (male and female). D: Disparate impact across gender groups. The reference value (the gray dashed line) for fairness is 0 for EOD and 1.0 for DI.
Fig. 2.
Fig. 2.
A comparison of EODs and DIs for three ML models before and after debiasing. The error bar represents standard deviation. A: Change of EOD across race groups before (original) and after removing protected attribute, resampling by size, and resampling by proportion compared to the original value before debiasing for each model. B: Change of DI across races before and after debiasing. C: Change of EOD across gender before and after debiasing. D: Change of DI across gender before and after debiasing.
Fig. 3.
Fig. 3.
Fairness vs. AUROC before and after debasing for gender group. The point is the mean value of 10 times splits. The shape represents the debias method used (or none), and the color represents the ML model used.

Similar articles

Cited by

References

    1. Johnson KB, Wei W, Weeraratne D, Frisse ME, Misulis K, Rhee K, et al., Precision medicine, AI, and the future of personalized health Care, Clin Transl Sci. 14 (1) (2021) 86–93. - PMC - PubMed
    1. Rajkomar A, Dean J, Kohane I, Machine learning in medicine, N Engl J Med. 380 (14) (2019) 1347–1358. - PubMed
    1. Singh A, Nadkarni G, Gottesman O, Ellis SB, Bottinger EP, Guttag JV, Incorporating temporal EHR data in predictive models for risk stratification of renal function deterioration, J. Biomed. Inform. 53 (2015) 220–228. - PMC - PubMed
    1. Gianfrancesco MA, Tamang S, Yazdany J, Schmajuk G, Potential biases in machine learning algorithms using electronic health record data, JAMA Intern. Med. 178 (11) (2018) 1544–1547. - PMC - PubMed
    1. Obermeyer Z, Powers B, Vogeli C, Mullainathan S, Dissecting racial bias in an algorithm used to manage the health of populations, Sci Am Assoc Adv Sci. 366 (6464) (2019) 447–453. - PubMed

Publication types

LinkOut - more resources