. 2023 Feb:138:104294.

doi: 10.1016/j.jbi.2023.104294. Epub 2023 Jan 24.

Evaluating and mitigating bias in machine learning models for cardiovascular disease prediction

Fuchen Li¹, Patrick Wu², Henry H Ong², Josh F Peterson², Wei-Qi Wei², Juan Zhao³

Affiliations

¹ College of Art and Science, Vanderbilt University, Nashville, TN, USA.
² Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA.
³ Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA. Electronic address: juan.zhao@heart.org.

PMID: 36706849
PMCID: PMC11104322
DOI: 10.1016/j.jbi.2023.104294

Evaluating and mitigating bias in machine learning models for cardiovascular disease prediction

Fuchen Li et al. J Biomed Inform. 2023 Feb.

. 2023 Feb:138:104294.

doi: 10.1016/j.jbi.2023.104294. Epub 2023 Jan 24.

Authors

Fuchen Li¹, Patrick Wu², Henry H Ong², Josh F Peterson², Wei-Qi Wei², Juan Zhao³

Affiliations

¹ College of Art and Science, Vanderbilt University, Nashville, TN, USA.
² Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA.
³ Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, TN, USA. Electronic address: juan.zhao@heart.org.

PMID: 36706849
PMCID: PMC11104322
DOI: 10.1016/j.jbi.2023.104294

Abstract

Objective: The study aims to investigate whether machine learning-based predictive models for cardiovascular disease (CVD) risk assessment show equivalent performance across demographic groups (such as race and gender) and if bias mitigation methods can reduce any bias present in the models. This is important as systematic bias may be introduced when collecting and preprocessing health data, which could affect the performance of the models on certain demographic sub-cohorts. The study is to investigate this using electronic health records data and various machine learning models.

Methods: The study used large de-identified Electronic Health Records data from Vanderbilt University Medical Center. Machine learning (ML) algorithms including logistic regression, random forest, gradient-boosting trees, and long short-term memory were applied to build multiple predictive models. Model bias and fairness were evaluated using equal opportunity difference (EOD, 0 indicates fairness) and disparate impact (DI, 1 indicates fairness). In our study, we also evaluated the fairness of a non-ML baseline model, the American Heart Association (AHA) Pooled Cohort Risk Equations (PCEs). Moreover, we compared the performance of three different de-biasing methods: removing protected attributes (e.g., race and gender), resampling the imbalanced training dataset by sample size, and resampling by the proportion of people with CVD outcomes.

Results: The study cohort included 109,490 individuals (mean [SD] age 47.4 [14.7] years; 64.5% female; 86.3% White; 13.7% Black). The experimental results suggested that most ML models had smaller EOD and DI than PCEs. For ML models, the mean EOD ranged from -0.001 to 0.018 and the mean DI ranged from 1.037 to 1.094 across race groups. There was a larger EOD and DI across gender groups, with EOD ranging from 0.131 to 0.136 and DI ranging from 1.535 to 1.587. For debiasing methods, removing protected attributes didn't significantly reduced the bias for most ML models. Resampling by sample size also didn't consistently decrease bias. Resampling by case proportion reduced the EOD and DI for gender groups but slightly reduced accuracy in many cases.

Conclusions: Among the VUMC cohort, both PCEs and ML models were biased against women, suggesting the need to investigate and correct gender disparities in CVD risk prediction. Resampling by proportion reduced the bias for gender groups but not for race groups.

Keywords: Bias mitigation; Cardiovascular diseases; Clinical predictive models; Electronic health records; Fairness; Machine learning.

PubMed Disclaimer

Conflict of interest statement

Declaration of Competing Interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Figures

**Fig. 1.**
Comparison of the mean and 95% confidence interval (CI) of fairness metrics for the ACC/AHA model, ML models trained with PCE features, and ML models trained with all EHR features. The bars represent 95% CI calculated from 10-time results using a t-distribution with a degree of freedom = 9. A: Equal opportunity (EOD) difference across two race groups (Black and White). B: Disparate impact (DI) across race groups. C: Equal opportunity difference across two gender groups (male and female). D: Disparate impact across gender groups. The reference value (the gray dashed line) for fairness is 0 for EOD and 1.0 for DI.

**Fig. 2.**
A comparison of EODs and DIs for three ML models before and after debiasing. The error bar represents standard deviation. A: Change of EOD across race groups before (original) and after removing protected attribute, resampling by size, and resampling by proportion compared to the original value before debiasing for each model. B: Change of DI across races before and after debiasing. C: Change of EOD across gender before and after debiasing. D: Change of DI across gender before and after debiasing.

**Fig. 3.**
Fairness vs. AUROC before and after debasing for gender group. The point is the mean value of 10 times splits. The shape represents the debias method used (or none), and the color represents the ML model used.

See this image and copyright information in PMC

Cited by

Computational strategic recruitment for representation and coverage studied in the All of Us Research Program.
Borza VA, Chen Q, Clayton EW, Kantarcioglu M, Sulieman L, Vorobeychik Y, Malin BA. Borza VA, et al. NPJ Digit Med. 2025 Jul 3;8(1):402. doi: 10.1038/s41746-025-01804-x. NPJ Digit Med. 2025. PMID: 40610586 Free PMC article.
Architectural Design of a Blockchain-Enabled, Federated Learning Platform for Algorithmic Fairness in Predictive Health Care: Design Science Study.
Liang X, Zhao J, Chen Y, Bandara E, Shetty S. Liang X, et al. J Med Internet Res. 2023 Oct 30;25:e46547. doi: 10.2196/46547. J Med Internet Res. 2023. PMID: 37902833 Free PMC article.
Machine Learning and Bias in Medical Imaging: Opportunities and Challenges.
Vrudhula A, Kwan AC, Ouyang D, Cheng S. Vrudhula A, et al. Circ Cardiovasc Imaging. 2024 Feb;17(2):e015495. doi: 10.1161/CIRCIMAGING.123.015495. Epub 2024 Feb 20. Circ Cardiovasc Imaging. 2024. PMID: 38377237 Free PMC article. Review.
Large Language Models for Wearable Sensor-Based Human Activity Recognition, Health Monitoring, and Behavioral Modeling: A Survey of Early Trends, Datasets, and Challenges.
Ferrara E. Ferrara E. Sensors (Basel). 2024 Aug 4;24(15):5045. doi: 10.3390/s24155045. Sensors (Basel). 2024. PMID: 39124092 Free PMC article. Review.
Machine learning based prediction models for cardiovascular disease risk using electronic health records data: systematic review and meta-analysis.
Liu T, Krentz A, Lu L, Curcin V. Liu T, et al. Eur Heart J Digit Health. 2024 Oct 27;6(1):7-22. doi: 10.1093/ehjdh/ztae080. eCollection 2025 Jan. Eur Heart J Digit Health. 2024. PMID: 39846062 Free PMC article. Review.

See all "Cited by" articles

References

1. Johnson KB, Wei W, Weeraratne D, Frisse ME, Misulis K, Rhee K, et al., Precision medicine, AI, and the future of personalized health Care, Clin Transl Sci. 14 (1) (2021) 86–93. - PMC - PubMed
1. Rajkomar A, Dean J, Kohane I, Machine learning in medicine, N Engl J Med. 380 (14) (2019) 1347–1358. - PubMed
1. Singh A, Nadkarni G, Gottesman O, Ellis SB, Bottinger EP, Guttag JV, Incorporating temporal EHR data in predictive models for risk stratification of renal function deterioration, J. Biomed. Inform. 53 (2015) 220–228. - PMC - PubMed
1. Gianfrancesco MA, Tamang S, Yazdany J, Schmajuk G, Potential biases in machine learning algorithms using electronic health record data, JAMA Intern. Med. 178 (11) (2018) 1544–1547. - PMC - PubMed
1. Obermeyer Z, Powers B, Vogeli C, Mullainathan S, Dissecting racial bias in an algorithm used to manage the health of populations, Sci Am Assoc Adv Sci. 366 (6464) (2019) 447–453. - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Evaluating and mitigating bias in machine learning models for cardiovascular disease prediction

Affiliations

Evaluating and mitigating bias in machine learning models for cardiovascular disease prediction

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Related information

Grants and funding

LinkOut - more resources

Full Text Sources