Ensemble learning to enhance accurate identification of patients with glaucoma using electronic health records

Collaborators, Affiliations

Affiliations

¹ Department of Medicine, Stanford University, Stanford, CA 94305, United States.
² Department of Ophthalmology and Visual Sciences, University of Michigan, Ann Arbor, MI 48105, United States.
³ Department of Ophthalmology, Byers Eye Institute, Stanford University, Stanford, CA 94305, United States.
⁴ Department of Health Management and Policy, School of Public Health, University of Michigan, Ann Arbor, MI 48109, United States.

PMID: 40799932
PMCID: PMC12342940
DOI: 10.1093/jamiaopen/ooaf080

Ensemble learning to enhance accurate identification of patients with glaucoma using electronic health records

Tushar Mungle et al. JAMIA Open. 2025.

. 2025 Aug 10;8(4):ooaf080.

doi: 10.1093/jamiaopen/ooaf080. eCollection 2025 Aug.

Affiliations

¹ Department of Medicine, Stanford University, Stanford, CA 94305, United States.
² Department of Ophthalmology and Visual Sciences, University of Michigan, Ann Arbor, MI 48105, United States.
³ Department of Ophthalmology, Byers Eye Institute, Stanford University, Stanford, CA 94305, United States.
⁴ Department of Health Management and Policy, School of Public Health, University of Michigan, Ann Arbor, MI 48109, United States.

PMID: 40799932
PMCID: PMC12342940
DOI: 10.1093/jamiaopen/ooaf080

Abstract

Objectives: Existing ophthalmology studies for clinical phenotypes identification in real-world datasets (RWD) rely exclusively on structured data elements (SDE). We evaluated the performance, generalizability, and fairness of multimodal ensemble models that integrate real-world SDE and free-text data compared to SDE-only models to identify patients with glaucoma.

Materials and methods: This is a retrospective cross-sectional study involving 2 health systems- University of Michigan (UoM) and Stanford University (SU). It involves 1728 patients visiting eye clinics during 2012-2021. Free-text embeddings extracted using BioClinicalBERT were combined with SDE. EditedNearestNeighbor (ENN) undersampling and Borderline-Synthetic Minority Over-sampling Technique (bSMOTE) addressed class imbalance. Lasso Regression (LR), Random Forest (RF), Support Vector Classifier (SVC) models were trained on UoM imbalanced (imb) and resampled data along with bagging ensemble method. Models were externally validated with SU data. Fairness was assessed using equalized odds difference (EOD) and Target Probability Difference (TPD).

Results: Among 900 and 828 patients from UoM and SU, 10% and 23% respectively had glaucoma as confirmed by ophthalmologists. At UoM, multimodal LR_imb (F1 = 76.60 [61.90-88.89]; AUROC = 95.41 [87.01-99.63]) outperformed unimodal RF_imb (F1 = 69.77 [52.94-83.64]; AUROC = 97.72 [95.95-99.18]) and ICD-coding method (F1 = 53.01 [39.51-65.43]; AUROC = 90.10 [84.59-93.93]). Bagging (BM = LR_ENN + LR_bSMOTE) improved performance achieving an F1 of 83.02 [70.59-92.86] and AUROC of 97.59 [92.98-99.88]. During external validation BM achieved the highest F1 (68.47 [62.61-73.75]), outperforming unimodal (F1 = 51.26 [43.80-58.13]) and multimodal LR_imb (F1 = 62.46 [55.95-68.24]). BM EOD revealed lower disparities for sex (<0.1), race (<0.5) and ethnicity (<0.5), and had least uncertainty using TDP analysis as compared to traditional models.

Discussion: Multimodal ensemble models integrating structured and unstructured EHR data outperformed traditional SDE models achieving fair predictions across demographic sub-groups. Among ensemble methods, bagging demonstrated better generalizability than stacking, particularly when training data is limited.

Conclusion: This approach can enhance phenotype discovery to enable future research studies using RWD, leading to better patient management and clinical outcomes.

Keywords: class imbalance; clinical notes; ensemble learning; generalizability and fairness; real-world data.

PubMed Disclaimer

Conflict of interest statement

None declared.

Figures

**Figure 1.**
Overall methodology for the study..

**Figure 2.**
Model interpretability and explainability illustrating the contribution of various features in distinguishing patients with and without glaucoma for multimodal LR models trained with different sampling techniques on internal data (University of Michigan). (A) and (B) represents model coefficients and SHAP for LR_ENN respectively; (C) and (D) represents model coeffieicients and SHAP values for LR_bSMOTE respectively. LR: Lasso Regression; ENN: EditedNearestNeighbours; bSMOTE: Borderline Synthetic-Minority Oversampling Technique; SHAP: SHapley Additive exPlanations; *“No pathology, optic nerve” captures information such as “no disc hemorrhage.”

**Figure 3.**
Comparison of fairness for various models using target probability difference for demographic sub-groups. UoM: University of Michigan; SU: Stanford University; LR: Lasso Regression; ENN: EditedNearestNeighbours; bSMOTE: Borderline Synthetic-Minority Oversampling Technique. Note: The extension of the violin plot below 0 or above 1 does not represent negative or greater than 1 probability values but a feature of kernel density estimation used to indicate data distribution.

See this image and copyright information in PMC

References

1. Kaskar OG, Wells-Gray E, Fleischman D, et al. Evaluating machine learning classifiers for glaucoma referral decision support in primary care settings. Sci Rep. 2022;12:8518. 10.1038/s41598-022-12270-w - DOI - PMC - PubMed
1. Devalla SK, Liang Z, Pham TH, et al. Glaucoma management in the era of artificial intelligence. Br J Ophthalmol. 2020;104:301-311. 10.1136/bjophthalmol-2019-315016 - DOI - PubMed
1. Welvaars K, Oosterhoff JHF, van den Bekerom MPJ, et al. ; OLVG Urology Consortium, and the Machine Learning Consortium. Implications of resampling data to address the class imbalance problem (IRCIP): an evaluation of impact on performance between classification algorithms in medical data. JAMIA Open. 2023;6:ooad033. 10.1093/jamiaopen/ooad033 - DOI - PMC - PubMed
1. van den Goorbergh R, van Smeden M, Timmerman D, et al. The harm of class imbalance corrections for risk prediction models: illustration and simulation using logistic regression. J Am Med Inform Assoc. 2022;29:1525-1534. 10.1093/jamia/ocac093 - DOI - PMC - PubMed
1. Yang S, Varghese P, Stephenson E, et al. Machine learning approaches for electronic health records phenotyping: a methodical review. J Am Med Inform Assoc. 2023;30:367-381. 10.1093/jamia/ocac216 - DOI - PMC - PubMed

LinkOut - more resources

Full Text Sources
- PubMed Central

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Ensemble learning to enhance accurate identification of patients with glaucoma using electronic health records

Collaborators

Affiliations

Ensemble learning to enhance accurate identification of patients with glaucoma using electronic health records

Authors

Collaborators

Affiliations

Abstract

Conflict of interest statement

Figures

References

LinkOut - more resources

Full Text Sources