Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Aug 10;8(4):ooaf080.
doi: 10.1093/jamiaopen/ooaf080. eCollection 2025 Aug.

Ensemble learning to enhance accurate identification of patients with glaucoma using electronic health records

Collaborators, Affiliations

Ensemble learning to enhance accurate identification of patients with glaucoma using electronic health records

Tushar Mungle et al. JAMIA Open. .

Abstract

Objectives: Existing ophthalmology studies for clinical phenotypes identification in real-world datasets (RWD) rely exclusively on structured data elements (SDE). We evaluated the performance, generalizability, and fairness of multimodal ensemble models that integrate real-world SDE and free-text data compared to SDE-only models to identify patients with glaucoma.

Materials and methods: This is a retrospective cross-sectional study involving 2 health systems- University of Michigan (UoM) and Stanford University (SU). It involves 1728 patients visiting eye clinics during 2012-2021. Free-text embeddings extracted using BioClinicalBERT were combined with SDE. EditedNearestNeighbor (ENN) undersampling and Borderline-Synthetic Minority Over-sampling Technique (bSMOTE) addressed class imbalance. Lasso Regression (LR), Random Forest (RF), Support Vector Classifier (SVC) models were trained on UoM imbalanced (imb) and resampled data along with bagging ensemble method. Models were externally validated with SU data. Fairness was assessed using equalized odds difference (EOD) and Target Probability Difference (TPD).

Results: Among 900 and 828 patients from UoM and SU, 10% and 23% respectively had glaucoma as confirmed by ophthalmologists. At UoM, multimodal LRimb (F1 = 76.60 [61.90-88.89]; AUROC = 95.41 [87.01-99.63]) outperformed unimodal RFimb (F1 = 69.77 [52.94-83.64]; AUROC = 97.72 [95.95-99.18]) and ICD-coding method (F1 = 53.01 [39.51-65.43]; AUROC = 90.10 [84.59-93.93]). Bagging (BM = LRENN + LRbSMOTE) improved performance achieving an F1 of 83.02 [70.59-92.86] and AUROC of 97.59 [92.98-99.88]. During external validation BM achieved the highest F1 (68.47 [62.61-73.75]), outperforming unimodal (F1 = 51.26 [43.80-58.13]) and multimodal LRimb (F1 = 62.46 [55.95-68.24]). BM EOD revealed lower disparities for sex (<0.1), race (<0.5) and ethnicity (<0.5), and had least uncertainty using TDP analysis as compared to traditional models.

Discussion: Multimodal ensemble models integrating structured and unstructured EHR data outperformed traditional SDE models achieving fair predictions across demographic sub-groups. Among ensemble methods, bagging demonstrated better generalizability than stacking, particularly when training data is limited.

Conclusion: This approach can enhance phenotype discovery to enable future research studies using RWD, leading to better patient management and clinical outcomes.

Keywords: class imbalance; clinical notes; ensemble learning; generalizability and fairness; real-world data.

PubMed Disclaimer

Conflict of interest statement

None declared.

Figures

Figure 1.
Figure 1.
Overall methodology for the study..
Figure  2.
Figure  2.
Model interpretability and explainability illustrating the contribution of various features in distinguishing patients with and without glaucoma for multimodal LR models trained with different sampling techniques on internal data (University of Michigan). (A) and (B) represents model coefficients and SHAP for LRENN respectively; (C) and (D) represents model coeffieicients and SHAP values for LRbSMOTE respectively. LR: Lasso Regression; ENN: EditedNearestNeighbours; bSMOTE: Borderline Synthetic-Minority Oversampling Technique; SHAP: SHapley Additive exPlanations; *“No pathology, optic nerve” captures information such as “no disc hemorrhage.”
Figure 3.
Figure 3.
Comparison of fairness for various models using target probability difference for demographic sub-groups. UoM: University of Michigan; SU: Stanford University; LR: Lasso Regression; ENN: EditedNearestNeighbours; bSMOTE: Borderline Synthetic-Minority Oversampling Technique. Note: The extension of the violin plot below 0 or above 1 does not represent negative or greater than 1 probability values but a feature of kernel density estimation used to indicate data distribution.

References

    1. Kaskar OG, Wells-Gray E, Fleischman D, et al. Evaluating machine learning classifiers for glaucoma referral decision support in primary care settings. Sci Rep. 2022;12:8518. 10.1038/s41598-022-12270-w - DOI - PMC - PubMed
    1. Devalla SK, Liang Z, Pham TH, et al. Glaucoma management in the era of artificial intelligence. Br J Ophthalmol. 2020;104:301-311. 10.1136/bjophthalmol-2019-315016 - DOI - PubMed
    1. Welvaars K, Oosterhoff JHF, van den Bekerom MPJ, et al. ; OLVG Urology Consortium, and the Machine Learning Consortium. Implications of resampling data to address the class imbalance problem (IRCIP): an evaluation of impact on performance between classification algorithms in medical data. JAMIA Open. 2023;6:ooad033. 10.1093/jamiaopen/ooad033 - DOI - PMC - PubMed
    1. van den Goorbergh R, van Smeden M, Timmerman D, et al. The harm of class imbalance corrections for risk prediction models: illustration and simulation using logistic regression. J Am Med Inform Assoc. 2022;29:1525-1534. 10.1093/jamia/ocac093 - DOI - PMC - PubMed
    1. Yang S, Varghese P, Stephenson E, et al. Machine learning approaches for electronic health records phenotyping: a methodical review. J Am Med Inform Assoc. 2023;30:367-381. 10.1093/jamia/ocac216 - DOI - PMC - PubMed