This is a preprint.
OphthaBERT: Automated Glaucoma Diagnosis from Clinical Notes
- PMID: 40585161
- PMCID: PMC12204413
- DOI: 10.1101/2025.06.08.25329151
OphthaBERT: Automated Glaucoma Diagnosis from Clinical Notes
Abstract
Glaucoma is a leading cause of irreversible blindness worldwide, with early intervention often being crucial. Research into the underpinnings of glaucoma often relies on electronic health records (EHRs) to identify patients with glaucoma and their subtypes. However, current methods for identifying glaucoma patients from EHRs are often inaccurate or infeasible at scale, relying on International Classification of Diseases (ICD) codes or manual chart reviews. To address this limitation, we introduce (1) OphthaBERT, a powerful general clinical ophthalmology language model trained on over 2 million diverse clinical notes, and (2) a fine-tuned variant of OphthaBERT that automatically extracts binary and subtype glaucoma diagnoses from clinical notes. The base OphthaBERT model is a robust encoder, outperforming state-of-the-art clinical encoders in masked token prediction on out-of-distribution ophthalmology clinical notes and binary glaucoma classification with limited data. We report significant binary classification performance improvements in low-data regimes (p < 0.001, Bonferroni corrected). OphthaBERT is also able to achieve superior classification performance for both binary and subtype diagnosis, outperforming even fine-tuned large decoder-only language models at a fraction of the computational cost. We demonstrate a 0.23-point increase in macro-F1 for subtype diagnosis over ICD codes and strong binary classification performance when externally validated at Wilmer Eye Institute. OphthaBERT provides an interpretable, equitable framework for general ophthalmology language modeling and automated glaucoma diagnosis.
Conflict of interest statement
9 Competing Interests N.Z. receives consulting fees from Sanofi.
Figures






References
-
- Vaswani Ashish, Shazeer Noam, Parmar Niki, Uszkoreit Jakob, Jones Llion, Gomez Aidan N, Kaiser Ł ukasz, and Polosukhin Illia. Attention is All you Need. In Advances in Neural Information Processing Systems, volume 30. Curran Associates, Inc., 2017.
-
- Huang Kexin, Altosaar Jaan, and Ranganath Rajesh. ClinicalBERT: Modeling Clinical Notes and Predicting Hospital Readmission, November 2020.
-
- Singhal Karan, Azizi Shekoofeh, Tu Tao, Mahdavi S. Sara, Wei Jason, Hyung Won Chung Nathan Scales, Tanwani Ajay, Heather Cole-Lewis Stephen Pfohl, Payne Perry, Seneviratne Martin, Gamble Paul, Kelly Chris, Babiker Abubakr, Schärli Nathanael, Chowdhery Aakanksha, Mansfield Philip, Demner-Fushman Dina, Arcas Blaise Agüera y, Webster Dale, Corrado Greg S., Matias Yossi, Chou Katherine, Gottweis Juraj, Tomasev Nenad, Liu Yun, Rajkomar Alvin, Barral Joelle, Semturs Christopher, Karthikesalingam Alan, and Natarajan Vivek. Large language models encode clinical knowledge. Nature, 620(7972):172–180, August 2023. ISSN 1476–4687. doi: 10.1038/s41586-023-06291-2. - DOI - PMC - PubMed
Publication types
Grants and funding
LinkOut - more resources
Full Text Sources