Enhanced Phenotype Identification of Common Ocular Diseases in Real-World Datasets
- PMID: 40212931
- PMCID: PMC11985028
- DOI: 10.1016/j.xops.2025.100717
Enhanced Phenotype Identification of Common Ocular Diseases in Real-World Datasets
Abstract
Objective: For studies using real-world data, accurately identifying patients with phenotypes of interest is challenging. To identify cohorts of interest, most studies exclusively use the International Classification of Diseases (ICD) billing codes, which can be limiting. We developed a method to accurately identify the presence or absence of 3 common ocular diseases (diabetic retinopathy [DR], age-related macular degeneration [AMD], and glaucoma) using electronic health record (EHR) data.
Design: Database study.
Participants: Three thousand nine hundred fourteen eyes from 1957 patients at 2 Sight OUtcomes Research CollaborativE (SOURCE) Ophthalmology Data Repository sites.
Methods: We developed enhanced phenotype identification (EPI) algorithms that search EHR fields, including eye examination findings, orders, charges, medication prescriptions, and surgery data for evidence that a patient has glaucoma, DR, or AMD. We trained our EPI models using gold standard assessments of the EHR by ophthalmologists for the presence/absence of these conditions, compared the performance of our EPI models to models developed using ICD codes alone, and validated the performance of model using data from another SOURCE site.
Main outcome measures: Area under the receiver operating curve (AUC), area under the precision-recall curve (AUPRC), and model calibration.
Results: The AUCs of our EPI models were better than ICD-only models for glaucoma (0.97 vs. 0.90), DR (0.997 vs. 0.98), and AMD (0.99 vs. 0.95). The AUPRCs of our EPI models were also much better than ICD-only models for glaucoma (0.79 vs. 0.32), DR (0.96 vs. 0.84), and AMD (0.74 vs. 0.55). When testing on patients from a second SOURCE site, the AUC and AUPRC for glaucoma (0.93, 0.74), DR (0.98, 0.77), and AMD (0.96, 0.64) were slightly worse than the primary site but still quite high. However, for all 3 conditions, model calibration was worse at the second site.
Conclusions: Leveraging machine learning, we developed EPI models to accurately identify most patients with glaucoma, DR, and AMD in real-world datasets. The EPI models significantly outperform ICD-only models in identifying patients confirmed to have these conditions. These findings underscore the potential of using comprehensive EHR data combined with advanced machine learning techniques to improve the accuracy of patient phenotype identification, leading to better patient management and clinical outcomes.
Financial disclosures: Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.
Keywords: Diabetic retinopathy; Electronic health records; Glaucoma; Machine learning; Macular degeneration.
© 2025 by the American Academy of Ophthalmologyé.
References
-
- Rosenblatt T.R., Vail D., Saroj N., et al. Increasing incidence and prevalence of common retinal diseases in retina practices across the United States. Ophthalmic Surg Lasers Imaging Retina. 2021;52:29–36. - PubMed
-
- Shih V., Parekh M., Multani J.K., et al. Clinical and economic burden of glaucoma by disease severity: a United States claims-based analysis. Ophthalmol Glaucoma. 2021;4:490–503. - PubMed
-
- Thorne J.E., Suhler E., Skup M., et al. Prevalence of noninfectious uveitis in the United States: a claims-based analysis. JAMA Ophthalmol. 2016;134:1237–1245. - PubMed
LinkOut - more resources
Full Text Sources