Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jan 24;5(4):100717.
doi: 10.1016/j.xops.2025.100717. eCollection 2025 Jul-Aug.

Enhanced Phenotype Identification of Common Ocular Diseases in Real-World Datasets

Collaborators, Affiliations

Enhanced Phenotype Identification of Common Ocular Diseases in Real-World Datasets

Joshua D Stein et al. Ophthalmol Sci. .

Abstract

Objective: For studies using real-world data, accurately identifying patients with phenotypes of interest is challenging. To identify cohorts of interest, most studies exclusively use the International Classification of Diseases (ICD) billing codes, which can be limiting. We developed a method to accurately identify the presence or absence of 3 common ocular diseases (diabetic retinopathy [DR], age-related macular degeneration [AMD], and glaucoma) using electronic health record (EHR) data.

Design: Database study.

Participants: Three thousand nine hundred fourteen eyes from 1957 patients at 2 Sight OUtcomes Research CollaborativE (SOURCE) Ophthalmology Data Repository sites.

Methods: We developed enhanced phenotype identification (EPI) algorithms that search EHR fields, including eye examination findings, orders, charges, medication prescriptions, and surgery data for evidence that a patient has glaucoma, DR, or AMD. We trained our EPI models using gold standard assessments of the EHR by ophthalmologists for the presence/absence of these conditions, compared the performance of our EPI models to models developed using ICD codes alone, and validated the performance of model using data from another SOURCE site.

Main outcome measures: Area under the receiver operating curve (AUC), area under the precision-recall curve (AUPRC), and model calibration.

Results: The AUCs of our EPI models were better than ICD-only models for glaucoma (0.97 vs. 0.90), DR (0.997 vs. 0.98), and AMD (0.99 vs. 0.95). The AUPRCs of our EPI models were also much better than ICD-only models for glaucoma (0.79 vs. 0.32), DR (0.96 vs. 0.84), and AMD (0.74 vs. 0.55). When testing on patients from a second SOURCE site, the AUC and AUPRC for glaucoma (0.93, 0.74), DR (0.98, 0.77), and AMD (0.96, 0.64) were slightly worse than the primary site but still quite high. However, for all 3 conditions, model calibration was worse at the second site.

Conclusions: Leveraging machine learning, we developed EPI models to accurately identify most patients with glaucoma, DR, and AMD in real-world datasets. The EPI models significantly outperform ICD-only models in identifying patients confirmed to have these conditions. These findings underscore the potential of using comprehensive EHR data combined with advanced machine learning techniques to improve the accuracy of patient phenotype identification, leading to better patient management and clinical outcomes.

Financial disclosures: Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.

Keywords: Diabetic retinopathy; Electronic health records; Glaucoma; Machine learning; Macular degeneration.

PubMed Disclaimer

References

    1. Rosenblatt T.R., Vail D., Saroj N., et al. Increasing incidence and prevalence of common retinal diseases in retina practices across the United States. Ophthalmic Surg Lasers Imaging Retina. 2021;52:29–36. - PubMed
    1. Wang S.Y., Andrews C.A., Herman W.H., et al. Incidence and risk factors for developing diabetic retinopathy among youths with type 1 or type 2 diabetes throughout the United States. Ophthalmology. 2017;124:424–430. - PMC - PubMed
    1. Shih V., Parekh M., Multani J.K., et al. Clinical and economic burden of glaucoma by disease severity: a United States claims-based analysis. Ophthalmol Glaucoma. 2021;4:490–503. - PubMed
    1. Almony A., Keyloun K.R., Shah-Manek B., et al. Clinical and economic burden of neovascular age-related macular degeneration by disease status: a US claims-based analysis. J Manag Care Spec Pharm. 2021;27:1260–1272. - PMC - PubMed
    1. Thorne J.E., Suhler E., Skup M., et al. Prevalence of noninfectious uveitis in the United States: a claims-based analysis. JAMA Ophthalmol. 2016;134:1237–1245. - PubMed

LinkOut - more resources