Leveraging weak supervision to perform named entity recognition in electronic health records progress notes to identify the ophthalmology exam
- PMID: 36179600
- PMCID: PMC9901505
- DOI: 10.1016/j.ijmedinf.2022.104864
Leveraging weak supervision to perform named entity recognition in electronic health records progress notes to identify the ophthalmology exam
Abstract
Objective: To develop deep learning models to recognize ophthalmic examination components from clinical notes in electronic health records (EHR) using a weak supervision approach.
Methods: A corpus of 39,099 ophthalmology notes weakly labeled for 24 examination entities was assembled from the EHR of one academic center. Four pre-trained transformer-based language models (DistilBert, BioBert, BlueBert, and ClinicalBert) were fine-tuned to this named entity recognition task and compared to a baseline regular expression model. Models were evaluated on the weakly labeled test dataset, a human-labeled sample of that set, and a human-labeled independent dataset.
Results: On the weakly labeled test set, all transformer-based models had recall > 0.93, with precision varying from 0.815 to 0.843. The baseline model had lower recall (0.769) and precision (0.682). On the human-annotated sample, the baseline model had high recall (0.962, 95 % CI 0.955-0.067) with variable precision across entities (0.081-0.999). Bert models had recall ranging from 0.771 to 0.831, and precision >=0.973. On the independent dataset, precision was 0.926 and recall 0.458 for BlueBert. The baseline model had better recall (0.708, 95 % CI 0.674-0.738) but worse precision (0.399, 95 % CI -0.352-0.451).
Conclusion: We developed the first deep learning system to recognize eye examination components from clinical notes, leveraging a novel opportunity for weak supervision. Transformer-based models had high precision on human-annotated labels, whereas the baseline model had poor precision but higher recall. This system may be used to improve cohort and feature identification using free-text notes.Our weakly supervised approach may help amass large datasets of domain-specific entities from EHRs in many fields.
Keywords: Deep learning; Electronic health records; Named entity recognition; Natural language processing; Ophthalmology; Weak supervision.
Copyright © 2022 Elsevier B.V. All rights reserved.
Conflict of interest statement
Declaration of Competing Interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Figures



Similar articles
-
Automated Recognition of Visual Acuity Measurements in Ophthalmology Clinical Notes Using Deep Learning.Ophthalmol Sci. 2023 Jul 19;4(2):100371. doi: 10.1016/j.xops.2023.100371. eCollection 2024 Mar-Apr. Ophthalmol Sci. 2023. PMID: 37868799 Free PMC article.
-
Evaluating the Performance of Large Language Models for Named Entity Recognition in Ophthalmology Clinical Free-Text Notes.AMIA Annu Symp Proc. 2025 May 22;2024:778-787. eCollection 2024. AMIA Annu Symp Proc. 2025. PMID: 40417582 Free PMC article.
-
Looking for low vision: Predicting visual prognosis by fusing structured and free-text data from electronic health records.Int J Med Inform. 2022 Mar;159:104678. doi: 10.1016/j.ijmedinf.2021.104678. Epub 2021 Dec 30. Int J Med Inform. 2022. PMID: 34999410 Free PMC article.
-
Overview of the First Natural Language Processing Challenge for Extracting Medication, Indication, and Adverse Drug Events from Electronic Health Record Notes (MADE 1.0).Drug Saf. 2019 Jan;42(1):99-111. doi: 10.1007/s40264-018-0762-z. Drug Saf. 2019. PMID: 30649735 Free PMC article. Review.
-
Comparative analysis of generative LLMs for labeling entities in clinical notes.Genomics Inform. 2025 Feb 6;23(1):3. doi: 10.1186/s44342-024-00036-x. Genomics Inform. 2025. PMID: 39915888 Free PMC article. Review.
Cited by
-
Health Care Language Models and Their Fine-Tuning for Information Extraction: Scoping Review.JMIR Med Inform. 2024 Oct 21;12:e60164. doi: 10.2196/60164. JMIR Med Inform. 2024. PMID: 39432345 Free PMC article.
-
Comparison of Ophthalmologist and Large Language Model Chatbot Responses to Online Patient Eye Care Questions.JAMA Netw Open. 2023 Aug 1;6(8):e2330320. doi: 10.1001/jamanetworkopen.2023.30320. JAMA Netw Open. 2023. PMID: 37606922 Free PMC article.
-
Automated Recognition of Visual Acuity Measurements in Ophthalmology Clinical Notes Using Deep Learning.Ophthalmol Sci. 2023 Jul 19;4(2):100371. doi: 10.1016/j.xops.2023.100371. eCollection 2024 Mar-Apr. Ophthalmol Sci. 2023. PMID: 37868799 Free PMC article.
-
Utilizing Large Language Models in Ophthalmology: The Current Landscape and Challenges.Ophthalmol Ther. 2024 Oct;13(10):2543-2558. doi: 10.1007/s40123-024-01018-6. Epub 2024 Aug 24. Ophthalmol Ther. 2024. PMID: 39180701 Free PMC article. Review.
-
Predicting Glaucoma Surgical Outcomes Using Neural Networks and Machine Learning on Electronic Health Records.Transl Vis Sci Technol. 2024 Jun 3;13(6):15. doi: 10.1167/tvst.13.6.15. Transl Vis Sci Technol. 2024. PMID: 38904612 Free PMC article.
References
-
- Vaswani A, Shazeer N, Parmar N, et al. Attention Is All You Need. arXiv [cs.CL]. 2017.http://arxiv.org/abs/1706.03762
-
- Huang K, Altosaar J, Ranganath R. ClinicalBERT: Modeling Clinical Notes and Predicting Hospital Readmission. arXiv [cs.CL]. 2019.http://arxiv.org/abs/1904.05342
-
- Alsentzer E, Murphy JR, Boag W, et al. Publicly Available Clinical BERT Embeddings. arXiv [cs.CL]. 2019.http://arxiv.org/abs/1904.03323
-
- Peng Y, Yan S, Lu Z. Transfer Learning in Biomedical Natural Language Processing: An Evaluation of BERT and ELMo on Ten Benchmarking Datasets. arXiv [cs.CL]. 2019.http://arxiv.org/abs/1906.05474
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources