Leveraging weak supervision to perform named entity recognition in electronic health records progress notes to identify the ophthalmology exam

Sophia Y Wang¹, Justin Huang², Hannah Hwang³, Wendeng Hu⁴, Shiqi Tao⁴, Tina Hernandez-Boussard⁵

Affiliations

¹ Department of Ophthalmology, Byers Eye Institute, Stanford University, Palo Alto, CA, USA. Electronic address: sywang@stanford.edu.
² Johns Hopkins School of Medicine, Baltimore, MD, USA.
³ Department of Ophthalmology, Weill Cornell Medicine, New York, NY, USA.
⁴ Department of Ophthalmology, Byers Eye Institute, Stanford University, Palo Alto, CA, USA.
⁵ Center for Biomedical Informatics Research, Stanford University, Palo Alto, CA, USA.

PMID: 36179600
PMCID: PMC9901505
DOI: 10.1016/j.ijmedinf.2022.104864

Leveraging weak supervision to perform named entity recognition in electronic health records progress notes to identify the ophthalmology exam

Sophia Y Wang et al. Int J Med Inform. 2022 Nov.

. 2022 Nov:167:104864.

doi: 10.1016/j.ijmedinf.2022.104864. Epub 2022 Sep 16.

Authors

Sophia Y Wang¹, Justin Huang², Hannah Hwang³, Wendeng Hu⁴, Shiqi Tao⁴, Tina Hernandez-Boussard⁵

Affiliations

¹ Department of Ophthalmology, Byers Eye Institute, Stanford University, Palo Alto, CA, USA. Electronic address: sywang@stanford.edu.
² Johns Hopkins School of Medicine, Baltimore, MD, USA.
³ Department of Ophthalmology, Weill Cornell Medicine, New York, NY, USA.
⁴ Department of Ophthalmology, Byers Eye Institute, Stanford University, Palo Alto, CA, USA.
⁵ Center for Biomedical Informatics Research, Stanford University, Palo Alto, CA, USA.

PMID: 36179600
PMCID: PMC9901505
DOI: 10.1016/j.ijmedinf.2022.104864

Abstract

Objective: To develop deep learning models to recognize ophthalmic examination components from clinical notes in electronic health records (EHR) using a weak supervision approach.

Methods: A corpus of 39,099 ophthalmology notes weakly labeled for 24 examination entities was assembled from the EHR of one academic center. Four pre-trained transformer-based language models (DistilBert, BioBert, BlueBert, and ClinicalBert) were fine-tuned to this named entity recognition task and compared to a baseline regular expression model. Models were evaluated on the weakly labeled test dataset, a human-labeled sample of that set, and a human-labeled independent dataset.

Results: On the weakly labeled test set, all transformer-based models had recall > 0.93, with precision varying from 0.815 to 0.843. The baseline model had lower recall (0.769) and precision (0.682). On the human-annotated sample, the baseline model had high recall (0.962, 95 % CI 0.955-0.067) with variable precision across entities (0.081-0.999). Bert models had recall ranging from 0.771 to 0.831, and precision >=0.973. On the independent dataset, precision was 0.926 and recall 0.458 for BlueBert. The baseline model had better recall (0.708, 95 % CI 0.674-0.738) but worse precision (0.399, 95 % CI -0.352-0.451).

Conclusion: We developed the first deep learning system to recognize eye examination components from clinical notes, leveraging a novel opportunity for weak supervision. Transformer-based models had high precision on human-annotated labels, whereas the baseline model had poor precision but higher recall. This system may be used to improve cohort and feature identification using free-text notes.Our weakly supervised approach may help amass large datasets of domain-specific entities from EHRs in many fields.

Keywords: Deep learning; Electronic health records; Named entity recognition; Natural language processing; Ophthalmology; Weak supervision.

PubMed Disclaimer

Conflict of interest statement

Declaration of Competing Interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Figures

**Figure 1.. Example SmartForm and corresponding clinical progress note**
The leftmost panel shows the SmartForm template which clinicians use to enter text documenting different parts of the eye exam into discrete labeled fields. This information can then be imported via customizable templates into each clinician’s progress notes. The progress notes are then stored into a research database. VA = visual acuity; sc = counting fingers; IOP = intra-ocular pressure; L/L = lids and lashes; C/S = conjunctiva and sclera; K = cornea; AC = anterior chamber; Ant Vit = anterior vitreous; HPI = history of present illness; f/u = follow-up.

**Figure 2.. Preprocessing pipeline for clinical progress notes and corresponding SmartForm entity labels**
An example progress note and its corresponding SmartForm documentation is shown, as well as the process by which SmartForm labeled entities are assigned to individual words in the progress note. Notes with individual words labeled as entities are tokenized, split into shorter subdocuments, and word piece tokenized as appropriate for input into each Bert model. Label Name: entity label describing a portion of the eye examination. Measurement: The measurement associated with an examination component. Token: A single element of text for computational processing. Labels: entity labels assigned to each token for computational processing. sleodll = slit lamp exam, right eye, lids and lashes; sleosll = slit lamp exam, left eye, lids and lashes; sleodcs = slit lamp exam, right eye, conjunctiva and sclera; sleoscs = slit lamp exam, left eye, conjunctiva and sclera.

**Figure 3.. Examples of Common Types of BERT and Baseline Model Prediction Error**
Examples of common model mistakes on the text from the test set are given in the left column, along with corresponding explanations to the right. Areas highlighted in yellow show the model prediction. Areas boxed in red are those areas where the model has made a mistake, either in the prediction label or in failure to recognize any entity.

See this image and copyright information in PMC

References

1. Vaswani A, Shazeer N, Parmar N, et al. Attention Is All You Need. arXiv [cs.CL]. 2017.http://arxiv.org/abs/1706.03762
1. Lee J, Yoon W, Kim S, et al. BioBERT: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics 2020;36:1234–40. - PMC - PubMed
1. Huang K, Altosaar J, Ranganath R. ClinicalBERT: Modeling Clinical Notes and Predicting Hospital Readmission. arXiv [cs.CL]. 2019.http://arxiv.org/abs/1904.05342
1. Alsentzer E, Murphy JR, Boag W, et al. Publicly Available Clinical BERT Embeddings. arXiv [cs.CL]. 2019.http://arxiv.org/abs/1904.03323
1. Peng Y, Yan S, Lu Z. Transfer Learning in Biomedical Natural Language Processing: An Evaluation of BERT and ELMo on Ten Benchmarking Datasets. arXiv [cs.CL]. 2019.http://arxiv.org/abs/1906.05474

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Leveraging weak supervision to perform named entity recognition in electronic health records progress notes to identify the ophthalmology exam

Affiliations

Leveraging weak supervision to perform named entity recognition in electronic health records progress notes to identify the ophthalmology exam

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources