Named Entity Recognition in Electronic Health Records: A Methodological Review

María C Durango¹, Ever A Torres-Silva¹, Andrés Orozco-Duque^{1

2}

Affiliations

¹ Grupo de Investigación e Innovación Biomédica, Instituto Tecnológico Metropolitano, Antioquia, Colombia.
² Facultad de Ingenierías, Universidad de Medellín, Antioquia, Colombia.

PMID: 37964451
PMCID: PMC10651400
DOI: 10.4258/hir.2023.29.4.286

Named Entity Recognition in Electronic Health Records: A Methodological Review

María C Durango et al. Healthc Inform Res. 2023 Oct.

. 2023 Oct;29(4):286-300.

doi: 10.4258/hir.2023.29.4.286. Epub 2023 Oct 31.

Authors

María C Durango¹, Ever A Torres-Silva¹, Andrés Orozco-Duque^{1

2}

Affiliations

¹ Grupo de Investigación e Innovación Biomédica, Instituto Tecnológico Metropolitano, Antioquia, Colombia.
² Facultad de Ingenierías, Universidad de Medellín, Antioquia, Colombia.

PMID: 37964451
PMCID: PMC10651400
DOI: 10.4258/hir.2023.29.4.286

Abstract

Objectives: A substantial portion of the data contained in Electronic Health Records (EHR) is unstructured, often appearing as free text. This format restricts its potential utility in clinical decision-making. Named entity recognition (NER) methods address the challenge of extracting pertinent information from unstructured text. The aim of this study was to outline the current NER methods and trace their evolution from 2011 to 2022.

Methods: We conducted a methodological literature review of NER methods, with a focus on distinguishing the classification models, the types of tagging systems, and the languages employed in various corpora.

Results: Several methods have been documented for automatically extracting relevant information from EHRs using natural language processing techniques such as NER and relation extraction (RE). These methods can automatically extract concepts, events, attributes, and other data, as well as the relationships between them. Most NER studies conducted thus far have utilized corpora in English or Chinese. Additionally, the bidirectional encoder representation from transformers using the BIO tagging system architecture is the most frequently reported classification scheme. We discovered a limited number of papers on the implementation of NER or RE tasks in EHRs within a specific clinical domain.

Conclusions: EHRs play a pivotal role in gathering clinical information and could serve as the primary source for automated clinical decision support systems. However, the creation of new corpora from EHRs in specific clinical domains is essential to facilitate the swift development of NER and RE models applied to EHRs for use in clinical practice.

Keywords: Clinical Decision Support System; Deep Learning; Electronic Health Records; Natural Language Processing; Supervised Machine Learning.

PubMed Disclaimer

Conflict of interest statement

Conflict of Interest

No potential conflict of interest relevant to this article was reported.

Figures

**Figure 1**
Flow diagram of the methodological review process. EHR: Electronic Health Record, NLP: natural language processing, NER: named entity recognition.

**Figure 2**
Timeline of named entity recognition models. ML: machine learning, LSTM: long short-term memory, BiLSTM: bidirectional long short-term memory, CNN: convolutional neural network, CRF: conditional random field, RNN: recurrent neural network, BiGRU: bidirectional gated recurrent unit, BERT: bidirectional encoder representations from transformers.

**Figure 3**
Named entity recognition approaches and types of tagging. GRU: gated recurrent unit, BiGUR: bidirectional gated recurrent unit, CNN: convolutional neural network, RNN: recurrent neural network, LSTM: long short-term memory, BiLSTM: bidirectional long short-term memory, ML: machine learning.

**Figure 4**
Corpus languages, types of models, and named entity recognition targets. ML: machine learning.

See this image and copyright information in PMC

References

1. Murdoch TB, Detsky AS. The inevitable application of big data to health care. JAMA. 2013;309(13):1351–2. doi: 10.1001/jama.2013.393. - DOI - PubMed
1. Yehia E, Boshnak H, AbdelGaber S, Abdo A, Elzanfaly DS. Ontology-based clinical information extraction from physician’s free-text notes. J Biomed Inform. 2019;98:103276. doi: 10.1016/j.jbi.2019.103276. - DOI - PubMed
1. ElDin HG, AbdulRazek M, Abdelshafi M, Sahlol AT. Med-Flair: medical named entity recognition for diseases and medications based on Flair embedding. Procedia Comput Sci. 2021;189:67–75. doi: 10.1016/j.procs.2021.05.078. - DOI
1. Kaplar A, Stosovic M, Kaplar A, Brkovic V, Naumovic R, Kovacevic A. Evaluation of clinical named entity recognition methods for Serbian electronic health records. Int J Med Inform. 2022;164:104805. doi: 10.1016/j.ijmedinf.2022.104805. - DOI - PubMed
1. Neuraz A, Looten V, Rance B, Daniel N, Garcelon N, Llanos LC, et al. Do you need embeddings trained on a massive specialized corpus for your clinical natural language processing task? Stud Health Technol Inform. 2019;264:1558–9. doi: 10.3233/shti190533. - DOI - PubMed

Grants and funding

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Named Entity Recognition in Electronic Health Records: A Methodological Review

Affiliations

Named Entity Recognition in Electronic Health Records: A Methodological Review

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Grants and funding

LinkOut - more resources

Full Text Sources