Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Aug:164:104805.
doi: 10.1016/j.ijmedinf.2022.104805. Epub 2022 May 25.

Evaluation of clinical named entity recognition methods for Serbian electronic health records

Affiliations

Evaluation of clinical named entity recognition methods for Serbian electronic health records

Aleksandar Kaplar et al. Int J Med Inform. 2022 Aug.

Abstract

Background and objectives: The importance of clinical natural language processing (NLP) has increased with the adoption of electronic health records (EHRs). One of the critical tasks in clinical NLP is named entity recognition (NER). Clinical NER in the Serbian language is a severely under-researched area. The few approaches that have been proposed so far are based on rules or machine-learning models with hand-crafted features, while current state-of-the-art models have not been explored. The objective of this paper is to assess the performance of state-of-the-art NER methods on clinical narratives in the Serbian language.

Materials and methods: We designed an experimental setup for a comprehensive evaluation of state-of-the-art NER models. The gold standard corpus we used for the evaluation is comprised of discharge summaries from the Clinic for Nephrology at the University Clinical Center of Serbia. The following models were evaluated: conditional random fields (CRF), multilingual transformers (BERT Multilingual and XLM RoBERTa), and long short-term memory (LSTM) recurrent neural networks, and their ensembles. In addition, we investigated the necessity of the pretraining task of transformer based models and the use of pretrained word embeddings with LSTM model.

Results: Our results show that individually CRF had the best precision, the pretrained BERT Multilingual model had the best recall values, and the LSTM model had the best F1 score. The best performance was achieved by combining the existing models in a majority voting ensemble with an F1 score of 0.892. The presented results are similar to the inter annotator agreement on our gold standard corpus and are comparable to existing state-of-the-art results for clinical NER reported in literature.

Conclusion: Existing state-of-the-art models can provide viable results for clinical named entity recognition when applied to languages with the complexity of the Serbian language without major modifications.

Keywords: BERT; Clinical named entity recognition; Electronic health records; Serbian language; Transformers.

PubMed Disclaimer

LinkOut - more resources