Clinical concept annotation with contextual word embedding in active transfer learning environment
- PMID: 39711738
- PMCID: PMC11660282
- DOI: 10.1177/20552076241308987
Clinical concept annotation with contextual word embedding in active transfer learning environment
Abstract
Objective: The study aims to present an active learning approach that automatically extracts clinical concepts from unstructured data and classifies them into explicit categories such as Problem, Treatment, and Test while preserving high precision and recall and demonstrating the approach through experiments using i2b2 public datasets.
Methods: Initially labeled data are acquired from a lexical-based approach in sufficient amounts to perform an active learning process. A contextual word embedding similarity approach is adopted using BERT base variant models such as ClinicalBERT, DistilBERT, and SCIBERT to automatically classify the unlabeled clinical concept into explicit categories. Additionally, deep learning and large language model (LLM) are trained on acquiring label data through active learning.
Results: Using i2b2 datasets (426 clinical notes), the lexical-based method achieved precision, recall, and F1-scores of 76%, 70%, and 73%. SCIBERT excelled in active transfer learning, yielding precision of 70.84%, recall of 77.40%, F1-score of 73.97%, and accuracy of 69.30%, surpassing counterpart models. Among deep learning models, convolutional neural networks (CNNs) trained with embeddings (BERTBase, DistilBERT, SCIBERT, ClinicalBERT) achieved training accuracies of 92-95% and testing accuracies of 89-93%. These results were higher compared to other deep learning models. Additionally, we individually evaluated these LLMs; among them, ClinicalBERT achieved the highest performance, with a training accuracy of 98.4% and a testing accuracy of 96%, outperforming the others.
Conclusions: The proposed methodology enhances clinical concept extraction by integrating active learning and models like SCIBERT and CNN. It improves annotation efficiency while maintaining high accuracy, showcasing potential for clinical applications.
Keywords: Clinical concept extraction; active transfer learning; clinical concept annotation; contextual word embedding; information extraction; large language models.
© The Author(s) 2024.
Conflict of interest statement
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Figures












Similar articles
-
A comparative study on deep learning models for text classification of unstructured medical notes with various levels of class imbalance.BMC Med Res Methodol. 2022 Jul 2;22(1):181. doi: 10.1186/s12874-022-01665-y. BMC Med Res Methodol. 2022. PMID: 35780100 Free PMC article.
-
Clinical Concept Extraction with Lexical Semantics to Support Automatic Annotation.Int J Environ Res Public Health. 2021 Oct 9;18(20):10564. doi: 10.3390/ijerph182010564. Int J Environ Res Public Health. 2021. PMID: 34682315 Free PMC article.
-
RadioBERT: A deep learning-based system for medical report generation from chest X-ray images using contextual embeddings.J Biomed Inform. 2022 Nov;135:104220. doi: 10.1016/j.jbi.2022.104220. Epub 2022 Oct 10. J Biomed Inform. 2022. PMID: 36229001
-
Enhancing clinical concept extraction with contextual embeddings.J Am Med Inform Assoc. 2019 Nov 1;26(11):1297-1304. doi: 10.1093/jamia/ocz096. J Am Med Inform Assoc. 2019. PMID: 31265066 Free PMC article.
-
COVID-19 diagnosis: A comprehensive review of pre-trained deep learning models based on feature extraction algorithm.Results Eng. 2023 Jun;18:101020. doi: 10.1016/j.rineng.2023.101020. Epub 2023 Mar 16. Results Eng. 2023. PMID: 36945336 Free PMC article. Review.
References
-
- Li I, Yasunaga M, Nuzumlalı MY, et al. A neural topic-attention model for medical term abbreviation disambiguation. arXiv preprint arXiv:1910.14076. 2019. https://arxiv.org/abs/1910.14076#:∼:text=Specifically%2C%20a%20neural%20... .
-
- Navarro DF, Ijaz K, Rezazadegan D, et al. Clinical named entity recognition and relation extraction using natural language processing of medical free text: a systematic review. Int J Med Inf 2023; 177: 105122. - PubMed
LinkOut - more resources
Full Text Sources