. 2024 Mar 29;40(4):btae163.

doi: 10.1093/bioinformatics/btae163.

Advancing entity recognition in biomedicine via instruction tuning of large language models

Vipina K Keloth¹, Yan Hu², Qianqian Xie¹, Xueqing Peng¹, Yan Wang¹, Andrew Zheng³, Melih Selek⁴, Kalpana Raja¹, Chih Hsuan Wei⁵, Qiao Jin⁵, Zhiyong Lu⁵, Qingyu Chen^{1

5}, Hua Xu¹

Affiliations

¹ Section of Biomedical Informatics and Data Science, School of Medicine, Yale University, New Haven, CT-06510, United States.
² McWilliams School of Biomedical Informatics, University of Texas Health Science at Houston, Houston, TX-77030, United States.
³ William P. Clements High School, Sugar Land, TX-77479, United States.
⁴ Stephen F. Austin High School, Sugar Land, TX-77498, United States.
⁵ National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD-20894, United States.

PMID: 38514400
PMCID: PMC11001490
DOI: 10.1093/bioinformatics/btae163

Advancing entity recognition in biomedicine via instruction tuning of large language models

Vipina K Keloth et al. Bioinformatics. 2024.

. 2024 Mar 29;40(4):btae163.

doi: 10.1093/bioinformatics/btae163.

Authors

Vipina K Keloth¹, Yan Hu², Qianqian Xie¹, Xueqing Peng¹, Yan Wang¹, Andrew Zheng³, Melih Selek⁴, Kalpana Raja¹, Chih Hsuan Wei⁵, Qiao Jin⁵, Zhiyong Lu⁵, Qingyu Chen^{1

5}, Hua Xu¹

Affiliations

¹ Section of Biomedical Informatics and Data Science, School of Medicine, Yale University, New Haven, CT-06510, United States.
² McWilliams School of Biomedical Informatics, University of Texas Health Science at Houston, Houston, TX-77030, United States.
³ William P. Clements High School, Sugar Land, TX-77479, United States.
⁴ Stephen F. Austin High School, Sugar Land, TX-77498, United States.
⁵ National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD-20894, United States.

PMID: 38514400
PMCID: PMC11001490
DOI: 10.1093/bioinformatics/btae163

Abstract

Motivation: Large Language Models (LLMs) have the potential to revolutionize the field of Natural Language Processing, excelling not only in text generation and reasoning tasks but also in their ability for zero/few-shot learning, swiftly adapting to new tasks with minimal fine-tuning. LLMs have also demonstrated great promise in biomedical and healthcare applications. However, when it comes to Named Entity Recognition (NER), particularly within the biomedical domain, LLMs fall short of the effectiveness exhibited by fine-tuned domain-specific models. One key reason is that NER is typically conceptualized as a sequence labeling task, whereas LLMs are optimized for text generation and reasoning tasks.

Results: We developed an instruction-based learning paradigm that transforms biomedical NER from a sequence labeling task into a generation task. This paradigm is end-to-end and streamlines the training and evaluation process by automatically repurposing pre-existing biomedical NER datasets. We further developed BioNER-LLaMA using the proposed paradigm with LLaMA-7B as the foundational LLM. We conducted extensive testing on BioNER-LLaMA across three widely recognized biomedical NER datasets, consisting of entities related to diseases, chemicals, and genes. The results revealed that BioNER-LLaMA consistently achieved higher F1-scores ranging from 5% to 30% compared to the few-shot learning capabilities of GPT-4 on datasets with different biomedical entities. We show that a general-domain LLM can match the performance of rigorously fine-tuned PubMedBERT models and PMC-LLaMA, biomedical-specific language model. Our findings underscore the potential of our proposed paradigm in developing general-domain LLMs that can rival SOTA performances in multi-task, multi-domain scenarios in biomedical and health applications.

Availability and implementation: Datasets and other resources are available at https://github.com/BIDS-Xu-Lab/BioNER-LLaMA.

PubMed Disclaimer

Conflict of interest statement

The authors do not have any conflicts of interest to disclose.

Figures

**Figure 1.**
Framework for the development of instruction tuned large language models utilizing existing NER datasets.

**Figure 2.**
A prompt example used for instruction tuning to extract a disease mention (constructed from NCBI disease dataset).

**Figure 3.**
Error analysis on 100 samples of BioNER-LLaMA predictions

**Figure 4.**
Precision, recall, and F1. Variations in values when the size of training dataset is reduced to 10%, 25%, and 50% compared to full training data for BioNER-LLaMA

See this image and copyright information in PMC

Cited by

HunFlair2 in a cross-corpus evaluation of biomedical named entity recognition and normalization tools.
Sänger M, Garda S, Wang XD, Weber-Genzel L, Droop P, Fuchs B, Akbik A, Leser U. Sänger M, et al. Bioinformatics. 2024 Oct 1;40(10):btae564. doi: 10.1093/bioinformatics/btae564. Bioinformatics. 2024. PMID: 39302686 Free PMC article.
Evaluation of SURUS: a named entity recognition NLP system to extract knowledge from interventional study records.
Peeters C, Vijverberg K, Pouwer M, Westerman B, Boot M, Verberne S. Peeters C, et al. BMC Med Res Methodol. 2025 Jul 31;25(1):184. doi: 10.1186/s12874-025-02624-z. BMC Med Res Methodol. 2025. PMID: 40745274 Free PMC article.
A foundation model for human-AI collaboration in medical literature mining.
Wang Z, Cao L, Jin Q, Chan J, Wan N, Afzali B, Cho HJ, Choi CI, Emamverdi M, Gill MK, Kim SH, Li Y, Liu Y, Ong H, Rousseau J, Sheikh I, Wei JJ, Xu Z, Zallek CM, Kim K, Peng Y, Lu Z, Sun J. Wang Z, et al. ArXiv [Preprint]. 2025 Jan 27:arXiv:2501.16255v1. ArXiv. 2025. PMID: 40735107 Free PMC article. Preprint.
Benchmarking large language models for biomedical natural language processing applications and recommendations.
Chen Q, Hu Y, Peng X, Xie Q, Jin Q, Gilson A, Singer MB, Ai X, Lai PT, Wang Z, Keloth VK, Raja K, Huang J, He H, Lin F, Du J, Zhang R, Zheng WJ, Adelman RA, Lu Z, Xu H. Chen Q, et al. Nat Commun. 2025 Apr 6;16(1):3280. doi: 10.1038/s41467-025-56989-2. Nat Commun. 2025. PMID: 40188094 Free PMC article.
Toward Cross-Hospital Deployment of Natural Language Processing Systems: Model Development and Validation of Fine-Tuned Large Language Models for Disease Name Recognition in Japanese.
Shimizu S, Nishiyama T, Nagai H, Wakamiya S, Aramaki E. Shimizu S, et al. JMIR Med Inform. 2025 Jul 8;13:e76773. doi: 10.2196/76773. JMIR Med Inform. 2025. PMID: 40627819 Free PMC article.

See all "Cited by" articles

References

1. Achiam J, Adler S, Agarwal S. et al. Gpt-4 technical report. arXiv, arXiv:2303.08774, 2023, preprint: not peer reviewed.
1. Agrawal M, Hegselmann S, Lang H. et al. Large language models are few-shot clinical information extractors. In: Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, Abu Dhabi, UAE. 2022.
1. Ashok D, Lipton ZC. PromptNER: prompting for named entity recognition. arXiv, arXiv:2305.15444, 2023, preprint: not peer reviewed.
1. Beltagy I, Lo K, Cohan A. SciBERT: a pretrained language model for scientific text. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), pp. 3615–3620, Hong Kong, China. Association for Computational Linguistics. 2019.
1. Biderman S, Schoelkopf H, Anthony QG. et al. Pythia: a suite for analyzing large language models across training and scaling. In: International Conference on Machine Learning, Honolulu, Hawaii, USA. 2023.

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Advancing entity recognition in biomedicine via instruction tuning of large language models

Affiliations

Advancing entity recognition in biomedicine via instruction tuning of large language models

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Miscellaneous