Benchmarking large language models for biomedical natural language processing applications and recommendations
- PMID: 40188094
- PMCID: PMC11972378
- DOI: 10.1038/s41467-025-56989-2
Benchmarking large language models for biomedical natural language processing applications and recommendations
Abstract
The rapid growth of biomedical literature poses challenges for manual knowledge curation and synthesis. Biomedical Natural Language Processing (BioNLP) automates the process. While Large Language Models (LLMs) have shown promise in general domains, their effectiveness in BioNLP tasks remains unclear due to limited benchmarks and practical guidelines. We perform a systematic evaluation of four LLMs-GPT and LLaMA representatives-on 12 BioNLP benchmarks across six applications. We compare their zero-shot, few-shot, and fine-tuning performance with the traditional fine-tuning of BERT or BART models. We examine inconsistencies, missing information, hallucinations, and perform cost analysis. Here, we show that traditional fine-tuning outperforms zero- or few-shot LLMs in most tasks. However, closed-source LLMs like GPT-4 excel in reasoning-related tasks such as medical question answering. Open-source LLMs still require fine-tuning to close performance gaps. We find issues like missing information and hallucinations in LLM outputs. These results offer practical insights for applying LLMs in BioNLP.
© 2025. This is a U.S. Government work and not under copyright protection in the US; foreign copyright protection may apply.
Conflict of interest statement
Competing interests: Dr. Jingcheng Du and Dr. Hua Xu have research-related financial interests at Melax Technologies Inc. The remaining authors declare no competing interests.
Figures
Update of
-
Benchmarking large language models for biomedical natural language processing applications and recommendations.ArXiv [Preprint]. 2025 Apr 25:arXiv:2305.16326v5. ArXiv. 2025. Update in: Nat Commun. 2025 Apr 06;16(1):3280. doi: 10.1038/s41467-025-56989-2. PMID: 41031069 Free PMC article. Updated. Preprint.
References
-
- Blake, C. Beyond genes, proteins, and abstracts: Identifying scientific claims from full-text biomedical articles. J. Biomed. Inform.43, 173–189 (2010). - PubMed
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
