Do LLMs Surpass Encoders for Biomedical NER?
- PMID: 40787150
- PMCID: PMC12335919
- DOI: 10.1109/ICHI64645.2025.00048
Do LLMs Surpass Encoders for Biomedical NER?
Abstract
Recognizing spans of biomedical concepts and their types (e.g., drug or gene) in free text, often called biomedical named entity recognition (NER), is a basic component of information extraction (IE) pipelines. Without a strong NER component, other applications, such as knowledge discovery and information retrieval, are not practical. State-of-the-art in NER shifted from traditional ML models to deep neural networks with transformer-based encoder models (e.g., BERT) emerging as the current standard. However, decoder models (also called large language models or LLMs) are gaining traction in IE. But LLM-driven NER often ignores positional information due to the generative nature of decoder models. Furthermore, they are computationally very expensive (both in inference time and hardware needs). Hence, it is worth exploring if they actually excel at biomedical NER and assess any associated trade-offs (performance vs efficiency). This is exactly what we do in this effort employing the same BIO entity tagging scheme (that retains positional information) using five different datasets with varying proportions of longer entities. Our results show that the LLMs chosen (Mistral and Llama: 8B range) often outperform best encoder models (BERT-(un)cased, BiomedBERT, and DeBERTav3: 300M range) by 2-8% in F-scores except for one dataset, where they equal encoder performance. This gain is more prominent among longer entities of length ≥ 3 tokens. However, LLMs are one to two orders of magnitude more expensive at inference time and may need cost prohibitive hardware. Thus, when performance differences are small or real time user feedback is needed, encoder models might still be more suitable than LLMs.
Keywords: encoder models; large language models; named entity recognition.
Figures
Similar articles
-
From BERT to generative AI - Comparing encoder-only vs. large language models in a cohort of lung cancer patients for named entity recognition in unstructured medical reports.Comput Biol Med. 2025 Sep;195:110665. doi: 10.1016/j.compbiomed.2025.110665. Epub 2025 Jun 24. Comput Biol Med. 2025. PMID: 40554973
-
Sexual Harassment and Prevention Training.2024 Mar 29. In: StatPearls [Internet]. Treasure Island (FL): StatPearls Publishing; 2025 Jan–. 2024 Mar 29. In: StatPearls [Internet]. Treasure Island (FL): StatPearls Publishing; 2025 Jan–. PMID: 36508513 Free Books & Documents.
-
Advancing entity recognition in biomedicine via instruction tuning of large language models.Bioinformatics. 2024 Mar 29;40(4):btae163. doi: 10.1093/bioinformatics/btae163. Bioinformatics. 2024. PMID: 38514400 Free PMC article.
-
Automated monitoring compared to standard care for the early detection of sepsis in critically ill patients.Cochrane Database Syst Rev. 2018 Jun 25;6(6):CD012404. doi: 10.1002/14651858.CD012404.pub2. Cochrane Database Syst Rev. 2018. PMID: 29938790 Free PMC article.
-
A rapid and systematic review of the clinical effectiveness and cost-effectiveness of paclitaxel, docetaxel, gemcitabine and vinorelbine in non-small-cell lung cancer.Health Technol Assess. 2001;5(32):1-195. doi: 10.3310/hta5320. Health Technol Assess. 2001. PMID: 12065068
References
Grants and funding
LinkOut - more resources
Full Text Sources