BioInstruct: instruction tuning of large language models for biomedical natural language processing

Hieu Tran¹, Zhichao Yang¹, Zonghai Yao¹, Hong Yu^{1

2

3

4}

Affiliations

¹ Manning College of Information and Computer Sciences, University of Massachusetts Amherst, Amherst, MA 01003, United States.
² Department of Medicine, University of Massachusetts Medical School, Worcester, MA 01655, United States.
³ Center for Biomedical and Health Research in Data Sciences, Miner School of Computer and Information Sciences, University of Massachusetts Lowell, Lowell, MA 01854, United States.
⁴ Center for Healthcare Organization and Implementation Research, VA Bedford Health Care, Bedford, MA 01730, United States.

PMID: 38833265
PMCID: PMC11339494
DOI: 10.1093/jamia/ocae122

BioInstruct: instruction tuning of large language models for biomedical natural language processing

Hieu Tran et al. J Am Med Inform Assoc. 2024.

. 2024 Sep 1;31(9):1821-1832.

doi: 10.1093/jamia/ocae122.

Authors

Hieu Tran¹, Zhichao Yang¹, Zonghai Yao¹, Hong Yu^{1

2

3

4}

Affiliations

¹ Manning College of Information and Computer Sciences, University of Massachusetts Amherst, Amherst, MA 01003, United States.
² Department of Medicine, University of Massachusetts Medical School, Worcester, MA 01655, United States.
³ Center for Biomedical and Health Research in Data Sciences, Miner School of Computer and Information Sciences, University of Massachusetts Lowell, Lowell, MA 01854, United States.
⁴ Center for Healthcare Organization and Implementation Research, VA Bedford Health Care, Bedford, MA 01730, United States.

PMID: 38833265
PMCID: PMC11339494
DOI: 10.1093/jamia/ocae122

Abstract

Objectives: To enhance the performance of large language models (LLMs) in biomedical natural language processing (BioNLP) by introducing a domain-specific instruction dataset and examining its impact when combined with multi-task learning principles.

Materials and methods: We created the BioInstruct, comprising 25 005 instructions to instruction-tune LLMs (LLaMA 1 and 2, 7B and 13B version). The instructions were created by prompting the GPT-4 language model with 3-seed samples randomly drawn from an 80 human curated instructions. We employed Low-Rank Adaptation (LoRA) for parameter-efficient fine-tuning. We then evaluated these instruction-tuned LLMs on several BioNLP tasks, which can be grouped into 3 major categories: question answering (QA), information extraction (IE), and text generation (GEN). We also examined whether categories (eg, QA, IE, and generation) of instructions impact model performance.

Results and discussion: Comparing with LLMs without instruction-tuned, our instruction-tuned LLMs demonstrated marked performance gains: 17.3% in QA on average accuracy metric, 5.7% in IE on average F1 metric, and 96% in Generation tasks on average GPT-4 score metric. Our 7B-parameter instruction-tuned LLaMA 1 model was competitive or even surpassed other LLMs in the biomedical domain that were also fine-tuned from LLaMA 1 with vast domain-specific data or a variety of tasks. Our results also show that the performance gain is significantly higher when instruction fine-tuning is conducted with closely related tasks. Our findings align with the observations of multi-task learning, suggesting the synergies between 2 tasks.

Conclusion: The BioInstruct dataset serves as a valuable resource and instruction tuned LLMs lead to the best performing BioNLP applications.

Keywords: information extraction; instruction tuning; large language models; multi-task learning; natural language inference; question answering; text generation.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

**Figure 1.**
Distribution of our BioInstruct dataset. (A) Task type distribution of 25 005 natural language instructions. (B) The top 20 most common root verbs (inner circle) and their top 4 direct noun objects (outer circle) in the generated instructions.

**Figure 2.**
Performance of different tasks in BioInstruct. Each scatter corresponds to a subtask to evaluate. Each colored dot inside the scatter represents a different training task. The black dot represents the baseline performance of LLaMA 2 7B without BioInstruct fine-tuning. The purple dot represents the performance of LLaMA 2 7B fine-tuned on all BioInstruct tasks. We then ablate BioInstruct. Above each scatter, we provide the best single task fine-tuned in the first row. In the second row, we also provide the best fine-tuning task in addition to the specific task A, where task A is the same as the evaluation task.

**Figure 3.**
Performance on different evaluation tasks when LLaMA 2 7B is fine-tuned on varying number of instruction samples in BioInstruct.

See this image and copyright information in PMC

References

1. Brown TB, Mann B, Ryder N, et al. Language models are few-shot learners. In: Larochelle H, Ranzato M, Hadsell R, Balcan MF, Lin H, eds. Advances in Neural Information Processing Systems. Vol 33. 2020:1877-1901. https://proceedings.neurips.cc/paper/2020/file/1457c0d6bfcb4967418bfb8ac....
1. Sanh V, Webson A, Raffel C, et al. 2021. Multitask prompted training enables zero-shot task generalization. CoRR; abs/2110.08207. https://arxiv.org/abs/2110.08207.
1. Chowdhery A, Narang S, Devlin J, et al. 2022. PaLM: scaling language modeling with pathways. arXiv, arXiv:220402311, preprint: not peer reviewed. https://arxiv.org/abs/2204.02311.
1. Longpre S, Hou L, Vu T, et al. 2023. The flan collection: designing data and methods for effective instruction tuning. https://arxiv.org/abs/2301.13688.
1. OpenAI. 2023. GPT-4 Technical Report. arXiv, arXiv:230308774, preprint: not peer reviewed. https://api.semanticscholar.org/CorpusID:257532815.

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

BioInstruct: instruction tuning of large language models for biomedical natural language processing

Affiliations

BioInstruct: instruction tuning of large language models for biomedical natural language processing

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources