. 2024 Sep 1;31(9):1833-1843.

doi: 10.1093/jamia/ocae045.

PMC-LLaMA: toward building open-source language models for medicine

Chaoyi Wu^{1

2}, Weixiong Lin^{1

2}, Xiaoman Zhang^{1

2}, Ya Zhang^{1

2}, Weidi Xie^{1

2}, Yanfeng Wang^{1

2}

Affiliations

¹ Cooperative Medianet Innovation Center (CMIC), Shanghai Jiao Tong University, Shanghai, 200240, China.
² Shanghai AI Laboratory, Shanghai, 200232, China.

PMID: 38613821
PMCID: PMC11639126
DOI: 10.1093/jamia/ocae045

PMC-LLaMA: toward building open-source language models for medicine

Chaoyi Wu et al. J Am Med Inform Assoc. 2024.

. 2024 Sep 1;31(9):1833-1843.

doi: 10.1093/jamia/ocae045.

Authors

Chaoyi Wu^{1

2}, Weixiong Lin^{1

2}, Xiaoman Zhang^{1

2}, Ya Zhang^{1

2}, Weidi Xie^{1

2}, Yanfeng Wang^{1

2}

Affiliations

¹ Cooperative Medianet Innovation Center (CMIC), Shanghai Jiao Tong University, Shanghai, 200240, China.
² Shanghai AI Laboratory, Shanghai, 200232, China.

PMID: 38613821
PMCID: PMC11639126
DOI: 10.1093/jamia/ocae045

Abstract

Objective: Recently, large language models (LLMs) have showcased remarkable capabilities in natural language understanding. While demonstrating proficiency in everyday conversations and question-answering (QA) situations, these models frequently struggle in domains that require precision, such as medical applications, due to their lack of domain-specific knowledge. In this article, we describe the procedure for building a powerful, open-source language model specifically designed for medicine applications, termed as PMC-LLaMA.

Materials and methods: We adapt a general-purpose LLM toward the medical domain, involving data-centric knowledge injection through the integration of 4.8M biomedical academic papers and 30K medical textbooks, as well as comprehensive domain-specific instruction fine-tuning, encompassing medical QA, rationale for reasoning, and conversational dialogues with 202M tokens.

Results: While evaluating various public medical QA benchmarks and manual rating, our lightweight PMC-LLaMA, which consists of only 13B parameters, exhibits superior performance, even surpassing ChatGPT. All models, codes, and datasets for instruction tuning will be released to the research community.

Discussion: Our contributions are 3-fold: (1) we build up an open-source LLM toward the medical domain. We believe the proposed PMC-LLaMA model can promote further development of foundation models in medicine, serving as a medical trainable basic generative language backbone; (2) we conduct thorough ablation studies to demonstrate the effectiveness of each proposed component, demonstrating how different training data and model scales affect medical LLMs; (3) we contribute a large-scale, comprehensive dataset for instruction tuning.

Conclusion: In this article, we systematically investigate the process of building up an open-source medical-specific LLM, PMC-LLaMA.

Keywords: ChatGPT; biomedical NLP; generative language models; large language models.

PubMed Disclaimer

Conflict of interest statement

None declared.

Figures

**Figure 1.**
In the left, we show the general comparison between our PMC-LLaMA with LLaMA-2 and GPT-3.5. On the right, we visually show the advantages of our model in model sizes. PMC-LLaMA is much smaller than the others.

**Figure 2.**
Distribution of medical textbooks categories. The box sizes denote the book numbers for different categories.

**Figure 3.**
The training pipeline of PMC-LLaMA. Our training flow can be separated into 2 parts, that is, data-centric knowledge injection and medical-specific instruction tuning. In knowledge injection, we collect 4.8M biomedical academic papers and 30K medical books for further injecting knowledge into LLaMA. In the instruction tuning stage, we mainly consider 3 aspects, medical conversation, medical rationale question answering, and knowledge graph, containing 202M tokens in total.

**Figure 4.**
Examples of 3 instruction prompting cases from PMC-LLaMA and GPT-3.5. (A) Compares their responses to the patient’s query, where PMC-LLaMA proposes more concrete suggestions. (B) Shows the probing of microbiology knowledge. PMC-LLaMA analyzes both correct and incorrect options, enhancing the comprehensiveness of the analysis. Example (C) examines the models’ grasp of Pharmacology, and they respond with roughly equivalent answers. The correct options are marked **bold**.

See this image and copyright information in PMC

Cited by

Demystifying Large Language Models for Medicine: A Primer.
Jin Q, Wan N, Leaman R, Tian S, Wang Z, Yang Y, Wang Z, Xiong G, Lai PT, Zhu Q, Hou B, Sarfo-Gyamfi M, Zhang G, Gilson A, Bhasuran B, He Z, Zhang A, Sun J, Weng C, Summers RM, Chen Q, Peng Y, Lu Z. Jin Q, et al. ArXiv [Preprint]. 2024 Nov 20:arXiv:2410.18856v3. ArXiv. 2024. PMID: 39801619 Free PMC article. Preprint.
EYE-Llama, an in-domain large language model for ophthalmology.
Haghighi T, Gholami S, Sokol JT, Kishnani E, Ahsaniyan A, Rahmanian H, Hedayati F, Leng T, Alam MN. Haghighi T, et al. bioRxiv [Preprint]. 2025 May 22:2024.04.26.591355. doi: 10.1101/2024.04.26.591355. bioRxiv. 2025. PMID: 38746183 Free PMC article. Updated. Preprint.
Introduction to Large Language Models (LLMs) for dementia care and research.
Treder MS, Lee S, Tsvetanov KA. Treder MS, et al. Front Dement. 2024 May 14;3:1385303. doi: 10.3389/frdem.2024.1385303. eCollection 2024. Front Dement. 2024. PMID: 39081594 Free PMC article.
A review of large language models and autonomous agents in chemistry.
Ramos MC, Collison CJ, White AD. Ramos MC, et al. Chem Sci. 2024 Dec 9;16(6):2514-2572. doi: 10.1039/d4sc03921a. eCollection 2025 Feb 5. Chem Sci. 2024. PMID: 39829984 Free PMC article. Review.
Generative Large Language Model-Powered Conversational AI App for Personalized Risk Assessment: Case Study in COVID-19.
Roshani MA, Zhou X, Qiang Y, Suresh S, Hicks S, Sethuraman U, Zhu D. Roshani MA, et al. JMIR AI. 2025 Mar 27;4:e67363. doi: 10.2196/67363. JMIR AI. 2025. PMID: 40146990 Free PMC article.

See all "Cited by" articles

References

1. OpenAI. OpenAI. Introducing ChatGPT. OpenAI; 2023. Accessed April 2, 2024. https://openai.com/blog/chatgpt/
1. OpenAI. GPT-4 technical report. arXiv:2303.08774. 2023, preprint: not peer reviewed.
1. Nori H, King N, et al. Capabilities of GPT-4 on medical challenge problems. arXiv 230313375. 2023, preprint: not peer reviewed.
1. Singhal K, Azizi S, Tu TAO, et al. Large language models encode clinical knowledge. Nature. 2023;620(7972):172-180. - PMC - PubMed
1. Touvron H, Lavril T, Izacard G, Martinet , et al. LLaMA: open and efficient foundation language models. arXiv 230213971. 2023, preprint: not peer reviewed.

MeSH terms

Actions

Grants and funding

State Key Laboratory of UHD Video and Audio Production and Presentation.

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

PMC-LLaMA: toward building open-source language models for medicine

Affiliations

PMC-LLaMA: toward building open-source language models for medicine

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

MeSH terms

Related information

Grants and funding

LinkOut - more resources

Full Text Sources