Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Jun 24;15(6):e40895.
doi: 10.7759/cureus.40895. eCollection 2023 Jun.

ChatDoctor: A Medical Chat Model Fine-Tuned on a Large Language Model Meta-AI (LLaMA) Using Medical Domain Knowledge

Affiliations

ChatDoctor: A Medical Chat Model Fine-Tuned on a Large Language Model Meta-AI (LLaMA) Using Medical Domain Knowledge

Yunxiang Li et al. Cureus. .

Abstract

Objective The primary aim of this research was to address the limitations observed in the medical knowledge of prevalent large language models (LLMs) such as ChatGPT, by creating a specialized language model with enhanced accuracy in medical advice. Methods We achieved this by adapting and refining the large language model meta-AI (LLaMA) using a large dataset of 100,000 patient-doctor dialogues sourced from a widely used online medical consultation platform. These conversations were cleaned and anonymized to respect privacy concerns. In addition to the model refinement, we incorporated a self-directed information retrieval mechanism, allowing the model to access and utilize real-time information from online sources like Wikipedia and data from curated offline medical databases. Results The fine-tuning of the model with real-world patient-doctor interactions significantly improved the model's ability to understand patient needs and provide informed advice. By equipping the model with self-directed information retrieval from reliable online and offline sources, we observed substantial improvements in the accuracy of its responses. Conclusion Our proposed ChatDoctor, represents a significant advancement in medical LLMs, demonstrating a significant improvement in understanding patient inquiries and providing accurate advice. Given the high stakes and low error tolerance in the medical field, such enhancements in providing accurate and reliable information are not only beneficial but essential.

Keywords: ai chatbot; chat gpt; gpt; large language model; llama.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. A summary of the process involved in gathering the patient-physician conversation dataset and the steps involved in training the ChatDoctor model.
Figure 2
Figure 2. Overview of the autonomous ChatDoctor model based on information retrieval from an external knowledge brain.
Figure 3
Figure 3. Some samples in our offline disease database consist of symptoms, clinical test/treatment approaches, and medication suggestions.
Figure 4
Figure 4. Autonomously extract keywords for information retrieval.
Figure 5
Figure 5. Autonomous information retrieval from the disease database through the prompt.
Figure 6
Figure 6. Instruct the ChatDoctor to read the retrieved domain knowledge and provide a reliable answer.
Figure 7
Figure 7. Comparison between the ChatGPT and the autonomous ChatDoctor for relatively new medical diseases/terms. The ChatGPT cannot recognize the word Mpox (aka, Monkeypox), while our ChatDoctor can provide the precise answer for the relevant medical tests of Mpox, with the help of the external knowledge brain.
Figure 8
Figure 8. Comparison between the ChatGPT and the autonomous ChatDoctor. The ChatGPT provided a more general answer about otitis, while the ChatDoctor provided a more specialized response about the treatments of otitis, with the help of the external knowledge brain.
Figure 9
Figure 9. Comparison between the ChatGPT and the autonomous ChatDoctor. The ChatGPT is unfamiliar with the “Daybue” medication which received approval from the Food and Drug Administration (FDA) in early 2023. The ChatDoctor accurately pointed out the purpose of Daybue (trofinetide), with the help of the external knowledge brain.
Figure 10
Figure 10. Example 1: a patient suffering from a unilateral headache expressed concerns about a potential association with a brain tumor. Our ChatDoctor accurately proposed sinusitis as a possible cause for the headache, mirroring the diagnosis provided by the physician from iCliniq. On the other hand, ChatGPT failed to deliver a congruent interpretation regarding the root cause of the one-sided headache.
Figure 11
Figure 11. Example 2: a patient reported having a white lump in their throat for several months and expressed concerns about potential cancer. All three entities, iCliniq, ChatGPT, and ChatDoctor suggested that the patient could be dealing with abnormally enlarged lymph nodes. Both iCliniq and ChatDoctor additionally recommended that a biopsy and radiological diagnosis would be necessary if initial treatments proved unsuccessful. However, ChatGPT's response was limited to advising the patient to consult with an Ear, Nose, and Throat (ENT) specialist.
Figure 12
Figure 12. Example 3: a patient reported experiencing a sharp back pain during exercise, which intensified during breathing and rotation of the torso or neck. The patient was unsure whether urgent medical attention was necessary. ChatDoctor generated a closer answer to iCliniq than ChatGPT.
Figure 13
Figure 13. Example 4: a patient experienced blurred vision and was particularly concerned about the health of their left eye. Taking into consideration the patient's past medical history of retinal detachment, all three sources—iCliniq, ChatGPT, and ChatDoctor—advised the individual to seek professional consultation with ophthalmologists for a comprehensive assessment and swift treatment. Due to possible limitations in providing medical diagnoses (and advice), ChatGPT did not speculate on the cause of the diminished vision. On the other hand, both iCliniq and ChatDoctor identified the possibility of retinal detachment or bleeding as potential issues.

Similar articles

Cited by

References

    1. Long Ouyang, Jeff Wu, Xu Jiang, et al. Training language models to follow instructions with human feedback. arXiv preprint. [ Apr; 2023 ]. 2022. http://arXiv:2203.02155 p. 0.http://arXiv:2203.02155
    1. Yizhong Wang, Yeganeh Kordi, Swaroop Mishra, Alisa Liu, Noah A. Smith, Daniel Khashabi, Hannaneh Hajishirzi. Self-instruct: aligning language model with self generated instructions. arXiv preprint. [ Dec; 2022 ]. 2022. http://arXiv:2212.10560 p. 0.http://arXiv:2212.10560
    1. How does chatgpt perform on the united states medical licensing examination? the implications of large language models for medical education and knowledge assessment. Aidan Gilson, Conrad W Safranek, Thomas Huang, et al. https://mededu.jmir.org/2023/1/e45312/ JMIR Med Educ. 2023;9:45312–42023. - PMC - PubMed
    1. Means: a medical question-answering system combining NLP techniques and semantic web technologies. Abacha AB, Zweigenbaum P. https://www.sciencedirect.com/science/article/pii/S0306457315000515 Inf Process Manag. 2015;51:570–594.
    1. Rohan Taori, Ishaan Gulrajani, Tianyi Zhang, et al. Stanford alpaca: an instruction-following llama model. [ Apr; 2023 ]. 2023. https://github.com/tatsu-lab/stanford_alpaca https://github.com/tatsu-lab/stanford_alpaca

LinkOut - more resources