Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Feb 15;20(4):3378.
doi: 10.3390/ijerph20043378.

Diagnostic Accuracy of Differential-Diagnosis Lists Generated by Generative Pretrained Transformer 3 Chatbot for Clinical Vignettes with Common Chief Complaints: A Pilot Study

Affiliations

Diagnostic Accuracy of Differential-Diagnosis Lists Generated by Generative Pretrained Transformer 3 Chatbot for Clinical Vignettes with Common Chief Complaints: A Pilot Study

Takanobu Hirosawa et al. Int J Environ Res Public Health. .

Abstract

The diagnostic accuracy of differential diagnoses generated by artificial intelligence (AI) chatbots, including the generative pretrained transformer 3 (GPT-3) chatbot (ChatGPT-3) is unknown. This study evaluated the accuracy of differential-diagnosis lists generated by ChatGPT-3 for clinical vignettes with common chief complaints. General internal medicine physicians created clinical cases, correct diagnoses, and five differential diagnoses for ten common chief complaints. The rate of correct diagnosis by ChatGPT-3 within the ten differential-diagnosis lists was 28/30 (93.3%). The rate of correct diagnosis by physicians was still superior to that by ChatGPT-3 within the five differential-diagnosis lists (98.3% vs. 83.3%, p = 0.03). The rate of correct diagnosis by physicians was also superior to that by ChatGPT-3 in the top diagnosis (53.3% vs. 93.3%, p < 0.001). The rate of consistent differential diagnoses among physicians within the ten differential-diagnosis lists generated by ChatGPT-3 was 62/88 (70.5%). In summary, this study demonstrates the high diagnostic accuracy of differential-diagnosis lists generated by ChatGPT-3 for clinical cases with common chief complaints. This suggests that AI chatbots such as ChatGPT-3 can generate a well-differentiated diagnosis list for common chief complaints. However, the order of these lists can be improved in the future.

Keywords: AI chatbot; artificial intelligence; clinical decision support; diagnosis; diagnostic accuracy; generative pretrained transformers; natural language processing.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
Study design.
Figure 2
Figure 2
(a) Example of a differential-diagnosis list generated by ChatGPT-3. (b) Explanation of the differential-diagnosis list example generated by ChatGPT-3.

References

    1. Zhou B., Yang G., Shi Z., Ma S. Natural language processing for smart healthcare. arXiv. 2021 doi: 10.1109/RBME.2022.3210270.2110.15803 - DOI - PubMed
    1. Chen J.H., Dhaliwal G., Yang D. Decoding Artificial Intelligence to Achieve Diagnostic Excellence: Learning from Experts, Examples, and Experience: Learning from Experts, Examples, and Experience. JAMA. 2022;328:709–710. doi: 10.1001/jama.2022.13735. - DOI - PubMed
    1. Bulla C., Parushetti C., Teli A., Aski S., Koppad S. A Review of AI Based Medical Assistant Chatbot. Res. Appl. Web Dev. Des. 2020;3:1–14.
    1. Nath S., Marie A., Ellershaw S., Korot E., Keane P.A. New Meaning for NLP: The Trials and Tribulations of Natural Language Processing with GPT-3 in Ophthalmology. Br. J. Ophthalmol. 2022;106:889–892. doi: 10.1136/bjophthalmol-2022-321141. - DOI - PubMed
    1. Korngiebel D.M., Mooney S.D. Considering the Possibilities and Pitfalls of Generative Pre-Trained Transformer 3 (GPT-3) in Healthcare Delivery. NPJ Digit. Med. 2021;4:93. doi: 10.1038/s41746-021-00464-x. - DOI - PMC - PubMed

LinkOut - more resources