Diagnostic Accuracy of Differential-Diagnosis Lists Generated by Generative Pretrained Transformer 3 Chatbot for Clinical Vignettes with Common Chief Complaints: A Pilot Study

Takanobu Hirosawa¹, Yukinori Harada¹, Masashi Yokose¹, Tetsu Sakamoto¹, Ren Kawamura¹, Taro Shimizu¹

Affiliations

PMID: 36834073
PMCID: PMC9967747
DOI: 10.3390/ijerph20043378

Diagnostic Accuracy of Differential-Diagnosis Lists Generated by Generative Pretrained Transformer 3 Chatbot for Clinical Vignettes with Common Chief Complaints: A Pilot Study

Takanobu Hirosawa et al. Int J Environ Res Public Health. 2023.

. 2023 Feb 15;20(4):3378.

doi: 10.3390/ijerph20043378.

Authors

Takanobu Hirosawa¹, Yukinori Harada¹, Masashi Yokose¹, Tetsu Sakamoto¹, Ren Kawamura¹, Taro Shimizu¹

Affiliation

¹ Department of Diagnostic and Generalist Medicine, Dokkyo Medical University, Tochigi 321-0293, Japan.

PMID: 36834073
PMCID: PMC9967747
DOI: 10.3390/ijerph20043378

Abstract

The diagnostic accuracy of differential diagnoses generated by artificial intelligence (AI) chatbots, including the generative pretrained transformer 3 (GPT-3) chatbot (ChatGPT-3) is unknown. This study evaluated the accuracy of differential-diagnosis lists generated by ChatGPT-3 for clinical vignettes with common chief complaints. General internal medicine physicians created clinical cases, correct diagnoses, and five differential diagnoses for ten common chief complaints. The rate of correct diagnosis by ChatGPT-3 within the ten differential-diagnosis lists was 28/30 (93.3%). The rate of correct diagnosis by physicians was still superior to that by ChatGPT-3 within the five differential-diagnosis lists (98.3% vs. 83.3%, p = 0.03). The rate of correct diagnosis by physicians was also superior to that by ChatGPT-3 in the top diagnosis (53.3% vs. 93.3%, p < 0.001). The rate of consistent differential diagnoses among physicians within the ten differential-diagnosis lists generated by ChatGPT-3 was 62/88 (70.5%). In summary, this study demonstrates the high diagnostic accuracy of differential-diagnosis lists generated by ChatGPT-3 for clinical cases with common chief complaints. This suggests that AI chatbots such as ChatGPT-3 can generate a well-differentiated diagnosis list for common chief complaints. However, the order of these lists can be improved in the future.

Keywords: AI chatbot; artificial intelligence; clinical decision support; diagnosis; diagnostic accuracy; generative pretrained transformers; natural language processing.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

**Figure 2**
(a) Example of a differential-diagnosis list generated by ChatGPT-3. (b) Explanation of the differential-diagnosis list example generated by ChatGPT-3.

See this image and copyright information in PMC

References

1. Zhou B., Yang G., Shi Z., Ma S. Natural language processing for smart healthcare. arXiv. 2021 doi: 10.1109/RBME.2022.3210270.2110.15803 - DOI - PubMed
1. Chen J.H., Dhaliwal G., Yang D. Decoding Artificial Intelligence to Achieve Diagnostic Excellence: Learning from Experts, Examples, and Experience: Learning from Experts, Examples, and Experience. JAMA. 2022;328:709–710. doi: 10.1001/jama.2022.13735. - DOI - PMC - PubMed
1. Bulla C., Parushetti C., Teli A., Aski S., Koppad S. A Review of AI Based Medical Assistant Chatbot. Res. Appl. Web Dev. Des. 2020;3:1–14.
1. Nath S., Marie A., Ellershaw S., Korot E., Keane P.A. New Meaning for NLP: The Trials and Tribulations of Natural Language Processing with GPT-3 in Ophthalmology. Br. J. Ophthalmol. 2022;106:889–892. doi: 10.1136/bjophthalmol-2022-321141. - DOI - PubMed
1. Korngiebel D.M., Mooney S.D. Considering the Possibilities and Pitfalls of Generative Pre-Trained Transformer 3 (GPT-3) in Healthcare Delivery. NPJ Digit. Med. 2021;4:93. doi: 10.1038/s41746-021-00464-x. - DOI - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Diagnostic Accuracy of Differential-Diagnosis Lists Generated by Generative Pretrained Transformer 3 Chatbot for Clinical Vignettes with Common Chief Complaints: A Pilot Study

Affiliation

Diagnostic Accuracy of Differential-Diagnosis Lists Generated by Generative Pretrained Transformer 3 Chatbot for Clinical Vignettes with Common Chief Complaints: A Pilot Study

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

References

MeSH terms

LinkOut - more resources

Full Text Sources