Performance of GPT-4 and GPT-3.5 in generating accurate and comprehensive diagnoses across medical subspecialties
- PMID: 38305423
- DOI: 10.1097/JCMA.0000000000001064
Performance of GPT-4 and GPT-3.5 in generating accurate and comprehensive diagnoses across medical subspecialties
Abstract
Artificial intelligence has demonstrated a promising potential for diagnosing complex medical cases, with Generative Pre-Trained Transformer 4 (GPT-4) being the most recent advancement in this field. This study evaluated the diagnostic performance of the GPT-4 in comparison with that of its predecessor, GPT-3.5, using 81 complex medical case records from the New England Journal of Medicine . The cases were categorized as cognitive impairment, infectious disease, rheumatology, or drug reactions. The GPT-4 achieved a primary diagnostic accuracy of 38.3%, which improved to 71.6% when differential diagnoses were included. In 84.0% of cases, primary diagnoses were made by conducting investigations suggested by GPT-4. GPT-4 outperformed GPT-3.5 in all subspecialties except for drug reactions. GPT-4 demonstrated the highest performance in infectious diseases and drug reactions, whereas it underperformed in cases of cognitive impairment. These findings indicate that GPT-4 can provide reasonably accurate diagnoses, comprehensive differential diagnoses, and appropriate investigations. However, its performance varies across subspecialties.
Copyright © 2024, the Chinese Medical Association.
Conflict of interest statement
Conflicts of interest: The authors declare that they have no conflicts of interest related to the subject matter or materials discussed in this article.
References
-
- Lee P, Bubeck S, Petro J. Benefits, limits, and risks of GPT-4 as an AI chatbot for medicine. N Engl J Med 2023;388:1233–9.
-
- Kanjee Z, Crowe B, Rodman A. Accuracy of a generative artificial intelligence model in a complex diagnostic challenge. JAMA 2023;330:78–80.
-
- Shea YF, Lee CMY, Ip WCT, Luk DWA, Wong SSW. Use of GPT-4 to analyze medical records of patients with extensive investigations and delayed diagnosis. JAMA Netw Open 2023;6:e2325000.
-
- Shea YF, Ma NC. Limitations of GPT-4 in analyzing real-life medical notes related to cognitive impairment. Psychogeriatrics 2023;23:885–7.
MeSH terms
LinkOut - more resources
Full Text Sources
Medical