Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Mar 1;87(3):259-260.
doi: 10.1097/JCMA.0000000000001064. Epub 2024 Feb 2.

Performance of GPT-4 and GPT-3.5 in generating accurate and comprehensive diagnoses across medical subspecialties

Affiliations

Performance of GPT-4 and GPT-3.5 in generating accurate and comprehensive diagnoses across medical subspecialties

Dik Wai Anderson Luk et al. J Chin Med Assoc. .

Abstract

Artificial intelligence has demonstrated a promising potential for diagnosing complex medical cases, with Generative Pre-Trained Transformer 4 (GPT-4) being the most recent advancement in this field. This study evaluated the diagnostic performance of the GPT-4 in comparison with that of its predecessor, GPT-3.5, using 81 complex medical case records from the New England Journal of Medicine . The cases were categorized as cognitive impairment, infectious disease, rheumatology, or drug reactions. The GPT-4 achieved a primary diagnostic accuracy of 38.3%, which improved to 71.6% when differential diagnoses were included. In 84.0% of cases, primary diagnoses were made by conducting investigations suggested by GPT-4. GPT-4 outperformed GPT-3.5 in all subspecialties except for drug reactions. GPT-4 demonstrated the highest performance in infectious diseases and drug reactions, whereas it underperformed in cases of cognitive impairment. These findings indicate that GPT-4 can provide reasonably accurate diagnoses, comprehensive differential diagnoses, and appropriate investigations. However, its performance varies across subspecialties.

PubMed Disclaimer

Conflict of interest statement

Conflicts of interest: The authors declare that they have no conflicts of interest related to the subject matter or materials discussed in this article.

References

    1. Lee P, Bubeck S, Petro J. Benefits, limits, and risks of GPT-4 as an AI chatbot for medicine. N Engl J Med 2023;388:1233–9.
    1. Kanjee Z, Crowe B, Rodman A. Accuracy of a generative artificial intelligence model in a complex diagnostic challenge. JAMA 2023;330:78–80.
    1. Shea YF, Lee CMY, Ip WCT, Luk DWA, Wong SSW. Use of GPT-4 to analyze medical records of patients with extensive investigations and delayed diagnosis. JAMA Netw Open 2023;6:e2325000.
    1. Shea YF, Ma NC. Limitations of GPT-4 in analyzing real-life medical notes related to cognitive impairment. Psychogeriatrics 2023;23:885–7.

MeSH terms

LinkOut - more resources