Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Jul 3;330(1):78-80.
doi: 10.1001/jama.2023.8288.

Accuracy of a Generative Artificial Intelligence Model in a Complex Diagnostic Challenge

Affiliations

Accuracy of a Generative Artificial Intelligence Model in a Complex Diagnostic Challenge

Zahir Kanjee et al. JAMA. .
No abstract available

Plain language summary

This study assesses the diagnostic accuracy of the Generative Pre-trained Transformer 4 (GPT-4) artificial intelligence (AI) model in a series of challenging cases.

PubMed Disclaimer

Conflict of interest statement

Conflict of Interest Disclosures: Dr Kanjee reported receipt of royalties for books edited and membership on a paid advisory board for medical education products not related to AI from Wolters Kluwer, as well as honoraria for continuing medical education delivered from Oakstone Publishing. Dr Crowe reported employment with Solera Health. No other disclosures were reported.

Figures

Figure.
Figure.. Performance of Generative Pre-trained Transformer 4 (GPT-4)
Histogram of GPT-4’s performance. Performance scale scores (Bond et al): 5 = the actual diagnosis was suggested in the differential; 4 = the suggestions included something very close, but not exact; 3 = the suggestions included something closely related that might have been helpful; 2 = the suggestions included something related, but unlikely to be helpful; 0 = no suggestions close to the target diagnosis. (The scale does not contain a score of 1.)

Comment in

References

    1. Kung TH, Cheatham M, Medenilla A, et al. . Performance of ChatGPT on USMLE: potential for AI-assisted medical education using large language models. PLOS Digit Health. 2023;2(2):e0000198. doi:10.1371/journal.pdig.0000198 - DOI - PMC - PubMed
    1. Bond WF, Schwartz LM, Weaver KR, Levick D, Giuliano M, Graber ML. Differential diagnosis generators: an evaluation of currently available computer programs. J Gen Intern Med. 2012;27(2):213-219. doi:10.1007/s11606-011-1804-8 - DOI - PMC - PubMed
    1. Fritz P, Kleinhans A, Raoufi R, et al. . Evaluation of medical decision support systems (DDX generators) using real medical cases of varying complexity and origin. BMC Med Inform Decis Mak. 2022;22(1):254. doi:10.1186/s12911-022-01988-2 - DOI - PMC - PubMed
    1. Ledley RS, Lusted LB. Reasoning foundations of medical diagnosis; symbolic logic, probability, and value theory aid our understanding of how physicians reason. Science. 1959;130(3366):9-21. doi:10.1126/science.130.3366.9 - DOI - PubMed
    1. Dorr DA, Adams L, Embí P. Harnessing the promise of artificial intelligence responsibly. JAMA. 2023;329(16):1347-1348. doi:10.1001/jama.2023.2771 - DOI - PubMed

MeSH terms