Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2024 May 1;184(5):581-583.
doi: 10.1001/jamainternmed.2024.0295.

Clinical Reasoning of a Generative Artificial Intelligence Model Compared With Physicians

Affiliations
Comparative Study

Clinical Reasoning of a Generative Artificial Intelligence Model Compared With Physicians

Stephanie Cabral et al. JAMA Intern Med. .
No abstract available

Plain language summary

This cross-sectional study assesses the ability of a large language model to process medical data and display clinical reasoning compared with the ability of attending physicians and residents.

PubMed Disclaimer

Conflict of interest statement

Conflict of Interest Disclosures: Dr Kanjee reported receiving personal fees from Wolters Kluwer and Oakstone Publishing outside the submitted work. Dr Crowe reported being an employee of Solera Health outside the submitted work. Dr Abdulnour reported being an employee of Massachusetts Medical Society, which owns NEJM Healer, an interactive application that teaches clinical reasoning, during the conduct of the study. Dr Rodman reported receiving funding from the Gordon and Betty Moore Foundation. No other disclosures were reported.

Figures

Figure.
Figure.. Distribution of Revised-IDEA (R-IDEA) Scores
Distribution of R-IDEA scores is displayed across 232 sections, stratified by respondent type: chatbot (n = 80), attending physicians (n = 80), and residents (n = 72). The dashed vertical line delineates the low score (0-7) and high score (8-10) categories used in the regression analysis. For visual clarity, scores of 0 were aggregated with scores of 1.

References

    1. Kanjee Z, Crowe B, Rodman A. Accuracy of a generative artificial intelligence model in a complex diagnostic challenge. JAMA. 2023;330(1):78-80. doi:10.1001/jama.2023.8288 - DOI - PMC - PubMed
    1. Singhal K, Azizi S, Tu T, et al. . Large language models encode clinical knowledge. Nature. 2023;620(7972):172-180. doi:10.1038/s41586-023-06291-2 - DOI - PMC - PubMed
    1. Strong E, DiGiammarino A, Weng Y, et al. . Chatbot vs medical student performance on free-response clinical reasoning examinations. JAMA Intern Med. 2023;183(9):1028-1030. doi:10.1001/jamainternmed.2023.2909 - DOI - PMC - PubMed
    1. Abdulnour REE, Parsons AS, Muller D, Drazen J, Rubin EJ, Rencic J. Deliberate practice at the virtual bedside to improve clinical reasoning. N Engl J Med. 2022;386(20):1946-1947. doi:10.1056/NEJMe2204540 - DOI - PubMed
    1. OpenAI. Best practices for prompt engineering with OpenAI API. Accessed November 14, 2023. https://help.openai.com/en/articles/6654000-best-practices-for-prompt-en...

Publication types