Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Jan 24;7(1):20.
doi: 10.1038/s41746-024-01010-1.

Diagnostic reasoning prompts reveal the potential for large language model interpretability in medicine

Affiliations

Diagnostic reasoning prompts reveal the potential for large language model interpretability in medicine

Thomas Savage et al. NPJ Digit Med. .

Abstract

One of the major barriers to using large language models (LLMs) in medicine is the perception they use uninterpretable methods to make clinical decisions that are inherently different from the cognitive processes of clinicians. In this manuscript we develop diagnostic reasoning prompts to study whether LLMs can imitate clinical reasoning while accurately forming a diagnosis. We find that GPT-4 can be prompted to mimic the common clinical reasoning processes of clinicians without sacrificing diagnostic accuracy. This is significant because an LLM that can imitate clinical reasoning to provide an interpretable rationale offers physicians a means to evaluate whether an LLMs response is likely correct and can be trusted for patient care. Prompting methods that use diagnostic reasoning have the potential to mitigate the "black box" limitations of LLMs, bringing them one step closer to safe and effective use in medicine.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. GPT 3.5 CoT and diagnostic reasoning rationale examples.
Example GPT-3.5 rationales responding to a traditional CoT prompt as well as diagnostic reasoning prompts. LLM response and rationale results for the entire test set can be found in Supplementary Information 1.
Fig. 2
Fig. 2. GPT 4 CoT and diagnostic reasoning rationale examples.
Example GPT-4 rationales responding to the question posed in Fig. 1. LLM response and rationale results for the entire test set can be found in Supplementary Information 1.
Fig. 3
Fig. 3. Proposed LLM workflow.
a Current LLM workflow. b Proposed LLM workflow.

References

    1. Thirunavukarasu AJ, et al. Large language models in medicine. Nat. Med. 2023;29:1–11. doi: 10.1038/s41591-023-02448-8. - DOI - PubMed
    1. Lee P, Bubeck S, Petro J. Benefits, limits, and risks of GPT-4 as an AI chatbot for medicine. N. Engl. J. Med. 2023;388:2399–2400. doi: 10.1056/NEJMsr2214184. - DOI - PubMed
    1. Nayak A, et al. Comparison of history of present illness summaries generated by a chatbot and senior internal medicine residents. JAMA Intern. Med. 2023;183:e232561. doi: 10.1001/jamainternmed.2023.2561. - DOI - PMC - PubMed
    1. Kung TH, et al. Performance of ChatGPT on USMLE: potential for AI-assisted medical education using large language models. PLoS Digit. Health. 2023;2:e0000198. doi: 10.1371/journal.pdig.0000198. - DOI - PMC - PubMed
    1. Ayers JW, et al. Comparing physician and artificial intelligence chatbot responses to patient questions posted to a public social media forum. JAMA Intern. Med. 2023;183:589–596. doi: 10.1001/jamainternmed.2023.1838. - DOI - PMC - PubMed