Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Oct;20(10):1004-1009.
doi: 10.1016/j.jacr.2023.06.008. Epub 2023 Jul 8.

Use of Large Language Models to Predict Neuroimaging

Affiliations

Use of Large Language Models to Predict Neuroimaging

Lleayem Nazario-Johnson et al. J Am Coll Radiol. 2023 Oct.

Abstract

Purpose: Large language models (LLMs) have demonstrated a level of competency within the medical field. The aim of this study was to explore the ability of LLMs to predict the best neuroradiologic imaging modality given specific clinical presentations. In addition, the authors seek to determine if LLMs can outperform an experienced neuroradiologist in this regard.

Methods: ChatGPT and Glass AI, a health care-based LLM by Glass Health, were used. ChatGPT was prompted to rank the three best neuroimaging modalities while taking the best responses from Glass AI and the neuroradiologist. The responses were compared with the ACR Appropriateness Criteria for 147 conditions. Clinical scenarios were passed into each LLM twice to account for stochasticity. Each output was scored out of 3 on the basis of the criteria. Partial scores were given for nonspecific answers.

Results: ChatGPT and Glass AI scored 1.75 and 1.83, respectively, with no statistically significant difference. The neuroradiologist scored 2.20, significantly outperforming both LLMs. ChatGPT was also found to be the more inconsistent of the two LLMs, with the score difference between both outputs being statistically significant. Additionally, scores between different ranks output by ChatGPT were statistically significant.

Conclusions: LLMs perform well in selecting appropriate neuroradiologic imaging procedures when prompted with specific clinical scenarios. ChatGPT performed the same as Glass AI, suggesting that with medical text training, ChatGPT could significantly improve its function in this application. LLMs did not outperform an experienced neuroradiologist, indicating the need for continued improvement in the medical context.

Keywords: Artificial intelligence; ChatGPT; clinical decision making.

PubMed Disclaimer

LinkOut - more resources