Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Jul:187:e1083-e1088.
doi: 10.1016/j.wneu.2024.05.052. Epub 2024 May 16.

Can Artificial Intelligence Mitigate Missed Diagnoses by Generating Differential Diagnoses for Neurosurgeons?

Affiliations

Can Artificial Intelligence Mitigate Missed Diagnoses by Generating Differential Diagnoses for Neurosurgeons?

Rohit Prem Kumar et al. World Neurosurg. 2024 Jul.

Abstract

Background/objective: Neurosurgery emphasizes the criticality of accurate differential diagnoses, with diagnostic delays posing significant health and economic challenges. As large language models (LLMs) emerge as transformative tools in healthcare, this study seeks to elucidate their role in assisting neurosurgeons with the differential diagnosis process, especially during preliminary consultations.

Methods: This study employed 3 chat-based LLMs, ChatGPT (versions 3.5 and 4.0), Perplexity AI, and Bard AI, to evaluate their diagnostic accuracy. Each LLM was prompted using clinical vignettes, and their responses were recorded to generate differential diagnoses for 20 common and uncommon neurosurgical disorders. Disease-specific prompts were crafted using Dynamed, a clinical reference tool. The accuracy of the LLMs was determined based on their ability to identify the target disease within their top differential diagnoses correctly.

Results: For the initial differential, ChatGPT 3.5 achieved an accuracy of 52.63%, while ChatGPT 4.0 performed slightly better at 53.68%. Perplexity AI and Bard AI demonstrated 40.00% and 29.47% accuracy, respectively. As the number of considered differentials increased from 2 to 5, ChatGPT 3.5 reached its peak accuracy of 77.89% for the top 5 differentials. Bard AI and Perplexity AI had varied performances, with Bard AI improving in the top 5 differentials at 62.11%. On a disease-specific note, the LLMs excelled in diagnosing conditions like epilepsy and cervical spine stenosis but faced challenges with more complex diseases such as Moyamoya disease and amyotrophic lateral sclerosis.

Conclusions: LLMs showcase the potential to enhance diagnostic accuracy and decrease the incidence of missed diagnoses in neurosurgery.

Keywords: Artificial intelligence; Differential diagnoses; Large language models; Missed diagnoses; Neurosurgery.

PubMed Disclaimer

References

LinkOut - more resources