Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Apr 18;143(2):560-567.
doi: 10.3171/2024.12.JNS241607. Print 2025 Aug 1.

AtlasGPT: a language model grounded in neurosurgery with domain-specific data and document retrieval

Affiliations

AtlasGPT: a language model grounded in neurosurgery with domain-specific data and document retrieval

Rohaid Ali et al. J Neurosurg. .

Abstract

Objective: Large language models (LLMs) have shown promising performance on medical licensing examinations, but their ability to excel in subspecialty domains and their robustness under adversarial conditions remain unclear. Herein, the authors present AtlasGPT, a subspecialty-focused LLM for neurosurgery, and evaluate its performance on a benchmark multiple-choice question bank and under adversarial testing, as well as its ability to generate high-quality explanations.

Methods: AtlasGPT was built by fine-tuning GPT-4 architecture and retrieval-augmented generation from neurosurgical knowledge sources. Its performance was compared with that of GPT-4 and Gemini Advanced on a 149-question neurosurgery examination. Adversarial testing assessed robustness to misinformation. Answer explanations were rated by 15 independent neurosurgeons and compared with the question bank.

Results: Across all 149 questions and on text-only questions, AtlasGPT (96%) outperformed Gemini Advanced (93%) and GPT-4 (88%) in accuracy. In adversarial testing, under which AtlasGPT was tasked with identifying medical misinformation, it was fooled 14% of the time, compared with 44% for GPT-4 and 68% for Gemini Advanced. Neurosurgeons rated AtlasGPT's answer explanations as significantly more comprehensive, relevant, and better referenced than the question bank's explanations of the responses (p < 0.001). AtlasGPT did not demonstrate any evidence of hallucination or other content that would be harmful for patient care or the surgeon's clinical decision.

Conclusions: AtlasGPT demonstrates the potential of subspecialty-focused LLMs to outperform general models, exhibit robustness to misinformation, and generate high-quality explanations. Domain-specific LLMs may improve medical knowledge, decision-making, and educational materials in complex fields like neurosurgery.

Keywords: large language models; machine learning; medical education; neurosurgery.

PubMed Disclaimer

LinkOut - more resources