Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jul 15;14(14):4996.
doi: 10.3390/jcm14144996.

Assessing LLMs on IDSA Practice Guidelines for the Diagnosis and Treatment of Native Vertebral Osteomyelitis: A Comparison Study

Affiliations

Assessing LLMs on IDSA Practice Guidelines for the Diagnosis and Treatment of Native Vertebral Osteomyelitis: A Comparison Study

Filip Milicevic et al. J Clin Med. .

Abstract

Background: Native vertebral osteomyelitis (NVO) presents diagnostic and therapeutic challenges requiring adherence to complex clinical guidelines. The emergence of large language models (LLMs) offers new avenues for real-time clinical decision support, yet their utility in managing NVO has not been formally assessed. Methods: This study evaluated four LLMs-Consensus, Gemini, ChatGPT-4o Mini, and ChatGPT-4o-using 13 standardized questions derived from the 2015 IDSA guidelines. Each model generated 13 responses (n = 52), which were independently assessed by three orthopedic surgeons for accuracy (4-point scale) and comprehensiveness (five-point scale). Results: ChatGPT-4o produced the longest responses (428.0 ± 45.4 words), followed by ChatGPT-4o Mini (392.2 ± 97.4), Gemini (358.2 ± 60.5), and Consensus (213.2 ± 68.8). Accuracy ratings showed that ChatGPT-4o and Gemini achieved the highest proportion of "Excellent" responses (54% and 51%, respectively), while Consensus received only 20%. Comprehensiveness scores mirrored this trend, with ChatGPT-4o (3.95 ± 0.79) and Gemini (3.82 ± 0.68) significantly outperforming Consensus (2.87 ± 0.66). Domain-specific analysis revealed that ChatGPT-4o achieved a 100% "Excellent" accuracy rating in therapy-related questions. Statistical analysis confirmed significant inter-model differences (p < 0.001). Conclusions: Advanced LLMs-especially ChatGPT-4o and Gemini-demonstrated high accuracy and depth in interpreting clinical guidelines for NVO. These findings highlight their potential as effective tools in augmenting evidence-based decision-making and improving consistency in clinical care.

Keywords: ChatGPT; Gemini; IDSA guidelines; artificial intelligence; clinical decision support; large language models; native vertebral osteomyelitis; spine infection.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflicts of interest.

Figures

Figure 1
Figure 1
Flowchart of the overall study design.
Figure 2
Figure 2
Average accuracy scores of LLM-chatbot replies to ISDA-related queries, as determined by the three orthopaedic graders. Heatmap (A) of individual accuracy score, (B) average scores for accuracy, and (C) average comprehensiveness scores. The asterisks indicating statistical significance.
Figure 3
Figure 3
Consensus-based accuracy ratings of LLM-chatbot responses to IDSA practice guidelines related questions. LLM = Large language model. IDSA = Infectious Diseases Society of America.

Similar articles

References

    1. Issa K., Diebo B.G., Faloon M., Naziri Q., Pourtaheri S., Paulino C.B., Emami A. The epidemiology of vertebral osteomyelitis in the United States from 1998 to 2013. Clin. Spine Surg. 2018;31:E102–E108. doi: 10.1097/BSD.0000000000000597. - DOI - PubMed
    1. Deutscher Ärzteverlag GmbH Spondylodiscitis: Diagnosis and Treatment Options. [(accessed on 29 March 2024)]. Available online: https://www.aerzteblatt.de/int/archive/article/195481.
    1. Baryeh K., Anazor F., Iyer S., Rajagopal T. Spondylodiscitis in adults: Diagnosis and management. Br. J. Hosp. Med. 2022;83:1–9. doi: 10.12968/hmed.2021.0448. - DOI - PubMed
    1. Braun S., Diaremes P., Schönnagel L., Caffard T., Brenneis M., Meurer A. Spondylodiscitis. Orthopadie. 2023;52:677–690. - PubMed
    1. Lima D., Lopes N., Pereira A.L., Rodrigues D., Amaral-Silva M., Marques E. Diagnosis and treatment of spondylodiscitis: Insights from a five-year single-center study. Cureus. 2024;16:e74192. doi: 10.7759/cureus.74192. - DOI - PMC - PubMed

LinkOut - more resources