An analysis of ChatGPT recommendations for the diagnosis and treatment of cervical radiculopathy

Affiliations

¹ 1Department of Orthopaedics, Icahn School of Medicine at Mount Sinai, New York, New York; and.
² 2Chicago Medical School, Rosalind Franklin University, North Chicago, Illinois.

PMID: 38941643
DOI: 10.3171/2024.4.SPINE231148

An analysis of ChatGPT recommendations for the diagnosis and treatment of cervical radiculopathy

Timothy Hoang et al. J Neurosurg Spine. 2024.

. 2024 Jun 28;41(3):385-395.

doi: 10.3171/2024.4.SPINE231148. Print 2024 Sep 1.

Authors

Affiliations

¹ 1Department of Orthopaedics, Icahn School of Medicine at Mount Sinai, New York, New York; and.
² 2Chicago Medical School, Rosalind Franklin University, North Chicago, Illinois.

PMID: 38941643
DOI: 10.3171/2024.4.SPINE231148

Abstract

Objective: The objective of this study was to assess the safety and accuracy of ChatGPT recommendations in comparison to the evidence-based guidelines from the North American Spine Society (NASS) for the diagnosis and treatment of cervical radiculopathy.

Methods: ChatGPT was prompted with questions from the 2011 NASS clinical guidelines for cervical radiculopathy and evaluated for concordance. Selected key phrases within the NASS guidelines were identified. Completeness was measured as the number of overlapping key phrases between ChatGPT responses and NASS guidelines divided by the total number of key phrases. A senior spine surgeon evaluated the ChatGPT responses for safety and accuracy. ChatGPT responses were further evaluated on their readability, similarity, and consistency. Flesch Reading Ease scores and Flesch-Kincaid reading levels were measured to assess readability. The Jaccard Similarity Index was used to assess agreement between ChatGPT responses and NASS clinical guidelines.

Results: A total of 100 key phrases were identified across 14 NASS clinical guidelines. The mean completeness of ChatGPT-4 was 46%. ChatGPT-3.5 yielded a completeness of 34%. ChatGPT-4 outperformed ChatGPT-3.5 by a margin of 12%. ChatGPT-4.0 outputs had a mean Flesch reading score of 15.24, which is very difficult to read, requiring a college graduate education to understand. ChatGPT-3.5 outputs had a lower mean Flesch reading score of 8.73, indicating that they are even more difficult to read and require a professional education level to do so. However, both versions of ChatGPT were more accessible than NASS guidelines, which had a mean Flesch reading score of 4.58. Furthermore, with NASS guidelines as a reference, ChatGPT-3.5 registered a mean ± SD Jaccard Similarity Index score of 0.20 ± 0.078 while ChatGPT-4 had a mean of 0.18 ± 0.068. Based on physician evaluation, outputs from ChatGPT-3.5 and ChatGPT-4.0 were safe 100% of the time. Thirteen of 14 (92.8%) ChatGPT-3.5 responses and 14 of 14 (100%) ChatGPT-4.0 responses were in agreement with current best clinical practices for cervical radiculopathy according to a senior spine surgeon.

Conclusions: ChatGPT models were able to provide safe and accurate but incomplete responses to NASS clinical guideline questions about cervical radiculopathy. Although the authors' results suggest that improvements are required before ChatGPT can be reliably deployed in a clinical setting, future versions of the LLM hold promise as an updated reference for guidelines on cervical radiculopathy. Future versions must prioritize accessibility and comprehensibility for a diverse audience.

Keywords: ChatGPT; cervical radiculopathy; clinical guidelines; degenerative.

PubMed Disclaimer

MeSH terms

Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
- Sheridan PubFactory

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

An analysis of ChatGPT recommendations for the diagnosis and treatment of cervical radiculopathy

Affiliations

An analysis of ChatGPT recommendations for the diagnosis and treatment of cervical radiculopathy

Authors

Affiliations

Abstract

MeSH terms

LinkOut - more resources

Full Text Sources