Assessing the Accuracy of Artificial Intelligence Models in Scoliosis Classification and Suggested Therapeutic Approaches

Artur Fabijan¹, Agnieszka Zawadzka-Fabijan², Robert Fabijan³, Krzysztof Zakrzewski¹, Emilia Nowosławska¹, Bartosz Polis¹

Affiliations

¹ Department of Neurosurgery, Polish-Mother's Memorial Hospital Research Institute, 93-338 Lodz, Poland.
² Department of Rehabilitation Medicine, Faculty of Health Sciences, Medical University of Lodz, 90-419 Lodz, Poland.
³ Independent Researcher, Luton LU2 0GS, UK.

PMID: 39064053
PMCID: PMC11278075
DOI: 10.3390/jcm13144013

Assessing the Accuracy of Artificial Intelligence Models in Scoliosis Classification and Suggested Therapeutic Approaches

Artur Fabijan et al. J Clin Med. 2024.

. 2024 Jul 9;13(14):4013.

doi: 10.3390/jcm13144013.

Authors

Artur Fabijan¹, Agnieszka Zawadzka-Fabijan², Robert Fabijan³, Krzysztof Zakrzewski¹, Emilia Nowosławska¹, Bartosz Polis¹

Affiliations

¹ Department of Neurosurgery, Polish-Mother's Memorial Hospital Research Institute, 93-338 Lodz, Poland.
² Department of Rehabilitation Medicine, Faculty of Health Sciences, Medical University of Lodz, 90-419 Lodz, Poland.
³ Independent Researcher, Luton LU2 0GS, UK.

PMID: 39064053
PMCID: PMC11278075
DOI: 10.3390/jcm13144013

Abstract

Background: Open-source artificial intelligence models (OSAIMs) are increasingly being applied in various fields, including IT and medicine, offering promising solutions for diagnostic and therapeutic interventions. In response to the growing interest in AI for clinical diagnostics, we evaluated several OSAIMs-such as ChatGPT 4, Microsoft Copilot, Gemini, PopAi, You Chat, Claude, and the specialized PMC-LLaMA 13B-assessing their abilities to classify scoliosis severity and recommend treatments based on radiological descriptions from AP radiographs. Methods: Our study employed a two-stage methodology, where descriptions of single-curve scoliosis were analyzed by AI models following their evaluation by two independent neurosurgeons. Statistical analysis involved the Shapiro-Wilk test for normality, with non-normal distributions described using medians and interquartile ranges. Inter-rater reliability was assessed using Fleiss' kappa, and performance metrics, like accuracy, sensitivity, specificity, and F1 scores, were used to evaluate the AI systems' classification accuracy. Results: The analysis indicated that although some AI systems, like ChatGPT 4, Copilot, and PopAi, accurately reflected the recommended Cobb angle ranges for disease severity and treatment, others, such as Gemini and Claude, required further calibration. Particularly, PMC-LLaMA 13B expanded the classification range for moderate scoliosis, potentially influencing clinical decisions and delaying interventions. Conclusions: These findings highlight the need for the continuous refinement of AI models to enhance their clinical applicability.

Keywords: ChatGPT 4; PMC-LLaMA; artificial intelligence; clinical decision support systems; scoliosis.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflicts of interest.

Figures

**Figure 1**
Illustration showing the performance of the PMC-LLaMA model compared to ChatGPT and LLaMA-2 in response to medical questions. This figure has been reconstructed based on the results from the work of Wu et al. [22]. The PubMedQA dataset consists of biomedical questions and answers collected from PubMed abstracts. The MedMCQA dataset comprises multiple-choice questions sourced from mock and past exams of two Indian medical school entrance tests, AIIMS and NEET-PG. The MedQA dataset features questions derived from medical exams. The y-axis represents the QA benchmark score; question answering (QA). The pie chart on the right illustrates the size of the language models, with ChatGPT (light gray) trained on the largest dataset, followed by LLaMA-2 (dark gray), and PMC-LLaMA (light blue) on the smallest.

**Figure 2**
Three stages of progression in single-curve scoliosis, and therapeutic approaches based on the AO Spine classification have been outlined. A mild form of scoliosis where the Cobb angle is approximately 12 degrees, as measured between vertebrae L1/L2 and Th8/Th9—monitoring and physiotherapy is recommended. A moderate form of scoliosis with a curvature angle of about 32 degrees, as measured between vertebrae L1/L2 and Th6/Th7—bracing is indicated. A severe form of single-arc scoliosis with a Cobb angle of about 56 degrees, as measured between vertebrae L3/L4 and Th6/Th7, qualifying for surgical intervention.

**Figure 3**
Comparative visualization of AI systems predicting scoliosis severity based on Cobb angle measurements.

**Figure 4**
Comparative efficacies of AI systems in predicting scoliosis treatment modalities from Cobb angle measurements.

See this image and copyright information in PMC

References

1. Uppalapati V.K., Nag D.S. A Comparative Analysis of AI Models in Complex Medical Decision-Making Scenarios: Evaluating ChatGPT, Claude AI, Bard, and Perplexity. Cureus. 2024;16:e52485. doi: 10.7759/cureus.52485. - DOI - PMC - PubMed
1. Zhang H., Huang C., Wang D., Li K., Han X., Chen X., Li Z. Artificial Intelligence in Scoliosis: Current Applications and Future Directions. J. Clin. Med. 2023;12:7382. doi: 10.3390/jcm12237382. - DOI - PMC - PubMed
1. Zong H., Li J., Wu E., Wu R., Lu J., Shen B. Performance of ChatGPT on Chinese national medical licensing examinations: A five-year examination evaluation study for physicians, pharmacists and nurses. BMC Med. Educ. 2024;24:143. doi: 10.1186/s12909-024-05125-7. - DOI - PMC - PubMed
1. Saravia-Rojas M.Á., Camarena-Fonseca A.R., León-Manco R., Geng-Vivanco R. Artificial intelligence: ChatGPT as a disruptive didactic strategy in dental education. J. Dent. Educ. 2024;88:872–876. doi: 10.1002/jdd.13485. - DOI - PubMed
1. Pradhan F., Fiedler A., Samson K., Olivera-Martinez M., Manatsathit W., Peeraphatdit T. Artificial intelligence compared with human-derived patient educational materials on cirrhosis. Hepatol. Commun. 2024;8:e0367. doi: 10.1097/HC9.0000000000000367. - DOI - PMC - PubMed

LinkOut - more resources

Full Text Sources
- MDPI
- PubMed Central

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Assessing the Accuracy of Artificial Intelligence Models in Scoliosis Classification and Suggested Therapeutic Approaches

Affiliations

Assessing the Accuracy of Artificial Intelligence Models in Scoliosis Classification and Suggested Therapeutic Approaches

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

LinkOut - more resources

Full Text Sources