Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Jul 9;13(14):4013.
doi: 10.3390/jcm13144013.

Assessing the Accuracy of Artificial Intelligence Models in Scoliosis Classification and Suggested Therapeutic Approaches

Affiliations

Assessing the Accuracy of Artificial Intelligence Models in Scoliosis Classification and Suggested Therapeutic Approaches

Artur Fabijan et al. J Clin Med. .

Abstract

Background: Open-source artificial intelligence models (OSAIMs) are increasingly being applied in various fields, including IT and medicine, offering promising solutions for diagnostic and therapeutic interventions. In response to the growing interest in AI for clinical diagnostics, we evaluated several OSAIMs-such as ChatGPT 4, Microsoft Copilot, Gemini, PopAi, You Chat, Claude, and the specialized PMC-LLaMA 13B-assessing their abilities to classify scoliosis severity and recommend treatments based on radiological descriptions from AP radiographs. Methods: Our study employed a two-stage methodology, where descriptions of single-curve scoliosis were analyzed by AI models following their evaluation by two independent neurosurgeons. Statistical analysis involved the Shapiro-Wilk test for normality, with non-normal distributions described using medians and interquartile ranges. Inter-rater reliability was assessed using Fleiss' kappa, and performance metrics, like accuracy, sensitivity, specificity, and F1 scores, were used to evaluate the AI systems' classification accuracy. Results: The analysis indicated that although some AI systems, like ChatGPT 4, Copilot, and PopAi, accurately reflected the recommended Cobb angle ranges for disease severity and treatment, others, such as Gemini and Claude, required further calibration. Particularly, PMC-LLaMA 13B expanded the classification range for moderate scoliosis, potentially influencing clinical decisions and delaying interventions. Conclusions: These findings highlight the need for the continuous refinement of AI models to enhance their clinical applicability.

Keywords: ChatGPT 4; PMC-LLaMA; artificial intelligence; clinical decision support systems; scoliosis.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflicts of interest.

Figures

Figure 1
Figure 1
Illustration showing the performance of the PMC-LLaMA model compared to ChatGPT and LLaMA-2 in response to medical questions. This figure has been reconstructed based on the results from the work of Wu et al. [22]. The PubMedQA dataset consists of biomedical questions and answers collected from PubMed abstracts. The MedMCQA dataset comprises multiple-choice questions sourced from mock and past exams of two Indian medical school entrance tests, AIIMS and NEET-PG. The MedQA dataset features questions derived from medical exams. The y-axis represents the QA benchmark score; question answering (QA). The pie chart on the right illustrates the size of the language models, with ChatGPT (light gray) trained on the largest dataset, followed by LLaMA-2 (dark gray), and PMC-LLaMA (light blue) on the smallest.
Figure 2
Figure 2
Three stages of progression in single-curve scoliosis, and therapeutic approaches based on the AO Spine classification have been outlined. A mild form of scoliosis where the Cobb angle is approximately 12 degrees, as measured between vertebrae L1/L2 and Th8/Th9—monitoring and physiotherapy is recommended. A moderate form of scoliosis with a curvature angle of about 32 degrees, as measured between vertebrae L1/L2 and Th6/Th7—bracing is indicated. A severe form of single-arc scoliosis with a Cobb angle of about 56 degrees, as measured between vertebrae L3/L4 and Th6/Th7, qualifying for surgical intervention.
Figure 3
Figure 3
Comparative visualization of AI systems predicting scoliosis severity based on Cobb angle measurements.
Figure 4
Figure 4
Comparative efficacies of AI systems in predicting scoliosis treatment modalities from Cobb angle measurements.

References

    1. Uppalapati V.K., Nag D.S. A Comparative Analysis of AI Models in Complex Medical Decision-Making Scenarios: Evaluating ChatGPT, Claude AI, Bard, and Perplexity. Cureus. 2024;16:e52485. doi: 10.7759/cureus.52485. - DOI - PMC - PubMed
    1. Zhang H., Huang C., Wang D., Li K., Han X., Chen X., Li Z. Artificial Intelligence in Scoliosis: Current Applications and Future Directions. J. Clin. Med. 2023;12:7382. doi: 10.3390/jcm12237382. - DOI - PMC - PubMed
    1. Zong H., Li J., Wu E., Wu R., Lu J., Shen B. Performance of ChatGPT on Chinese national medical licensing examinations: A five-year examination evaluation study for physicians, pharmacists and nurses. BMC Med. Educ. 2024;24:143. doi: 10.1186/s12909-024-05125-7. - DOI - PMC - PubMed
    1. Saravia-Rojas M.Á., Camarena-Fonseca A.R., León-Manco R., Geng-Vivanco R. Artificial intelligence: ChatGPT as a disruptive didactic strategy in dental education. J. Dent. Educ. 2024;88:872–876. doi: 10.1002/jdd.13485. - DOI - PubMed
    1. Pradhan F., Fiedler A., Samson K., Olivera-Martinez M., Manatsathit W., Peeraphatdit T. Artificial intelligence compared with human-derived patient educational materials on cirrhosis. Hepatol. Commun. 2024;8:e0367. doi: 10.1097/HC9.0000000000000367. - DOI - PMC - PubMed

LinkOut - more resources