Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Mar 20;15(1):9675.
doi: 10.1038/s41598-025-93106-1.

Integrating multidimensional data analytics for precision diagnosis of chronic low back pain

Affiliations

Integrating multidimensional data analytics for precision diagnosis of chronic low back pain

Sam Vickery et al. Sci Rep. .

Abstract

Low back pain (LBP) is a leading cause of disability worldwide, with up to 25% of cases become chronic (cLBP). Whilst multi-factorial, the relative importance of contributors to cLBP remains unclear. We leveraged a comprehensive multi-dimensional data-set and machine learning-based variable importance selection to identify the most effective modalities for differentiating whether a person has cLBP. The dataset included questionnaire data, clinical and functional assessments, and spino-pelvic magnetic resonance imaging (MRI), encompassing a total of 144 parameters from 1,161 adults with (n = 512) and without cLBP (n = 649). Boruta and random forest were utilised for variable importance selection and cLBP classification respectively. A multimodal model including questionnaire, clinical, and MRI data was the most effective in differentiating people with and without cLBP. From this, the most robust variables (n = 9) were psychosocial factors, neck and hip mobility, as well as lower lumbar disc herniation and degeneration. This finding persisted in an unseen holdout dataset. Beyond demonstrating the importance of a multi-dimensional approach to cLBP, our findings will guide the development of targeted diagnostics and personalized treatment strategies for cLBP patients.

Keywords: Chronic low back pain; Classification; Data-driven; Feature selection; MRI; Multi-modality; Psychosocial.

PubMed Disclaimer

Conflict of interest statement

Declarations. Competing interests: All authors declare no conflict of interests.

Figures

Fig. 1
Fig. 1
Modality dataset distributions and machine leaning workflow. A Top shows the chronic low back pain (cLBP) sample size distribution across all 15 dataset modalities. Bottom presents the number of variables used for cLBP classification and variable importance selection across the 15 dataset modalities. B Represents the machine learning workflow implemented to compare the different modalities and determine the most important variables for cLBP patient delineation using a random forest binary classification algorithm for training and testing.
Fig. 2
Fig. 2
Modality datasets age, sex, and cLBP distributions. Violin plots for non-imputed dataset A questionnaire; B clinical assessment: C back shape and function; D MRI; E questionnaire + clinic; F questionnaire + back shape and function; G questionnaire + MRI; H clinic + back shape and function; I clinic + MRI; J back shape and function + MRI; K questionnaire + clinic + back shape and function; L questionnaire + clinic + MRI; M questionnaire + back shape and function + MRI; N clinic + back shape and function + MRI; O all datasets.
Fig. 3
Fig. 3
Boruta variable reduction performance. AC shows RF classification model performance (AUC) following the reduction of variables using Boruta and all variables in the single, dual, and multi data modalities respectively. This shows the change in performance follow variable reduction. Error bars represent 95% CI in AUC over tenfold train-test splits. D Shows the amount of variable reduction by using Boruta as a percentage of the total number of variables within each modality dataset.
Fig. 4
Fig. 4
Most robust and important variables for chronic low back pain classification performance. A Presents a bar plot of the nine most robust variables in order of average Boruta importance score (left). The right bar plot shows the absolute effect size (Cohen’s r or ω depending on data type) comparing controls and cLBP patients of the nine robust variables. B Column plot showing RF classification performance as mean of five-fold train-test iterations in hold-out set using Boruta selected and all variables. Column plot error bars represent 95% CI. IVD—intervertebral disc, SF-36—short form 36 health status questionnaire, Acc—accuracy, AUC—Area under the receiver operating characteristic curve, Sens—sensitivity, Spec—specificity.

References

    1. Andersson, G. B. Epidemiologic aspects on low-back pain in industry. Spine (Phila Pa 1976)6, 53–60 (1981). - PubMed
    1. Meucci, R. D., Fassa, A. G. & Faria, N. M. X. Prevalence of chronic low back pain: systematic review. Rev Saude Publica49, 1 (2015). - PMC - PubMed
    1. Pastorino, R. et al. Benefits and challenges of Big Data in healthcare: An overview of the European initiatives. Eur. J. Public Health29, 23–27 (2019). - PMC - PubMed
    1. Shilo, S., Rossman, H. & Segal, E. Axes of a revolution: challenges and promises of big data in healthcare. Nat. Med.26, 29–38 (2020). - PubMed
    1. Tagliaferri, S. D. et al. Relative contributions of the nervous system, spinal tissue and psychosocial health to non-specific low back pain: Multivariate meta-analysis. Eur. J. Pain10.1002/ejp.1883 (2021). - PubMed