Optimizing Rare Disease Gait Classification through Data Balancing and Generative AI: Insights from Hereditary Cerebellar Ataxia
- PMID: 38894404
- PMCID: PMC11175240
- DOI: 10.3390/s24113613
Optimizing Rare Disease Gait Classification through Data Balancing and Generative AI: Insights from Hereditary Cerebellar Ataxia
Abstract
The interpretability of gait analysis studies in people with rare diseases, such as those with primary hereditary cerebellar ataxia (pwCA), is frequently limited by the small sample sizes and unbalanced datasets. The purpose of this study was to assess the effectiveness of data balancing and generative artificial intelligence (AI) algorithms in generating synthetic data reflecting the actual gait abnormalities of pwCA. Gait data of 30 pwCA (age: 51.6 ± 12.2 years; 13 females, 17 males) and 100 healthy subjects (age: 57.1 ± 10.4; 60 females, 40 males) were collected at the lumbar level with an inertial measurement unit. Subsampling, oversampling, synthetic minority oversampling, generative adversarial networks, and conditional tabular generative adversarial networks (ctGAN) were applied to generate datasets to be input to a random forest classifier. Consistency and explainability metrics were also calculated to assess the coherence of the generated dataset with known gait abnormalities of pwCA. ctGAN significantly improved the classification performance compared with the original dataset and traditional data augmentation methods. ctGAN are effective methods for balancing tabular datasets from populations with rare diseases, owing to their ability to improve diagnostic models with consistent explainability.
Keywords: cerebellar ataxia; conditional tabular generative artificial network; data augmentation; data balancing; gait analysis; generative artificial intelligence; generative artificial network; inertial measurement unit; rare diseases; synthetic minority oversampling technique.
Conflict of interest statement
The authors declare no conflicts of interest.
Figures







Similar articles
-
Tabular transformer generative adversarial network for heterogeneous distribution in healthcare.Sci Rep. 2025 Mar 25;15(1):10254. doi: 10.1038/s41598-025-93077-3. Sci Rep. 2025. PMID: 40133347 Free PMC article.
-
Data Augmentation of a Corrosion Dataset for Defect Growth Prediction of Pipelines Using Conditional Tabular Generative Adversarial Networks.Materials (Basel). 2024 Mar 1;17(5):1142. doi: 10.3390/ma17051142. Materials (Basel). 2024. PMID: 38473613 Free PMC article.
-
Synthetic Lung Ultrasound Data Generation Using Autoencoder With Generative Adversarial Network.IEEE Trans Ultrason Ferroelectr Freq Control. 2025 May;72(5):624-635. doi: 10.1109/TUFFC.2025.3555447. Epub 2025 May 7. IEEE Trans Ultrason Ferroelectr Freq Control. 2025. PMID: 40146656
-
Quantitative Gait and Balance Outcomes for Ataxia Trials: Consensus Recommendations by the Ataxia Global Initiative Working Group on Digital-Motor Biomarkers.Cerebellum. 2024 Aug;23(4):1566-1592. doi: 10.1007/s12311-023-01625-2. Epub 2023 Nov 13. Cerebellum. 2024. PMID: 37955812 Free PMC article. Review.
-
Generative artificial intelligence: synthetic datasets in dentistry.BDJ Open. 2024 Mar 1;10(1):13. doi: 10.1038/s41405-024-00198-4. BDJ Open. 2024. PMID: 38429258 Free PMC article. Review.
Cited by
-
Influence of Main Thoracic and Thoracic Kyphosis Morphology on Gait Characteristics in Adolescents with Idiopathic Scoliosis: Gait Analysis Using an Inertial Measurement Unit.Sensors (Basel). 2025 Jul 9;25(14):4265. doi: 10.3390/s25144265. Sensors (Basel). 2025. PMID: 40732393 Free PMC article.
-
Development of machine learning models for gait-based classification of incomplete spinal cord injuries and cauda equina syndrome.Sci Rep. 2025 Jun 6;15(1):20012. doi: 10.1038/s41598-025-04065-6. Sci Rep. 2025. PMID: 40481015 Free PMC article.
-
Identification and quantification of muscular cocontraction for ankle rehabilitation through variational mode decomposition in surface electromyography.Sci Rep. 2025 Apr 28;15(1):14847. doi: 10.1038/s41598-025-96334-7. Sci Rep. 2025. PMID: 40295627 Free PMC article.
-
Gait stability prediction through synthetic time-series and vision-based data.Front Sports Act Living. 2025 Aug 13;7:1646146. doi: 10.3389/fspor.2025.1646146. eCollection 2025. Front Sports Act Living. 2025. PMID: 40881479 Free PMC article.
-
Gait-based Parkinson's disease diagnosis and severity classification using force sensors and machine learning.Sci Rep. 2025 Jan 2;15(1):328. doi: 10.1038/s41598-024-83357-9. Sci Rep. 2025. PMID: 39747956 Free PMC article.
References
-
- Rinaldi M., Ranavolo A., Conforto S., Martino G., Draicchio F., Conte C., Varrecchia T., Bini F., Casali C., Pierelli F., et al. Increased Lower Limb Muscle Coactivation Reduces Gait Performance and Increases Metabolic Cost in Patients with Hereditary Spastic Paraparesis. Clin. Biomech. 2017;48:63–72. doi: 10.1016/J.CLINBIOMECH.2017.07.013. - DOI - PubMed
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Medical