Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Oct 16:3:135.
doi: 10.1038/s41746-020-00338-8. eCollection 2020.

Ensemble learning predicts multiple sclerosis disease course in the SUMMIT study

Affiliations

Ensemble learning predicts multiple sclerosis disease course in the SUMMIT study

Yijun Zhao et al. NPJ Digit Med. .

Erratum in

Abstract

The rate of disability accumulation varies across multiple sclerosis (MS) patients. Machine learning techniques may offer more powerful means to predict disease course in MS patients. In our study, 724 patients from the Comprehensive Longitudinal Investigation in MS at Brigham and Women's Hospital (CLIMB study) and 400 patients from the EPIC dataset, University of California, San Francisco, were included in the analysis. The primary outcome was an increase in Expanded Disability Status Scale (EDSS) ≥ 1.5 (worsening) or not (non-worsening) at up to 5 years after the baseline visit. Classification models were built using the CLIMB dataset with patients' clinical and MRI longitudinal observations in first 2 years, and further validated using the EPIC dataset. We compared the performance of three popular machine learning algorithms (SVM, Logistic Regression, and Random Forest) and three ensemble learning approaches (XGBoost, LightGBM, and a Meta-learner L). A "threshold" was established to trade-off the performance between the two classes. Predictive features were identified and compared among different models. Machine learning models achieved 0.79 and 0.83 AUC scores for the CLIMB and EPIC datasets, respectively, shortly after disease onset. Ensemble learning methods were more effective and robust compared to standalone algorithms. Two ensemble models, XGBoost and LightGBM were superior to the other four models evaluated in our study. Of variables evaluated, EDSS, Pyramidal Function, and Ambulatory Index were the top common predictors in forecasting the MS disease course. Machine learning techniques, in particular ensemble methods offer increased accuracy for the prediction of MS disease course.

Keywords: Multiple sclerosis.

PubMed Disclaimer

Conflict of interest statement

Competing interestsThe authors declare no competing interests. Complete disclosures are listed on ICJME forms.

Figures

Fig. 1
Fig. 1. Illustration of three baseline machine learning models.
a Support Vector Machine: red squares and blue circles represent data from different classes. The optimal decision plane achieves the largest separation, or margin, between the two classes. b A Random Forest with n decision trees. Each tree is trained with a randomly sampled subset of training data. Predictions from all trees are combined using majority voting to produce a final decision. c Logistic Regression with one dependent variable. The blue line is the linear regression model of the observed data. The sigmoid function transforms the linear model’s predictions into values between 0 and 1, which indicate the observations’ likelihood of belonging to the positive class.
Fig. 2
Fig. 2. Illustration of ensemble learning and adaptive boosting.
a Ensemble learning: L1, L2, …, Ln are independent learners trained on the entire training data D. The stacked generalizer is a logistic regression model trained to produce a final prediction P based on the decisions from individual classifiers. Model performance is measured using the final predictions. b Adaptive boosting: checkmarks and crosses indicate correctly and incorrectly classified instances, respectively. The heights of the rectangles are proportional to the weights of the training instances. A sequence of learners, L1, L2, …, Ln, is generated with each new model trained on a re-weighted dataset, which boosts the weights of the misclassified training instances in the previous model.

References

    1. Mowry, E. M. Natural history of multiple sclerosis: early prognostic factors. Neurol. Clin. 29, 279–292 (2011). - PubMed
    1. Confavreux C, Vukusic S. Age at disability milestones in multiple sclerosis. Brain. 2006;129:595–605. doi: 10.1093/brain/awh714. - DOI - PubMed
    1. Confavreux C, Vukusic S, Adeleine P. Early clinical predictors and progression of irreversible disability in multiple sclerosis: an amnesic process. Brain. 2003;126:770–782. doi: 10.1093/brain/awg081. - DOI - PubMed
    1. Renoux C, et al. Natural history of multiple sclerosis with childhood onset. N. Engl. J. Med. 2007;356:2603–2613. doi: 10.1056/NEJMoa067597. - DOI - PubMed
    1. Amato, M. & Ponziani, G. A prospective study on the prognosis of multiple sclerosis. Neurol. Sci.21, S831–S838 (2000). - PubMed