. 2022 May;4(5):e359-e369.

doi: 10.1016/S2589-7500(21)00274-0. Epub 2022 Mar 24.

Identifying and predicting amyotrophic lateral sclerosis clinical subgroups: a population-based machine-learning study

Collaborators, Affiliations

Collaborators

Adriano Chiò, Andrea Calvo, Cristina Moglia, Antonio Canosa, Umberto Manera, Rosario Vasta, Francesca Palumbo, Alessandro Bombaci, Maurizio Grassano, Maura Brunetti, Federico Casale, Giuseppe Fuda, Paolina Salamone, Barbara Iazzolino, Laura Peotta, Paolo Cugnasco, Giovanni De Marco, Maria Claudia Torrieri, Salvatore Gallone, Marco Barberis, Luca Sbaiz, Salvatore Gentile, Alessandro Mauro, Letizia Mazzini, Fabiola De Marchi, Lucia Corrado, Sandra D'Alfonso, Antonio Bertolotto, Daniele Imperiale, Marco De Mattei, Salvatore Amarù, Cristoforo Comi, Carmelo Labate, Fabio Poglio, Luigi Ruiz, Lucia Testa, Eugenia Rota, Paolo Ghiglione, Nicola Launaro, Alessia Di Sapio, Jessica Mandrioli, Nicola Fini, Ilaria Martinelli, Elisabetta Zucchi, Giulia Gianferrari, Cecilia Simonini, Stefano Meletti, Rocco Liguori, Veria Vacchiano, Fabrizio Salvi, Ilaria Bartolomei, Roberto Michelucci, Pietro Cortelli, Rita Rinaldi, Anna Maria Borghi, Andrea Zini, Elisabetta Sette, Valeria Tugnoli, Maura Pugliatti, Elena Canali, Luca Codeluppi, Franco Valzania, Lucia Zinno, Giovanni Pavesi, Doriana Medici, Giovanna Pilurzi, Emilio Terlizzi, Donata Guidetti, Silvia De Pasqua, Mario Santangelo, Patrizia De Massis, Martina Bracaglia, Mario Casmiro, Pietro Querzani, Simonetta Morresi, Marco Longoni, Alberto Patuelli, Susanna Malagù, Marco Currò Dossi, Simone Vidale, Salvatore Ferro

Affiliations

¹ Neuromuscular Diseases Research Section, Laboratory of Neurogenetics, US National Institute on Aging, Bethesda, MD, USA; Center for Alzheimer's and Related Dementias, US National Institute on Aging, Bethesda, MD, USA; Data Tecnica International, Glen Echo, MD, USA; Department of Computer Science, University of Illinois at Urbana-Champaign, Champaign, IL, USA.
² Department of Computer Science, University of Illinois at Urbana-Champaign, Champaign, IL, USA.
³ Center for Alzheimer's and Related Dementias, US National Institute on Aging, Bethesda, MD, USA; Data Tecnica International, Glen Echo, MD, USA; Department of Computer Science, University of Illinois at Urbana-Champaign, Champaign, IL, USA.
⁴ Department of Biomedical, Metabolic and Neural Sciences, University of Modena and Reggio Emilia, Modena, Italy.
⁵ Neurology Unit, Department of Neurosciences, Azienda Ospedaliero Universitaria di Modena, Modena, Italy.
⁶ ALS Centre, Department of Neurology, Maggiore della Carità University Hospital, Novara, Italy.
⁷ Rita Levi Montalcini, Department of Neuroscience, University of Turin, Turin, Italy.
⁸ Center for Alzheimer's and Related Dementias, US National Institute on Aging, Bethesda, MD, USA; Data Tecnica International, Glen Echo, MD, USA.
⁹ Department of Biomedical, Metabolic and Neural Sciences, University of Modena and Reggio Emilia, Modena, Italy; Neurology Unit, Department of Neurosciences, Azienda Ospedaliero Universitaria di Modena, Modena, Italy.
¹⁰ Neuromuscular Diseases Research Section, Laboratory of Neurogenetics, US National Institute on Aging, Bethesda, MD, USA; Department of Neurology, Johns Hopkins University Medical Center, Baltimore, MD, USA; Reta Lila Weston Institute, UCL Queen Square Institute of Neurology, University College London, London, UK. Electronic address: traynorb@mail.nih.gov.
¹¹ Rita Levi Montalcini, Department of Neuroscience, University of Turin, Turin, Italy; Institute of Cognitive Sciences and Technologies, CNR, Rome, Italy; Neurology 1 and ALS Centre, Azienda Ospedaliero Universitaria Città della Salute e della Scienza, Turin, Italy.

PMID: 35341712
PMCID: PMC9038712
DOI: 10.1016/S2589-7500(21)00274-0

Identifying and predicting amyotrophic lateral sclerosis clinical subgroups: a population-based machine-learning study

Faraz Faghri et al. Lancet Digit Health. 2022 May.

. 2022 May;4(5):e359-e369.

doi: 10.1016/S2589-7500(21)00274-0. Epub 2022 Mar 24.

Authors

Collaborators

Adriano Chiò, Andrea Calvo, Cristina Moglia, Antonio Canosa, Umberto Manera, Rosario Vasta, Francesca Palumbo, Alessandro Bombaci, Maurizio Grassano, Maura Brunetti, Federico Casale, Giuseppe Fuda, Paolina Salamone, Barbara Iazzolino, Laura Peotta, Paolo Cugnasco, Giovanni De Marco, Maria Claudia Torrieri, Salvatore Gallone, Marco Barberis, Luca Sbaiz, Salvatore Gentile, Alessandro Mauro, Letizia Mazzini, Fabiola De Marchi, Lucia Corrado, Sandra D'Alfonso, Antonio Bertolotto, Daniele Imperiale, Marco De Mattei, Salvatore Amarù, Cristoforo Comi, Carmelo Labate, Fabio Poglio, Luigi Ruiz, Lucia Testa, Eugenia Rota, Paolo Ghiglione, Nicola Launaro, Alessia Di Sapio, Jessica Mandrioli, Nicola Fini, Ilaria Martinelli, Elisabetta Zucchi, Giulia Gianferrari, Cecilia Simonini, Stefano Meletti, Rocco Liguori, Veria Vacchiano, Fabrizio Salvi, Ilaria Bartolomei, Roberto Michelucci, Pietro Cortelli, Rita Rinaldi, Anna Maria Borghi, Andrea Zini, Elisabetta Sette, Valeria Tugnoli, Maura Pugliatti, Elena Canali, Luca Codeluppi, Franco Valzania, Lucia Zinno, Giovanni Pavesi, Doriana Medici, Giovanna Pilurzi, Emilio Terlizzi, Donata Guidetti, Silvia De Pasqua, Mario Santangelo, Patrizia De Massis, Martina Bracaglia, Mario Casmiro, Pietro Querzani, Simonetta Morresi, Marco Longoni, Alberto Patuelli, Susanna Malagù, Marco Currò Dossi, Simone Vidale, Salvatore Ferro

Affiliations

¹ Neuromuscular Diseases Research Section, Laboratory of Neurogenetics, US National Institute on Aging, Bethesda, MD, USA; Center for Alzheimer's and Related Dementias, US National Institute on Aging, Bethesda, MD, USA; Data Tecnica International, Glen Echo, MD, USA; Department of Computer Science, University of Illinois at Urbana-Champaign, Champaign, IL, USA.
² Department of Computer Science, University of Illinois at Urbana-Champaign, Champaign, IL, USA.
³ Center for Alzheimer's and Related Dementias, US National Institute on Aging, Bethesda, MD, USA; Data Tecnica International, Glen Echo, MD, USA; Department of Computer Science, University of Illinois at Urbana-Champaign, Champaign, IL, USA.
⁴ Department of Biomedical, Metabolic and Neural Sciences, University of Modena and Reggio Emilia, Modena, Italy.
⁵ Neurology Unit, Department of Neurosciences, Azienda Ospedaliero Universitaria di Modena, Modena, Italy.
⁶ ALS Centre, Department of Neurology, Maggiore della Carità University Hospital, Novara, Italy.
⁷ Rita Levi Montalcini, Department of Neuroscience, University of Turin, Turin, Italy.
⁸ Center for Alzheimer's and Related Dementias, US National Institute on Aging, Bethesda, MD, USA; Data Tecnica International, Glen Echo, MD, USA.
⁹ Department of Biomedical, Metabolic and Neural Sciences, University of Modena and Reggio Emilia, Modena, Italy; Neurology Unit, Department of Neurosciences, Azienda Ospedaliero Universitaria di Modena, Modena, Italy.
¹⁰ Neuromuscular Diseases Research Section, Laboratory of Neurogenetics, US National Institute on Aging, Bethesda, MD, USA; Department of Neurology, Johns Hopkins University Medical Center, Baltimore, MD, USA; Reta Lila Weston Institute, UCL Queen Square Institute of Neurology, University College London, London, UK. Electronic address: traynorb@mail.nih.gov.
¹¹ Rita Levi Montalcini, Department of Neuroscience, University of Turin, Turin, Italy; Institute of Cognitive Sciences and Technologies, CNR, Rome, Italy; Neurology 1 and ALS Centre, Azienda Ospedaliero Universitaria Città della Salute e della Scienza, Turin, Italy.

PMID: 35341712
PMCID: PMC9038712
DOI: 10.1016/S2589-7500(21)00274-0

Abstract

Background: Amyotrophic lateral sclerosis (ALS) is known to represent a collection of overlapping syndromes. Various classification systems based on empirical observations have been proposed, but it is unclear to what extent they reflect ALS population substructures. We aimed to use machine-learning techniques to identify the number and nature of ALS subtypes to obtain a better understanding of this heterogeneity, enhance our understanding of the disease, and improve clinical care.

Methods: In this retrospective study, we applied unsupervised Uniform Manifold Approximation and Projection [UMAP]) modelling, semi-supervised (neural network UMAP) modelling, and supervised (ensemble learning based on LightGBM) modelling to a population-based discovery cohort of patients who were diagnosed with ALS while living in the Piedmont and Valle d'Aosta regions of Italy, for whom detailed clinical data, such as age at symptom onset, were available. We excluded patients with missing Revised ALS Functional Rating Scale (ALSFRS-R) feature values from the unsupervised and semi-supervised steps. We replicated our findings in an independent population-based cohort of patients who were diagnosed with ALS while living in the Emilia Romagna region of Italy.

Findings: Between Jan 1, 1995, and Dec 31, 2015, 2858 patients were entered in the discovery cohort. After excluding 497 (17%) patients with missing ALSFRS-R feature values, data for 42 clinical features across 2361 (83%) patients were available for the unsupervised and semi-supervised analysis. We found that semi-supervised machine learning produced the optimum clustering of the patients with ALS. These clusters roughly corresponded to the six clinical subtypes defined by the Chiò classification system (ie, bulbar, respiratory, flail arm, classical, pyramidal, and flail leg ALS). Between Jan 1, 2009, and March 1, 2018, 1097 patients were entered in the replication cohort. After excluding 108 (10%) patients with missing ALSFRS-R feature values, data for 42 clinical features across 989 patients were available for the unsupervised and semi-supervised analysis. All 1097 patients were included in the supervised analysis. The same clusters were identified in the replication cohort. By contrast, other ALS classification schemes, such as the El Escorial categories, Milano-Torino clinical staging, and King's clinical stages, did not adequately label the clusters. Supervised learning identified 11 clinical parameters that predicted ALS clinical subtypes with high accuracy (area under the curve 0·982 [95% CI 0·980-0·983]).

Interpretation: Our data-driven study provides insight into the ALS population substructure and confirms that the Chiò classification system successfully identifies ALS subtypes. Additional validation is required to determine the accuracy and clinical use of these algorithms in assigning clinical subtypes. Nevertheless, our algorithms offer a broad insight into the clinical heterogeneity of ALS and help to determine the actual subtypes of disease that exist within this fatal neurodegenerative syndrome. The systematic identification of ALS subtypes will improve clinical care and clinical trial design.

Funding: US National Institute on Aging, US National Institutes of Health, Italian Ministry of Health, European Commission, University of Torino Rita Levi Montalcini Department of Neurosciences, Emilia Romagna Regional Health Authority, and Italian Ministry of Education, University, and Research.

Translations: For the Italian and German translations of the abstract see Supplementary Materials section.

PubMed Disclaimer

Conflict of interest statement

Declaration of interests BJT holds patents on the clinical testing and therapeutic intervention for the hexanucleotide repeat expansion of C9orf72 (patent numbers EP2751284A1, CA2846307A, and 20180187262); received research grants from the Myasthenia Gravis Foundation, ALS Association, US Center for Disease Control and Prevention, US Department of Veterans Affairs, MSD, and Cerevel Therapeutics; receives funding through the Intramural Research Program at the US National Institutes of Health (NIH), is on the scientific advisory committee of the American Neurological Association, is an associate editor of Brain, and is on the editorial boards of Journal of Neurology, Neurosurgery, and Psychiatry, Neurobiology of Aging, and eClinicalMedicine. JM received research grants from the Fondazione Italiana di Ricerca per la Sclerosi Laterale Amiotrofica, Agenzia Italiana del Farmaco, Italian Ministry of Health, Emilia Romagna Regional Health Authority, and Pfizer. ACh received research funding and honoraria for lectures from Biogen; sits on advisory boards for Mitsubishi Tanabe Pharma, Roche, Denali Therapeutics, Cytokinetics, Biogen, Amylyx Pharmaceuticals, and Sanofi; and participates in data safety monitoring boards for Lilly and AB Science. RV received research scholarship funding from the Rotary Club (global grant GG2094854). FF is employed by Data Tecnica International. MAN is employed by Data Tecnica International and is an adviser for Clover Therapeutics and Neuron23. AD is employed by Data Tecnica International. All other authors declare no competing interests.

Figures

**Figure 1.. Workflow followed in this study.**
Unsupervised and semi-supervised machine learning was applied to clinical data collected from two population-based ALS registries (n = 2,858 cases and 1,097 cases) to identify clinical subtypes. Supervised machine learning was used to predict subtypes based on clinical parameters, and a web-based tool was built for clinical researchers to apply to their own data.

**Figure 2.. The ALS subtypes identified by machine learning in the discovery and replication cohorts.**
The top row (A) shows the three-dimensional projections of the discovery cohort (n = 2,361) defined by the semi-supervised machine learning algorithm consisting of a UMAP algorithm applied to the output of a five-layer neural network. The same three-dimensional projections (left panel = 100 degrees azimuthal rotation, center panel = 135 degrees, and right panel = 170 degrees) of the replication cohort (n = 989) are shown in the bottom row (B). The projections are symbolic representations of ALS subtypes. Each patient (dot) was color-coded after machine learning cluster generation according to the Chiò classification system. Interactive three-dimensional graphs are available on https://share.streamlit.io/anant-dadu/machinelearningforals/main.

**Figure 3.. Different classification schema applied to the semi-supervised 3D projection of the ALS discovery cohort (n = 2,361).**
(A) The El Escorial classification system assigns patients to definite (def.), probable (prob.), probable - laboratory supported (prob. - lab.), possible (poss.), and suspected (susp.) categories based on their disability. (B) Patients with a family history of ALS are represented by red dots, and blue dots show patients with sporadic disease. (C) Patients carrying the pathogenic repeat expansion are represented by red dots. (D) The MITOS classification system assigns patients to clinical stages 0 to 4 based on their disability. (E) The ALSFRS-R score rates the severity of disability ranging from 0 to 48 (no disability). (F) The King’s clinical staging system classifies patients into four stages according to their disability level.

**Figure 4.. Clinical parameters used in the supervised machine learning model to predict ALS clinical subtype.**
(A) Graphical representation of the overlap between the eleven parameters with the most significant impact on the classification model. The dark circles in the dot plot indicate the parameters that are part of an intersection, and the vertical bar plot reports the number of patients with that parameter combination. The horizontal bar plot reports the set sizes. Analysis was confined to 699 ALS patients with no missing data. (B) Distribution of the parameters in each patient. On average, a patient had five of these clinical features. (C - E) The distribution of the age at onset, weight at diagnosis, and forced vital capacity percent at diagnosis in the analyzed patients.

**Figure 5.. The eleven features used in the supervised machine learning model to predict ALS clinical subtype.**
(A) Distribution of the Shap values for the eleven features with the most significant impact on the classification model. Each point represents a subject and may have a positive or negative impact depending on its SHAP value. For instance, high values of the rate of BMI decline in red contribute strongly to the positive class, while low values in blue contribute to a lesser extent to the negative class. (B & C) The aggregate of the Shap values is shown for the top eleven features (ranked from most to least important). (D) Model output trajectory for a single subject with the bulbar subtype of ALS. The predicted probability that the patient had the bulbar subtype of ALS was 0.91, predominantly driven by the patient’s bulbar site of symptom onset and only minorly driven by their smoking status and El Escorial category at diagnosis. See https://share.streamlit.io/anant-dadu/machinelearningforals/main for more examples.

See this image and copyright information in PMC

References

1. Hirtz D, Thurman DJ, Gwinn-Hardy K, Mohamed M, Chaudhuri AR, Zalutsky R. How common are the “common” neurologic disorders? Neurology 2007; 68: 326–37 - PubMed
1. Byrne S, Bede P, Elamin M, et al. Proposed criteria for familial amyotrophic lateral sclerosis. Amyotroph Lateral Scler 2011; 12: 157–9 - PubMed
1. Roche JC, Rojas-Garcia R, Scott KM, et al. A proposed staging system for amyotrophic lateral sclerosis. Brain 2012; 135: 847–52 - PMC - PubMed
1. de Carvalho M, Dengler R, Eisen A, et al. Electrodiagnostic criteria for diagnosis of ALS. Clin Neurophysiol 2008; 119: 497–503 - PubMed
1. Brooks BR. El Escorial World Federation of Neurology criteria for the diagnosis of amyotrophic lateral sclerosis. Subcommittee on Motor Neuron Diseases/Amyotrophic Lateral Sclerosis of the World Federation of Neurology Research Group on Neuromuscular Diseases and the El Escorial “Clinical limits of amyotrophic lateral sclerosis” workshop contributors. J Neurol Sci 1994; 124 Suppl: 96–107 - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Medical
- MedlinePlus Health Information
Research Materials
- NCI CPTC Antibody Characterization Program
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Identifying and predicting amyotrophic lateral sclerosis clinical subgroups: a population-based machine-learning study

Collaborators

Affiliations

Identifying and predicting amyotrophic lateral sclerosis clinical subgroups: a population-based machine-learning study

Authors

Collaborators

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Medical

Research Materials

Miscellaneous