Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Nov 5;8(1):16334.
doi: 10.1038/s41598-018-34642-x.

Machine Learning Reveals Protein Signatures in CSF and Plasma Fluids of Clinical Value for ALS

Affiliations

Machine Learning Reveals Protein Signatures in CSF and Plasma Fluids of Clinical Value for ALS

Michael S Bereman et al. Sci Rep. .

Abstract

We use shotgun proteomics to identify biomarkers of diagnostic and prognostic value in individuals diagnosed with amyotrophic lateral sclerosis. Matched cerebrospinal and plasma fluids were subjected to abundant protein depletion and analyzed by nano-flow liquid chromatography high resolution tandem mass spectrometry. Label free quantitation was used to identify differential proteins between individuals with ALS (n = 33) and healthy controls (n = 30) in both fluids. In CSF, 118 (p-value < 0.05) and 27 proteins (q-value < 0.05) were identified as significantly altered between ALS and controls. In plasma, 20 (p-value < 0.05) and 0 (q-value < 0.05) proteins were identified as significantly altered between ALS and controls. Proteins involved in complement activation, acute phase response and retinoid signaling pathways were significantly enriched in the CSF from ALS patients. Subsequently various machine learning methods were evaluated for disease classification using a repeated Monte Carlo cross-validation approach. A linear discriminant analysis model achieved a median area under the receiver operating characteristic curve of 0.94 with an interquartile range of 0.88-1.0. Three proteins composed a prognostic model (p = 5e-4) that explained 49% of the variation in the ALS-FRS scores. Finally we investigated the specificity of two promising proteins from our discovery data set, chitinase-3 like 1 protein and alpha-1-antichymotrypsin, using targeted proteomics in a separate set of CSF samples derived from individuals diagnosed with ALS (n = 11) and other neurological diseases (n = 15). These results demonstrate the potential of a panel of targeted proteins for objective measurements of clinical value in ALS.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Figure 1
Figure 1
An overview of the sample set and experimental design. (A) Matched plasma and CSF samples derived from individuals diagnosed with ALS and healthy controls were obtained from the NEALS biorepository. (B) Pie charts of the number of males and females in the ALS and healthy sample set. (C) Mosaic plots describing the characteristics of the three cycles used for sample preparation. (D) Box plots of the age distribution in each cycle. (E) Samples were depleted of abundant proteins, digested using standard laboratory procedures, and analyzed by LC-MS/MS followed by protein identification and label free quantitation. (F) A combination of univariate and multivariate techniques were used to identify biomarkers, investigate perturbed pathways, and develop diagnostic and prognostic models. (G) Set of samples used for targeted proteomic experiments. *While the ALS and disease controls were unique to the targeted experiment, the majority of the heathy samples were the same in both experiments. Figure was partially created using images purchased in the PPT Drawing Toolkits-BIOLOGY Bundle from Motifolio, Inc.
Figure 2
Figure 2
Volcano plots of the −log10 (p-value) versus the log2 fold change of proteins in ALS versus control for (A) CSF and (B) plasma fluids. Points colored gray, red, black indicate proteins with a p-value > 0.05, p-value < 0.05, and q-value < 0.05, respectively. GO analysis of biological processes and molecular function in CSF (C and E) and plasma fluids (D and F).
Figure 3
Figure 3
(A) A stacked bar chart of significantly enriched pathways derived from the differential proteins in the CSF data. Gray solid bars represent proteins that were not detected as differentially abundant. Horizontal and diagonal dashed bars represent proteins that are up- and down regulated, respectively. Left axis is the percentage of proteins detected in that pathway (top number) as unchanged or different. Right axis displays the significance of the enrichment. (B) Interaction network analysis of differentially abundant proteins. The size of the circle is proportional to the significance (i.e., p-value) while the shade is indicative of the fold change. Clusters of proteins were isolated and subjected to GO analysis to determine biological function.
Figure 4
Figure 4
(A) An outline of the procedure used to develop and evaluate different algorithms for disease status prediction. (B) Comparison of the performance of 4 different machine learning algorithms on the resampled data using a repeated (n = 50) 5-fold cross validation approach. The coefficients of the LDA model and area under the curve rank the most important features for classification.
Figure 5
Figure 5
(A) Plots of the residual sum of squares, the adjusted r-squared value, Mallows Cp statistic, and the Bayesian Information Criterion (BIC) as a function of the number of proteins in the model. A three protein model was chosen. (B) Results from the regression analysis. (C) A plot of the ALS FRS scores as a function of the fitted values. Inset displays a density plot of the residuals.
Figure 6
Figure 6
(A) The mean area under the curve of the resampled data sets is plotted as a function of the number of proteins used to create the model. (B) Comparison of the performance of the machine learning algorithms on the resampled data using the optimal number of proteins determined in (A).
Figure 7
Figure 7
Boxplots of the abundance of the two peptide surrogates for alpha-1 antichymotrypsin (A and B) and chitinase-3 like 1 protein (C and D) across the groups. The 3 highlighted healthy samples in red (star) were new and previously not run in the discovery experiment.

Similar articles

Cited by

References

    1. Brown RH, Al-Chalabi A. Amyotrophic Lateral Sclerosis. N Engl J Med. 2017;377:162–172. doi: 10.1056/NEJMra1603471. - DOI - PubMed
    1. Corcia P, et al. Causes of death in a post-mortem series of ALS patients. Amyotrophic Lateral Sclerosis. 2008;9:59–62. doi: 10.1080/17482960701656940. - DOI - PubMed
    1. Petrov D, Mansfield C, Moussy A, Hermine O. ALS Clinical Trials Review: 20 Years of Failure. Are We Any Closer to Registering a New Treatment? Frontiers in Aging Neuroscience. 2017;9:68. doi: 10.3389/fnagi.2017.00068. - DOI - PMC - PubMed
    1. Zarei S, et al. A comprehensive review of amyotrophic lateral sclerosis. Surgical Neurology International. 2015;6:171. doi: 10.4103/2152-7806.169561. - DOI - PMC - PubMed
    1. Vu LT, Bowser R. Fluid-Based Biomarkers for Amyotrophic Lateral Sclerosis. Neurotherapeutics. 2017;14:119–134. doi: 10.1007/s13311-016-0503-x. - DOI - PMC - PubMed

Publication types

MeSH terms