Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Sep 1;22(9):3068-3080.
doi: 10.1021/acs.jproteome.3c00366. Epub 2023 Aug 22.

Deciphering Protein Secretion from the Brain to Cerebrospinal Fluid for Biomarker Discovery

Affiliations

Deciphering Protein Secretion from the Brain to Cerebrospinal Fluid for Biomarker Discovery

Katharina Waury et al. J Proteome Res. .

Abstract

Cerebrospinal fluid (CSF) is an essential matrix for the discovery of neurological disease biomarkers. However, the high dynamic range of protein concentrations in CSF hinders the detection of the least abundant protein biomarkers by untargeted mass spectrometry. It is thus beneficial to gain a deeper understanding of the secretion processes within the brain. Here, we aim to explore if and how the secretion of brain proteins to the CSF can be predicted. By combining a curated CSF proteome and the brain elevated proteome of the Human Protein Atlas, brain proteins were classified as CSF or non-CSF secreted. A machine learning model was trained on a range of sequence-based features to differentiate between CSF and non-CSF groups and effectively predict the brain origin of proteins. The classification model achieves an area under the curve of 0.89 if using high confidence CSF proteins. The most important prediction features include the subcellular localization, signal peptides, and transmembrane regions. The classifier generalized well to the larger brain detected proteome and is able to correctly predict novel CSF proteins identified by affinity proteomics. In addition to elucidating the underlying mechanisms of protein secretion, the trained classification model can support biomarker candidate selection.

Keywords: brain proteome; cerebrospinal fluid; fluid biomarker; machine learning; protein secretion.

PubMed Disclaimer

Conflict of interest statement

The authors declare the following competing financial interest(s): C.E.T. has a collaboration contract with ADx Neurosciences, Quanterix and Eli Lilly, performed contract research or received grants from AC-Immune, Axon Neurosciences, Biogen, Brainstorm Therapeutics, Celgene, EIP Pharma, Eisai, PeopleBio Inc., Roche, Toyama, Vivoryon, and has a speaker contract with Roche. S.A. reports grants and nonfinancial support from Cergentis BV and a patent pending outside the submitted work. The MIRIADE project includes the following commercial beneficiaries and partners: ADx Neuroscience, ENPICOM, LGC Limited, PeopleBio Inc., Olink, Quanterix, and Roche.

Figures

Figure 1
Figure 1
Workflow highlighting the curated data sets and trained classification models. The brain elevated HPA proteome was annotated regarding protein presence in CSF. The resulting data set of CSF and non-CSF brain proteins was used to train the full CSF classification model. By following the same data curation but only including CSF proteins detected in at least half the studies, a high confidence CSF model was trained. The models were applied to two data sets: the brain detected HPA proteome and a set of novel CSF proteins identified by PEA. HPA – Human Protein Atlas; PEA – proximity extension assay.
Figure 2
Figure 2
CSF brain proteome. (A) Integration of six different mass spectrometry studies leads to a CSF proteome composed of 5344 unique proteins. Increasing the minimum number of CSF studies that a protein has to be found in leads to smaller but higher confidence CSF proteomes. (B) The average expression of brain elevated proteins according to the HPA is significantly lower in the non-CSF protein group (red) compared with the CSF proteins (CSF1+, light blue). Average expression is even higher in the CSF proteins present in all six studies (CSF6+, light green). HPA – Human Protein Atlas.
Figure 3
Figure 3
Model performance on the test set. The ROC-AUC plot illustrates how well the two trained prediction models perform on their respective held-back test sets. The high confidence CSF model performs better, indicating that ambiguous proteins were filtered out. AUC – area under the curve; ROC – receiver operating characteristics.
Figure 4
Figure 4
Features most important for classification. (A) The highest absolute feature coefficients of the high confidence CSF model indicate which features are relevant for the model’s decision-making. (B) Features associated with the conventional secretion pathway of the cell, e.g., the presence of a signal peptide and glycosylation sites in the protein sequence, are more common in CSF secreted proteins. (C) The proportions of predicted subcellular localizations show clear differences between the CSF and non-CSF group. (D) While proteins with one predicted transmembrane region are much more likely to be found in the CSF, the opposite is true for proteins with a high number of transmembrane proteins. HC-CSF – high confidence CSF; TM – transmembrane.
Figure 5
Figure 5
Model performance on the brain detected HPA proteome. Predicted probability of the full CSF and high confidence CSF model on proteins detected in the human brain that have not been utilized for previous training and testing. Proteins with a probability score of >0.5 are predicted as CSF secreted. Both models are most confident about proteins that have been detected in a higher number of CSF studies. Proteins not identified in CSF are consistently predicted as brain confined. The high confidence model predicts a large fraction of ambiguous proteins (marked in gray) as brain-confined. HPA – Human Protein Atlas.
Figure 6
Figure 6
Model performance on proteins identified by affinity proteomics. (A) Proteins detected solely by PEA and not mass spectrometry have a lower average brain abundance according to PaxDB. (B) The classification models perform well on the CSF proteins identified by affinity proteomics, identifying the majority of them. Importantly, the models are able to correctly predict low abundance proteins that are potentially only identifiable by targeted approaches. PEA – proximity extension assay; PPM – parts per million.
Figure 7
Figure 7
Predicted CSF secretion probability of established biomarkers of Alzheimer’s Disease. A model was trained on the high confidence CSF data set but with a list of 17 AD biomarkers removed. The model correctly predicts 12 out of 15 CSF biomarkers as being secreted to the CSF, many with a very high probability. Colors indicate the process the biomarker is associated with, illustrating that the model struggles to identify biomarkers of neuronal injury as CSF secreted. Two known PET biomarkers are predicted as non-CSF proteins. The prediction corroborates why for these two imaging biomarkers no assay for measurement in CSF is established. AD – Alzheimer’s Disease; NfL – neurofilament light chain; Ng – neurogranin; PET – positron emission tomography.

Similar articles

Cited by

References

    1. Hansson O. Biomarkers for Neurodegenerative Diseases. Nature Medicine 2021, 27 (6), 954–963. 10.1038/s41591-021-01382-x. - DOI - PubMed
    1. Dayon L.; Cominetti O.; Affolter M. Proteomics of Human Biological Fluids for Biomarker Discoveries: Technical Advances and Recent Applications. Expert Review of Proteomics 2022, 19 (2), 131–151. 10.1080/14789450.2022.2070477. - DOI - PubMed
    1. Teunissen C. E.; Otto M.; Engelborghs S.; Herukka S.-K.; Lehmann S.; Lewczuk P.; Lleó A.; Perret-Liaudet A.; Tumani H.; Turner M. R.; Verbeek M. M.; Wiltfang J.; Zetterberg H.; Parnetti L.; Blennow K. White Paper by the Society for CSF Analysis and Clinical Neurochemistry: Overcoming Barriers in Biomarker Development and Clinical Translation. Alz Res. Therapy 2018, 10 (1), 30.10.1186/s13195-018-0359-x. - DOI - PMC - PubMed
    1. Kroksveen A. C.; Opsahl J. A.; Aye T. T.; Ulvik R. J.; Berven F. S. Proteomics of Human Cerebrospinal Fluid: Discovery and Verification of Biomarker Candidates in Neurodegenerative Diseases Using Quantitative Proteomics. Journal of Proteomics 2011, 74 (4), 371–388. 10.1016/j.jprot.2010.11.010. - DOI - PubMed
    1. Duits F. H.; Martinez-Lage P.; Paquet C.; Engelborghs S.; Lleó A.; Hausner L.; Molinuevo J. L.; Stomrud E.; Farotti L.; Ramakers I. H. G. B.; Tsolaki M.; Skarsgård C.; Åstrand R.; Wallin A.; Vyhnalek M.; Holmber-Clausen M.; Forlenza O. V.; Ghezzi L.; Ingelsson M.; Hoff E. I.; Roks G.; Mendonça A.; Papma J. M.; Izagirre A.; Taga M.; Struyfs H.; Alcolea D. A.; Frölich L.; Balasa M.; Minthon L.; Twisk J. W. R.; Persson S.; Zetterberg H.; Flier W. M.; Teunissen C. E.; Scheltens P.; Blennow K. Performance and Complications of Lumbar Puncture in Memory Clinics: Results of the Multicenter Lumbar Puncture Feasibility Study. Alzheimer’s & Dementia 2016, 12 (2), 154–163. 10.1016/j.jalz.2015.08.003. - DOI - PubMed

Publication types