Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Jan 23;3(1):e120.
doi: 10.1002/jex2.120. eCollection 2024 Jan.

Proteome encoded determinants of protein sorting into extracellular vesicles

Affiliations

Proteome encoded determinants of protein sorting into extracellular vesicles

Katharina Waury et al. J Extracell Biol. .

Abstract

Extracellular vesicles (EVs) are membranous structures released by cells into the extracellular space and are thought to be involved in cell-to-cell communication. While EVs and their cargo are promising biomarker candidates, sorting mechanisms of proteins to EVs remain unclear. In this study, we ask if it is possible to determine EV association based on the protein sequence. Additionally, we ask what the most important determinants are for EV association. We answer these questions with explainable AI models, using human proteome data from EV databases to train and validate the model. It is essential to correct the datasets for contaminants introduced by coarse EV isolation workflows and for experimental bias caused by mass spectrometry. In this study, we show that it is indeed possible to predict EV association from the protein sequence: a simple sequence-based model for predicting EV proteins achieved an area under the curve of 0.77 ± 0.01, which increased further to 0.84 ± 0.00 when incorporating curated post-translational modification (PTM) annotations. Feature analysis shows that EV-associated proteins are stable, polar, and structured with low isoelectric point compared to non-EV proteins. PTM annotations emerged as the most important features for correct classification; specifically, palmitoylation is one of the most prevalent EV sorting mechanisms for unique proteins. Palmitoylation and nitrosylation sites are especially prevalent in EV proteins that are determined by very strict isolation protocols, indicating they could potentially serve as quality control criteria for future studies. This computational study offers an effective sequence-based predictor of EV associated proteins with extensive characterisation of the human EV proteome that can explain for individual proteins which factors contribute to their EV association.

Keywords: biomarkers; extracellular vesicles; human proteome; machine learning; post‐translational modification.

PubMed Disclaimer

Conflict of interest statement

Research of Katharina Waury, Dea Gogishvili, Charlotte E. Teunissen, and Sanne Abeln are supported by the European Commission Marie Curie International Training Network, grant agreement No 860197, the MIRIADE project. Charlotte E. Teunissen is supported by JPND (bPRIDE), Health Holland, the Dutch Research Council (ZonMW), Alzheimer Drug Discovery Foundation, The Selfridges Group Foundation, Alzheimer Netherlands, Alzheimer Association. Charlotte E. Teunissen is recipient of ABOARD, which is a public‐private partnership receiving funding from ZonMW (#73305095007) and HealthHolland, Topsector Life Sciences & Health (PPP‐allowance; #LSHM20106). More than 30 partners participate in ABOARD. ABOARD also receives funding from Edwin Bouw Fonds and Gieskes‐Strijbisfonds. Charlotte E. Teunissen has a collaboration contract with ADx, Neurosciences, Quanterix and Eli Lilly, performed contract research or received grants from AC‐Immune, Axon Neurosciences, Biogen, Brainstorm Therapeutics, Celgene, EIP Pharma, Eisai, PeopleBio, Roche, Toyama, Vivoryon. She serves on editorial boards of Medidact Neurologie/Springer, Alzheimer Research and Therapy, Neurology: Neuroimmunology & Neuroinflammation, and is editor of a Neuromethods book Springer.

Figures

FIGURE 1
FIGURE 1
Data curation workflow. Squares represent datasets with remaining entries. Three datasets from databases Vesiclepedia, ExoCarta and UniProt are coloured in khaki. Unique proteins from Vesiclepedia and ExoCarta were merged to construct a General EV dataset, and proteins identified by unreliable isolation workflows were removed to obtain the Stringent EV dataset. Sequence‐based features as well as annotations were generated for each protein in the human proteome. Human proteins not detectable by MS were removed by the MS filter. All unique MS‐detectable human proteins were annotated regarding their EV association using the stringent EV dataset. Lastly, rarely detected EV proteins (count ≤ 2) were removed from the dataset entirely resulting in the EV‐annotated discovery set (blue). EV, extracellular vesicle; MS, mass spectrometry.
FIGURE 2
FIGURE 2
Density plots of log2‐transformed molecular weight across the human proteome and EV‐annotated datasets. (a) Distribution of log2‐transformed molecular weight of the MS‐detectable human proteome compared to the full human proteome. MS struggles to detect low molecular weight proteins of the human proteome. (b) The molecular weight densities of EV and non‐EV proteins in the unfiltered EV annotated dataset highlight the discrepancy in molecular weight between EV and non‐EV proteomes. (c) The much more similar molecular weight distribution of EV and non‐EV group show how the MS filter step diminishes the experimental bias introduced by MS. EV, extracellular vesicle; MS, mass spectrometry.
FIGURE 3
FIGURE 3
Performance of RF model and feature importance analysis. (a) ROC curves and AUC display the performance of the RF classifiers using sequence‐based features (light blue) and sequence‐based features and curated annotations (dark blue). (b) Bar plots show the Gini importance of the top 20 features for EV prediction as well as the correlation of these features with the EV class (c). Note that (b) and (c) share the same labels. PTMs, stability, structure, and polarity differentiate EV and non‐EV proteins. Dark blue features are curated annotations. AUC, area under the curve; C, cysteine; D, aspartic acid; EV, extracellular vesicle; H, histidine; I, isoleucine; P, proline; PTM, post‐translational modification; RF, random forest; ROC, receiver operating characteristic; S, serine; V, valine.
FIGURE 4
FIGURE 4
Features in the high and low confidence EV sets. Proteins in the high confidence EV dataset, which was constructed from three recent studies show a similar distribution of physicochemical and structural properties as the EV protein set of the discovery dataset. For many features, the discrepancy with the non‐EV group becomes more distinct. Furthermore, the low confidence dataset (orange) which contains EV proteins identified in older studies dilutes the observed signal compared to the EV protein set probably due to many falsely included contaminants. p‐values are displayed in the plots. EV, extracellular vesicle; HC, high confidence; LC, low confidence.
FIGURE 5
FIGURE 5
Shapley value analysis of case examples. SHAP plots for local interpretability are shown for correctly predicted proteins (i.e., true positive, true negative), as well as proteins for which the model prediction and the annotation from our data curation workflow do not agree with each other (i.e., false positive, false negative) from our test set. We chose examples in which the predictor is very certain if the individual protein is EV associated or not. Each SHAP plot displays a set of SHAP values that explain for each individual protein which features contributed to the model's prediction. Features in red contribute to the prediction being higher (i.e., EV associated), and features in blue decrease the predicted score (i.e., non‐EV). Protein structures shown here are predicted by AlphaFold (Jumper et al., 2021).

References

    1. Ageta, H. , & Tsuchida, K. (2019). Post‐translational modification and protein sorting to small extracellular vesicles including exosomes by ubiquitin and UBLs. Cellular and Molecular Life Sciences, 76, 4829–4848., 12. - PMC - PubMed
    1. Anand, S. , Samuel, M. , Kumar, S. , & Mathivanan, S. (2019). Ticket to a bubble ride: Cargo sorting into exosomes and extracellular vesicles. Biochimica Et Biophysica Acta (BBA)‐Proteins and Proteomics, 1867(12), 140203. - PubMed
    1. Anderson, M. R. , Kashanchi, F. , & Jacobson, S. (2016). Exosomes in viral disease. Neurotherapeutics, 13, 535–546. - PMC - PubMed
    1. Ban, J.‐J. , Lee, M. , Im, W. , & Kim, M. (2015). Low pH increases the yield of exosome isolation. Biochemical and Biophysical Research Communications, 461, 76–79. - PubMed
    1. Bhandari, B. K. , Gardner, P. P. , & Lim, C. S. (2020). Solubility‐weighted index: Fast and accurate prediction of protein solubility. Bioinformatics, 36(18), 4691–4698. - PMC - PubMed