Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Sep 2;39(9):btad486.
doi: 10.1093/bioinformatics/btad486.

Ionmob: a Python package for prediction of peptide collisional cross-section values

Affiliations

Ionmob: a Python package for prediction of peptide collisional cross-section values

David Teschner et al. Bioinformatics. .

Abstract

Motivation: Including ion mobility separation (IMS) into mass spectrometry proteomics experiments is useful to improve coverage and throughput. Many IMS devices enable linking experimentally derived mobility of an ion to its collisional cross-section (CCS), a highly reproducible physicochemical property dependent on the ion's mass, charge and conformation in the gas phase. Thus, known peptide ion mobilities can be used to tailor acquisition methods or to refine database search results. The large space of potential peptide sequences, driven also by posttranslational modifications of amino acids, motivates an in silico predictor for peptide CCS. Recent studies explored the general performance of varying machine-learning techniques, however, the workflow engineering part was of secondary importance. For the sake of applicability, such a tool should be generic, data driven, and offer the possibility to be easily adapted to individual workflows for experimental design and data processing.

Results: We created ionmob, a Python-based framework for data preparation, training, and prediction of collisional cross-section values of peptides. It is easily customizable and includes a set of pretrained, ready-to-use models and preprocessing routines for training and inference. Using a set of ≈21 000 unique phosphorylated peptides and ≈17 000 MHC ligand sequences and charge state pairs, we expand upon the space of peptides that can be integrated into CCS prediction. Lastly, we investigate the applicability of in silico predicted CCS to increase confidence in identified peptides by applying methods of re-scoring and demonstrate that predicted CCS values complement existing predictors for that task.

Availability and implementation: The Python package is available at github: https://github.com/theGreatHerrLebert/ionmob.

PubMed Disclaimer

Conflict of interest statement

None declared.

Figures

Figure 1.
Figure 1.
(A) General workflow of ionmob. (a) Data are generated from different samples, devices and laboratories. A sample of interest (S1, S2) is analyzed through multiple replicates (R1, R2) and combined into an identification table during raw data analysis (E1, E2). (b) For a representative set of training values, peptide charge state pairs are pre-processed e.g. deduplicated. Raw data are then translated into sets of features for machine learning. This results in data ready for training. (c) Training then is an iterative process where the internal state of a predictor is changed so that its output better resembles the desired output based on some objective measure. This results in a trained model that can be used for prediction. (d) Before trained model outputs can be compared with data derived from a new source, a dataset specific shift needs to be calculated. After that, predictions of a model are ready, e.g. for rescoring. (B) Proposed model architecture. (a) Simple initial projection fitting a square-root function and a bias with mass and charge of a peptide as inputs. (b) Recurrent neural network using GRUs to predict higher-order interactions that contribute to observed CCS based on peptide sequences. Deeper dense layers are also provided with the charge state of the ion as additional input. AAs stands for amino acids. (c) Final CCS values are then calculated as sum of initial projection and deep residues
Figure 2.
Figure 2.
m/z versus CCS for observed (blue) and predicted (orange) CCS values of MHC peptides, model performance. (A) Ground truth versus predicted CCS after initial projection with a simple square-root function, see Equation (1) and Fig. 1Ba. (B) Final CCS prediction as sum of initial projection and deep residues, see Equation (2) and Fig. 1Bc. (C) Boxplots showing charge state wise relative errors comparing both prediction accuracies. (D) Total relative error distributions for both models after training
Figure 3.
Figure 3.
A performance comparison between ionmob GRU predictor and freely available deep predictors. (A–C) Performance per charge state for different test datasets. The gru model has a slight performance boost over the others for the Feola et al. (2022) dataset, likely since in contrast to the others it was explicitly trained on MHC peptides. Surprisingly, for charge state 4, prediction error for the Chang et al. (2021) dataset is relatively high for all models. (D) Boxplots of relative error distributions for all models. Overall performance of conv Samukhina et al. (2021), lstm Meier et al. (2021a), and gru model are relatively on par with each other, while the apd Zeng et al. (2022) model seems to perform a little worse. The ensemble prediction is calculated as the average predicted CCS value over all four models
Figure 4.
Figure 4.
Agglomerative clustering of amino acid and modification embedding vectors. Outgroups are formed by phosphorylated, acetylated and positively charged amino acids. Inner groups are roughly divided between aliphatic and aromatic as well as hydrophilic and hydrophobic amino acids
Figure 5.
Figure 5.
Marginal distributions of intensity along the ion-mobility dimension of peptide features, recorded with a timsTOF instrument. Left: Intensity distribution along the scan dimension (blue) for a uni-modal peptide feature, reported CCS is calculated from apex value (black). Right: intensity distribution for a multi-modal peptide. MaxQuant reported this peptide twice at the same retention time with differing scan indices (orange, red). Raw data extracted using opentimsŁącki et al. (2021). 1/K0 was converted to CCS using the Mason–Schamp equation

References

    1. Abadi M, Agarwal A, Barham P. et al. Tensorflow: large-scale machine learning on heterogeneous distributed systems. 2016.
    1. Brunner A, Thielert M, Vasilopoulou C. et al. Ultra‐high sensitivity mass spectrometry quantifies single‐cell proteome changes upon perturbation. Mol Syst Biol 2022;18:e10798. - PMC - PubMed
    1. Bush MF, Campuzano IDG, Robinson CV.. Ion mobility mass spectrometry of peptide ions: effects of drift gas and calibration strategies. Anal Chem 2012;84:7124–30. - PubMed
    1. Chang CH, Yeung D, Spicer V. et al. Sequence-specific model for predicting peptide collision cross section values in proteomic ion mobility spectrometry. J Proteome Re 2021;20:3600–10. - PubMed
    1. Chang Y-W, Lin C-J. Feature ranking using linear svm. In: Guyon I, Aliferis C, Cooper G, Elisseeff A, Pellet J-P, Spirtes P, and Statnikov A (eds.), Proceedings of the Workshop on the Causation and Prediction Challenge at WCCI 2008, volume 3 of Proceedings of Machine Learning Research, 53–64. Hong Kong: PMLR, 2008.

Publication types