. 2023 Sep 2;39(9):btad486.

doi: 10.1093/bioinformatics/btad486.

Ionmob: a Python package for prediction of peptide collisional cross-section values

David Teschner¹, David Gomez-Zepeda^{2

3}, Arthur Declercq^{4

5}, Mateusz K Łącki², Seymen Avci¹, Konstantin Bob¹, Ute Distler², Thomas Michna^{2

3}, Lennart Martens^{4

5}, Stefan Tenzer^{2

3}, Andreas Hildebrandt¹

Affiliations

¹ Institute of Computer Science, Johannes Gutenberg University, 55128 Mainz, Germany.
² Institute for Immunology, University Medical Center of the Johannes Gutenberg University, 55128 Mainz, Germany.
³ Immunoproteomics Unit, Helmholtz-Institute for Translational Oncology (HI-TRON), 55131 Mainz, Germany.
⁴ VIB-UGent Center for Medical Biotechnology, VIB, 9052 Gent, Belgium.
⁵ Department of Biomolecular Medicine, Ghent University, 9000 Ghent, Belgium.

PMID: 37540201
PMCID: PMC10521631
DOI: 10.1093/bioinformatics/btad486

Ionmob: a Python package for prediction of peptide collisional cross-section values

David Teschner et al. Bioinformatics. 2023.

. 2023 Sep 2;39(9):btad486.

doi: 10.1093/bioinformatics/btad486.

Authors

Affiliations

¹ Institute of Computer Science, Johannes Gutenberg University, 55128 Mainz, Germany.
² Institute for Immunology, University Medical Center of the Johannes Gutenberg University, 55128 Mainz, Germany.
³ Immunoproteomics Unit, Helmholtz-Institute for Translational Oncology (HI-TRON), 55131 Mainz, Germany.
⁴ VIB-UGent Center for Medical Biotechnology, VIB, 9052 Gent, Belgium.
⁵ Department of Biomolecular Medicine, Ghent University, 9000 Ghent, Belgium.

PMID: 37540201
PMCID: PMC10521631
DOI: 10.1093/bioinformatics/btad486

Abstract

Motivation: Including ion mobility separation (IMS) into mass spectrometry proteomics experiments is useful to improve coverage and throughput. Many IMS devices enable linking experimentally derived mobility of an ion to its collisional cross-section (CCS), a highly reproducible physicochemical property dependent on the ion's mass, charge and conformation in the gas phase. Thus, known peptide ion mobilities can be used to tailor acquisition methods or to refine database search results. The large space of potential peptide sequences, driven also by posttranslational modifications of amino acids, motivates an in silico predictor for peptide CCS. Recent studies explored the general performance of varying machine-learning techniques, however, the workflow engineering part was of secondary importance. For the sake of applicability, such a tool should be generic, data driven, and offer the possibility to be easily adapted to individual workflows for experimental design and data processing.

Results: We created ionmob, a Python-based framework for data preparation, training, and prediction of collisional cross-section values of peptides. It is easily customizable and includes a set of pretrained, ready-to-use models and preprocessing routines for training and inference. Using a set of ≈21 000 unique phosphorylated peptides and ≈17 000 MHC ligand sequences and charge state pairs, we expand upon the space of peptides that can be integrated into CCS prediction. Lastly, we investigate the applicability of in silico predicted CCS to increase confidence in identified peptides by applying methods of re-scoring and demonstrate that predicted CCS values complement existing predictors for that task.

Availability and implementation: The Python package is available at github: https://github.com/theGreatHerrLebert/ionmob.

PubMed Disclaimer

Conflict of interest statement

None declared.

Figures

**Figure 1.**
(A) General workflow of ionmob. (a) Data are generated from different samples, devices and laboratories. A sample of interest ( $S_{1}$ , $S_{2}$ ) is analyzed through multiple replicates ( $R_{1}$ , $R_{2}$ ) and combined into an identification table during raw data analysis ( $E_{1}$ , $E_{2}$ ). (b) For a representative set of training values, peptide charge state pairs are pre-processed e.g. deduplicated. Raw data are then translated into sets of features for machine learning. This results in data ready for training. (c) Training then is an iterative process where the internal state of a predictor is changed so that its output better resembles the desired output based on some objective measure. This results in a trained model that can be used for prediction. (d) Before trained model outputs can be compared with data derived from a new source, a dataset specific shift needs to be calculated. After that, predictions of a model are ready, e.g. for rescoring. (B) Proposed model architecture. (a) Simple initial projection fitting a square-root function and a bias with mass and charge of a peptide as inputs. (b) Recurrent neural network using GRUs to predict higher-order interactions that contribute to observed CCS based on peptide sequences. Deeper dense layers are also provided with the charge state of the ion as additional input. AAs stands for amino acids. (c) Final CCS values are then calculated as sum of initial projection and deep residues

**Figure 2.**
m/z versus CCS for observed (blue) and predicted (orange) CCS values of MHC peptides, model performance. (A) Ground truth versus predicted CCS after initial projection with a simple square-root function, see Equation (1) and Fig. 1Ba. (B) Final CCS prediction as sum of initial projection and deep residues, see Equation (2) and Fig. 1Bc. (C) Boxplots showing charge state wise relative errors comparing both prediction accuracies. (D) Total relative error distributions for both models after training

**Figure 3.**
A performance comparison between ionmob GRU predictor and freely available deep predictors. (A–C) Performance per charge state for different test datasets. The gru model has a slight performance boost over the others for the Feola *et al.* (2022) dataset, likely since in contrast to the others it was explicitly trained on MHC peptides. Surprisingly, for charge state 4, prediction error for the Chang *et al.* (2021) dataset is relatively high for all models. (D) Boxplots of relative error distributions for all models. Overall performance of conv Samukhina *et al.* (2021), lstm Meier *et al.* (2021a), and gru model are relatively on par with each other, while the apd Zeng *et al.* (2022) model seems to perform a little worse. The ensemble prediction is calculated as the average predicted CCS value over all four models

**Figure 4.**
Agglomerative clustering of amino acid and modification embedding vectors. Outgroups are formed by phosphorylated, acetylated and positively charged amino acids. Inner groups are roughly divided between aliphatic and aromatic as well as hydrophilic and hydrophobic amino acids

**Figure 5.**
Marginal distributions of intensity along the ion-mobility dimension of peptide features, recorded with a timsTOF instrument. Left: Intensity distribution along the scan dimension (blue) for a uni-modal peptide feature, reported CCS is calculated from apex value (black). Right: intensity distribution for a multi-modal peptide. MaxQuant reported this peptide twice at the same retention time with differing scan indices (orange, red). Raw data extracted using opentimsŁącki *et al.* (2021). $1 / K 0$ was converted to CCS using the Mason–Schamp equation

See this image and copyright information in PMC

References

1. Abadi M, Agarwal A, Barham P. et al. Tensorflow: large-scale machine learning on heterogeneous distributed systems. 2016.
1. Brunner A, Thielert M, Vasilopoulou C. et al. Ultra‐high sensitivity mass spectrometry quantifies single‐cell proteome changes upon perturbation. Mol Syst Biol 2022;18:e10798. - PMC - PubMed
1. Bush MF, Campuzano IDG, Robinson CV.. Ion mobility mass spectrometry of peptide ions: effects of drift gas and calibration strategies. Anal Chem 2012;84:7124–30. - PubMed
1. Chang CH, Yeung D, Spicer V. et al. Sequence-specific model for predicting peptide collision cross section values in proteomic ion mobility spectrometry. J Proteome Re 2021;20:3600–10. - PubMed
1. Chang Y-W, Lin C-J. Feature ranking using linear svm. In: Guyon I, Aliferis C, Cooper G, Elisseeff A, Pellet J-P, Spirtes P, and Statnikov A (eds.), Proceedings of the Workshop on the Causation and Prediction Challenge at WCCI 2008, volume 3 of Proceedings of Machine Learning Research, 53–64. Hong Kong: PMLR, 2008.

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Ionmob: a Python package for prediction of peptide collisional cross-section values

Affiliations

Ionmob: a Python package for prediction of peptide collisional cross-section values

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Other Literature Sources

Research Materials