Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2025 May;25(9-10):e202400398.
doi: 10.1002/pmic.202400398. Epub 2025 Apr 10.

Peptide Property Prediction for Mass Spectrometry Using AI: An Introduction to State of the Art Models

Affiliations
Review

Peptide Property Prediction for Mass Spectrometry Using AI: An Introduction to State of the Art Models

Jesse Angelis et al. Proteomics. 2025 May.

Abstract

This review explores state of the art machine learning and deep learning models for peptide property prediction in mass spectrometry-based proteomics, including, but not limited to, models for predicting digestibility, retention time, charge state distribution, collisional cross section, fragmentation ion intensities, and detectability. The combination of these models enables not only the in silico generation of spectral libraries but also finds many additional use cases in the design of targeted assays or data-driven rescoring. This review serves as both an introduction for newcomers and an update for experienced researchers aiming to develop accessible and reproducible models for peptide property predictions. Key limitations of the current models, including difficulties in handling diverse post-translational modifications and instrument variability, highlight the need for large-scale, harmonized datasets, and standardized evaluation metrics for benchmarking.

Keywords: deep learning; machine learning; mass spectrometry; peptide property prediction; proteomics.

PubMed Disclaimer

Conflict of interest statement

M.W. is a founder and shareholder of OmicScouts GmbH and MSAID GmbH with no operational role in either company. The other authors do not have any conflicts of interest.

Figures

FIGURE 1
FIGURE 1
Citations over time for different digestibility models. As of February 18, 2025. The cutoff years are 2010 and 2024. References to all models can be found in Table S1. Models using deep learning (DL) are blue. Models using only classical machine learning (ML) are gray. The exact values of the citation counts per tool and year can be found in Table S2. Created with https://github.com/jesseangelis/Citation_vis/ and OpenAlex [32].
FIGURE 2
FIGURE 2
Schematic architecture of DeepDigest [25].
FIGURE 3
FIGURE 3
Citations over time for different retention time models. As of February 18, 2025. The cutoff years are 2010 and 2024. Chronologer is a preprint. Prosit includes publications from 2019, 2022, and 2024 (preprint). Elude includes publications from 2010 and 2012. References to all models can be found in Table S1. Models using deep learning (DL) are blue. Models using only classical machine learning (ML) are gray. The exact values of the citation counts per tool and year can be found in Table S2. Created with https://github.com/jesseangelis/Citation_vis/ and OpenAlex [32].
FIGURE 4
FIGURE 4
Schematic architecture of DeepLC [48].
FIGURE 5
FIGURE 5
Citations over time for different charge state and charge state distribution models. As of February 18, 2025. The cutoff years are 2010 and 2024. AlphaPeptDeep is not included as its predictor has not been part of a publication yet. CPM is an abbreviation for Charge Prediction Machine. References to all models can be found in Table S1. Models using deep learning (DL) are blue. Models using only classical machine learning (ML) are gray. The exact values of the citation counts per tool and year can be found in Table S2. Created with: https://github.com/jesseangelis/Citation_vis/ and OpenAlex [32].
FIGURE 6
FIGURE 6
Schematic architecture of CPred [63].
FIGURE 7
FIGURE 7
Citations over time for different ion mobility models. As of February 18, 2025. The cutoff years are 2010 and 2024. McKetney et al. is a preprint. DeepCCS is an abbreviation for DeepCollisionalCrossSection. References to all models can be found in Table S1. Models using deep learning (DL) are blue. Models using only classical machine learning (ML) are gray. The exact values of the citation counts per tool and year can be found in Table S2. Created with: https://github.com/jesseangelis/Citation_vis/ and OpenAlex [32].
FIGURE 8
FIGURE 8
Schematic architecture of ionmob [86].
FIGURE 9
FIGURE 9
Citations over time for different fragmentation ion intensity models. As of February 18, 2025. The cutoff years are 2010 and 2024. MS2PIP Original includes publications from 2013 to 2015. Prosit includes publications from 2019, 2021, 2022, 2024, and 2024 (preprint). pDeep includes publications from 2017 and 2019. References to all models can be found in Table S1. Models using deep learning (DL) are blue. Models using only classical machine learning (ML) are gray. The exact values of the citation counts per tool and year can be found in Table S2. Created with: https://github.com/jesseangelis/Citation_vis/ and OpenAlex [32].
FIGURE 10
FIGURE 10
Schematic architecture of Prosit [51].
FIGURE 11
FIGURE 11
Citations over time for different detectability (flyability) models. As of February 18, 2025. The cutoff years are 2010 and 2024. Pepper is the only flyability model. References to all models can be found in Table S1. Models using deep learning (DL) are blue. Models using only classical machine learning (ML) are gray. The exact values of the citation counts per tool and year can be found in Table S2. Created with: https://github.com/jesseangelis/Citation_vis/ and OpenAlex [32].
FIGURE 12
FIGURE 12
Schematic architecture of DeepDetect [102]. *Not clearly defined in the publication but assumed to be word embedding.
FIGURE 13
FIGURE 13
Schematic architecture of PepFormer [101].
FIGURE 14
FIGURE 14
Schematic architecture of Pepper [103].
FIGURE 15
FIGURE 15
Citations over time for different hydrophobicity models. As of February 18, 2025. The cutoff years are 2010 and 2024. ALOGPS includes publications from 2001, 2001, 2002, 2004, and 2004. References to all models can be found in Table S1. Models using deep learning (DL) are blue. Models using only classical machine learning (ML) are gray. The exact values of the citation counts per tool and year can be found in Table S2. Created with: https://github.com/jesseangelis/Citation_vis/ and OpenAlex [32].
FIGURE 16
FIGURE 16
Citations over time for different 3D structure models. As of February 18, 2025. The cutoff years are 2010 and 2024. tFold includes publications from 2021, 2022 (preprint), and 2024 (preprint). Chai‐1 is a preprint. RoseTTAFold includes two publications from 2021. Omega‐Fold is a preprint. References to all models can be found in Table S1. Models using deep learning (DL) are blue. Models using only classical machine learning (ML) are gray. The exact values of the citation counts per tool and year can be found in Table S2. Created with: https://github.com/jesseangelis/Citation_vis/ and OpenAlex [32].

Similar articles

References

    1. OpenAI, "ChatGPT," (2024), https://chatgpt.com/.
    1. stability.ai, "Image‐Models," (2024), https://stability.ai/stable‐image.
    1. Google, "Gemini," (2024), https://gemini.google.com/.
    1. Marr B., “Why Hybrid AI Is The Next Big Thing In Tech,” Forbes (2024), https://www.forbes.com/sites/bernardmarr/2024/10/02/why‐hybrid‐ai‐is‐the....
    1. Kleinman Z., “Microsoft: 'ever present' AI Assistants Are Coming,” BBC (2024), https://www.bbc.com/news/articles/czj9vmnlv9zo.

LinkOut - more resources