Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2023 Mar 3;22(3):681-696.
doi: 10.1021/acs.jproteome.2c00711. Epub 2023 Feb 6.

Toward an Integrated Machine Learning Model of a Proteomics Experiment

Affiliations
Review

Toward an Integrated Machine Learning Model of a Proteomics Experiment

Benjamin A Neely et al. J Proteome Res. .

Abstract

In recent years machine learning has made extensive progress in modeling many aspects of mass spectrometry data. We brought together proteomics data generators, repository managers, and machine learning experts in a workshop with the goals to evaluate and explore machine learning applications for realistic modeling of data from multidimensional mass spectrometry-based proteomics analysis of any sample or organism. Following this sample-to-data roadmap helped identify knowledge gaps and define needs. Being able to generate bespoke and realistic synthetic data has legitimate and important uses in system suitability, method development, and algorithm benchmarking, while also posing critical ethical questions. The interdisciplinary nature of the workshop informed discussions of what is currently possible and future opportunities and challenges. In the following perspective we summarize these discussions in the hope of conveying our excitement about the potential of machine learning in proteomics and to inspire future research.

Keywords: artificial intelligence; deep learning; enzymatic digestion; ion mobility; liquid chromatography; machine learning; research integrity; synthetic data; tandem mass spectrometry.

PubMed Disclaimer

Conflict of interest statement

The authors declare the following competing financial interest(s): S.G. and T.S. are co-founders, shareholders and employees of MSAID GmbH, a company that develops software for proteomics. M.W. is founder and shareholder of MSAID GmbH and OmicScouts GmbH, with no operational role in both companies.

Figures

Figure 1
Figure 1
Some common steps in proteomics workflows corresponding to the workshop discussion topics and sections herein. Some icons made using BioRender.com.

References

    1. Degroeve S.; Maddelein D.; Martens L. MS2PIP prediction server: compute and visualize MS2 peak intensity predictions for CID and HCD fragmentation. Nucleic acids research 2015, 43 (W1), W326–30. 10.1093/nar/gkv542. - DOI - PMC - PubMed
    1. Gessulat S.; Schmidt T.; Zolg D. P.; Samaras P.; Schnatbaum K.; Zerweck J.; Knaute T.; Rechenberger J.; Delanghe B.; Huhmer A.; Reimer U.; Ehrlich H.-C.; Aiche S.; Kuster B.; Wilhelm M. Prosit: proteome-wide prediction of peptide tandem mass spectra by deep learning. Nat. Methods 2019, 16 (6), 509–518. 10.1038/s41592-019-0426-7. - DOI - PubMed
    1. Bouwmeester R.; Gabriels R.; Hulstaert N.; Martens L.; Degroeve S. DeepLC can predict retention times for peptides that carry as-yet unseen modifications. Nat. Methods 2021, 18 (11), 1363–1369. 10.1038/s41592-021-01301-5. - DOI - PubMed
    1. Sun B.; Smialowski P.; Straub T.; Imhof A. Investigation and Highly Accurate Prediction of Missed Tryptic Cleavages by Deep Learning. J. Proteome Res. 2021, 20 (7), 3749–3757. 10.1021/acs.jproteome.1c00346. - DOI - PubMed
    1. Yang J.; Gao Z.; Ren X.; Sheng J.; Xu P.; Chang C.; Fu Y. DeepDigest: Prediction of Protein Proteolytic Digestion with Deep Learning. Anal. Chem. 2021, 93 (15), 6094–6103. 10.1021/acs.analchem.0c04704. - DOI - PubMed

Publication types