. 2023 Jul 7;22(7):2246-2255.

doi: 10.1021/acs.jproteome.2c00807. Epub 2023 May 26.

AIomics: Exploring More of the Proteome Using Mass Spectral Libraries Extended by Artificial Intelligence

Lewis Y Geer¹, Joel Lapin^{1

2}, Douglas J Slotta¹, Tytus D Mak¹, Stephen E Stein¹

Affiliations

¹ Mass Spectrometry Data Center, National Institute of Standards and Technology, Biomolecular Measurement Division, 100 Bureau Dr., Gaithersburg, Maryland 20899, United States.
² Department of Physics, Georgetown University, Washington, D.C. 20057, United States.

PMID: 37232537
PMCID: PMC10542943
DOI: 10.1021/acs.jproteome.2c00807

AIomics: Exploring More of the Proteome Using Mass Spectral Libraries Extended by Artificial Intelligence

Lewis Y Geer et al. J Proteome Res. 2023.

. 2023 Jul 7;22(7):2246-2255.

doi: 10.1021/acs.jproteome.2c00807. Epub 2023 May 26.

Authors

Lewis Y Geer¹, Joel Lapin^{1

2}, Douglas J Slotta¹, Tytus D Mak¹, Stephen E Stein¹

Affiliations

¹ Mass Spectrometry Data Center, National Institute of Standards and Technology, Biomolecular Measurement Division, 100 Bureau Dr., Gaithersburg, Maryland 20899, United States.
² Department of Physics, Georgetown University, Washington, D.C. 20057, United States.

PMID: 37232537
PMCID: PMC10542943
DOI: 10.1021/acs.jproteome.2c00807

Abstract

The unbounded permutations of biological molecules, including proteins and their constituent peptides, present a dilemma in identifying the components of complex biosamples. Sequence search algorithms used to identify peptide spectra can be expanded to cover larger classes of molecules, including more modifications, isoforms, and atypical cleavage, but at the cost of false positives or false negatives due to the simplified spectra they compute from sequence records. Spectral library searching can help solve this issue by precisely matching experimental spectra to library spectra with excellent sensitivity and specificity. However, compiling spectral libraries that span entire proteomes is pragmatically difficult. Neural networks that predict complete spectra containing a full range of annotated and unannotated ions can be used to replace these simplified spectra with libraries of fully predicted spectra, including modified peptides. Using such a network, we created predicted spectral libraries that were used to rescore matches from a sequence search done over a large search space, including a large number of modifications. Rescoring improved the separation of true and false hits by 82%, yielding an 8% increase in peptide identifications, including a 21% increase in nonspecifically cleaved peptides and a 17% increase in phosphopeptides.

Keywords: algorithms; machine learning; peptides; proteome analysis; search engine methods; tandem mass spectrometry.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing financial interests. All commercial instruments, software, and materials used in the study are for experimental purposes only. Such identification does not intend recommendation or endorsement by the National Institute of Standards and Technology, nor does it intend that the materials, software, or instruments used are necessarily the best available for the purpose.

Figures

**Figure 1.**
Mirror plots of (A) a non-tryptic spectrum with score near the median prediction score and (B) a phosphopeptide spectrum with score near the bottom 10th percentile score. The blue spectrum at the top of each mirror plot is the experimental spectrum from the test set, annotated by ion series, including immonium ions (IQA, etc.), parent ions (p), and ions containing carbon-13 (+i). The red spectrum below is the matching predicted spectrum. The predicted phosphopeptide spectrum contains neutral loss ions that are useful for identifying and localizing phosphosites, as well as internal ions (Int/), immonium ions, and unannotated ions.

**Figure 2.**
(A) Histogram of the similarity score S calculated between the experimental spectra and predicted spectra in the test set. (B) Histogram of S for the subset of the test set that contains the spectra of phosphopeptides. (C) Histogram of S for TMT derivatized peptides. (D) Histogram of S for non-tryptic peptides.

**Figure 3.**
(A) Histogram of the Mascot ions score for both true and false matches to the test set spectra, searching against the human, mouse, and Chinese hamster proteome. (B) histogram of the corrected S score as applied to predicted spectra for the same search results as (A). The separation between true and false matches is improved by 82%.

**Figure 4.**
FDR analysis done before and after rescoring using test spectra as the queries. True matches are from the human, mouse, and Chinese hamster proteome and defined as those that match the peptide sequence, charge, and modifications of the query. Spectra were predicted for each Mascot sequence search match and a corrected Stein-Scott dot product S calculated to rescore the matches.

See this image and copyright information in PMC

References

1. Eng JK; Jahan TA; Hoopmann MR Comet: An Open-Source MS/MS Sequence Database Search Tool. PROTEOMICS 2013, 13 (1), 22–24. 10.1002/pmic.201200439. - DOI - PubMed
1. Perkins DN; Pappin DJC; Creasy DM; Cottrell JS Probability-Based Protein Identification by Searching Sequence Databases Using Mass Spectrometry Data. Electrophoresis 1999, 20 (18), 3551–3567. 10.1002/(SICI)1522-2683(19991201)20:18<3551::AID-ELPS3551>3.0.CO;2-2. - DOI - PubMed
1. Shilov IV; Seymour SL; Patel AA; Loboda A; Tang WH; Keating SP; Hunter CL; Nuwaysir LM; Schaeffer DA The Paragon Algorithm, a Next Generation Search Engine That Uses Sequence Temperature Values and Feature Probabilities to Identify Peptides from Tandem Mass Spectra. Mol. Cell. Proteomics 2007, 6 (9), 1638–1655. 10.1074/mcp.T600050-MCP200. - DOI - PubMed
1. Eng JK; McCormack AL; Yates JR An Approach to Correlate Tandem Mass Spectral Data of Peptides with Amino Acid Sequences in a Protein Database. J. Am. Soc. Mass Spectrom 1994, 5 (11), 976–989. 10.1016/1044-0305(94)80016-2. - DOI - PubMed
1. Craig R; Beavis RC TANDEM: Matching Proteins with Tandem Mass Spectra. Bioinformatics 2004, 20 (9), 1466–1467. 10.1093/bioinformatics/bth092. - DOI - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions
Actions

Grants and funding

9999-NIST/ImNIST/Intramural NIST DOC/United States

LinkOut - more resources

Full Text Sources
- American Chemical Society
- PubMed Central

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

AIomics: Exploring More of the Proteome Using Mass Spectral Libraries Extended by Artificial Intelligence

Affiliations

AIomics: Exploring More of the Proteome Using Mass Spectral Libraries Extended by Artificial Intelligence

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources