Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2024 Jul;23(7):100798.
doi: 10.1016/j.mcpro.2024.100798. Epub 2024 Jun 11.

Rescoring Peptide Spectrum Matches: Boosting Proteomics Performance by Integrating Peptide Property Predictors Into Peptide Identification

Affiliations
Review

Rescoring Peptide Spectrum Matches: Boosting Proteomics Performance by Integrating Peptide Property Predictors Into Peptide Identification

Mostafa Kalhor et al. Mol Cell Proteomics. 2024 Jul.

Abstract

Rescoring of peptide spectrum matches originating from database search engines enabled by peptide property predictors is exceeding the performance of peptide identification from traditional database search engines. In contrast to the peptide spectrum match scores calculated by traditional database search engines, rescoring peptide spectrum matches generates scores based on comparing observed and predicted peptide properties, such as fragment ion intensities and retention times. These newly generated scores enable a more efficient discrimination between correct and incorrect peptide spectrum matches. This approach was shown to lead to substantial improvements in the number of confidently identified peptides, facilitating the analysis of challenging datasets in various fields such as immunopeptidomics, metaproteomics, proteogenomics, and single-cell proteomics. In this review, we summarize the key elements leading up to the recent introduction of multiple data-driven rescoring pipelines. We provide an overview of relevant post-processing rescoring tools, introduce prominent data-driven rescoring pipelines for various applications, and highlight limitations, opportunities, and future perspectives of this approach and its impact on mass spectrometry-based proteomics.

Keywords: artificial intelligence; computational proteomics; data-driven rescoring; machine learning; peptide identification; peptide property prediction; rescoring.

PubMed Disclaimer

Conflict of interest statement

Conflict of interest The authors declare the following financial interests/personal relationships which may be considered as potential competing interests: M. W. is a co-founder and shareholder of MSAID GmbH and OmicScouts GmbH, with no operational role in both companies.

Figures

None
Graphical abstract
Fig. 1
Fig. 1
Post-processor rescoring (PPR).A, tandem mass spectra (MS/MS) from data-dependent acquisition (DDA) data are most commonly processed using database search engines (DBSEs), such as MaxQuant, MSFragger, or PEAKS. The primary result of this process is a list of peptide spectrum matches (PSM), where each PSM has features attached, such as the precursor charge, the precursor mass, and the DBSE score. Today, most commonly this (unfiltered) list is handed over to post-processing rescoring (PPR) tools, such as PeptideProphet or Percolator, that train machine learning-based pipelines that classify PSMs into correct and incorrect PSMs including false discovery rate (FDR) estimation. B, an example of the DBSE score distribution of all PSMs before PPR. The dashed black vertical line indicates the cutoff required to achieve 1% PSM FDR. Note the poor separation between targets (red) and decoys (blue), indicating poor separation between correct and incorrect PSMs. C, example distribution of PSMs scores after PPR using all generated DBSE features. Note the increased separation between correct and incorrect PSMs step leads to better differentiation between target and decoy.
Fig. 2
Fig. 2
Data-driven rescoring (DDR) approach. A, in data-driven rescoring (DDR), such as Oktoberfest, InSPIRE, and MS2Rescore, unfiltered DBSE results are further processed by additional features such as a similarity metric between predicted (e.g. by Prosit, DeepLC, and MS2PIP) and observed spectra or retention time of PSMs. PPR is applied on the resulting list of features to classify and estimate the FDR of the provided PSMs. In contrast to filtering correct matches solely based on the DBSE score (blue bar in barchart) and PPR applied on DBSE features (blue + dark green), DDR (blue + dark green + light green) typically achieves the highest number of identified PSMs. B, example distribution of PSM scores after PPR using all generated DBSE features. The dashed black vertical line indicates the cutoff required to achieve 1% PSM FDR estimated by the target (red) decoy (blue) approach. C, example distribution of PSMs scores after DDR. Note the further increased separation power between correct and incorrect PSMs and the increase in confidently identified targets.

Similar articles

Cited by

References

    1. Petrosius V., Aragon-Fernandez P., Arrey T.N., Üresin N., Furtwängler B., Stewart H., et al. Evaluating the capabilities of the Astral mass analyzer for single-cell proteomics. bioRxiv. 2023 doi: 10.1101/2023.06.06.543943. [preprint] - DOI
    1. Ctortecka C., Clark N.M., Boyle B., Seth A., Mani D.R., Udeshi N.D., et al. Automated single-cell proteomics providing sufficient proteome depth to study complex biology beyond cell type classifications. bioRxiv. 2024 doi: 10.1101/2024.01.20.576369. [preprint] - DOI - PMC - PubMed
    1. Zhang H., Ouyang Z., Zhang W. Advances in mass spectrometry for clinical analysis: data acquisition, interpretation and information integration. Trac Trends Anal. Chem. 2023;169
    1. Kresse M., Drinda H., Romanotto A., Speer K. Simultaneous determination of pesticides, mycotoxins, and metabolites as well as other contaminants in cereals by LC-LC-MS/MS. J. Chromatogr. B Analyt. Technol. Biomed. Life Sci. 2019;1117:86–102. - PubMed
    1. Eng J.K., Jahan T.A., Hoopmann M.R. Comet: an open-source MS/MS sequence database search tool. Proteomics. 2013;13:22–24. - PubMed

LinkOut - more resources