Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2009 May;8(5):2241-52.
doi: 10.1021/pr800678b.

A ranking-based scoring function for peptide-spectrum matches

Affiliations

A ranking-based scoring function for peptide-spectrum matches

Ari M Frank. J Proteome Res. 2009 May.

Abstract

The analysis of the large volume of tandem mass spectrometry (MS/MS) proteomics data that is generated these days relies on automated algorithms that identify peptides from their mass spectra. An essential component of these algorithms is the scoring function used to evaluate the quality of peptide-spectrum matches (PSMs). In this paper, we present new approach to scoring of PSMs. We argue that since this problem is at its core a ranking task (especially in the case of de novo sequencing), it can be solved effectively using machine learning ranking algorithms. We developed a new discriminative boosting-based approach to scoring. Our scoring models draw upon a large set of diverse feature functions that measure different qualities of PSMs. Our method improves the performance of our de novo sequencing algorithm beyond the current state-of-the-art, and also greatly enhances the performance of database search programs. Furthermore, by increasing the efficiency of tag filtration and improving the sensitivity of PSM scoring, we make it practical to perform large-scale MS/MS analysis, such as proteogenomic search of a six-frame translation of the human genome (in which we achieve a reduction of the running time by a factor of 15 and a 60% increase in the number of identified peptides, compared to the InsPecT database search tool). Our scoring function is incorporated into PepNovo+ which is available for download or can be run online at http://bix.ucsd.edu.

PubMed Disclaimer

Figures

Figure 1
Figure 1
The peak rank prediction problem.
Figure 2
Figure 2
Training of a de novo PSM scoring model. The graph on the left displays the training and validation error rates after running the RankBoost algorithm for various numbers of rounds. The graph on the right displays the number of active features in the model (i.e., features that have a nonzero wight). The x-axis displays the number of boosting rounds using a logarithmic scale. The figures were generated for a training set of doubly-charged peptides of mass 1100-1300.
Figure 3
Figure 3
Benchmark results for OPD280 and ISB769. The plots show results for PepNovo (with and without reranking), MS-Dictionary (with tryptic only and non-restricted predictions), and Peaks. In each plot the x-axis shows the size of the set of highest scoring predicted sequences (1-2000), and the y-axis shows the proportion of spectra for which the set of de novo predictions contained a correct sequence.
Figure 4
Figure 4
Benchmark results for sets HEK8,HEK10 and HEK12. The plots show results for Pep-Novo (with and without reranking), MS-Dictionary (with tryptic only and non-restricted predictions), and Peaks. In each plot the x-axis shows the size of the set of highest scoring predicted sequences (1-2000), and the y-axis shows the proportion of spectra for which the set of de novo predictions contained a correct sequence.

References

    1. Auerbach D, Thaminy S, Hottiger M, Stagljar I. The post-genomic era of interactive proteomics: facts and perspectives. Proteomics. 2002;2:611–23. - PubMed
    1. Pandey A, Mann M. Proteomics to study genes and genomes. Nature. 2000;405:837–846. - PubMed
    1. Washburn M, Wolters D, Yates J., III Large Scale analysis of the yeast proteome via multidimensional protein identification technology. Nature Biotech. 2001;19:242–247. - PubMed
    1. Aebersold R, Mann M. Mass spectrometry-based proteomics. Nature. 2003;422:198–207. - PubMed
    1. Stein S, Scott D. Optimization and Testing of Mass Spectral Library Search Algorithms for Compound Identification. J. Am. Soc. Mass. Spectrom. 1994;5:859–866. - PubMed

Publication types