Rapid and accurate peptide identification from tandem mass spectra

Christopher Y Park¹, Aaron A Klammer, Lukas Käll, Michael J MacCoss, William S Noble

Affiliations

PMID: 18505281
PMCID: PMC2667385
DOI: 10.1021/pr800127y

Rapid and accurate peptide identification from tandem mass spectra

Christopher Y Park et al. J Proteome Res. 2008 Jul.

. 2008 Jul;7(7):3022-7.

doi: 10.1021/pr800127y. Epub 2008 May 28.

Authors

Christopher Y Park¹, Aaron A Klammer, Lukas Käll, Michael J MacCoss, William S Noble

Affiliation

¹ Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA.

PMID: 18505281
PMCID: PMC2667385
DOI: 10.1021/pr800127y

Abstract

Mass spectrometry, the core technology in the field of proteomics, promises to enable scientists to identify and quantify the entire complement of proteins in a complex biological sample. Currently, the primary bottleneck in this type of experiment is computational. Existing algorithms for interpreting mass spectra are slow and fail to identify a large proportion of the given spectra. We describe a database search program called Crux that reimplements and extends the widely used database search program Sequest. For speed, Crux uses a peptide indexing scheme to rapidly retrieve candidate peptides for a given spectrum. For each peptide in the target database, Crux generates shuffled decoy peptides on the fly, providing a good null model and, hence, accurate false discovery rate estimates. Crux also implements two recently described postprocessing methods: a p value calculation based upon fitting a Weibull distribution to the observed scores, and a semisupervised method that learns to discriminate between target and decoy matches. Both methods significantly improve the overall rate of peptide identification. Crux is implemented in C and is distributed with source code freely to noncommercial users.

PubMed Disclaimer

Figures

**Figure 1. The Crux algorithm**
Crux takes as input a collection of fragmentation spectra and a target protein sequence database, and produces a list of peptide-spectrum matches, each with an associated q value, a measure of false discovery rate.

**Figure 2. Rapid retrieval of candidate peptides**
Panels (A) and (B) plot the average running time required to search 100 tandem mass spectra against the human protein databases on computers running the Linux (A) or Windows (B) operating systems, using Sequest and Crux with and without indices. Running time is plotted as a function of the mass tolerance used to define candidate peptides. The Linux plot includes only three series because we do not have a Linux implementation of TurboSequest. (C) The figure plots ratio of running times for indexed versus non-indexed searches for Crux on Windows and Linux and for Sequest on Windows.

**Figure 3. Re-implementation ofSp and *Xcorr* scoring functions**
The figure plots, for a collection of peptide-spectrum matches, the Sp (A) and *XCorr* (B) scores as computed by Crux as a function of the same scores as computed by Sequest.

**Figure 4. Improved peptide identification**
The figure plots, for a variety of database search algorithms, the number of PSMs as a function of the estimated q value. The five series correspond to Sequest, Crux’s implementation of Sequest using a static shuffled decoy database, Crux using decoys generated on-the-fly, Crux with the p value calculation enabled, and Crux with the Percolator post-processor enabled.

See this image and copyright information in PMC

References

1. Eng JK, McCormack AL, Yates III., JR An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. Journal of the American Society for Mass Spectrometry. 1994;5:976–989. - PubMed
1. Perkins DN, Pappin DJC, Creasy DM, Cottrell JS. Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis. 1999;20:3551–3567. - PubMed
1. Craig R, Beavis RC. Tandem: matching proteins with tandem mass spectra. Bioinformatics. 2004;20:1466–1467. - PubMed
1. Tanner S, Shu H, Frank A, Wang LC, Zandi E, et al. InsPecT: Identification of posttranslationally modified peptides from tandem mass spectra. Analytical Chemistry. 2005;77:4626–4639. - PubMed
1. Bern M, Goldberg D, Cai Y. Lookup peaks: A hybrid de novo sequencing and database search for protein identification by tandem mass spectrometry. Analytical Chemistry. 2007;79:1393–400. - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Rapid and accurate peptide identification from tandem mass spectra

Affiliation

Rapid and accurate peptide identification from tandem mass spectra

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources