Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2008 Jul;7(7):3022-7.
doi: 10.1021/pr800127y. Epub 2008 May 28.

Rapid and accurate peptide identification from tandem mass spectra

Affiliations

Rapid and accurate peptide identification from tandem mass spectra

Christopher Y Park et al. J Proteome Res. 2008 Jul.

Abstract

Mass spectrometry, the core technology in the field of proteomics, promises to enable scientists to identify and quantify the entire complement of proteins in a complex biological sample. Currently, the primary bottleneck in this type of experiment is computational. Existing algorithms for interpreting mass spectra are slow and fail to identify a large proportion of the given spectra. We describe a database search program called Crux that reimplements and extends the widely used database search program Sequest. For speed, Crux uses a peptide indexing scheme to rapidly retrieve candidate peptides for a given spectrum. For each peptide in the target database, Crux generates shuffled decoy peptides on the fly, providing a good null model and, hence, accurate false discovery rate estimates. Crux also implements two recently described postprocessing methods: a p value calculation based upon fitting a Weibull distribution to the observed scores, and a semisupervised method that learns to discriminate between target and decoy matches. Both methods significantly improve the overall rate of peptide identification. Crux is implemented in C and is distributed with source code freely to noncommercial users.

PubMed Disclaimer

Figures

Figure 1
Figure 1. The Crux algorithm
Crux takes as input a collection of fragmentation spectra and a target protein sequence database, and produces a list of peptide-spectrum matches, each with an associated q value, a measure of false discovery rate.
Figure 2
Figure 2. Rapid retrieval of candidate peptides
Panels (A) and (B) plot the average running time required to search 100 tandem mass spectra against the human protein databases on computers running the Linux (A) or Windows (B) operating systems, using Sequest and Crux with and without indices. Running time is plotted as a function of the mass tolerance used to define candidate peptides. The Linux plot includes only three series because we do not have a Linux implementation of TurboSequest. (C) The figure plots ratio of running times for indexed versus non-indexed searches for Crux on Windows and Linux and for Sequest on Windows.
Figure 3
Figure 3. Re-implementation ofSp and Xcorr scoring functions
The figure plots, for a collection of peptide-spectrum matches, the Sp (A) and XCorr (B) scores as computed by Crux as a function of the same scores as computed by Sequest.
Figure 4
Figure 4. Improved peptide identification
The figure plots, for a variety of database search algorithms, the number of PSMs as a function of the estimated q value. The five series correspond to Sequest, Crux’s implementation of Sequest using a static shuffled decoy database, Crux using decoys generated on-the-fly, Crux with the p value calculation enabled, and Crux with the Percolator post-processor enabled.

References

    1. Eng JK, McCormack AL, Yates III., JR An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. Journal of the American Society for Mass Spectrometry. 1994;5:976–989. - PubMed
    1. Perkins DN, Pappin DJC, Creasy DM, Cottrell JS. Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis. 1999;20:3551–3567. - PubMed
    1. Craig R, Beavis RC. Tandem: matching proteins with tandem mass spectra. Bioinformatics. 2004;20:1466–1467. - PubMed
    1. Tanner S, Shu H, Frank A, Wang LC, Zandi E, et al. InsPecT: Identification of posttranslationally modified peptides from tandem mass spectra. Analytical Chemistry. 2005;77:4626–4639. - PubMed
    1. Bern M, Goldberg D, Cai Y. Lookup peaks: A hybrid de novo sequencing and database search for protein identification by tandem mass spectrometry. Analytical Chemistry. 2007;79:1393–400. - PubMed

Publication types