Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2008 Oct;7(10):4422-34.
doi: 10.1021/pr800400q. Epub 2008 Sep 12.

Improved sequence tag generation method for peptide identification in tandem mass spectrometry

Affiliations

Improved sequence tag generation method for peptide identification in tandem mass spectrometry

Xia Cao et al. J Proteome Res. 2008 Oct.

Abstract

The sequence tag-based peptide identification methods are a promising alternative to the traditional database search approach. However, a more comprehensive analysis, optimization, and comparison with established methods are necessary before these methods can gain widespread use in the proteomics community. Using the InsPecT open source code base ( Tanner et al., Anal. Chem. 2005, 77, 4626- 39 ), we present an improved sequence tag generation method that directly incorporates multicharged fragment ion peaks present in many tandem mass spectra of higher charge states. We also investigate the performance of sequence tagging under different settings using control data sets generated on five different types of mass spectrometers, as well as using a complex phosphopeptide-enriched sample. We also demonstrate that additional modeling of InsPecT search scores using a semiparametric approach incorporating the accuracy of the precursor ion mass measurement provides additional improvement in the ability to discriminate between correct and incorrect peptide identifications. The overall superior performance of the sequence tag-based peptide identification method is demonstrated by comparison with a commonly used SEQUEST/PeptideProphet approach.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Overview of the method and different tag generations settings.
Figure 2
Figure 2
The frequency of observing one the two highest scoring correct tags or one of the 100 randomly selected incorrect tags extracted from 3+ spectra in phosphopeptide-enriched data set having 0, 1, or 2 mismatches of node type.
Figure 3
Figure 3
Optimal sensitivity as a trade-off between the total number of sequence tags that can be extracted from the spectrum graph and the coverage of the peptide sequence by generated correct tags. (a) Average number of sequence tags (correct or incorrect) that can be extracted from 2+ (dashes) or 3+ (solid curve) spectra in the phosphopeptide data set using different tag generation options. (b) Coverage of the peptide sequence by generated correct tags. (c) Sensitivity, i.e. the percentage of spectra for which one of the top 100 scoring tags is correct
Figure 4
Figure 4
Sensitivity of different tag extraction methods as a function of the number of top scoring tags considered for each spectrum (up to 100, the default value) in the protein mix LTQ-FT 4+ charge state spectra data set.
Figure 5
Figure 5
The number of correct identifications as a function of false discovery rate for spectra from the protein mix LTQ-FT data set. (a) 4+ spectra. Shown are the results of SEQUEST/PeptideProphet analysis (dashed green curve); sequence tagging using singly charge (SC) fragment ions only, [01] method, followed by p-value filtering (dotted purple curve); sequence tagging using multi-charged (MC) fragment ions, (03)(12)1.0 method, followed by p-value filtering (dash dot magenta curve); sequence tagging, (03)(12)1.0 method, followed by semi-parametric modeling with inclusion of the mass accuracy parameter dM (solid blue curve). (b) 3+ spectra, same as above (c) 2+ spectra. The tags were generated using [01] method only.
Figure 6
Figure 6
Histogram of InsPecT (a) FScore and (b) mass accuracy score dM plotted separately for correct (solid blue) and incorrect (solid green) peptide identifications for 3+ spectra in the protein mix LTQ-FT data set. Also shown are distributions fitted by the semi-parametric model (dashed curves).
Figure 7
Figure 7
The estimated number of correct identifications as a function of false discovery rate in the phosphopeptide data set (a) 3+ spectra, SEQUEST/PeptideProphet analysis (green dashed curve), sequence tagging using singly charge (SC) fragment ions only, [01] method, followed by p-value filtering (dotted purple curve); sequence tagging using multi-charged (MC) fragment ions, (03)(12)1.0 method, followed by p-value filtering (dash dot magenta curve); sequence tagging, (03)(12)1.0 method, followed by semi-parametric modeling with inclusion of the mass accuracy parameter dM (solid blue curve). (b) 2+ spectra. The tags were generated using [01] method only.

Similar articles

Cited by

References

    1. Hernandez P, Müller M, Appel RD. Automated protein identification by tandem mass spectrometry: Issues and strategies. Mass Spectrometry Reviews. 2006;25(2):235–254. - PubMed
    1. Nesvizhskii AI. Protein identification by tandem mass spectrometry and sequence database searching. Methods Mol Biol. 2007;367:87–119. - PubMed
    1. Xu C, Ma B. Software for computational peptide identification from MS-MS data. Drug Discovery Today. 2006;11(13–14):595–600. - PubMed
    1. Keller A, Nesvizhskii AI, Kolker E, Aebersold R. Empirical Statistical Model To Estimate the Accuracy of Peptide Identifications Made by MS/MS and Database Search. Anal Chem. 2002;74(20):5383–5392. - PubMed
    1. Bern M, Cai Y, Goldberg D. Lookup Peaks: A Hybrid of de Novo Sequencing and Database Search for Protein Identification by Tandem Mass Spectrometry. Anal Chem. 2007;79(4):1393–1400. - PubMed

Publication types

LinkOut - more resources