. 2008 Oct;7(10):4422-34.

doi: 10.1021/pr800400q. Epub 2008 Sep 12.

Improved sequence tag generation method for peptide identification in tandem mass spectrometry

Xia Cao¹, Alexey I Nesvizhskii

Affiliations

PMID: 18785767
PMCID: PMC3744226
DOI: 10.1021/pr800400q

Improved sequence tag generation method for peptide identification in tandem mass spectrometry

Xia Cao et al. J Proteome Res. 2008 Oct.

. 2008 Oct;7(10):4422-34.

doi: 10.1021/pr800400q. Epub 2008 Sep 12.

Authors

Xia Cao¹, Alexey I Nesvizhskii

Affiliation

¹ Department of Pathology, University of Michigan, Ann Arbor, Michigan 48109, USA.

PMID: 18785767
PMCID: PMC3744226
DOI: 10.1021/pr800400q

Abstract

The sequence tag-based peptide identification methods are a promising alternative to the traditional database search approach. However, a more comprehensive analysis, optimization, and comparison with established methods are necessary before these methods can gain widespread use in the proteomics community. Using the InsPecT open source code base ( Tanner et al., Anal. Chem. 2005, 77, 4626- 39 ), we present an improved sequence tag generation method that directly incorporates multicharged fragment ion peaks present in many tandem mass spectra of higher charge states. We also investigate the performance of sequence tagging under different settings using control data sets generated on five different types of mass spectrometers, as well as using a complex phosphopeptide-enriched sample. We also demonstrate that additional modeling of InsPecT search scores using a semiparametric approach incorporating the accuracy of the precursor ion mass measurement provides additional improvement in the ability to discriminate between correct and incorrect peptide identifications. The overall superior performance of the sequence tag-based peptide identification method is demonstrated by comparison with a commonly used SEQUEST/PeptideProphet approach.

PubMed Disclaimer

Figures

**Figure 1**
Overview of the method and different tag generations settings.

**Figure 2**
The frequency of observing one the two highest scoring correct tags or one of the 100 randomly selected incorrect tags extracted from 3+ spectra in phosphopeptide-enriched data set having 0, 1, or 2 mismatches of node type.

**Figure 3**
Optimal sensitivity as a trade-off between the total number of sequence tags that can be extracted from the spectrum graph and the coverage of the peptide sequence by generated correct tags. **(a)** Average number of sequence tags (correct or incorrect) that can be extracted from 2+ (dashes) or 3+ (solid curve) spectra in the phosphopeptide data set using different tag generation options. **(b)** Coverage of the peptide sequence by generated correct tags. (c) Sensitivity, i.e. the percentage of spectra for which one of the top 100 scoring tags is correct

**Figure 4**
Sensitivity of different tag extraction methods as a function of the number of top scoring tags considered for each spectrum (up to 100, the default value) in the protein mix LTQ-FT 4+ charge state spectra data set.

**Figure 5**
The number of correct identifications as a function of false discovery rate for spectra from the protein mix LTQ-FT data set. **(a)** 4+ spectra. Shown are the results of SEQUEST/PeptideProphet analysis (dashed green curve); sequence tagging using singly charge (SC) fragment ions only, [01] method, followed by p-value filtering (dotted purple curve); sequence tagging using multi-charged (MC) fragment ions, (03)(12)1.0 method, followed by p-value filtering (dash dot magenta curve); sequence tagging, (03)(12)1.0 method, followed by semi-parametric modeling with inclusion of the mass accuracy parameter dM (solid blue curve). **(b)** 3+ spectra, same as above **(c)** 2+ spectra. The tags were generated using [01] method only.

**Figure 6**
Histogram of InsPecT **(a)** FScore and **(b)** mass accuracy score dM plotted separately for correct (solid blue) and incorrect (solid green) peptide identifications for 3+ spectra in the protein mix LTQ-FT data set. Also shown are distributions fitted by the semi-parametric model (dashed curves).

**Figure 7**
The estimated number of correct identifications as a function of false discovery rate in the phosphopeptide data set **(a)** 3+ spectra, SEQUEST/PeptideProphet analysis (green dashed curve), sequence tagging using singly charge (SC) fragment ions only, [01] method, followed by p-value filtering (dotted purple curve); sequence tagging using multi-charged (MC) fragment ions, (03)(12)1.0 method, followed by p-value filtering (dash dot magenta curve); sequence tagging, (03)(12)1.0 method, followed by semi-parametric modeling with inclusion of the mass accuracy parameter dM (solid blue curve). **(b)** 2+ spectra. The tags were generated using [01] method only.

See this image and copyright information in PMC

Cited by

Speeding up tandem mass spectral identification using indexes.
Liu X, Mammana A, Bafna V. Liu X, et al. Bioinformatics. 2012 Jul 1;28(13):1692-7. doi: 10.1093/bioinformatics/bts244. Epub 2012 Apr 27. Bioinformatics. 2012. PMID: 22543365 Free PMC article.
A survey of computational methods and error rate estimation procedures for peptide and protein identification in shotgun proteomics.
Nesvizhskii AI. Nesvizhskii AI. J Proteomics. 2010 Oct 10;73(11):2092-123. doi: 10.1016/j.jprot.2010.08.009. Epub 2010 Sep 8. J Proteomics. 2010. PMID: 20816881 Free PMC article. Review.
Validation of De Novo Peptide Sequences with Bottom-Up Tag Convolution.
Vyatkina K. Vyatkina K. Proteomes. 2021 Dec 29;10(1):1. doi: 10.3390/proteomes10010001. Proteomes. 2021. PMID: 35076636 Free PMC article.
Towards an understanding of wheat chloroplasts: a methodical investigation of thylakoid proteome.
Kamal AH, Cho K, Komatsu S, Uozumi N, Choi JS, Woo SH. Kamal AH, et al. Mol Biol Rep. 2012 May;39(5):5069-83. doi: 10.1007/s11033-011-1302-4. Epub 2011 Dec 11. Mol Biol Rep. 2012. PMID: 22160430
Systematic Evaluation of Protein Sequence Filtering Algorithms for Proteoform Identification Using Top-Down Mass Spectrometry.
Kou Q, Wu S, Liu X. Kou Q, et al. Proteomics. 2018 Feb;18(3-4):10.1002/pmic.201700306. doi: 10.1002/pmic.201700306. Epub 2018 Feb 6. Proteomics. 2018. PMID: 29327814 Free PMC article.

See all "Cited by" articles

References

1. Hernandez P, Müller M, Appel RD. Automated protein identification by tandem mass spectrometry: Issues and strategies. Mass Spectrometry Reviews. 2006;25(2):235–254. - PubMed
1. Nesvizhskii AI. Protein identification by tandem mass spectrometry and sequence database searching. Methods Mol Biol. 2007;367:87–119. - PubMed
1. Xu C, Ma B. Software for computational peptide identification from MS-MS data. Drug Discovery Today. 2006;11(13–14):595–600. - PubMed
1. Keller A, Nesvizhskii AI, Kolker E, Aebersold R. Empirical Statistical Model To Estimate the Accuracy of Peptide Identifications Made by MS/MS and Database Search. Anal Chem. 2002;74(20):5383–5392. - PubMed
1. Bern M, Cai Y, Goldberg D. Lookup Peaks: A Hybrid of de Novo Sequencing and Database Search for Protein Identification by Tandem Mass Spectrometry. Anal Chem. 2007;79(4):1393–1400. - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Improved sequence tag generation method for peptide identification in tandem mass spectrometry

Affiliation

Improved sequence tag generation method for peptide identification in tandem mass spectrometry

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Molecular Biology Databases

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Related information

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Molecular Biology Databases