. 2007 Aug 31:8:323.

doi: 10.1186/1471-2105-8-323.

Optimization of filtering criterion for SEQUEST database searching to improve proteome coverage in shotgun proteomics

Xinning Jiang¹, Xiaogang Jiang, Guanghui Han, Mingliang Ye, Hanfa Zou

Affiliations

PMID: 17761002
PMCID: PMC2040164
DOI: 10.1186/1471-2105-8-323

Optimization of filtering criterion for SEQUEST database searching to improve proteome coverage in shotgun proteomics

Xinning Jiang et al. BMC Bioinformatics. 2007.

. 2007 Aug 31:8:323.

doi: 10.1186/1471-2105-8-323.

Authors

Xinning Jiang¹, Xiaogang Jiang, Guanghui Han, Mingliang Ye, Hanfa Zou

Affiliation

¹ National Chromatographic R&A Center, Dalian Institute of Chemical Physics, The Chinese Academy of Sciences, Dalian 116023, China. vext@dicp.ac.cn

PMID: 17761002
PMCID: PMC2040164
DOI: 10.1186/1471-2105-8-323

Abstract

Background: In proteomic analysis, MS/MS spectra acquired by mass spectrometer are assigned to peptides by database searching algorithms such as SEQUEST. The assignations of peptides to MS/MS spectra by SEQUEST searching algorithm are defined by several scores including Xcorr, Delta Cn, Sp, Rsp, matched ion count and so on. Filtering criterion using several above scores is used to isolate correct identifications from random assignments. However, the filtering criterion was not favorably optimized up to now.

Results: In this study, we implemented a machine learning approach known as predictive genetic algorithm (GA) for the optimization of filtering criteria to maximize the number of identified peptides at fixed false-discovery rate (FDR) for SEQUEST database searching. As the FDR was directly determined by decoy database search scheme, the GA based optimization approach did not require any pre-knowledge on the characteristics of the data set, which represented significant advantages over statistical approaches such as PeptideProphet. Compared with PeptideProphet, the GA based approach can achieve similar performance in distinguishing true from false assignment with only 1/10 of the processing time. Moreover, the GA based approach can be easily extended to process other database search results as it did not rely on any assumption on the data.

Conclusion: Our results indicated that filtering criteria should be optimized individually for different samples. The new developed software using GA provides a convenient and fast way to create tailored optimal criteria for different proteome samples to improve proteome coverage.

PubMed Disclaimer

Figures

**Figure 1**
**Distribution of peptides identified from human liver tissue lysate by SEQUEST**. A) Singly charged peptides; B) Doubly charged peptides; C) Triply charged peptides. Each data point represents a peptide identification from the composite database: cross represents peptide identification from reversed sequence while square indicates peptide identification from forward sequence. Cumulate curves drawn in each graph are 1% false-discovery curves. Each point on curves indicates a filtering criterion leading to peptide identification with FDR of 1%, and the identified peptides by each criterion present in the region where Xcorr and ΔCn scores are higher than the Xcorr and ΔCn cutoffs in each set of criteria. Graphs were drawn using the Speed Model by Origin 7.5 with 5000 max points per curve, and three raw graphs with all data points were shown [see Additional file 1].

**Figure 2**
**Relationship between the number of peptide identifications and Xcorr values in different criteria which leaded to these identifications for human liver tissue lysate at same FDR (<1%)**. To achieve less than 1% FDR, ΔCn cutoff for each criterion changes with the Xcorr cutoff. Curves for three different charge states were drawn separately: A) for singly charged peptides, B) for triply charged peptides and C) for doubly charged peptides. D) is the zoomed curve for singly charged peptides.

**Figure 3**
**Dependence of fitness on generations for doubly charged peptides**. Fitness for each individual (criterion) represents the number of peptide identifications filtered by this criterion. Fitness of the fittest individual in each generation was represented as black dots.

**Figure 4**
**Overlap of peptides identified by SFOER and PeptideProphet for human liver tissue lysate**. The numbers of peptide identifications by one or both algorithms are indicated, e.g., 27,272 peptides are identified by both algorithms (intersection).

**Figure 5**
**Evaluation of the classification performances of SFOER and PeptideProphet with standard protein mixture**. A) Number of correct and incorrect peptide identifications by SFOER and PeptideProphet under different FDR, where incorrect peptide identification indicates peptide assignment from forward yeast database while correct one is from known standard proteins and trypsin. B) Predicated and observed FDRs. Observed FDR is calculated as the number of peptide identifications not from standard proteins over total peptide identifications, while predicated FDR is calculated using equation (1). Observed FDR for SFOER are presented by open circles, while observed FDR for PeptideProphet are represented by filled circles.

**Figure 6**
**Flowchart of the optimization procedure using genetic algorithm**. It starts with the initialization phase, which randomly generates the initial population P₀. Population in the next generation P_i+1is obtained by applying genetic operators on current population P_i. Fitness for each individual (criterion) is evaluated as the number of filtered peptides. Evolution continues until a terminating condition is reached. The selection, mutation and cross-over operator are used in genetic algorithm.

See this image and copyright information in PMC

Cited by

Ubiquitinated proteome: ready for global?
Shi Y, Xu P, Qin J. Shi Y, et al. Mol Cell Proteomics. 2011 May;10(5):R110.006882. doi: 10.1074/mcp.R110.006882. Epub 2011 Feb 21. Mol Cell Proteomics. 2011. PMID: 21339389 Free PMC article. Review.
Dynamics of the lipid droplet proteome of the Oleaginous yeast rhodosporidium toruloides.
Zhu Z, Ding Y, Gong Z, Yang L, Zhang S, Zhang C, Lin X, Shen H, Zou H, Xie Z, Yang F, Zhao X, Liu P, Zhao ZK. Zhu Z, et al. Eukaryot Cell. 2015 Mar;14(3):252-64. doi: 10.1128/EC.00141-14. Epub 2015 Jan 9. Eukaryot Cell. 2015. PMID: 25576482 Free PMC article.
Identification of outer membrane proteins from an Antarctic bacterium Pseudomonas syringae Lz4W.
Jagannadham MV, Abou-Eladab EF, Kulkarni HM. Jagannadham MV, et al. Mol Cell Proteomics. 2011 Jun;10(6):M110.004549. doi: 10.1074/mcp.M110.004549. Epub 2011 Mar 29. Mol Cell Proteomics. 2011. PMID: 21447709 Free PMC article.
Target-decoy search strategy for mass spectrometry-based proteomics.
Elias JE, Gygi SP. Elias JE, et al. Methods Mol Biol. 2010;604:55-71. doi: 10.1007/978-1-60761-444-9_5. Methods Mol Biol. 2010. PMID: 20013364 Free PMC article.
A novel algorithm for validating peptide identification from a shotgun proteomics search engine.
Jian L, Niu X, Xia Z, Samir P, Sumanasekera C, Mu Z, Jennings JL, Hoek KL, Allos T, Howard LM, Edwards KM, Weil PA, Link AJ. Jian L, et al. J Proteome Res. 2013 Mar 1;12(3):1108-19. doi: 10.1021/pr300631t. Epub 2013 Feb 12. J Proteome Res. 2013. PMID: 23402659 Free PMC article.

See all "Cited by" articles

References

1. Aebersold R, Mann M. Mass spectrometry-based proteomics. Nature. 2003;422:198–207. doi: 10.1038/nature01511. - DOI - PubMed
1. Yates JR. Mass spectral analysis in proteomics. Annu Rev Biophys Biomolec Struct. 2004;33:297–316. doi: 10.1146/annurev.biophys.33.111502.082538. - DOI - PubMed
1. Koller A, Washburn MP, Lange BM, Andon NL, Deciu C, Haynes PA, Hays L, Schieltz D, Ulaszek R, Wei J, Wolters D, Yates JR. Proteomic survey of metabolic pathways in rice. Proc Natl Acad Sci U S A. 2002;99:11969–11974. doi: 10.1073/pnas.172183199. - DOI - PMC - PubMed
1. Wu CC, MacCoss MJ, Howell KE, Yates JR. A method for the comprehensive proteomic analysis of membrane proteins. Nat Biotechnol. 2003;21:532–538. doi: 10.1038/nbt819. - DOI - PubMed
1. Florens L, Washburn MP, Raine JD, Anthony RM, Grainger M, Haynes JD, Moch JK, Muster N, Sacci JB, Tabb DL, Witney AA, Wolters D, Wu YM, Gardner MJ, Holder AA, Sinden RE, Yates JR, Carucci DJ. A proteomic view of the Plasmodium falciparum life cycle. Nature. 2002;419:520–526. doi: 10.1038/nature01107. - DOI - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

LinkOut - more resources

Full Text Sources
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Optimization of filtering criterion for SEQUEST database searching to improve proteome coverage in shotgun proteomics

Affiliation

Optimization of filtering criterion for SEQUEST database searching to improve proteome coverage in shotgun proteomics

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Research Materials

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Related information

LinkOut - more resources

Full Text Sources

Research Materials