Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Apr 1:3:121-129.
doi: 10.4172/jpb.1000130.

Two-phase Filtering Strategy for Efficient Peptide Identification from Mass Spectrometry

Affiliations

Two-phase Filtering Strategy for Efficient Peptide Identification from Mass Spectrometry

Hoong Kee Ng et al. J Proteomics Bioinform. .

Abstract

Peptide identification by tandem mass spectrometry (MS/MS) is one of the most important problems in proteomics. Recent advances in high throughput MS/MS experiments result in huge amount of spectra, and the peptide identification process should keep pace. In this paper, we strive to achieve high accuracy and efficiency for peptide identification with the presence of noise by a two-phase filtering strategy. Our algorithm transforms spectra to high dimensional vectors, and then uses self-organizing map (SOM) and multi-point range query (MPRQ) as very efficient coarse filters to select a number of candidate peptides from database. These candidate peptides are subsequently scored and ranked by an accurate tag-based scoring function S(λ). Experiments showed that our approach is both fast and accurate for peptide identification.

PubMed Disclaimer

Figures

Figure 1
Figure 1
The clustering result of theoretical spectra by SOM on (a) OPD and (b) ISB datasets. Areas with red dots indicate clustered vectors that are mapped on the plane. X-axis and y-axis represents coordinates of 2D plane.
Figure 2
Figure 2
The effects of increasing distance d (x-axis) on recall and precision (y-axis) on (a) PeptideAtlas, (b) OPD and (c) ISB datasets. All results are computed based on SPC scores.
Figure 3
Figure 3
Distributions of the number of sequence pairs (y-axis) against 2D distance range (x-axis)) for peptides with edit distance (a) 1 (b) 2 and (c) 3 on ISB dataset.
Figure 4
Figure 4
The distribution of the number of peptides (y-axis) by Sλ scores (x-axis) for peptides from forward (solid black) and decoy (dashed red) database. Results are based on PeptideAtlas dataset.

Similar articles

References

    1. Bertone P, Gerstein M. Integrative data mining: the new direction in bioinformatics. IEEE Engineering in Medicine and Biology Magazine. 2001;20:33–40. » CrossRef » PubMed » Google Scholar. - PubMed
    1. Dancik V, Addona T, Clauser K, Vath J, Pevzner P. De novo protein sequencing via tandem mass-spectrometry. J Comp Biol. 1999;6:327–341. » CrossRef » PubMed » Google Scholar. - PubMed
    1. Desiere F, Deutsch EW, King NL, Nesvizhskii AI, Mallick P, et al. The PeptideAtlas Project. Nucleic Acids Research. 2006;34:D655–D658. » CrossRef » PubMed » Google Scholar. - PMC - PubMed
    1. Eng JK, McCormack AL, John R, Yates I. An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. JASMS. 1994;5:976–989. » CrossRef » PubMed » Google Scholar. - PubMed
    1. Frank A, Pevzner P. PepNovo: De Novo Peptide Sequencing via Probabilistic Network Modeling. Anal Chem. 2005;77:964–973. » CrossRef » PubMed » Google Scholar. - PubMed

LinkOut - more resources