Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Mar 1;12(3):1108-19.
doi: 10.1021/pr300631t. Epub 2013 Feb 12.

A novel algorithm for validating peptide identification from a shotgun proteomics search engine

Affiliations

A novel algorithm for validating peptide identification from a shotgun proteomics search engine

Ling Jian et al. J Proteome Res. .

Abstract

Liquid chromatography coupled with tandem mass spectrometry (LC-MS/MS) has revolutionized the proteomics analysis of complexes, cells, and tissues. In a typical proteomic analysis, the tandem mass spectra from a LC-MS/MS experiment are assigned to a peptide by a search engine that compares the experimental MS/MS peptide data to theoretical peptide sequences in a protein database. The peptide spectra matches are then used to infer a list of identified proteins in the original sample. However, the search engines often fail to distinguish between correct and incorrect peptides assignments. In this study, we designed and implemented a novel algorithm called De-Noise to reduce the number of incorrect peptide matches and maximize the number of correct peptides at a fixed false discovery rate using a minimal number of scoring outputs from the SEQUEST search engine. The novel algorithm uses a three-step process: data cleaning, data refining through a SVM-based decision function, and a final data refining step based on proteolytic peptide patterns. Using proteomics data generated on different types of mass spectrometers, we optimized the De-Noise algorithm on the basis of the resolution and mass accuracy of the mass spectrometer employed in the LC-MS/MS experiment. Our results demonstrate De-Noise improves peptide identification compared to other methods used to process the peptide sequence matches assigned by SEQUEST. Because De-Noise uses a limited number of scoring attributes, it can be easily implemented with other search engines.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Pseudocode for the De-Noise algorithm.
Figure 2
Figure 2
Venn diagrams for the seven datasets showing the number of overlapping validated PSMs from De-Noise, PeptidePropheet, and Percolator. An FDR of 0.05 was used for all three approaches.
Figure 3
Figure 3
Plots of target PSM hits for the seven datasets validated under a series of FDRs for De-Noise, PeptideProphet, and Percolator. The number of target peptide hits is plotted for a FDR range from 0.01 to 0.1. (A) Gcn4 LCQ (B) UPS1 LTQ (C) Tal08 LTQ-Orbitrap XL MiPS (D) PBMC LTQ-Orbitrap XL MiPS (E) PBMC LTQ-Orbitrap XL MiPS-off (F) PBMC LTQ-Orbitrap Velos MiPS (G) PBMC LTQ-Orbitrap Velos MiPS-off.
Figure 3
Figure 3
Plots of target PSM hits for the seven datasets validated under a series of FDRs for De-Noise, PeptideProphet, and Percolator. The number of target peptide hits is plotted for a FDR range from 0.01 to 0.1. (A) Gcn4 LCQ (B) UPS1 LTQ (C) Tal08 LTQ-Orbitrap XL MiPS (D) PBMC LTQ-Orbitrap XL MiPS (E) PBMC LTQ-Orbitrap XL MiPS-off (F) PBMC LTQ-Orbitrap Velos MiPS (G) PBMC LTQ-Orbitrap Velos MiPS-off.
Figure 4
Figure 4
ROC curves for the seven datasets showing the validation performance of De-Noise, PeptideProphet, and Percolator. (A) Gcn4 LCQ (B) UPS1 LTQ (C) Tal08 LTQ-Orbitrap XL MiPS-off (D) PBMC LTQ Orbitrap XL MiPS (E) PBMC LTQ-Orbitrap XL MiPS-off (F) PBMC LTQ-Orbitrap Velos MiPS (G) PBMC LTQ-Orbitrap Velos MiPS-off.
Figure 4
Figure 4
ROC curves for the seven datasets showing the validation performance of De-Noise, PeptideProphet, and Percolator. (A) Gcn4 LCQ (B) UPS1 LTQ (C) Tal08 LTQ-Orbitrap XL MiPS-off (D) PBMC LTQ Orbitrap XL MiPS (E) PBMC LTQ-Orbitrap XL MiPS-off (F) PBMC LTQ-Orbitrap Velos MiPS (G) PBMC LTQ-Orbitrap Velos MiPS-off.
Figure 4
Figure 4
ROC curves for the seven datasets showing the validation performance of De-Noise, PeptideProphet, and Percolator. (A) Gcn4 LCQ (B) UPS1 LTQ (C) Tal08 LTQ-Orbitrap XL MiPS-off (D) PBMC LTQ Orbitrap XL MiPS (E) PBMC LTQ-Orbitrap XL MiPS-off (F) PBMC LTQ-Orbitrap Velos MiPS (G) PBMC LTQ-Orbitrap Velos MiPS-off.

References

    1. Elias JE, Haas W, Faherty BK, Gygi SP. Comparative evaluation of mass spectrometry platforms used in large-scale proteomics investigations. Nat Methods. 2005;2:667–675. - PubMed
    1. Elias JE, Gygi SP. Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry. Nat Methods. 2007;4:207–214. - PubMed
    1. Peng J, Elias JE, Thoreen CC, Licklider LJ, Gygi SP. Evaluation of multidimensional chromatography coupled with tandem mass spectrometry (LC/LC-MS/MS) for large-scale protein analysis: the yeast proteome. J Proteome Res. 2003;2:43–50. - PubMed
    1. Kall L, Storey JD, MacCoss MJ, Noble WS. Assigning significance to peptides identified by tandem mass spectrometry using decoy databases. J Proteome Res. 2008;7:29–34. - PubMed
    1. Choi H, Nesvizhskii AI. False discovery rates and related statistical concepts in mass spectrometry-based proteomics. J Proteome Res. 2008;7:47–50. - PubMed

Publication types

LinkOut - more resources