Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2009 Sep;8(9):4173-81.
doi: 10.1021/pr9004794.

False discovery rates of protein identifications: a strike against the two-peptide rule

Affiliations

False discovery rates of protein identifications: a strike against the two-peptide rule

Nitin Gupta et al. J Proteome Res. 2009 Sep.

Abstract

Most proteomics studies attempt to maximize the number of peptide identifications and subsequently infer proteins containing two or more peptides as reliable protein identifications. In this study, we evaluate the effect of this "two-peptide" rule on protein identifications, using multiple search tools and data sets. Contrary to the intuition, the "two-peptide" rule reduces the number of protein identifications in the target database more significantly than in the decoy database and results in increased false discovery rates, compared to the case when single-hit proteins are not discarded. We therefore recommend that the "two-peptide" rule should be abandoned, and instead, protein identifications should be subject to the estimation of error rates, as is the case with peptide identifications. We further extend the generating function approach (originally proposed for evaluating matches between a peptide and a single spectrum) to evaluating matches between a protein and an entire spectral data set.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Identification of peptides in the Shewanella data set using different approaches and scoring functions. Each point in the curves is generated by varying the scoring threshold and computing the number of hits in the target and the decoy database exceeding the threshold.
Figure 2
Figure 2
(a) Identification of proteins in the human data set using different approaches and scoring functions. (b) Similar plot as in panel (a) for Shewanella data set.
Figure 3
Figure 3
(a) Protein identification in the human data set using X!Tandem search results with different scoring approaches at the protein level. (b) Similar plot as in panel (a) for an arbitrarily selected subset of Shewanella data set containing 1.25 million spectra.
Figure 4
Figure 4
Identification of proteins, using the unique peptides only (peptides that are not shared between multiple proteins), in the human data set using InsPecT search results with different approaches.
Figure 5
Figure 5
(a) Identification of proteins in the human data set using MS-GF scores, without and with length correction. (b) Similar plot as in panel (a) for Shewanella data set.

Similar articles

Cited by

References

    1. Aebersold R, Mann M. Mass spectrometry-based proteomics. Nature. 2003;422:198–207. - PubMed
    1. Cargile BJ, Bundy JL, Stephenson JL., Jr Potential for false positive identifications from large databases through tandem mass spectrometry. J. Proteome Res. 2004;3:1082–1085. - PubMed
    1. Elias JE, Gygi SP. Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry. Nat. Methods. 2007;4:207–214. - PubMed
    1. Kall L, Storey JD, MacCoss MJ, Noble SW. Assigning significance to peptides identified by tandem mass spectrometry using decoy databases. J. Proteome Res. 2008;7:29–34. - PubMed
    1. Omenn GS, States DJ, Adamski M, Blackwell TW, Menon R, Hermjakob H, Apweiler R, Haab BB, Simpson RJ, Eddes JS. Overview of the HUPO Plasma Proteome Project: results from the pilot phase with 35 collaborating laboratories and multiple analytical groups, generating a core dataset of 3020 proteins and a publicly-available database. Proteomics. 2005;5:3226–3245. - PubMed

Publication types