Confidence assignment for mass spectrometry based peptide identifications via the extreme value distribution

Gelio Alves¹, Yi-Kuo Yu¹

Affiliations

PMID: 27153659
PMCID: PMC5939896
DOI: 10.1093/bioinformatics/btw225

Confidence assignment for mass spectrometry based peptide identifications via the extreme value distribution

Gelio Alves et al. Bioinformatics. 2016.

. 2016 Sep 1;32(17):2642-9.

doi: 10.1093/bioinformatics/btw225. Epub 2016 Apr 29.

Authors

Gelio Alves¹, Yi-Kuo Yu¹

Affiliation

¹ National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.

PMID: 27153659
PMCID: PMC5939896
DOI: 10.1093/bioinformatics/btw225

Abstract

Motivation: There is a growing trend for biomedical researchers to extract evidence and draw conclusions from mass spectrometry based proteomics experiments, the cornerstone of which is peptide identification. Inaccurate assignments of peptide identification confidence thus may have far-reaching and adverse consequences. Although some peptide identification methods report accurate statistics, they have been limited to certain types of scoring function. The extreme value statistics based method, while more general in the scoring functions it allows, demands accurate parameter estimates and requires, at least in its original design, excessive computational resources. Improving the parameter estimate accuracy and reducing the computational cost for this method has two advantages: it provides another feasible route to accurate significance assessment, and it could provide reliable statistics for scoring functions yet to be developed.

Results: We have formulated and implemented an efficient algorithm for calculating the extreme value statistics for peptide identification applicable to various scoring functions, bypassing the need for searching large random databases.

Availability and implementation: The source code, implemented in C ++ on a linux system, is available for download at ftp://ftp.ncbi.nlm.nih.gov/pub/qmbp/qmbp_ms/RAId/RAId_Linux_64Bit

Contact: yyu@ncbi.nlm.nih.gov

Supplementary information: Supplementary data are available at Bioinformatics online.

Published by Oxford University Press 2016. This work is written by US Government employees and is in the public domain in the US.

PubMed Disclaimer

Figures

**Fig. 1.**
DPV accuracy and assessment of the EVD model. In panels A – C , two dashes lines, x = 3 y and x = *y /* 3, are plotted to show how close/off the measured curves are from the theoretical y = x curve. All spectra in DG-1 ( *E. coli* ) were queried against the random database (Section 2.7). With δ =0.01 Da, panel (A) displays the observed DPVs versus the reported DPVs as NSRP varies from 10 ³ , 10 ⁴ , 10 ⁵ to 10 ⁶ . With NSRP =10 ⁵ , panel (B) displays the accuracy of the reported DPV for different δs: 0.1 Da, 0.01 Da and 0.001 Da. With NSRP =10 ⁵ and δ set to 0.01 Da, panel (C) displays the accuracy of the reported database DPVs under different internal mass spacings ( ϵ s): 0.1 Da, 0.01 Da and 0.001 Da. In panel (D) (with NSRP=10 ^5,*δ =* 0.01 Da, and ϵ = 0.1 Da), the cumulative frequency histogram of the model GOF is shown

**Fig. 2.**
Agreement between the Sorić’s PFD and the target-decoy PFD when peptides are ranked by DPVs. The PFD curves through both approaches are displayed in panels A , B and C , respectively, for DG-1, DG-2 and DG-3; each DG is analyzed using the parameters mentioned in Section 2.6 and with the following additional parameters: δ = 0.015 Da, ϵ = 0.15 Da, target and decoy databases as described in Section 2.7

**Fig. 3.**
Peptide retrieval comparison via target-decoy approach. DGs 1-3, analyzed using the same parameters as mentioned in the caption of Figure 2, yield the retrieval curves in panels A – C (and in panels D – F ) respectively. Panels A–C display the retrieval PFD curves when peptides are ranked by the per spectrum EVD statistics and by the native SEQUEST program, both of which use XCorr as the scoring function. Panels D–F display, for various scoring functions, the retrieval PFD curves when peptides are ranked by the EVD statistics

See this image and copyright information in PMC

References

1. Alves G., Yu Y.K. ( 2008. ) Statistical characterization of a 1D random potential problem – with applications in score statistics of MS-based peptide sequencing . Physica A , 387 , 6538 – 6544 . - PMC - PubMed
1. Alves G., Yu Y.K. ( 2015. ) Mass spectrometry-based protein identification with accurate statistical significance assignment . Bioinformatics , 31 , 699 – 706 . - PMC - PubMed
1. Alves G. et al. . ( 2007a. ) Calibrating E-values for MS2 database search methods . Biol. Direct , 2 , 26.. - PMC - PubMed
1. Alves G. et al. . ( 2007b. ) RAId_DbS: peptide identification using database searches with realistic statistics . Biol. Direct , 2 , 25.. - PMC - PubMed
1. Alves G. et al. . ( 2008a. ) Enhancing peptide identification confidence by combining search methods . J. Proteome Res ., 7 , 3102 – 3113 . - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Confidence assignment for mass spectrometry based peptide identifications via the extreme value distribution

Affiliation

Confidence assignment for mass spectrometry based peptide identifications via the extreme value distribution

Authors

Affiliation

Abstract

Figures

References

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Other Literature Sources