Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Sep 1;32(17):2642-9.
doi: 10.1093/bioinformatics/btw225. Epub 2016 Apr 29.

Confidence assignment for mass spectrometry based peptide identifications via the extreme value distribution

Affiliations

Confidence assignment for mass spectrometry based peptide identifications via the extreme value distribution

Gelio Alves et al. Bioinformatics. .

Abstract

Motivation: There is a growing trend for biomedical researchers to extract evidence and draw conclusions from mass spectrometry based proteomics experiments, the cornerstone of which is peptide identification. Inaccurate assignments of peptide identification confidence thus may have far-reaching and adverse consequences. Although some peptide identification methods report accurate statistics, they have been limited to certain types of scoring function. The extreme value statistics based method, while more general in the scoring functions it allows, demands accurate parameter estimates and requires, at least in its original design, excessive computational resources. Improving the parameter estimate accuracy and reducing the computational cost for this method has two advantages: it provides another feasible route to accurate significance assessment, and it could provide reliable statistics for scoring functions yet to be developed.

Results: We have formulated and implemented an efficient algorithm for calculating the extreme value statistics for peptide identification applicable to various scoring functions, bypassing the need for searching large random databases.

Availability and implementation: The source code, implemented in C ++ on a linux system, is available for download at ftp://ftp.ncbi.nlm.nih.gov/pub/qmbp/qmbp_ms/RAId/RAId_Linux_64Bit

Contact: yyu@ncbi.nlm.nih.gov

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
DPV accuracy and assessment of the EVD model. In panels AC , two dashes lines, x = 3 y and x = y / 3, are plotted to show how close/off the measured curves are from the theoretical y  =  x curve. All spectra in DG-1 ( E. coli ) were queried against the random database (Section 2.7). With δ =0.01 Da, panel (A) displays the observed DPVs versus the reported DPVs as NSRP varies from 10 3 , 10 4 , 10 5 to 10 6 . With NSRP =10 5 , panel (B) displays the accuracy of the reported DPV for different δs: 0.1 Da, 0.01 Da and 0.001 Da. With NSRP =10 5 and δ set to 0.01 Da, panel (C) displays the accuracy of the reported database DPVs under different internal mass spacings ( ϵ s): 0.1 Da, 0.01 Da and 0.001 Da. In panel (D) (with NSRP=10 5,δ = 0.01 Da, and ϵ = 0.1 Da), the cumulative frequency histogram of the model GOF is shown
Fig. 2.
Fig. 2.
Agreement between the Sorić’s PFD and the target-decoy PFD when peptides are ranked by DPVs. The PFD curves through both approaches are displayed in panels A , B and C , respectively, for DG-1, DG-2 and DG-3; each DG is analyzed using the parameters mentioned in Section 2.6 and with the following additional parameters: δ = 0.015 Da, ϵ = 0.15 Da, target and decoy databases as described in Section 2.7
Fig. 3.
Fig. 3.
Peptide retrieval comparison via target-decoy approach. DGs 1-3, analyzed using the same parameters as mentioned in the caption of Figure 2, yield the retrieval curves in panels AC (and in panels DF ) respectively. Panels A–C display the retrieval PFD curves when peptides are ranked by the per spectrum EVD statistics and by the native SEQUEST program, both of which use XCorr as the scoring function. Panels D–F display, for various scoring functions, the retrieval PFD curves when peptides are ranked by the EVD statistics

References

    1. Alves G., Yu Y.K. ( 2008. ) Statistical characterization of a 1D random potential problem – with applications in score statistics of MS-based peptide sequencing . Physica A , 387 , 6538 – 6544 . - PMC - PubMed
    1. Alves G., Yu Y.K. ( 2015. ) Mass spectrometry-based protein identification with accurate statistical significance assignment . Bioinformatics , 31 , 699 – 706 . - PMC - PubMed
    1. Alves G. et al. . ( 2007a. ) Calibrating E-values for MS2 database search methods . Biol. Direct , 2 , 26.. - PMC - PubMed
    1. Alves G. et al. . ( 2007b. ) RAId_DbS: peptide identification using database searches with realistic statistics . Biol. Direct , 2 , 25.. - PMC - PubMed
    1. Alves G. et al. . ( 2008a. ) Enhancing peptide identification confidence by combining search methods . J. Proteome Res ., 7 , 3102 – 3113 . - PMC - PubMed