. 2019 Jul;19(14):e1800367.

doi: 10.1002/pmic.201800367.

RAId: Knowledge-Integrated Proteomics Web Service with Accurate Statistical Significance Assignment

Aleksey Y Ogurtsov¹, Gelio Alves¹, Yi-Kuo Yu¹

Affiliations

PMID: 30908818
PMCID: PMC6635056
DOI: 10.1002/pmic.201800367

RAId: Knowledge-Integrated Proteomics Web Service with Accurate Statistical Significance Assignment

Aleksey Y Ogurtsov et al. Proteomics. 2019 Jul.

. 2019 Jul;19(14):e1800367.

doi: 10.1002/pmic.201800367.

Authors

Aleksey Y Ogurtsov¹, Gelio Alves¹, Yi-Kuo Yu¹

Affiliation

¹ National Center for Biotechnology Information, National Library of Medicine, NIH, Bethesda, MD, 20894, USA.

PMID: 30908818
PMCID: PMC6635056
DOI: 10.1002/pmic.201800367

Abstract

Mass spectrometry-based proteomics starts with identifications of peptides and proteins, which provide the bases for forming the next-level hypotheses whose "validations" are often employed for forming even higher level hypotheses and so forth. Scientifically meaningful conclusions are thus attainable only if the number of falsely identified peptides/proteins is accurately controlled. For this reason, RAId continued to be developed in the past decade. RAId employs rigorous statistics for peptides/proteins identification, hence assigning accurate P-values/E-values that can be used confidently to control the number of falsely identified peptides and proteins. The RAId web service is a versatile tool built to identify peptides and proteins from tandem mass spectrometry data. Not only recognizing various spectra file formats, the web service also allows four peptide scoring functions and choice of three statistical methods for assigning P-values/E-values to identified peptides. Users may upload their own protein database or use one of the available knowledge integrated organismal databases that contain annotated information such as single amino acid polymorphisms, post-translational modifications, and their disease associations. The web service also provides a friendly interface to display, sort using different criteria, and download the identified peptides and proteins. RAId web service is freely available at https://www.ncbi.nlm.nih.gov/CBBresearch/Yu/raid.

Keywords: MS/MS data analyses; knowledge-integrated database; peptide/protein identifications; proportion of false discoveries; statistical significance assignment.

Published 2019. This article is a U.S. Government work and is in the public domain in the USA. Proteomics Published by WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

PubMed Disclaimer

Conflict of interest statement

Conflict of Interest

The authors have declared no conflict of interest.

Figures

**Figure 1:**
Panel (A) shows the main dialog window of RAId when the *Generate histogram* tab is chosen. To the right of the scoring function heading, users may select a scoring function. Note that the symbol R(•) represents RAId’s implementation of scoring function • minus the undescribed/unjustified heuristics. The user may select desired PTMs alongside with the 20 regular amino acids to generate the score distribution (normalized histogram) for a given MS/MS spectrum. Under the “Amino acids and PTMs” expansion tab, one can click on “Change” next to PTMs, and a pop-up window, part of which shown in panel (B), appears which allows the user to select PTMs desired. Panel (C) displays the score distribution by scoring all possible peptides made of only regular amino acids. Panel (D) also displays the score distribution by scoring all possible peptides made of 20 regular amino acids and with 31 PTMs, some of which are shown in panel (B).

**Figure 2:**
Panel (A) shows the main dialog window of RAId when the *Database search* tab is chosen. Under the “Amino acids and PTMs” expansion tab, one can enable the search program to consider annotated SAPs, novel PTMs, and annotated PTMs. All of those can be accessed via pop-up windows by clicking on the corresponding “Change” buttons on the right. In the example shown in the lower right corner of panel (A), two annotated SAPs are selected. This means that when encountering peptides whose residue A (or S) have been documented to have SAPs, the search program will automatically consider all such variant peptides during the search. Panel (B) shows the pop-up window associated with annotated PTMs. In this example, 31 annotated PTMs were checked. Once all search parameters are entered and the button “Submit job” clicked, the main dialog window is replaced by the result window. An example of the result window is displayed in panel (C) when a *Homo sapiens* dataset is used. The four job status, pending, running, retrieving, and complete are self-explanatory. Once the job is complete, one may sort the results using a different criterion by clicking on any one of the remaining five open circles.

**Figure 3:**
Panel (A) shows what the results window is like when the user sorts the search results according to “protein E-value”. With this choice, proteins are sorted in ascending order of their E-values. That is, the protein with highest identification significance is shown first and so on. If one clicks on the “plus” sign in the front of a protein row, that row expands and all peptides mappable to that protein are displayed in ascending order of their E-values. Panel (B) shows such an expansion when the fifth protein on Panel (A) is clicked. The user may also in the expanded list click on one of the peptides; this will induce a pop-up window displaying the peptide-spectrum match. Panel (C) shows such an example when clicking on the peptide AVFQANQENLPILKR belonging to the fifth protein. Finally, the user may access a protein’s corresponding RefSeq page by clicking on that protein’s GI. Panel (D) shows part of the RefSeq page corresponding to the fifth protein when its accession number NP_000692 is clicked.

See this image and copyright information in PMC

References

1. Alves G, Ogurtsov AY, Yu YK: RAId_DbS: Peptide Identification using Database Searches with Realistic Statistics. Biology Direct 2007, 2:25. [[Online]]. - PMC - PubMed
1. Park CY, Klammer AA, Kall L, MacCoss MJ, Noble WS: Rapid and accurate peptide identification from tandem mass spectra. J. Proteome Res. 2008, 7(7):3022–3027. - PMC - PubMed
1. Craig R, Beavis RC: TANDEM: matching proteins with tandem mass spectra. Bioinformatics 2004, 20(9):1466–1467. - PubMed
1. MacLean B, Eng JK, Beavis RC, McIntosh M: General framework for developing and evaluating database scoring algorithms using the TANDEM search engine. Bioinformatics 2006, 22(22):2830–2832. - PubMed
1. Alves G, Yu YK: Statistical Characterization of a 1D Random Potential Problem - with applications in score statistics of MS-based peptide sequencing. Physica A 2008, 387(26):6538–6544. - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions

Grants and funding

ZIA LM092404/ImNIH/Intramural NIH HHS/United States

LinkOut - more resources

Full Text Sources
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

RAId: Knowledge-Integrated Proteomics Web Service with Accurate Statistical Significance Assignment

Affiliation

RAId: Knowledge-Integrated Proteomics Web Service with Accurate Statistical Significance Assignment

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Miscellaneous