SQID: an intensity-incorporated protein identification algorithm for tandem mass spectrometry

Wenzhou Li¹, Li Ji, Jonathan Goya, Guanhong Tan, Vicki H Wysocki

Affiliations

PMID: 21204564
PMCID: PMC3477243
DOI: 10.1021/pr100959y

SQID: an intensity-incorporated protein identification algorithm for tandem mass spectrometry

Wenzhou Li et al. J Proteome Res. 2011.

. 2011 Apr 1;10(4):1593-602.

doi: 10.1021/pr100959y. Epub 2011 Feb 23.

Authors

Wenzhou Li¹, Li Ji, Jonathan Goya, Guanhong Tan, Vicki H Wysocki

Affiliation

¹ Department of Chemistry and Biochemistry, University of Arizona, Tucson, Arizona 85721, United States.

PMID: 21204564
PMCID: PMC3477243
DOI: 10.1021/pr100959y

Abstract

To interpret LC-MS/MS data in proteomics, most popular protein identification algorithms primarily use predicted fragment m/z values to assign peptide sequences to fragmentation spectra. The intensity information is often undervalued, because it is not as easy to predict and incorporate into algorithms. Nevertheless, the use of intensity to assist peptide identification is an attractive prospect and can potentially improve the confidence of matches and generate more identifications. On the basis of our previously reported study of fragmentation intensity patterns, we developed a protein identification algorithm, SeQuence IDentfication (SQID), that makes use of the coarse intensity from a statistical analysis. The scoring scheme was validated by comparing with Sequest and X!Tandem using three data sets, and the results indicate an improvement in the number of identified peptides, including unique peptides that are not identified by Sequest or X!Tandem. The software and source code are available under the GNU GPL license at http://quiz2.chem.arizona.edu/wysocki/bioinformatics.htm.

PubMed Disclaimer

Figures

**Figure 1**
The calculation of intensity score in SQID. The bottom is a labeled experimental spectrum when matching it to the candidate sequence YEFGIFNQK2+. The most abundant peaks used for the intensity score calculation are circled. The numbers above b ions and below y ions are the probabilities of observing strong peaks with Pr values extracted from the intensity table.

**Figure 2**
Plot of q-value versus number of identified peptides showing the effect of individual components in the SQID score function for a) singly b) doubly c) triply charged peptides. More peptides were identified when adding consecutive ion pairs as well as the intensity related terms to the scoring function.

**Figure 3**
A comparison of SQID, Sequest and X!Tandem by plotting q-value (a measure of FDR) versus identified peptide-spectrum match for the PNNL dataset. (a) Singly charged peptides. (b) Doubly charged peptides. (c) Triple charged peptides. (d) A combination of all charge states.

**Figure 4**
A comparison of SQID, Sequest and X!Tandem by plotting q-value (a measure of FDR) versus identified peptide-spectrum match for the 18 protein mixture dataset. (a) Singly charged peptides. (b) Doubly charged peptides. (c) Triple charged peptides. (d) A combination of all charge states.

**Figure 5**
A comparison of SQID, Sequest and X!Tandem by plotting q-value (a measure of FDR) versus identified peptide-spectrum match for the yeast dataset. (a) Singly charged peptides. (b) Doubly charged peptides. (c) Triple charged peptides. (d) A combination of all charge states.

**Figure 6**
Example spectra that are a) identified by SQID but missed by Sequest and X!Tandem (TKIPAVFK 2+), b) identified by Sequest and X!Tandem but missed by SQID (AAANFFSASCVPCADQSSFPK 2+).

**Figure 7**
A plot of Xcorr versus a) m+n (numbers of matched peaks and numbers of consecutive pairs) and b) SQID score for 2571 peptide-spectrum matches extracted from the 18 protein mixture dataset. Every data point is scored by Sequest and SQID using the same experimental spectrum and the same peptide sequence. The blue spots are true identifications and red spots are false identifications.

See this image and copyright information in PMC

References

1. Eng JK, McCormack AL, Yates JR. An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J Am Soc Mass Spectrom. 1994;5(11):976–989. - PubMed
1. Craig R, Beavis RC. Tandem: matching proteins with mass spectra. Bioinformatics. 2004;20(9):1466–1467. - PubMed
1. Perkins DN, Pappin DJ, Creasy DM, Cottrell JS. Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis. 1999;20(18):3551–3567. - PubMed
1. Sadygov RG, Cociorva D, Yates JR. Large-scale database searching using tandem mass spectra: looking up the answer in the back of the book. Nat. Methods. 2004;1(13):195–202. - PubMed
1. Wysocki VH, Tsaprailis G, Smith LL, Breci LA. Mobile and localized protons: a framework for understanding peptide dissociation. Journal of Mass Spectrometry. 2000;35(12):1399–1406. - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

SQID: an intensity-incorporated protein identification algorithm for tandem mass spectrometry

Affiliation

SQID: an intensity-incorporated protein identification algorithm for tandem mass spectrometry

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Molecular Biology Databases