Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Apr 1;10(4):1593-602.
doi: 10.1021/pr100959y. Epub 2011 Feb 23.

SQID: an intensity-incorporated protein identification algorithm for tandem mass spectrometry

Affiliations

SQID: an intensity-incorporated protein identification algorithm for tandem mass spectrometry

Wenzhou Li et al. J Proteome Res. .

Abstract

To interpret LC-MS/MS data in proteomics, most popular protein identification algorithms primarily use predicted fragment m/z values to assign peptide sequences to fragmentation spectra. The intensity information is often undervalued, because it is not as easy to predict and incorporate into algorithms. Nevertheless, the use of intensity to assist peptide identification is an attractive prospect and can potentially improve the confidence of matches and generate more identifications. On the basis of our previously reported study of fragmentation intensity patterns, we developed a protein identification algorithm, SeQuence IDentfication (SQID), that makes use of the coarse intensity from a statistical analysis. The scoring scheme was validated by comparing with Sequest and X!Tandem using three data sets, and the results indicate an improvement in the number of identified peptides, including unique peptides that are not identified by Sequest or X!Tandem. The software and source code are available under the GNU GPL license at http://quiz2.chem.arizona.edu/wysocki/bioinformatics.htm.

PubMed Disclaimer

Figures

Figure 1
Figure 1
The calculation of intensity score in SQID. The bottom is a labeled experimental spectrum when matching it to the candidate sequence YEFGIFNQK2+. The most abundant peaks used for the intensity score calculation are circled. The numbers above b ions and below y ions are the probabilities of observing strong peaks with Pr values extracted from the intensity table.
Figure 2
Figure 2
Plot of q-value versus number of identified peptides showing the effect of individual components in the SQID score function for a) singly b) doubly c) triply charged peptides. More peptides were identified when adding consecutive ion pairs as well as the intensity related terms to the scoring function.
Figure 3
Figure 3
A comparison of SQID, Sequest and X!Tandem by plotting q-value (a measure of FDR) versus identified peptide-spectrum match for the PNNL dataset. (a) Singly charged peptides. (b) Doubly charged peptides. (c) Triple charged peptides. (d) A combination of all charge states.
Figure 4
Figure 4
A comparison of SQID, Sequest and X!Tandem by plotting q-value (a measure of FDR) versus identified peptide-spectrum match for the 18 protein mixture dataset. (a) Singly charged peptides. (b) Doubly charged peptides. (c) Triple charged peptides. (d) A combination of all charge states.
Figure 5
Figure 5
A comparison of SQID, Sequest and X!Tandem by plotting q-value (a measure of FDR) versus identified peptide-spectrum match for the yeast dataset. (a) Singly charged peptides. (b) Doubly charged peptides. (c) Triple charged peptides. (d) A combination of all charge states.
Figure 6
Figure 6
Example spectra that are a) identified by SQID but missed by Sequest and X!Tandem (TKIPAVFK 2+), b) identified by Sequest and X!Tandem but missed by SQID (AAANFFSASCVPCADQSSFPK 2+).
Figure 7
Figure 7
A plot of Xcorr versus a) m+n (numbers of matched peaks and numbers of consecutive pairs) and b) SQID score for 2571 peptide-spectrum matches extracted from the 18 protein mixture dataset. Every data point is scored by Sequest and SQID using the same experimental spectrum and the same peptide sequence. The blue spots are true identifications and red spots are false identifications.

Similar articles

Cited by

References

    1. Eng JK, McCormack AL, Yates JR. An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. J Am Soc Mass Spectrom. 1994;5(11):976–989. - PubMed
    1. Craig R, Beavis RC. Tandem: matching proteins with mass spectra. Bioinformatics. 2004;20(9):1466–1467. - PubMed
    1. Perkins DN, Pappin DJ, Creasy DM, Cottrell JS. Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis. 1999;20(18):3551–3567. - PubMed
    1. Sadygov RG, Cociorva D, Yates JR. Large-scale database searching using tandem mass spectra: looking up the answer in the back of the book. Nat. Methods. 2004;1(13):195–202. - PubMed
    1. Wysocki VH, Tsaprailis G, Smith LL, Breci LA. Mobile and localized protons: a framework for understanding peptide dissociation. Journal of Mass Spectrometry. 2000;35(12):1399–1406. - PubMed

Publication types

MeSH terms