Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Jul 1;10(7):2896-904.
doi: 10.1021/pr200118r. Epub 2011 Apr 26.

ScanRanker: Quality assessment of tandem mass spectra via sequence tagging

Affiliations

ScanRanker: Quality assessment of tandem mass spectra via sequence tagging

Ze-Qiang Ma et al. J Proteome Res. .

Abstract

In shotgun proteomics, protein identification by tandem mass spectrometry relies on bioinformatics tools. Despite recent improvements in identification algorithms, a significant number of high quality spectra remain unidentified for various reasons. Here we present ScanRanker, an open-source tool that evaluates the quality of tandem mass spectra via sequence tagging with reliable performance in data from different instruments. The superior performance of ScanRanker enables it not only to find unassigned high quality spectra that evade identification through database search but also to select spectra for de novo sequencing and cross-linking analysis. In addition, we demonstrate that the distribution of ScanRanker scores predicts the richness of identifiable spectra among multiple LC-MS/MS runs in an experiment, and ScanRanker scores assist the process of peptide assignment validation to increase confident spectrum identifications. The source code and executable versions of ScanRanker are available from http://fenchurch.mc.vanderbilt.edu.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Combining three subscores improves the discriminating power of ScanRanker. Tests on the “DLD1 LTQ” data set revealed different discrimination in ScanRanker’s subscores. The ROC curves display true positive rate (a.k.a. sensitivity) and false positive rate (a.k.a. 1-specificity) of ScanRanker’s subscores and the combined score. The AUC values show that combining three subscores yields better discrimination than using any single subscore.
Figure 2
Figure 2
Evaluation of ScanRanker to recover unidentified high quality spectra. Three data sets were reanalyzed by additional search methods to find high quality spectra that were unidentified in initial database searches. Each test represents a typical reason that high quality spectra may be left unidentified in an initial search. (A) The “DLD1 LTQ” data set was initially identified by Sequest search. New identifications (IDs) were added by MyriMatch and X!Tandem searches. (B) The “Serum Orbi” data was searched by MyriMatch in either tryptic or semi-tryptic mode. (C) The “Histone Orbi” data was searched by MyriMatch. A subsequent TagRecon search was performed to identify spectra of mutated or modified peptides. These graphs plot the distributions of initial identifications, new identifications by additional searches and unidentified spectra in deciles by ScanRanker scores. In each panel, the left side represents spectra assigned high ScanRanker quality scores and the right side is low quality spectra. Newly identified spectra tend to associate with better ScanRanker scores in all data sets.
Figure 3
Figure 3
Comparison of ScanRanker to QualScore. Spectra in three data sets were separately processed by ScanRanker and QualScore to generate quality scores. ScanRanker performs as well as QualScore in all data sets but does not require Sequest/PeptideProphet analysis for spectral quality assessment.
Figure 4
Figure 4
ScanRanker scores predict the richness of identifiable spectra. Each point in the figure represents a single LC-MS/MS run and the dotted lines show the least squares fit of the data. Three ScanRanker thresholds were used to count retained spectra. 9 of 10 LC-MS/MS runs in the MudPIT data set are plotted because the first fraction of the MudPIT experiment generated only 21 spectrum identifications. Each LC-MS/MS run in all three data sets includes about 10000 MS/MS spectra, while the number of identified spectra varies dramatically. The number of spectra assigned high ScanRanker scores correlate to the number of identified spectra, providing relative quality assessment of LC-MS/MS runs in an experiment.
Figure 5
Figure 5
Adding ScanRanker scores in peptide validation increases the number of confident spectrum identifications. “DLD1 LTQ” data set was separately searched by Mascot, Sequest and X!Tandem. ScanRanker scores were added to pepXML files to allow score combination in IDPicker. Mascot scores were combined using either static weights as “IonScore-IdentityScore” or optimized weights as “IonScore + ScanRanker”. Sequest and X!Tandem results were combined by enabling score weights optimization in IDPicker. The Venn diagrams show the percent overlap of identified spectra when using either a single score or combination of two scores. The latter method yielded more spectrum identifications for all searches.
Figure 6
Figure 6
ScanRanker scores can be used to predict de novo sequencing success. Spectra in three data sets were separately processed by ScanRanker and PepNovo. Identifications were generated by searching the spectra using MyriMatch. For clarity, only 1000 spectra were randomly sampled and displayed. When PepNovo reported no peptide for a spectrum, it was visualized as matching the minimum score reported by the software for that data set. Panel C highlights five published key spectra from the Asara group publication. In all three tests, spectra with high ScanRanker scores tend to be assigned high PepNovo scores, implying that ScanRanker can be used to select high quality spectra for de novo sequencing.
Figure 7
Figure 7
ScanRanker helps to prioritize spectra for manual inspection in cross-linking analysis. The “Crosslink Orbi” data set was processed using Protein Prospector to identify crosslinked and non-crosslinked spectra. The figure plots the distribution of these spectra in deciles by ScanRanker scores. The identified spectra, either crosslinked or non-crosslinked, were associated with high ScanRanker scores, implying that ScanRanker can be used to facilitate cross-linking analysis by ranking spectra for manual inspection.

References

    1. Ning K, Fermin D, Nesvizhskii AI. Computational analysis of unassigned high-quality MS/MS spectra in proteomic data sets. Proteomics. 2010;10(14):2712–2718. - PMC - PubMed
    1. Leitner A, Walzthoeni T, Kahraman A, Herzog F, Rinner O, Beck M, Aebersold R. Probing native protein structures by chemical cross-linking, mass spectrometry, and bioinformatics. Mol Cell Proteomics. 2010;9(8):1634–1649. - PMC - PubMed
    1. Eng JKMAL, Yates JR. An approach to correlate tandem mass-spectral data of peptides with amino acid sequences in a protein database. J. Am. Soc. Mass Spectrom. 1994;5:976–989. - PubMed
    1. Perkins DN, Pappin DJ, Creasy DM, Cottrell JS. Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis. 1999;20(18):3551–3567. - PubMed
    1. Mann M, Wilm M. Error-tolerant identification of peptides in sequence databases by peptide sequence tags. Anal Chem. 1994;66(24):4390–4399. - PubMed

Publication types