Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Aug 23:11:436.
doi: 10.1186/1471-2105-11-436.

A dynamic noise level algorithm for spectral screening of peptide MS/MS spectra

Affiliations

A dynamic noise level algorithm for spectral screening of peptide MS/MS spectra

Hua Xu et al. BMC Bioinformatics. .

Abstract

Background: High-throughput shotgun proteomics data contain a significant number of spectra from non-peptide ions or spectra of too poor quality to obtain highly confident peptide identifications. These spectra cannot be identified with any positive peptide matches in some database search programs or are identified with false positives in others. Removing these spectra can improve the database search results and lower computational expense.

Results: A new algorithm has been developed to filter tandem mass spectra of poor quality from shotgun proteomic experiments. The algorithm determines the noise level dynamically and independently for each spectrum in a tandem mass spectrometric data set. Spectra are filtered based on a minimum number of required signal peaks with a signal-to-noise ratio of 2. The algorithm was tested with 23 sample data sets containing 62,117 total spectra.

Conclusions: The spectral screening removed 89.0% of the tandem mass spectra that did not yield a peptide match when searched with the MassMatrix database search software. Only 6.0% of tandem mass spectra that yielded peptide matches considered to be true positive matches were lost after spectral screening. The algorithm was found to be very effective at removal of unidentified spectra in other database search programs including Mascot, OMSSA, and X!Tandem (75.93%-91.00%) with a small loss (3.59%-9.40%) of true positive matches.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Simulated noise spectrum. (a) A simulated spectrum with 100 Gaussian noise peaks, and (b) the estimated signal-to-noise ratio (SNR) for all the noise peaks in it.
Figure 2
Figure 2
Example noise and peptide tandem mass spectra before and after noise reduction. An example tandem mass spectrum from a blank run (a) with 39 peaks before DNL noise reduction and (b) with 3 signal peaks after DNL noise reduction, and an example tandem mass spectrum for a peptide (c) with 279 peaks in it before DNL noise reduction and (d) with 46 signal peaks in it after DNL noise reduction.
Figure 3
Figure 3
ROC curves of the DNL spectral screening algorithm. ROC curves of the DNL spectral screening algorithm with different SNR settings for the spectra with all charges, singly charged spectra, and doubly/triply charged spectra.
Figure 4
Figure 4
Distributions of noise level determined by DNL algorithm. Distributions of noise level determined by DNL algorithm for the spectra with all charges, singly charged spectra, and doubly/triply charged spectra.
Figure 5
Figure 5
Distributions of number of signal peaks determined by DNL spectral screening algorithm. Distributions of number of signal peaks determined by DNL spectral screening algorithm for the spectra with TPs, FPs and the unidentified spectra. Peptide identifications were obtained from MassMatrix database search engine.
Figure 6
Figure 6
Effect of DNL spectral screening on MassMatrix search results. Histograms showing the effect of DNL spectral screening on tandem mass spectra for the spectra with TPs, FPs and the unidentified spectra. Peptide identifications were obtained from MassMatrix database search engine.
Figure 7
Figure 7
Effect of DNL spectral screening on Mascot, OMSSA, and X!Tandem results. Histograms showing the effect of DNL spectral screening on tandem mass spectra for the spectra with TPs, FPs and the unidentified spectra based on peptide identifications from Mascot, OMSSA, and X!Tandem.
Figure 8
Figure 8
Database search times of the data set before and after spectral screening. Database search times of the data set against a protein database containing both the bovine histone database (117 proteins) and a decoy reversed human database (96,997 proteins) before and after spectral screening for MassMatrix, Mascot, OMSSA, X!Tandem. All searches were performed on a PC with Intel quad core CPU (2.4 GHz) and Linux operating system.

Similar articles

Cited by

References

    1. Aebersold R, Mann M. Mass spectrometry-based proteomics. Nature. 2003;422:198–207. doi: 10.1038/nature01511. - DOI - PubMed
    1. Sadygov RG, Cociorva DC, Yates JR. Large-scale database searching using tandem mass spectra: Looking up the answer in the back of the book. Nature Methods. 2004;1(3):195–202. doi: 10.1038/nmeth725. - DOI - PubMed
    1. Moore RE, Young MK, Lee TD. Method for screening peptide fragment ion mass spectra prior to database searching. J Am Soc Mass Spectrom. 2000;11:422–426. doi: 10.1016/S1044-0305(00)00097-0. - DOI - PubMed
    1. Bern M, Goldberg D, McDonald WH, Yates JR. Automatic quality assessment of peptide tandem mass spectra. Bioinformatics. 2004;S20(1):i49–i54. doi: 10.1093/bioinformatics/bth947. - DOI - PubMed
    1. Wong JWH, Sullivan MJ, Cartwright HM, Cagney G. msmsEval: tandem mass spectral quality assignment for high-throughput proteomics. BMC Bioinformatics. 2007;8:51. doi: 10.1186/1471-2105-8-51. - DOI - PMC - PubMed

Publication types

LinkOut - more resources