Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2005:1:2005.0017.
doi: 10.1038/msb4100024. Epub 2005 Aug 2.

A uniform proteomics MS/MS analysis platform utilizing open XML file formats

Affiliations

A uniform proteomics MS/MS analysis platform utilizing open XML file formats

Andrew Keller et al. Mol Syst Biol. 2005.

Abstract

The analysis of tandem mass (MS/MS) data to identify and quantify proteins is hampered by the heterogeneity of file formats at the raw spectral data, peptide identification, and protein identification levels. Different mass spectrometers output their raw spectral data in a variety of proprietary formats, and alternative methods that assign peptides to MS/MS spectra and infer protein identifications from those peptide assignments each write their results in different formats. Here we describe an MS/MS analysis platform, the Trans-Proteomic Pipeline, which makes use of open XML file formats for storage of data at the raw spectral data, peptide, and protein levels. This platform enables uniform analysis and exchange of MS/MS data generated from a variety of different instruments, and assigned peptides using a variety of different database search programs. We demonstrate this by applying the pipeline to data sets generated by ThermoFinnigan LCQ, ABI 4700 MALDI-TOF/TOF, and Waters Q-TOF instruments, and searched in turn using SEQUEST, Mascot, and COMET.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Trans-Protoemic Pipeline using open XML file formats at three steps: (1) raw spectral data generated by different mass spectrometers; (2) peptide assignments using different search engines; and (3) protein identifications using different methods of inference. Asterisk indicates that PeptideProphet must be specialized for each search engine.
Figure 2
Figure 2
Trans-Protoemic Pipeline analysis of LC-MS/MS data sets. (A) Accuracy of PeptideProphet-computed peptide probabilities for HaloICAT LCQ data set in sliding window of 50 search results. (B) Numbers of search results for HaloICAT LCQ data set filtered at a minimum PeptideProphet probability to achieve a predicted 2.5% error rate. The inset shows the numbers using Mascot results with probabilities adjusted by SearchCombiner to take into account the results of SEQUEST and COMET applied to the same data set. (C) Numbers of ProteinProphet identifications for HaloICAT LCQ data set filtered at a minimum ProteinProphet probability to achieve a predicted 2.5% error rate. Each asterisk indicates an incorrect protein identification. (D) Numbers of ProteinProphet identifications for Serum MALDI-TOF/TOF data set filtered at a minimum ProteinProphet probability to achieve a predicted 2.5% error rate. (E) Numbers of ProteinProphet identifications for Yeast Q-TOF data set filtered at a minimum ProteinProphet probability to achieve a predicted 2.5% error rate.

References

    1. Baliga N, Pan M, Goo YA, Yi EC, Goodlett DR, Dimitrov K, Shannon P, Aebersold R, Ng WV, Hood L (2002) Coordinate regulation of energy transduction modules in Halobacterium sp. analyzed by a global systems approach. Proc Natl Acad Sci USA 99: 14913–14918 - PMC - PubMed
    1. Chen SC, Deutsch EW, Yi EC, Li X-J, Goodlett DR, Aebersold R Improving mass and liquid chromatography based identification of proteins using Bayesian scoring. (manuscript in preparation) - PubMed
    1. Craig R, Beavis RC (2004) TANDEM: matching proteins with tandem mass spectra. Bioinformatics 12: 1466–1467 - PubMed
    1. Eng J, Martin DB, Aebersold R (2005) Tandem mass spectrometry database searching. In Encyclopedia of Genetics, Genomics, Proteomics and Bioinformatics, Dunn M, Jorde L, Little P, Subramaniam S (eds). John Wiley & Sons, Ltd, ISB 0470849746
    1. Field HI, Fenyo D, Beavis RC (2002) RADARS, a bioinformatics solution that automates proteome mass spectral analysis, optimises protein identification, and archives data in a relational database. Proteomics 2: 36–47 - PubMed

Publication types

MeSH terms