Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 Jul;11(7):M111.014381.
doi: 10.1074/mcp.M111.014381. Epub 2012 Feb 27.

The mzIdentML data standard for mass spectrometry-based proteomics results

Affiliations

The mzIdentML data standard for mass spectrometry-based proteomics results

Andrew R Jones et al. Mol Cell Proteomics. 2012 Jul.

Abstract

We report the release of mzIdentML, an exchange standard for peptide and protein identification data, designed by the Proteomics Standards Initiative. The format was developed by the Proteomics Standards Initiative in collaboration with instrument and software vendors, and the developers of the major open-source projects in proteomics. Software implementations have been developed to enable conversion from most popular proprietary and open-source formats, and mzIdentML will soon be supported by the major public repositories. These developments enable proteomics scientists to start working with the standard for exchanging and publishing data sets in support of publications and they provide a stable platform for bioinformatics groups and commercial software vendors to work with a single file format for identification data.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
The overall structure of a typical mzIdentML file. Each file must contain one or more instances of SpectrumIdentificationList (the set of peptide identifications made by a search) and must contain zero or one ProteinDetectionList (the set of proteins identities inferred from peptide identifications).
Fig. 2.
Fig. 2.
Peptide identification from MS/MS represented in mzIdentML: (i) DBSequence stores database entries, such as complete protein sequences and accessions for their retrieval from external databases; (ii) Peptide holds individual peptide sequences and modifications that have been identified; (iii) PeptideEvidence instances provide the mappings between a peptide sequence and all the protein sequences from which it could have arisen; (iv) The association between SpectrumIdentificationItem and PeptideEvidence is the core result of a single PSM; and (v) SpectrumIdentificationResult captures all ranked identifications (SpectrumIdentificationItem) made from one spectrum and is mapped back to the source spectrum in an external format, such as mzML. Note, the representation of some attributes and elements has been shortened to simplify the figure, for example scores and metrics are represented in mzIdentML using CV terms to incorporate flexibility and extensibility into the schema.
Fig. 3.
Fig. 3.
Protein identifications represented in mzIdentML. If the same set of peptide sequences provides supporting evidence for more than one protein, the proteins appear within a ProteinAmbiguityGroup. (i) Each ProteinDetectionHypothesis contains references back to the instances of PeptideEvidence on which it is based, onward references to Peptide not shown. (ii) The ProteinDetectionHypothesis element has associations to all SpectrumIdentificationItem elements that have been used for protein inference. (iii) Each ProteinDetectionHypothesis references the protein sequence (DBSequence) that has been identified.

References

    1. Zhang W., Chait B. T. (2000) ProFound: An Expert System for Protein Identification Using Mass Spectrometric Peptide Mapping Information. Anal. Chem. 72, 2482–2489 - PubMed
    1. Perkins D. N., Pappin D. J., Creasy D. M., Cottrell J. S. (1999) Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 20, 3551–3567 - PubMed
    1. MacCoss M. J., Wu C. C., Yates J. R., 3rd (2002) Probability-Based Validation of Protein Identifications Using a Modified SEQUEST Algorithm. Anal. Chem. 74, 5593–5599 - PubMed
    1. Geer L. Y., Markey S. P., Kowalak J. A., Wagner L., Xu M., Maynard D. M., Yang X., Shi W., Bryant S. H. (2004) Open Mass Spectrometry Search Algorithm. J. Proteome Res. 3, 958–964 - PubMed
    1. Fenyö D., Beavis R. C. (2003) A Method for Assessing the Statistical Significance of Mass Spectrometry-Based Protein Identifications Using General Scoring Schemes. Anal. Chem. 75, 768–774 - PubMed

Publication types

LinkOut - more resources