Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Nov;14(21-22):2389-99.
doi: 10.1002/pmic.201400080. Epub 2014 Sep 23.

A standardized framing for reporting protein identifications in mzIdentML 1.2

Affiliations

A standardized framing for reporting protein identifications in mzIdentML 1.2

Sean L Seymour et al. Proteomics. 2014 Nov.

Abstract

Inferring which protein species have been detected in bottom-up proteomics experiments has been a challenging problem for which solutions have been maturing over the past decade. While many inference approaches now function well in isolation, comparing and reconciling the results generated across different tools remains difficult. It presently stands as one of the greatest barriers in collaborative efforts such as the Human Proteome Project and public repositories such as the PRoteomics IDEntifications (PRIDE) database. Here we present a framework for reporting protein identifications that seeks to improve capabilities for comparing results generated by different inference tools. This framework standardizes the terminology for describing protein identification results, associated with the HUPO-Proteomics Standards Initiative (PSI) mzIdentML standard, while still allowing for differing methodologies to reach that final state. It is proposed that developers of software for reporting identification results will adopt this terminology in their outputs. While the new terminology does not require any changes to the core mzIdentML model, it represents a significant change in practice, and, as such, the rules will be released via a new version of the mzIdentML specification (version 1.2) so that consumers of files are able to determine whether the new guidelines have been adopted by export software.

Keywords: Bioinformatics; Data standards; Protein identification; Proteomics Standards Initiative; Software; XML.

PubMed Disclaimer

Conflict of interest statement

Conflict of interest statement

The authors have declared no conflict of interest.

Figures

Figure 1
Figure 1
Terminology defined by the iPRG2008 working group, for a protein accession (A), protein group (B) and protein cluster (C), with a multiple sequence alignment displaying the peptides shared between the different proteins.
Figure 2
Figure 2
A and B. Graphical representation of how the concepts defined in Figure 1 map onto an mzIdentML file, following the recommendations presented in this manuscript. Each has references back to all peptide spectrum matches (PSMs) on which the protein identifications are based (not shown – consult [19] for more details).
Figure 3
Figure 3
A snippet of mzIdentML showing a (lines 5038 to 5095) containing four elements (two minimised on lines 5068 and 5080). In this example, the first PDH (lines 5039-5055) has been flagged as both a “leading protein” and “group representative” (lines 5051 and 5052). The second PDH (lines 5056-5067) has been assigned as a “non-leading protein” (line 5066) and a “sequence sub-set protein” (line 5065). CV terms assigned to the PAG-level are on lines 5092-5094, including the mandatory term “protein group passes threshold” (line 5092).

References

    1. Smith LM, Kelleher NL. roteoform: a single term describing protein complexity. Nat Meth. 2013;10:186–187. - PMC - PubMed
    1. Koskinen VR, Emery PA, Creasy DM, Cottrell JS. Hierarchical Clustering of Shotgun Proteomics Data. Molecular & Cellular Proteomics. 2011;10 - PMC - PubMed
    1. Nesvizhskii AI, Keller A, Kolker E, Aebersold R. A Statistical Model for Identifying Proteins by Tandem Mass Spectrometry. Anal Chem. 2003;75:4646–4658. - PubMed
    1. Searle BC. Scaffold: A bioinformatic tool for validating MS/MS-based proteomic studies. PROTEOMICS. 2010;10:1265–1269. - PubMed
    1. Slotta DJ, McFarland MA, Markey SP. MassSieve: Panning MS/MS peptide data for proteins. PROTEOMICS. 2010;10:3035–3039. - PMC - PubMed

Publication types

LinkOut - more resources