Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Dec 6;18(12):4108-4116.
doi: 10.1021/acs.jproteome.9b00542. Epub 2019 Oct 21.

Human Proteome Project Mass Spectrometry Data Interpretation Guidelines 3.0

Affiliations

Human Proteome Project Mass Spectrometry Data Interpretation Guidelines 3.0

Eric W Deutsch et al. J Proteome Res. .

Abstract

The Human Proteome Organization's (HUPO) Human Proteome Project (HPP) developed Mass Spectrometry (MS) Data Interpretation Guidelines that have been applied since 2016. These guidelines have helped ensure that the emerging draft of the complete human proteome is highly accurate and with low numbers of false-positive protein identifications. Here, we describe an update to these guidelines based on consensus-reaching discussions with the wider HPP community over the past year. The revised 3.0 guidelines address several major and minor identified gaps. We have added guidelines for emerging data independent acquisition (DIA) MS workflows and for use of the new Universal Spectrum Identifier (USI) system being developed by the HUPO Proteomics Standards Initiative (PSI). In addition, we discuss updates to the standard HPP pipeline for collecting MS evidence for all proteins in the HPP, including refinements to minimum evidence. We present a new plan for incorporating MassIVE-KB into the HPP pipeline for the next (HPP 2020) cycle in order to obtain more comprehensive coverage of public MS data sets. The main checklist has been reorganized under headings and subitems, and related guidelines have been grouped. In sum, Version 2.1 of the HPP MS Data Interpretation Guidelines has served well, and this timely update to version 3.0 will aid the HPP as it approaches its goal of collecting and curating MS evidence of translation and expression for all predicted ∼20 000 human proteins encoded by the human genome.

Keywords: B/D-HPP; C-HPP; HPP; Human Proteome Project; Universal Spectrum Identifier (USI); false-discovery rates; guidelines; mass spectrometry; standards; unicity checker.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing financial interest.

Figures

Figure 1.
Figure 1.
Overview of the 2019 HPP pipeline for data integration. HPP investigators publish their results constrained by the HPP guidelines. The data sets from these publications as well as other data sets from the community flow into the ProteomeXchange repositories. Currently a subset of the data sets from PRIDE, MassIVE, and JPOST are reprocessed by PeptideAtlas, the results of which are transferred to neXtProt constrained by the HPP guidelines. Information from PeptideAtlas, neXtProt, and Human Protein Atlas is summarized yearly in the HPP Metrics summary (this issue). Data from the Human Protein Atlas is also transferred to and reprocessed by neXtProt as part of the HPP data cycle, although they are not yet used to change PE status.
Figure 2.
Figure 2.
Depiction of the current status of Q8N688 Beta-defensin 123. The protein is only 47 AAs long after cleavage of the 20 AA signal peptide. Three distinct peptide sequences are detected (plus a fully nested peptide), but only one of the three meets guideline length requirements. Yet, all of the expected tryptic peptides (plus one missed cleavage product) are detected with excellent spectra. Should this be sufficient?
Figure 3.
Figure 3.
Depiction of the current status of C9JFL3, currently annotated as “Proline, histidine and glycine-rich protein 1”. The protein is 83 amino acids long, but produces no useful fully tryptic peptides—only one that is too short and one that is too long. Yet, due to its high abundance in some samples, many miscleaved peptides are detected, easily providing the minimum evidence. The red bars indicate well detected peptides in PeptideAtlas. Multiple semitryptic peptides originate from the only cleavage site after the sixth amino acid. Multiple nontryptic peptides originate from the C terminus.

Similar articles

Cited by

References

    1. Hanash S; Celis JE The Human Proteome Organization: A Mission to Advance Proteome Knowledge. Mol. Cell. Proteomics MCP 2002, 1 (6), 413–414. 10.1074/mcp.r200002-mcp200. - DOI - PubMed
    1. Legrain P; Aebersold R; Archakov A; Bairoch A; Bala K; Beretta L; Bergeron J; Borchers C; Corthals GL; Costello CE; et al. The Human Proteome Project: Current State and Future Direction. Mol. Cell. Proteomics MCP 2011. 10.1074/mcp.O111.009993. - DOI - PubMed
    1. Omenn GS; Lane L; Overall CM; Corrales FJ; Schwenk JM; Paik Y-K; Van Eyk JE; Liu S; Snyder M; Baker MS; et al. Progress on Identifying and Characterizing the Human Proteome: 2018 Metrics from the HUPO Human Proteome Project. J. Proteome Res 2018, 17 (12), 4031–4041. 10.1021/acs.jproteome.8b00441. - DOI - PMC - PubMed
    1. Lander ES; Linton LM; Birren B; Nusbaum C; Zody MC; Baldwin J; Devon K; Dewar K; Doyle M; FitzHugh W; et al. Initial Sequencing and Analysis of the Human Genome. Nature 2001, 409 (6822), 860–921. 10.1038/35057062. - DOI - PubMed
    1. Venter JC; Adams MD; Myers EW; Li PW; Mural RJ; Sutton GG; Smith HO; Yandell M; Evans CA; Holt RA; et al. The Sequence of the Human Genome. Science 2001, 291 (5507), 1304–1351. 10.1126/science.1058040. - DOI - PubMed

Publication types