Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2020 Oct 16;11(1):5301.
doi: 10.1038/s41467-020-19045-9.

A high-stringency blueprint of the human proteome

Affiliations
Review

A high-stringency blueprint of the human proteome

Subash Adhikari et al. Nat Commun. .

Abstract

The Human Proteome Organization (HUPO) launched the Human Proteome Project (HPP) in 2010, creating an international framework for global collaboration, data sharing, quality assurance and enhancing accurate annotation of the genome-encoded proteome. During the subsequent decade, the HPP established collaborations, developed guidelines and metrics, and undertook reanalysis of previously deposited community data, continuously increasing the coverage of the human proteome. On the occasion of the HPP's tenth anniversary, we here report a 90.4% complete high-stringency human proteome blueprint. This knowledge is essential for discerning molecular processes in health and disease, as we demonstrate by highlighting potential roles the human proteome plays in our understanding, diagnosis and treatment of cancers, cardiovascular and infectious diseases.

PubMed Disclaimer

Conflict of interest statement

S.R.P. is founder and chief scientific officer of Atturos, a clinical diagnostics company. M.K. is an employee of Bruker Daltonics, a manufacturer of MS systems. All other authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Structure of HUPO’s Human Proteome Project.
a The HPP matrix formed by creating two major initiatives (C-HPP and B/D-HPP). The initiatives and their teams are underpinned by 4 Resource Pillars (AB, MS, KB and pathology). b The HPP KB pipeline demonstrates how MS, AB and other biological data are collected, processed, re-analysed and presented annually for FAIR (see below) use by the scientific community. MS datasets are deposited, tagged with a PXD identifier, and stored by PX repositories (PRIDE, PeptideAtlas, MassIVE, Panorama, iProX, JPOST). Data selection, extraction and re-analysis by PeptideAtlas and MassIVE results in processed data that is transmitted to neXtProt. Subsequently, neXtProt annotates and curates other biological data (like Sanger sequencing, protein : protein interaction and other structural/crystallographic data) that is aggregated, integrated and then disseminated to the community. The HUPO HPP KB uses reverse date versions (e.g., the latest 2020 neXtProt HPP reference release 17-01-2020).
Fig. 2
Fig. 2. Completing >90% of the high-stringency human proteome.
a Annual neXtProt HPP evidence of protein existence (PE1,2,3,4,5) metrics from 2010 to 2020. This data demonstrates a strong and progressive increase in PE1 identifications across the decade (13,588 in 2011 to 17,874 in 2020), correlative equivalent decrease in PE2 (5,696 to 1596), a post-2015 rise in PE3 coincident with revised guideline implementation (239 to 253) and decrease in PE4 identifications (90 down to 50). PMS Pantone colours employed match in the figure match for all past annual neXtProt HPP KB reference PE1,2,3,4,5 data release colours, namely PE1: light green, PE2: teal, PE3: yellow, PE4: orange, and PE5: red). b Decadal Sankey diagram of changes in PE1,2,3,4,5 status of neXtProt entries between 2011 and 2020, where arrow widths are proportional to the number of decadal PE entries that change category. This Sankey diagram displays fluidity in PE status of neXtProt entries. PMS Pantone colours match those used for all past annual neXtProt HPP KB reference PE1,2,3,4,5 data releases https://www.nextprot.org/about/protein-existence (i.e., PE1: light green, PE2: teal, PE3: yellow, PE4: orange and PE5: cerise). All neXtProt protein entries that were deleted or newly introduced during the decade are represented in black, noting that 432 neXtProt entries were deleted and 676 introduced. Sankey analysis demonstrates that 2011 PE2 entries were the most significant (but not exclusively) the source for the majority of additional 2020 PE1s. Year-by-year transition data can be found in metrics publications associated with annual (2013–2019) HPP special issues and refs therein, guided by high-stringency HPP Guidelines.
Fig. 3
Fig. 3. HPP decadal impact.
a The top 15 neXtProt protein descriptor groups/families with the highest number of 2020 PE2,3,4 missing protein members (i.e., lacking high-stringency PE1 evidence of protein existence data; magenta, left) and the top 15 protein descriptor groups that have been upgraded to PE1 since 2011 (green, right). The data illustrates that the OR family has the highest number of 2020 missing PE2,3,4 proteins (magenta bars) and the Zn finger protein family has the highest number of discovered PE2,3,4 entries upgraded to PE1 since 2011 b Human chromosomal distribution of the OR and Zn finger families neXtProt protein descriptor groups/families. This example data clearly illustrates that the positioning of multiple ORs (magenta vertical bars) or Zn finger protein-coding genes (green vertical bars) on certain chromsomes explains why Chr 11 appears more resistant and Chr 19 more susceptible to PE1 discovery over the decade.
Fig. 4
Fig. 4. Progress in reducing the fraction of missing proteins for all human chromosomes.
a The percentage of missing proteins (PE2,3,4) relative to all protein-coding genes (PE1,2,3,4) plotted annually according to human Chrs 1–22, X and Y location from the first neXtProt release (23-08-2011) to the latest HPP reference release (17-01-2020). b The relative percentage (magenta bars) and absolute number (green dots) of all neXtProt PE2,3,4 missing protein entries specifically upgraded to PE1 since 2011 across Chrs 1–22, X and Y.
Fig. 5
Fig. 5. Assembly of the Human Proteome Reference Library (HPRL).
Data show cumulative PubMed search references emanating since HPP launch in 2010 up until 2019. a PubMed search for the terms HPP, C-HPP and B/D-HPP (including unabbreviated version). b PubMed community-at-large bibliometric impacts that parallel the research disciplines (e.g., ‘human’ AND ‘cancer proteomics’) addressed and undertaken by key B/D-HPP teams. All B/D-HPP PubMed bibliometric searches are listed as full searches and as hyperlinked current PubMed searches on the HUPO website at https://hupo.org/HPP-HPRL/. All NCBI PubMed filters and tools are fully accessible to users and searches can be selected and modified in a user-friendly manner, allowing decadal (from 2010 to 2019) and other bibliometric analyses to be undertaken routinely. c VOSviewer HPP collaborations analysis. All co-author geographical affiliations for PubMed publications emanating from Fig. 5a were transposed onto a world map.

References

    1. Humphery-Smith I. A human proteome project with a beginning and an end. Proteomics. 2004;4:2519–2521. doi: 10.1002/pmic.200400866. - DOI - PubMed
    1. Baker MS. Building the ‘practical’ human proteome project—the next big thing in basic and clinical proteomics. Curr. Opin. Mol. Ther. 2009;11:600–602. - PubMed
    1. Rabilloud T, Hochstrasser D, Simpson RJ. Is a gene-centric human proteome project the best way for proteomics to serve biology? Proteomics. 2010;10:3067–3072. doi: 10.1002/pmic.201000220. - DOI - PubMed
    1. Legrain P, et al. The human proteome project: current state and future direction. Mol. Cell. Proteomics. 2011;10:M111.009993. doi: 10.1074/mcp.M111.009993. - DOI - PMC - PubMed
    1. Pennisi E. Human genome. Finally, the book of life and instructions for navigating it. Science. 2000;288:2304–2307. doi: 10.1126/science.288.5475.2304. - DOI - PubMed

Publication types