Review

. 2020 Oct 16;11(1):5301.

doi: 10.1038/s41467-020-19045-9.

A high-stringency blueprint of the human proteome

Subash Adhikari¹, Edouard C Nice^{1

2}, Eric W Deutsch³, Lydie Lane⁴, Gilbert S Omenn⁵, Stephen R Pennington⁶, Young-Ki Paik⁷, Christopher M Overall⁸, Fernando J Corrales⁹, Ileana M Cristea¹⁰, Jennifer E Van Eyk¹¹, Mathias Uhlén¹², Cecilia Lindskog¹³, Daniel W Chan¹⁴, Amos Bairoch⁴, James C Waddington⁶, Joshua L Justice¹⁰, Joshua LaBaer¹⁵, Henry Rodriguez¹⁶, Fuchu He¹⁷, Markus Kostrzewa¹⁸, Peipei Ping¹⁹, Rebekah L Gundry²⁰, Peter Stewart²¹, Sanjeeva Srivastava²², Sudhir Srivastava²³, Fabio C S Nogueira²⁴, Gilberto B Domont²⁴, Yves Vandenbrouck²⁵, Maggie P Y Lam^{26

27}, Sara Wennersten²⁸, Juan Antonio Vizcaino²⁹, Marc Wilkins³⁰, Jochen M Schwenk¹², Emma Lundberg¹², Nuno Bandeira³¹, Gyorgy Marko-Varga³², Susan T Weintraub³³, Charles Pineau³⁴, Ulrike Kusebauch³, Robert L Moritz³, Seong Beom Ahn¹, Magnus Palmblad³⁵, Michael P Snyder³⁶, Ruedi Aebersold^{3

37}, Mark S Baker^{38

39}

Affiliations

¹ Faculty of Medicine, Health and Human Sciences, Department of Biomedical Sciences, Macquarie University, North Ryde, NSW, 2109, Australia.
² Faculty of Medicine, Nursing and Health Sciences, Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC, 3800, Australia.
³ Institute for Systems Biology, 401 Terry Avenue North, Seattle, WA, 98109, USA.
⁴ Faculty of Medicine, SIB-Swiss Institute of Bioinformatics and Department of Microbiology and Molecular Medicine, University of Geneva, CMU, Michel-Servet 1, 1211, Geneva, Switzerland.
⁵ Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, 48109-2218, USA.
⁶ UCD Conway Institute of Biomolecular and Biomedical Research, School of Medicine, University College Dublin, Dublin, Ireland.
⁷ Yonsei Proteome Research Center, 50 Yonsei-ro, Sudaemoon-ku, Seoul, 120-749, South Korea.
⁸ Faculty of Dentistry, University of British Columbia, Vancouver, BC, Canada.
⁹ Functional Proteomics Laboratory, Centro Nacional de Biotecnología-CSIC, Proteored-ISCIII, 28049, Madrid, Spain.
¹⁰ Department of Molecular Biology, Princeton University, Princeton, NJ, 08544, USA.
¹¹ Cedars Sinai Medical Center, Advanced Clinical Biosystems Research Institute, The Smidt Heart Institute, Los Angeles, CA, 90048, USA.
¹² Science for Life Laboratory, School of Engineering Sciences in Chemistry, Biotechnology and Health, KTH Royal Institute of Technology, 17121, Solna, Sweden.
¹³ Rudbeck Laboratory, Department of Immunology, Genetics and Pathology, Uppsala University, 75185, Uppsala, Sweden.
¹⁴ Department of Pathology and Oncology, Johns Hopkins University School of Medicine, Baltimore, MD, 21224, USA.
¹⁵ Biodesign Institute, Arizona State University, Tempe, AZ, USA.
¹⁶ Office of Cancer Clinical Proteomics Research, National Cancer Institute, NIH, Bethesda, MD, 20892, USA.
¹⁷ State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing, 102206, China.
¹⁸ Bruker Daltonik GmbH, Microbiology and Diagnostics, Fahrenheitstrasse, 428359, Bremen, Germany.
¹⁹ Cardiac Proteomics and Signaling Laboratory, Department of Physiology, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA, USA.
²⁰ CardiOmics Program, Center for Heart and Vascular Research, Division of Cardiovascular Medicine and Department of Cellular and Integrative Physiology, University of Nebraska Medical Center, Omaha, NE, 68198, USA.
²¹ Department of Chemical Pathology, Royal Prince Alfred Hospital, Camperdown, NSW, Australia.
²² Indian Institute of Technology Bombay, Powai, Mumbai, 400076, India.
²³ Cancer Biomarkers Research Branch, National Cancer Institute, National Institutes of Health, 9609 Medical Center Drive, Suite 5E136, Rockville, MD, 20852, USA.
²⁴ Proteomics Unit and Laboratory of Proteomics, Institute of Chemistry, Federal University of Rio de Janeiro, Av Athos da Silveria Ramos, 149, 21941-909, Rio de Janeiro, RJ, Brazil.
²⁵ University of Grenoble Alpes, Inserm, CEA, IRIG-BGE, U1038, 38000, Grenoble, France.
²⁶ Departments of Medicine-Cardiology and Biochemistry and Molecular Genetics, University of Colorado, Anschutz Medical Campus, Aurora, CO, USA.
²⁷ Consortium for Fibrosis Research and Translation, University of Colorado, Anschutz Medical Campus, Aurora, CO, USA.
²⁸ Division of Cardiology, Department of Medicine, University of Colorado, Anschutz Medical Campus, Aurora, CO, USA.
²⁹ European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK.
³⁰ School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, NSW, Australia.
³¹ Department of Computer Science and Engineering, University of California, San Diego, 9500 Gilman Drive, Mail Code 0404, La Jolla, CA, 92093-0404, USA.
³² Department of Biomedical Engineering, Lund University, Lund, Sweden.
³³ Department of Biochemistry and Structural Biology, University of Texas Health Science Center San Antonio, UT Health, 7703 Floyd Curl Drive, San Antonio, TX, 78229-3900, USA.
³⁴ University of Rennes, Inserm, EHESP, IREST, UMR_S 1085, F-35042, Rennes, France.
³⁵ Leiden University Medical Center, Leiden, 2333, The Netherlands.
³⁶ Department of Genetics, Stanford School of Medicine, Stanford, CA, 94305, USA.
³⁷ Faculty of Science, University of Zurich, Zurich, Switzerland.
³⁸ Faculty of Medicine, Health and Human Sciences, Department of Biomedical Sciences, Macquarie University, North Ryde, NSW, 2109, Australia. mark.baker@mq.edu.au.
³⁹ Department of Genetics, Stanford School of Medicine, Stanford, CA, 94305, USA. mark.baker@mq.edu.au.

PMID: 33067450
PMCID: PMC7568584
DOI: 10.1038/s41467-020-19045-9

Review

A high-stringency blueprint of the human proteome

Subash Adhikari et al. Nat Commun. 2020.

. 2020 Oct 16;11(1):5301.

doi: 10.1038/s41467-020-19045-9.

Authors

Affiliations

¹ Faculty of Medicine, Health and Human Sciences, Department of Biomedical Sciences, Macquarie University, North Ryde, NSW, 2109, Australia.
² Faculty of Medicine, Nursing and Health Sciences, Department of Biochemistry and Molecular Biology, Monash University, Melbourne, VIC, 3800, Australia.
³ Institute for Systems Biology, 401 Terry Avenue North, Seattle, WA, 98109, USA.
⁴ Faculty of Medicine, SIB-Swiss Institute of Bioinformatics and Department of Microbiology and Molecular Medicine, University of Geneva, CMU, Michel-Servet 1, 1211, Geneva, Switzerland.
⁵ Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, 48109-2218, USA.
⁶ UCD Conway Institute of Biomolecular and Biomedical Research, School of Medicine, University College Dublin, Dublin, Ireland.
⁷ Yonsei Proteome Research Center, 50 Yonsei-ro, Sudaemoon-ku, Seoul, 120-749, South Korea.
⁸ Faculty of Dentistry, University of British Columbia, Vancouver, BC, Canada.
⁹ Functional Proteomics Laboratory, Centro Nacional de Biotecnología-CSIC, Proteored-ISCIII, 28049, Madrid, Spain.
¹⁰ Department of Molecular Biology, Princeton University, Princeton, NJ, 08544, USA.
¹¹ Cedars Sinai Medical Center, Advanced Clinical Biosystems Research Institute, The Smidt Heart Institute, Los Angeles, CA, 90048, USA.
¹² Science for Life Laboratory, School of Engineering Sciences in Chemistry, Biotechnology and Health, KTH Royal Institute of Technology, 17121, Solna, Sweden.
¹³ Rudbeck Laboratory, Department of Immunology, Genetics and Pathology, Uppsala University, 75185, Uppsala, Sweden.
¹⁴ Department of Pathology and Oncology, Johns Hopkins University School of Medicine, Baltimore, MD, 21224, USA.
¹⁵ Biodesign Institute, Arizona State University, Tempe, AZ, USA.
¹⁶ Office of Cancer Clinical Proteomics Research, National Cancer Institute, NIH, Bethesda, MD, 20892, USA.
¹⁷ State Key Laboratory of Proteomics, Beijing Proteome Research Center, National Center for Protein Sciences (Beijing), Beijing Institute of Lifeomics, Beijing, 102206, China.
¹⁸ Bruker Daltonik GmbH, Microbiology and Diagnostics, Fahrenheitstrasse, 428359, Bremen, Germany.
¹⁹ Cardiac Proteomics and Signaling Laboratory, Department of Physiology, David Geffen School of Medicine, University of California Los Angeles, Los Angeles, CA, USA.
²⁰ CardiOmics Program, Center for Heart and Vascular Research, Division of Cardiovascular Medicine and Department of Cellular and Integrative Physiology, University of Nebraska Medical Center, Omaha, NE, 68198, USA.
²¹ Department of Chemical Pathology, Royal Prince Alfred Hospital, Camperdown, NSW, Australia.
²² Indian Institute of Technology Bombay, Powai, Mumbai, 400076, India.
²³ Cancer Biomarkers Research Branch, National Cancer Institute, National Institutes of Health, 9609 Medical Center Drive, Suite 5E136, Rockville, MD, 20852, USA.
²⁴ Proteomics Unit and Laboratory of Proteomics, Institute of Chemistry, Federal University of Rio de Janeiro, Av Athos da Silveria Ramos, 149, 21941-909, Rio de Janeiro, RJ, Brazil.
²⁵ University of Grenoble Alpes, Inserm, CEA, IRIG-BGE, U1038, 38000, Grenoble, France.
²⁶ Departments of Medicine-Cardiology and Biochemistry and Molecular Genetics, University of Colorado, Anschutz Medical Campus, Aurora, CO, USA.
²⁷ Consortium for Fibrosis Research and Translation, University of Colorado, Anschutz Medical Campus, Aurora, CO, USA.
²⁸ Division of Cardiology, Department of Medicine, University of Colorado, Anschutz Medical Campus, Aurora, CO, USA.
²⁹ European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SD, UK.
³⁰ School of Biotechnology and Biomolecular Sciences, University of New South Wales, Sydney, NSW, Australia.
³¹ Department of Computer Science and Engineering, University of California, San Diego, 9500 Gilman Drive, Mail Code 0404, La Jolla, CA, 92093-0404, USA.
³² Department of Biomedical Engineering, Lund University, Lund, Sweden.
³³ Department of Biochemistry and Structural Biology, University of Texas Health Science Center San Antonio, UT Health, 7703 Floyd Curl Drive, San Antonio, TX, 78229-3900, USA.
³⁴ University of Rennes, Inserm, EHESP, IREST, UMR_S 1085, F-35042, Rennes, France.
³⁵ Leiden University Medical Center, Leiden, 2333, The Netherlands.
³⁶ Department of Genetics, Stanford School of Medicine, Stanford, CA, 94305, USA.
³⁷ Faculty of Science, University of Zurich, Zurich, Switzerland.
³⁸ Faculty of Medicine, Health and Human Sciences, Department of Biomedical Sciences, Macquarie University, North Ryde, NSW, 2109, Australia. mark.baker@mq.edu.au.
³⁹ Department of Genetics, Stanford School of Medicine, Stanford, CA, 94305, USA. mark.baker@mq.edu.au.

PMID: 33067450
PMCID: PMC7568584
DOI: 10.1038/s41467-020-19045-9

Abstract

The Human Proteome Organization (HUPO) launched the Human Proteome Project (HPP) in 2010, creating an international framework for global collaboration, data sharing, quality assurance and enhancing accurate annotation of the genome-encoded proteome. During the subsequent decade, the HPP established collaborations, developed guidelines and metrics, and undertook reanalysis of previously deposited community data, continuously increasing the coverage of the human proteome. On the occasion of the HPP's tenth anniversary, we here report a 90.4% complete high-stringency human proteome blueprint. This knowledge is essential for discerning molecular processes in health and disease, as we demonstrate by highlighting potential roles the human proteome plays in our understanding, diagnosis and treatment of cancers, cardiovascular and infectious diseases.

PubMed Disclaimer

Conflict of interest statement

S.R.P. is founder and chief scientific officer of Atturos, a clinical diagnostics company. M.K. is an employee of Bruker Daltonics, a manufacturer of MS systems. All other authors declare no competing interests.

Figures

**Fig. 1. Structure of HUPO’s Human Proteome Project.**
a The HPP matrix formed by creating two major initiatives (C-HPP and B/D-HPP). The initiatives and their teams are underpinned by 4 Resource Pillars (AB, MS, KB and pathology). b The HPP KB pipeline demonstrates how MS, AB and other biological data are collected, processed, re-analysed and presented annually for FAIR (see below) use by the scientific community. MS datasets are deposited, tagged with a PXD identifier, and stored by PX repositories (PRIDE, PeptideAtlas, MassIVE, Panorama, iProX, JPOST). Data selection, extraction and re-analysis by PeptideAtlas and MassIVE results in processed data that is transmitted to neXtProt. Subsequently, neXtProt annotates and curates other biological data (like Sanger sequencing, protein : protein interaction and other structural/crystallographic data) that is aggregated, integrated and then disseminated to the community. The HUPO HPP KB uses reverse date versions (e.g., the latest 2020 neXtProt HPP reference release 17-01-2020).

**Fig. 2. Completing >90% of the high-stringency human proteome.**
a Annual neXtProt HPP evidence of protein existence (PE1,2,3,4,5) metrics from 2010 to 2020. This data demonstrates a strong and progressive increase in PE1 identifications across the decade (13,588 in 2011 to 17,874 in 2020), correlative equivalent decrease in PE2 (5,696 to 1596), a post-2015 rise in PE3 coincident with revised guideline implementation (239 to 253) and decrease in PE4 identifications (90 down to 50). PMS Pantone colours employed match in the figure match for all past annual neXtProt HPP KB reference PE1,2,3,4,5 data release colours, namely PE1: light green, PE2: teal, PE3: yellow, PE4: orange, and PE5: red). b Decadal Sankey diagram of changes in PE1,2,3,4,5 status of neXtProt entries between 2011 and 2020, where arrow widths are proportional to the number of decadal PE entries that change category. This Sankey diagram displays fluidity in PE status of neXtProt entries. PMS Pantone colours match those used for all past annual neXtProt HPP KB reference PE1,2,3,4,5 data releases https://www.nextprot.org/about/protein-existence (i.e., PE1: light green, PE2: teal, PE3: yellow, PE4: orange and PE5: cerise). All neXtProt protein entries that were deleted or newly introduced during the decade are represented in black, noting that 432 neXtProt entries were deleted and 676 introduced. Sankey analysis demonstrates that 2011 PE2 entries were the most significant (but not exclusively) the source for the majority of additional 2020 PE1s. Year-by-year transition data can be found in metrics publications associated with annual (2013–2019) HPP special issues and refs therein, guided by high-stringency HPP Guidelines.

**Fig. 3. HPP decadal impact.**
a The top 15 neXtProt protein descriptor groups/families with the highest number of 2020 PE2,3,4 missing protein members (i.e., lacking high-stringency PE1 evidence of protein existence data; magenta, left) and the top 15 protein descriptor groups that have been upgraded to PE1 since 2011 (green, right). The data illustrates that the OR family has the highest number of 2020 missing PE2,3,4 proteins (magenta bars) and the Zn finger protein family has the highest number of discovered PE2,3,4 entries upgraded to PE1 since 2011 b Human chromosomal distribution of the OR and Zn finger families neXtProt protein descriptor groups/families. This example data clearly illustrates that the positioning of multiple ORs (magenta vertical bars) or Zn finger protein-coding genes (green vertical bars) on certain chromsomes explains why Chr 11 appears more resistant and Chr 19 more susceptible to PE1 discovery over the decade.

**Fig. 4. Progress in reducing the fraction of missing proteins for all human chromosomes.**
a The percentage of missing proteins (PE2,3,4) relative to all protein-coding genes (PE1,2,3,4) plotted annually according to human Chrs 1–22, X and Y location from the first neXtProt release (23-08-2011) to the latest HPP reference release (17-01-2020). b The relative percentage (magenta bars) and absolute number (green dots) of all neXtProt PE2,3,4 missing protein entries specifically upgraded to PE1 since 2011 across Chrs 1–22, X and Y.

**Fig. 5. Assembly of the Human Proteome Reference Library (HPRL).**
Data show cumulative PubMed search references emanating since HPP launch in 2010 up until 2019. a PubMed search for the terms HPP, C-HPP and B/D-HPP (including unabbreviated version). b PubMed community-at-large bibliometric impacts that parallel the research disciplines (e.g., ‘human’ AND ‘cancer proteomics’) addressed and undertaken by key B/D-HPP teams. All B/D-HPP PubMed bibliometric searches are listed as full searches and as hyperlinked current PubMed searches on the HUPO website at https://hupo.org/HPP-HPRL/. All NCBI PubMed filters and tools are fully accessible to users and searches can be selected and modified in a user-friendly manner, allowing decadal (from 2010 to 2019) and other bibliometric analyses to be undertaken routinely. c VOSviewer HPP collaborations analysis. All co-author geographical affiliations for PubMed publications emanating from Fig. 5a were transposed onto a world map.

See this image and copyright information in PMC

References

1. Humphery-Smith I. A human proteome project with a beginning and an end. Proteomics. 2004;4:2519–2521. doi: 10.1002/pmic.200400866. - DOI - PubMed
1. Baker MS. Building the ‘practical’ human proteome project—the next big thing in basic and clinical proteomics. Curr. Opin. Mol. Ther. 2009;11:600–602. - PubMed
1. Rabilloud T, Hochstrasser D, Simpson RJ. Is a gene-centric human proteome project the best way for proteomics to serve biology? Proteomics. 2010;10:3067–3072. doi: 10.1002/pmic.201000220. - DOI - PubMed
1. Legrain P, et al. The human proteome project: current state and future direction. Mol. Cell. Proteomics. 2011;10:M111.009993. doi: 10.1074/mcp.M111.009993. - DOI - PMC - PubMed
1. Pennisi E. Human genome. Finally, the book of life and instructions for navigating it. Science. 2000;288:2304–2307. doi: 10.1126/science.288.5475.2304. - DOI - PubMed

Publication types

Actions
Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

A high-stringency blueprint of the human proteome

Affiliations

A high-stringency blueprint of the human proteome

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources