Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 May 23;8(2):12.
doi: 10.3390/proteomes8020012.

Is It Possible to Find Needles in a Haystack? Meta-Analysis of 1000+ MS/MS Files Provided by the Russian Proteomic Consortium for Mining Missing Proteins

Affiliations

Is It Possible to Find Needles in a Haystack? Meta-Analysis of 1000+ MS/MS Files Provided by the Russian Proteomic Consortium for Mining Missing Proteins

Ekaterina Poverennaya et al. Proteomes. .

Abstract

Despite direct or indirect efforts of the proteomic community, the fraction of blind spots on the protein map is still significant. Almost 11% of human genes encode missing proteins; the existence of which proteins is still in doubt. Apparently, proteomics has reached a stage when more attention and curiosity need to be exerted in the identification of every novel protein in order to expand the unusual types of biomaterials and/or conditions. It seems that we have exhausted the current conventional approaches to the discovery of missing proteins and may need to investigate alternatives. Here, we present an approach to deciphering missing proteins based on the use of non-standard methodological solutions and encompassing diverse MS/MS data, obtained for rare types of biological samples by members of the Russian Proteomic community in the last five years. These data were re-analyzed in a uniform manner by three search engines, which are part of the SearchGUI package. The study resulted in the identification of two missing and five uncertain proteins detected with two peptides. Moreover, 149 proteins were detected with a single proteotypic peptide. Finally, we analyzed the gene expression levels to suggest feasible targets for further validation of missing and uncertain protein observations, which will fully meet the requirements of the international consortium. The MS data are available on the ProteomeXchange platform (PXD014300).

Keywords: Chromosome-Centric Human Proteome Project (C-HPP); human proteome; mass spectrometry; missing proteins; neXtProt; proteotypic peptide; uncertain proteins.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Figures

Figure 1
Figure 1
Dynamics of the changes in the number of entries according to neXtProt (2011–2019 years): (a) the blue color indicates the total number of entries (number of protein-coding genes); (b) the red color indicates the number of missing-protein entries (PE2+PE3+PE4), and (c) the green, purple and blue colors indicate the number of uncertain (PE5), new and deleted entries, respectively.
Figure 2
Figure 2
Venn diagrams: intersection of proteins cleaved by different proteases with (a) no proteotypic peptides at all, and (b) one unique peptide. (c) Histograms of the frequencies of the detection of proteotypic peptides, according to GPMdb. The "no peptides" group corresponds to proteins without even theoretically unique peptides, "0" means that there is no experimental evidence of theoretical proteotypic peptides, and other numbers (1, 5, 10, etc.) mean that this number of proteotypic peptides was detected in a number of cases, illustrated by the height of the corresponding column.
Figure 3
Figure 3
Mass-spectra of the proteotypic peptide characteristic for the Q8NG97 protein, detected in four biosamples for the first time.
Figure 4
Figure 4
Mass-spectra of the detected proteotypic peptides for (a) Q96HZ4-2 and (b) Q96HZ4-3.

References

    1. Paik Y.-K., Jeong S.-K., Omenn G.S., Uhlen M., Hanash S., Cho S.Y., Lee H.-J., Na K., Choi E.-Y., Yan F., et al. The Chromosome-Centric Human Proteome Project for cataloging proteins encoded in the genome. Nat. Biotechnol. 2012;30:221–223. doi: 10.1038/nbt.2152. - DOI - PubMed
    1. Horvatovich P., Lundberg E.K., Chen Y.-J., Sung T.-Y., He F., Nice E.C., Goode R.J., Yu S., Ranganathan S., Baker M.S., et al. Quest for Missing Proteins: Update 2015 on Chromosome-Centric Human Proteome Project. J. Proteome Res. 2015;14:3415–3431. doi: 10.1021/pr5013009. - DOI - PubMed
    1. Ponomarenko E.A., Poverennaya E.V., Ilgisonis E.V., Pyatnitskiy M.A., Kopylov A.T., Zgoda V.G., Lisitsa A.V., Archakov A.I. The Size of the Human Proteome: The Width and Depth. Int. J. Anal. Chem. 2016;2016:1–6. doi: 10.1155/2016/7436849. - DOI - PMC - PubMed
    1. Poverennaya E.V., Ilgisonis E.V., Ponomarenko E.A., Kopylov A.T., Zgoda V.G., Radko S.P., Lisitsa A.V., Archakov A.I. Why Are the Correlations between mRNA and Protein Levels so Low among the 275 Predicted Protein-Coding Genes on Human Chromosome 18? J. Proteome Res. 2017;16:4311–4318. doi: 10.1021/acs.jproteome.7b00348. - DOI - PubMed
    1. Ilgisonis E.V., Kopylov A.T., Ponomarenko E.A., Poverennaya E.V., Tikhonova O.V., Farafonova T.E., Novikova S., Lisitsa A.V., Zgoda V.G., Archakov A.I. Increased Sensitivity of Mass Spectrometry by Alkaline Two-Dimensional Liquid Chromatography: Deep Cover of the Human Proteome in Gene-Centric Mode. J. Proteome Res. 2018;17:4258–4266. doi: 10.1021/acs.jproteome.8b00754. - DOI - PubMed