This is a preprint.
What can Ribo-seq and proteomics tell us about the non-canonical proteome?
- PMID: 37292611
- PMCID: PMC10245706
- DOI: 10.1101/2023.05.16.541049
What can Ribo-seq and proteomics tell us about the non-canonical proteome?
Update in
-
What Can Ribo-Seq, Immunopeptidomics, and Proteomics Tell Us About the Noncanonical Proteome?Mol Cell Proteomics. 2023 Sep;22(9):100631. doi: 10.1016/j.mcpro.2023.100631. Epub 2023 Aug 11. Mol Cell Proteomics. 2023. PMID: 37572790 Free PMC article.
Abstract
Ribosome profiling (Ribo-seq) has proven transformative for our understanding of the human genome and proteome by illuminating thousands of non-canonical sites of ribosome translation outside of the currently annotated coding sequences (CDSs). A conservative estimate suggests that at least 7,000 non-canonical open reading frames (ORFs) are translated, which, at first glance, has the potential to expand the number of human protein-coding sequences by 30%, from ∼19,500 annotated CDSs to over 26,000. Yet, additional scrutiny of these ORFs has raised numerous questions about what fraction of them truly produce a protein product and what fraction of those can be understood as proteins according to conventional understanding of the term. Adding further complication is the fact that published estimates of non-canonical ORFs vary widely by around 30-fold, from several thousand to several hundred thousand. The summation of this research has left the genomics and proteomics communities both excited by the prospect of new coding regions in the human genome, but searching for guidance on how to proceed. Here, we discuss the current state of non-canonical ORF research, databases, and interpretation, focusing on how to assess whether a given ORF can be said to be "protein-coding".
In brief: The human genome encodes thousands of non-canonical open reading frames (ORFs) in addition to protein-coding genes. As a nascent field, many questions remain regarding non-canonical ORFs. How many exist? Do they encode proteins? What level of evidence is needed for their verification? Central to these debates has been the advent of ribosome profiling (Ribo-seq) as a method to discern genome-wide ribosome occupancy, and immunopeptidomics as a method to detect peptides that are processed and presented by MHC molecules and not observed in traditional proteomics experiments. This article provides a synthesis of the current state of non-canonical ORF research and proposes standards for their future investigation and reporting.
Highlights: Combined use of Ribo-seq and proteomics-based methods enables optimal confidence in detecting non-canonical ORFs and their protein products.Ribo-seq can provide more sensitive detection of non-canonical ORFs, but data quality and analytical pipelines will impact results.Non-canonical ORF catalogs are diverse and span both high-stringency and low-stringency ORF nominations.A framework for standardized non-canonical ORF evidence will advance the research field.
Conflict of interest statement
Declaration of interests
The authors declare no competing interests.
Figures
References
-
- Aebersold R., Agar J. N., Amster I. J., Baker M. S., Bertozzi C. R., Boja E. S., Costello C. E., Cravatt B. F., Fenselau C., Garcia B. A., Ge Y., Gunawardena J., Hendrickson R. C., Hergenrother P. J., Huber C. G., Ivanov A. R., Jensen O. N., Jewett M. C., Kelleher N. L., Kiessling L. L., Krogan N. J., Larsen M. R., Loo J. A., Ogorzalek Loo R. R., Lundberg E., MacCoss M. J., Mallick P., Mootha V. K., Mrksich M., Muir T. W., Patrie S. M., Pesavento J. J., Pitteri S. J., Rodriguez H., Saghatelian A., Sandoval W., Schlüter H., Sechi S., Slavoff S. A., Smith L. M., Snyder M. P., Thomas P. M., Uhlén M., Van Eyk J. E., Vidal M., Walt D. R., White F. M., Williams E. R., Wohlschlager T., Wysocki V. H., Yates N. A., Young N. L., and Zhang B. (2018) How many human proteoforms are there? Nat. Chem. Biol. 14, 206–214 - PMC - PubMed
-
- Blencowe B. J. (2017) The Relationship between Alternative Splicing and Proteomic Complexity. Trends in Biochemical Sciences. 42, 407–408 - PubMed
-
- Sinitcyn P., Richards A. L., Weatheritt R. J., Brademan D. R., Marx H., Shishkova E., Meyer J. G., Hebert A. S., Westphall M. S., Blencowe B. J., Cox J., and Coon J. J. (2023) Global detection of human variants and isoforms by deep proteome sequencing. Nat. Biotechnol. 10.1038/s41587-023-01714-x - DOI - PMC - PubMed
-
- Frankish A., Carbonell-Sala S., Diekhans M., Jungreis I., Loveland J. E., Mudge J. M., Sisu C., Wright J. C., Arnan C., Barnes I., Banerjee A., Bennett R., Berry A., Bignell A., Boix C., Calvet F., Cerdán-Vélez D., Cunningham F., Davidson C., Donaldson S., Dursun C., Fatima R., Giorgetti S., Giron C. G., Gonzalez J. M., Hardy M., Harrison P. W., Hourlier T., Hollis Z., Hunt T., James B., Jiang Y., Johnson R., Kay M., Lagarde J., Martin F. J., Gómez L. M., Nair S., Ni P., Pozo F., Ramalingam V., Ruffier M., Schmitt B. M., Schreiber J. M., Steed E., Suner M.-M., Sumathipala D., Sycheva I., Uszczynska-Ratajczak B., Wass E., Yang Y. T., Yates A., Zafrulla Z., Choudhary J. S., Gerstein M., Guigo R., Hubbard T. J. P., Kellis M., Kundaje A., Paten B., Tress M. L., and Flicek P. (2023) GENCODE: reference annotation for the human and mouse genomes in 2023. Nucleic Acids Res. 51, D942–D949 - PMC - PubMed
Publication types
Grants and funding
LinkOut - more resources
Full Text Sources
Research Materials