Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2022 Mar;32(3):243-258.
doi: 10.1016/j.tcb.2021.10.010. Epub 2021 Nov 26.

The dark proteome: translation from noncanonical open reading frames

Affiliations
Review

The dark proteome: translation from noncanonical open reading frames

Bradley W Wright et al. Trends Cell Biol. 2022 Mar.

Abstract

Omics-based technologies have revolutionized our understanding of the coding potential of the genome. In particular, these studies revealed widespread unannotated open reading frames (ORFs) throughout genomes and that these regions have the potential to encode novel functional (micro-)proteins and/or hold regulatory roles. However, despite their genomic prevalence, relatively few of these noncanonical ORFs have been functionally characterized, likely in part due to their under-recognition by the broader scientific community. The few that have been investigated in detail have demonstrated their essentiality in critical and divergent biological processes. As such, here we aim to discuss recent advances in understanding the diversity of noncanonical ORFs and their roles, as well as detail biologically important examples within the context of the mammalian genome.

Keywords: CRISPR; microproteins; noncanonical ORFs; ribosome profiling; short ORFs; translation.

PubMed Disclaimer

Conflict of interest statement

Declaration on Interests∣

J.S.W. declares outside interest in 5 AM Venture, Amgen, Chroma Medicine, KSQ Therapeutics, Maze Therapeutics, Tenaya Therapeutics, Tessera Therapeutics, Third Rock Ventures, and Velia Therapeutics. J.C. consults for Velia Therapeutics.

Figures

Figure 1:
Figure 1:. Sources and topologies of non-canonical open reading frames (ORFs).
Diverse forms of non-canonical ORFs are shown. Orange bars depict the canonical coding sequence (CDS). All other coloured bars depict non-canonical ORFs. The figure reflects non-canonical ORFs at the transcript-level and genomic-level. (a) Long non-coding RNAs (lncRNAs) are increasingly recognized as a source of microproteins encoded by small ORFs (sORFs). (b) A peculiar form of RNA known as circular RNA (circRNA), has been identified to possibly possess functional ORFs that can be translated via cap-independent intitation processes. (c) sORFs can be encoded on coding transcripts. There are three representations of non-canonical ORFs depicted at this transcript-level example. An ORF may also be contained entirely within the untranslatated region (UTR) of a transcript either in the 5’-UTR (upstream ORF, uORF) or 3’-UTR (downstream ORF, dORF), or alternatively, entirely nested within an existing CDS (internal ORF or “nested” ORF). (d) Variants of annotated proteins. There are two representations of non-canonical ORFs depicted at this transcript-level example. Alternate, in-frame ORFs have been found that extend the annotated CDS either in the 5’ or 3’ direction, encoding extended variants. Alternate, in-frame ORFs may also result from downstream initiation thus resulting in a truncated variant of the annotated CDS. (e) At the genomic-level, alternate splicing of a pre-mRNA transcript may produce distinct mRNA species, including species that are “intron nested” in relation to the predominant mRNA isoform. (f) Production of alternate transcripts harboring different ORFs can be produced as a result of different promoters.
Figure 2:
Figure 2:. Functions of sORF-derived microproteins.
(a) Due to their small size, small open reading frame (sORF) derived microproteins may have widespread function as signalling molecules. Exemplified is the hormone microprotein Elabela, which is largely secreted by stem cells aiding in functions such as early cell differentiation [34]. (b) Microproteins may have roles in allosteric regulation through binding regions of enzymes to mediate their activation or repression. Microprotein ASRPS represses the transcription factor STAT3 by binding and masking a phosphorylation site, thereby inhibiting STAT3 activity [78]. (c) Microproteins have been shown to participate in multiprotein assemblies. The ASDURF microprotein is one of 12 subunits of the PAQosome chaperone complex. It binds to the PFDL module, completing a hexameric complex and enabling its promotional assembly into the PAQosome complex [79]. (d) The STORM microprotein shares sequence identity with the binding region of the SRP19 protein, resulting in competitive binding for the ribonucleic acid substrate [80]. This process, which may be called molecular mimicry, is expected to constitute a major category of microprotein function given their small size, which is less ammenable for independent enzymatic activity. (e) Microproteins may function as epitope markers [9]. During the integrated stress response, an upstream ORF (uORF) encoded peptide of the BiP gene is expressed and recognised by the class I major histocompatibility complex, and is transported to the cell surface enabling recognition by specific T cells [35]. (f) Microproteins have been shown to be integral componants of membranes [32, 81-84]. For instance, the microproteins DWORF and phospholamban (PLN) localise to the sarcoplasmic reticulum membrane whereby they interact with the calcium uptake protein SERCA, thereby modulating its calcium transport capacity in myocytes [33, 85].
Figure 3:
Figure 3:. Primary regulatory roles of upstream and downstream open reading frames (uORFs and dORFs).
Proposed uORF and dORF regulatory activity imparted on the canonical coding sequence (CDS) (orange bars) are shown. (a) uORF mediated regulation. (i) Cap-dependent translation initiation proceeds through the 43S pre-initiation complex (PIC) binding to the terminal 5’ end of an mRNA that harbours a 7-methyl guanosine residue and pre-bound initiation factors. The bound PIC proceeds to scan across the transcript in search of optimal sequence context to initiate translation. (ii) uORFs upstream of a canonical ORF are predicted to down-regulate the expression of the canonical CDS by blocking the PIC complex. Some scanning PICs are able to proceed past the uORF due to weaker sequence context facilitating what is known as leaky scanning, thereby allowing the PIC to initiate translation at the canonical start site. (iii) Some uORFs also encode functional proteins. (b) dORF mediated regulation. (i) Translation of the canonical CDS is expected to proceed through cap-dependent initiation, but it is currently unknown as to the mechanism of dORF translation initiation. Either the 80S translation machinary that actively translates and dissociates from the canonical CDS re-initiates at the dORF (through a currently unknown mechanism of ribosome recycling), or cap-independent translation initiation facilitates the recruitment of the PIC to an internal ribosome entry site at the dORF. (ii) In either case, the presence of translationally active dORF(s) facilitates the up-regulation of the canonical CDS translation. The mechanism for this is currently unresolved, but possibly the translational activity of dORFs facilitate the recruitment of translation factors to the canonical CDS mediating its up-regulation.
Figure 4:
Figure 4:. Alternative open reading frames (ORFs) as a source of novel coding sequences (CDS).
(a) Variant ORFs which present as either truncation or extensions of the defined CDS can encode proteins with related functions to the canonical CDS. MRPL18 transcript has a downstream near-cognate CUG start codon, which under stress conditions becomes the primary translated ORF. The truncated variant loses the mitochondrial targeting signal, and instead localizes in the cytosol where it engages in translational regulation [75]. (b) ORFs which share the same loci as a defined CDS but present out-of-frame are sources of proteins with functions likely different to the defined CDS they overlap. (i) The DNA polymerase subunit encoding transcript POLG has an out-of-frame ORF that has been demonstrated to have lower translational activity when ribosomes engage with the upstream uORF. Unusually, POLG translation is positively regulated by the upstream ORF (uORF), likely as a result of limiting interaction with the CUG alternative translational start site. (ii) When uORF ribosome engagement is perturbed, the out-of-frame overlapping ORF (CUG) is translated producing a protein with a potential role in extracellular signaling [67, 86].

Similar articles

Cited by

References

    1. International Human Genome Sequencing Consortium (2004) Finishing the euchromatic sequence of the human genome. Nature 431, 931–945 - PubMed
    1. Adhikari S, et al. (2020) A high-stringency blueprint of the human proteome. Nat Commun 11, 5301. - PMC - PubMed
    1. Chen J, et al. (2020) Pervasive functional translation of noncanonical human open reading frames. Science 367, 1140–1146 - PMC - PubMed
    1. van Heesch S, et al. (2019) The Translational Landscape of the Human Heart. Cell 178, 242–260 e229 - PubMed
    1. Lu S, et al. (2019) A hidden human proteome encoded by ‘non-coding’ genes. Nucleic Acids Res 47, 8111–8125 - PMC - PubMed

Publication types