Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Mar 6;367(6482):1140-1146.
doi: 10.1126/science.aay0262.

Pervasive functional translation of noncanonical human open reading frames

Affiliations

Pervasive functional translation of noncanonical human open reading frames

Jin Chen et al. Science. .

Abstract

Ribosome profiling has revealed pervasive but largely uncharacterized translation outside of canonical coding sequences (CDSs). In this work, we exploit a systematic CRISPR-based screening strategy to identify hundreds of noncanonical CDSs that are essential for cellular growth and whose disruption elicits specific, robust transcriptomic and phenotypic changes in human cells. Functional characterization of the encoded microproteins reveals distinct cellular localizations, specific protein binding partners, and hundreds of microproteins that are presented by the human leukocyte antigen system. We find multiple microproteins encoded in upstream open reading frames, which form stable complexes with the main, canonical protein encoded on the same messenger RNA, thereby revealing the use of functional bicistronic operons in mammals. Together, our results point to a family of functional human microproteins that play critical and diverse cellular roles.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.. Ribosome profiling reveals translation of unannotated CDSs.
(A) ORF-RATER analysis of ribosome profiling data: 62% are previously annotated coding sequences, while 16% are variants of canonical coding sequences that share portions of the coding sequence, and 22% are distinct from annotated coding sequences. The naming convention of the identified ORFs is shown on the right. (B) Start-codon usage of the identified CDSs. (C) Cumulative distribution of CDS length. For distinct CDSs, 96% are smaller than 100 amino acids. (D) Example ribosome profiling traces of a lncRNA peptide from LINC00998 and a uORF peptide from ARL5A displaying the hallmarks of translation, including peaks of density around the start codon following harringtonine treatment and three nucleotide periodicities along the coding region. (E) Metagene analysis shows that the signatures of translation, including three-nucleotide periodicity in the expected reading frame, for uORFs and lncRNA CDSs are similar to annotated coding regions. (F) Identification of more than 200 non-canonical CDS peptides from HLA-I peptidomics, cross-validating their existence across the whole abundance range, with a mean Andromeda score of 141 compared to a total mean Andromeda score of 144. See Methods.
Fig. 2.
Fig. 2.. Genome-scale CRISPR screens to identify functional, non-canonical CDSs.
(A) Schematic of CRISPR library design and screening strategies, either by growth screens or Perturb-Seq. For growth screens, frequencies of cells expressing a given sgRNA are determined by next-generation sequencing, and phenotype scores are quantified with the formula shown. For Perturb-Seq, single-cell transcriptomes and sgRNA identities were obtained by single-cell RNA-Seq. (B) Volcano plot summarizing knockout phenotypes and statistical significance (Mann-Whitney U test) for ORFs targeted in the pooled screen in iPSCs. Each dot represents a targeted ORF, and ORF hits are labeled in purple, with a more negative phenotype score indicating a stronger growth defect. See Methods. (C) Plot of the sgRNA phenotypes and distance from the start codon, across all ORF hits. sgRNAs targeting the genome immediately upstream of the ORF (shown in red) have significantly lower phenotype than sgRNAs targeting within the ORF (shown in blue). Note the axis is increasingly negative (stronger) phenotype. The difference is not due to differences in sgRNA on-target efficiencies, as quantified by the Doench v2 score. (D) The PhyloCSF Score per codon (higher is more conserved across the Euarchontoglires) is generally higher for ORF hits (P = 10−20, Kolmogorov–Smirnov test) and ORFs with a stronger phenotype. Note that lack of a growth phenotype does not necessarily imply a low PhyloCSF score.
Fig. 3.
Fig. 3.. Short lncRNA CDSs encode functional microproteins.
(A) Rescue of lncRNA CDS knockout growth phenotypes by the ectopic expression of the transcript encoding the peptide, as well as controls where the initiating start codon is removed (Δstart codon). Error bars represent standard deviation of triplicates. P < 0.05 for all comparisons between knockout (KO) and KO + rescue. (B-D) Microscopy images and volcano plots of the co-IP MS of three example lncRNA-encoded microproteins tagged with mNG11, expressed ectopically (in the native transcript context) in a HEK293T cell line expressing mNG1–10. Green is mNG, red is the indicated organelle localization, and blue is Hoechst 33342, which stains for the nucleus. Scale bar dimensions are labeled. Significant interactors are shown in the top, right corner of the volcano plots. Thick threshold line is 1% FDR (false discovery rate), and the thin threshold line is 5% FDR. The bait (the tagged peptide) is labeled in blue. The interactors are colored according to their functional groups. (E) lncRNA-encoded microproteins are uncharacterized proteins that may play important regulatory roles in cells.
Fig. 4.
Fig. 4.. Bicistronic mRNAs can encode uORF peptides that function in trans.
(A) Rescue of uORF knockout growth phenotypes by the ectopic expression of a transcript encoding the uORF peptide alone, as well as a controls where the initiating start codon is removed (Δstart codon). Error bars represent standard deviation of triplicates. P < 0.05 for all comparisons between KO and KO + rescue. (B) Summary of co-IP MS interactions, showing five uORF peptides that interact with their downstream-encoded protein (shown in red). Other significant interactors are shown in blue. (C-E) Examples of uORF peptides tagged with mNG11, expressed alone ectopically (in the native transcript context) in a HEK293T cell line expressing mNG1–10. Volcano plot of co-IP MS reveals significant interactors with uORF peptides. Threshold line is 1% FDR. The bait (the tagged peptide) is labeled in blue. For microscopy in C and D, the main, canonical protein tagged with mCherry (red) is co-expressed. For E, the mNG11-tagged MIEF1 uORF peptide (green) localizes to the mitochondria (red). (F) Volcano plot of co-IP MS from endogenously mNG11-tagged HAUS6 uORF. For microscopy, the mNG11-tagged uORF is expressed alone ectopically (green), and the canonical HAUS6 tagged with mCherry (red) is co-expressed. (G) Percent change for each cell cycle state for HAUS6 knockout (KO) and HAUS6 uORF KO, compared to control cells. (H) Transcriptome response of the MIEF1 uORF KO compared with the main CDS KO from Perturb-Seq. (I) Quantification of mitochondria morphology upon MIEF1 uORF peptide overexpression and knockout, as well as rescue of knockout phenotype. Representative microscopy images of the different mitochondria morphologies are shown to the right. (J) Possible model of uORF peptide functions and regulatory roles in cells.

Comment in

References

    1. Basrai MA, Hieter P, Boeke JD, Small open reading frames: beautiful needles in the haystack. Genome research 7, 768–771 (1997). - PubMed
    1. Odermatt A, Taschner PE, Scherer SW, Beatty B, Khanna VK, Cornblath DR, Chaudhry V, Yee WC, Schrank B, Karpati G, Breuning MH, Knoers N, MacLennan DH, Characterization of the gene encoding human sarcolipin (SLN), a proteolipid associated with SERCA1: absence of structural mutations in five patients with Brody disease. Genomics 45, 541–553 (1997). - PubMed
    1. MacLennan DH, Kranias EG, Phospholamban: a crucial regulator of cardiac contractility. Nat Rev Mol Cell Biol 4, 566–577 (2003). - PubMed
    1. Hann SR, King MW, Bentley DL, Anderson CW, Eisenman RN, A non-AUG translational initiation in c-myc exon 1 generates an N-terminally distinct protein whose synthesis is disrupted in Burkitt’s lymphomas. Cell 52, 185–195 (1988). - PubMed
    1. Jackson R, Kroehling L, Khitun A, Bailis W, Jarret A, York AG, Khan OM, Brewer JR, Skadow MH, Duizer C, Harman CCD, Chang L, Bielecki P, Solis AG, Steach HR, Slavoff S, Flavell RA, The translation of non-canonical open reading frames controls mucosal immunity. Nature 564, 434–438 (2018). - PMC - PubMed

Publication types