Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Oct 31;13(1):6515.
doi: 10.1038/s41467-022-34094-y.

Translation and natural selection of micropeptides from long non-canonical RNAs

Affiliations

Translation and natural selection of micropeptides from long non-canonical RNAs

Pedro Patraquim et al. Nat Commun. .

Abstract

Long noncoding RNAs (lncRNAs) are transcripts longer than 200 nucleotides but lacking canonical coding sequences. Apparently unable to produce peptides, lncRNA function seems to rely only on RNA expression, sequence and structure. Here, we exhaustively detect in-vivo translation of small open reading frames (small ORFs) within lncRNAs using Ribosomal profiling during Drosophila melanogaster embryogenesis. We show that around 30% of lncRNAs contain small ORFs engaged by ribosomes, leading to regulated translation of 100 to 300 micropeptides. We identify lncRNA features that favour translation, such as cistronicity, Kozak sequences, and conservation. For the latter, we develop a bioinformatics pipeline to detect small ORF homologues, and reveal evidence of natural selection favouring the conservation of micropeptide sequence and function across evolution. Our results expand the repertoire of lncRNA biochemical functions, and suggest that lncRNAs give rise to novel coding genes throughout evolution. Since most lncRNAs contain small ORFs with as yet unknown translation potential, we propose to rename them "long non-canonical RNAs".

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Detecting lncORF translation.
a Correlation plot and Spearman’s correlations for lncORF (green) and canonical ORFs (purple) RPKMFP values across replicas B (y axis) and T (x axis) of early embryogenesis. b Average ribosomal binding (RPKMFP) across embryogenesis for different types of coding sequences. Median value of the distributions shown on top. c Top, framing deconvolution model for Drosophila melanogaster Poly-Ribo-Seq. 31-nt long RPFs contain a mix of reads mapping to adjacent frames 0 and 2, reflecting a 1-ribonucleotide loss in either 5′ or 3′-positions, respectively, when compared to 32-nt reads (or longer), which mostly map to frame 2 (blue). 30 nt reads (and shorter) map predominantly to frame 0, explained by a 1-nt. loss in both 5′ and 3′-positions in this population of reads. Bottom, canonical ORF framing across ribosomal footprint lengths 26–36 nt. frame 0 (red), frame 1 (green) and frame 2 (blue). red dotted curve: frame 1 overrepresentation across shorter RPF length; blue dotted curve: frame 2 overrepresentation across longer RPF lengths. d Redundancy in lncORF translation signal. Left plot: Number of reads per frame for all RPF lengths (26–36 nt) mapping to lncORFs in one biological Poly-Ribo-Seq sample (0–8 h, Replicate B). Heatmap: detected framing events (yellow) per RPF length (y axis) per lncORF (x axis), sorted from higher (7) to lower (1) number of detections; 348 framing events support the translation of 191 lncORFs in this sample. Total number of events per RPF length is indicated on the right. e Number of lncORFs according to their translation signal. f FLAG-tagged lncORFs with translation signal are translated in S2 cells (left) whereas lncORFs lacking translation signal are not (right). Diagrams represent each construct used; 5´UTRs appear white, and lncORFs as colour-coded according to their Riboseq translation status (top). g The immunity-related lncRNA IBIN is robustly translated. ORF1 within the IBIN transcript shows accumulation of RPF reads, but not ORF2. Expression of FLAG-tagged IBIN ORF1 in S2 cells (see f) confirms its translated status. Source data are provided as a Source data file.
Fig. 2
Fig. 2. Developmental regulation of lncORF translation.
a Number of transcribed (white), ribo-bound (red), and translated (green) lncORFs per embryonic developmental window. b Venn diagram showing lncORFs with translation signal per stage, from all lncRNAs. c Constitutively transcribed lncORFs are also translated in a stage-specific manner. d Quantitative fluctuations of lncORF translation across stages. TE (translational efficiency) Z-ratios across embryogenesis for constitutively transcribed lncORFs. Inset plot shows proportion of lncORFs with significant shifts in TE (17.6%) versus those with no significant TE changes. The majority (82.4%) of translated lncORFs with expression across embryogenesis show no significant quantitative modulation in translational efficiency across stages (−1.5 ≥ Z-ratio ≥ 1.5), for both analysed developmental window transitions (Early-to-Mid and Mid-to-Late). lncORFs with significant TE changes are highlighted in red. Arrows denote upregulation or downregulation from early-to-mid (light blue) or mid-to-late (dark blue) transitions. e lncORFs with constitutive transcription across different cellular contexts show high percentages of context-specific translation: Embryos (blue, top), 36% (34 lncORFs); S2 cells (orange, right) 44% (42); and Eggs (green, left) 40% (18).
Fig. 3
Fig. 3. Clustering of translation in lncRNAs.
a A subset of lncRNA transcripts shows accumulation of translation events in cis. Top, proportional Venn diagram representing lncRNAs according to translation signal detected within their lncORFs (parentheses= lncORF numbers). Bottom, pairwise overlap comparisons, showing corresponding representation factors, and their significance (****p < 0.0001, representation factor analysis, see ‘Methods’). b Number of lncORFs in the same lncRNA showing ribosomal binding (RPKMFP > 1, orange), compared with expectations as given by a Poisson model (blue). Values to the right of their intersection (dotted line) show the enrichment of cis-related binding. c Correlation between ribosome-bound-only and translated lncORFs in the same lncRNA. Pearson’s r = 0.5998. d lncRNA length does not explain clustering of ribosomal binding and translation events to particular lncRNAs. Violin plots of annotated lncRNA transcript lengths (nt) in function of the translation signal detected within their lncORFs. N = 866 transcribed lncRNAs (see panel 3a). “*” denote p-values <0.05. p = 0.0133 for “reproducible-variable” comparison; p = 0.0183 for “reproducible-ribo-only” comparison; p = 0.0198 for “reproducible-transcribed” comparison. Mann–Whitney tests, two-tailed. e CR30055 is an example of a lncRNA with multiple ORFs: ORF2 appears as robustly translated by Riboseq, and ORF2-FLAG shows expression in S2 cells, whereas ORF4, appears as ribo-bound-only, and shows no expression in S2 cells, despite sharing the same transcript as ORF2. f Polysomal RNA RPKM values of lncRNAs from low polysomes in S2 cells (2–4 ribosomes per lncRNA, top) and Eggs (2–6, bottom) are enhanced for embryo-translated lncRNAs, suggesting that translated lncRNAs have an intrinsic higher affinity for ribosomes.
Fig. 4
Fig. 4. Expression levels and cis-factors affect lncRNA translation.
a Translational efficiency across lncORF categories. Violin plots for Translational efficiency values (TE, average across embryogenesis) for all lncORFs in each translation class, and for canonical annotated ORFs and uORFs (data from Patraquim et al.). Thick dotted lines denote median, thin dotted lines denote lower and upper quartiles (note the more modest changes in RNA levels - Supplementary Fig. 3a). b Kozak sequences, scored against canonical ORF consensus sequence, for lncORFs with different translation status (0 = canonical average; colours as in a). Robustly translated lncORFs have Kozak sequences significantly closer to canonical ORFs. NTranscribed-only = 2701; NRibo-bound-only = 310; NVariableTransl. = 185; NRobust = 107. “*” denotes p < 0.05 (exact p = 0.0364. Mann–Whitney tests, two-tailed). c Frequency distribution of relative lncORF positions in their lncRNA (cistronicity) (1 = closest to 5′ end), per translation class (colours as in a). Dashed lines and numbers denote median values. d Similar cistronic position of translated lncORFs in monosomes (dark orange) and polysomes (light orange). e Translation of ORF 2A within the polycistronic tal transcript is lost upon removal of the Stop codon of the upstream ORF 1A (tal1A-ns), which extends ORF 1A beyond the stop codon of 2A (yellow), leaving no chance for 5′-to-3′ re-initiation to occur. ATG: START codon. TAA: Actual STOP codon. AAA: mutated STOP codon. Source data are provided as a Source data file.
Fig. 5
Fig. 5. lncORF sequence evolution across Drosophilids and emergence of novel coding genes.
a GENOR pipeline for the detection of smORF evolutionary conservation across the Drosophila sp. genus. Each smORF is used to query, via jackhammer, smORF databases obtained from available RNA expression data for each target species, ensuring that putative homologues come from transcribed genes. Matching reciprocal hits are aligned using MAFFT, with a smORF-calibrated threshold score deciding on the conservation status of the top hit per ORF (see ‘Methods’). b Number of sCDSs with homologues identified by GENOR, plotted against the number of Drosophila species in which those homologues were identified. c AA sequence alignment of CG1307, a sCDS lacking annotated homologues, and its GENOR-identified Dsim and Dvir homologues. d Average conservation score across robustly translated or ribo-bound-only lncORFs, and translated sCDSs, for 6 species across the Drosophila phylogeny. Pale green rectangle: phylogenetic distance with substantial conservation signal for translated lncORFs as detected by GENOR. Yellow: Million Years Ago (MYA). e Robustly-translated lncORFs show evidence of purifying selection. Distributions of dN/dS values measuring natural selection (fraction of non-conservative nucleotide changes vs. nucleotide changes conserving the AA sequence) acting on robustly translated (green) or ribo-bound-only (red) lncORFs, in pairwise ORF alignments between Dmel lncORFs and syntenic ORFs in either Dsim or Dsec. “**” denote p < 0.01 (exact p for Dmel-Dsim=0.0074; exact p for Dmel-Dsec = 0.0029). Mann–Whitney tests, two-tailed. f AA and nucleotide sequence alignments of lncORF FBtr300230_2, within the uhg4 transcript, and its Dsim homologue identified by GENOR, showing a great extent of AA conservation, and a pattern of nucleotide changes (dN/dS score: 0.33) consistent with a coding function for this lncORF. Substitutions are highlighted by yellow squares. Blue: Synonymous substitutions, Red: non-synonymous ones. g GENOR-detected homologues of translated Dmel lncORFs in Dvir or beyond, are loaded into polysomes in Dvir, suggesting that their translation might also be conserved (full gel with molecular weights available in the “Source data” file of this manuscript). Dmel: D. melanogaster. Dsim: D. simulans. Dsec: D. sechelia. Dere: D. erecta. Dpse: D. pseudoobscura. Dmoj: D. mojavensis. Dvir: D. virilis. Source data are provided as a Source data file.
Fig. 6
Fig. 6. Model for the activation of lncRNA translation, and the evolution of novel coding genes.
a Molecular features directly linked to lncRNA translation (red) and those linked to related processes (blue). b Evolutionary acquisition of lncORF translation by accretion of molecular features from a), leading to the emergence of novel coding genes. Data from this and other works,.

References

    1. Ladoukakis E, Pereira V, Magny EG, Eyre-Walker A, Couso JP. Hundreds of putatively functional small open reading frames in Drosophila. Genome Biol. 2011;12:R118. doi: 10.1186/gb-2011-12-11-r118. - DOI - PMC - PubMed
    1. Kastenmayer JP, et al. Functional genomics of genes with small open reading frames (sORFs) in S. cerevisiae. Genome Res. 2006;16:365–373. doi: 10.1101/gr.4355406. - DOI - PMC - PubMed
    1. Aspden JL, et al. Extensive translation of small Open Reading Frames revealed by Poly-Ribo-Seq. eLife. 2014;3:e03528. doi: 10.7554/eLife.03528. - DOI - PMC - PubMed
    1. Ruiz-Orera J, Messeguer X, Subirana JA, Alba MM. Long non-coding RNAs as a source of new peptides. Elife. 2014;3:e03523. doi: 10.7554/eLife.03523. - DOI - PMC - PubMed
    1. Patraquim P, Mumtaz MAS, Pueyo JI, Aspden JL, Couso JP. Developmental regulation of canonical and small ORF translation from mRNAs. Genome Biol. 2020;21:128. doi: 10.1186/s13059-020-02011-5. - DOI - PMC - PubMed

Publication types

Substances