Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2014 Oct;30(10):439-52.
doi: 10.1016/j.tig.2014.08.004. Epub 2014 Sep 11.

Volatile evolution of long noncoding RNA repertoires: mechanisms and biological implications

Affiliations
Review

Volatile evolution of long noncoding RNA repertoires: mechanisms and biological implications

Aurélie Kapusta et al. Trends Genet. 2014 Oct.

Abstract

Thousands of genes encoding long noncoding RNAs (lncRNAs) have been identified in all vertebrate genomes thus far examined. The list of lncRNAs partaking in arguably important biochemical, cellular, and developmental activities is steadily growing. However, it is increasingly clear that lncRNA repertoires are subject to weak functional constraint and rapid turnover during vertebrate evolution. We discuss here some of the factors that may explain this apparent paradox, including relaxed constraint on sequence to maintain lncRNA structure/function, extensive redundancy in the regulatory circuits in which lncRNAs act, as well as adaptive and non-adaptive forces such as genetic drift. We explore the molecular mechanisms promoting the birth and rapid evolution of lncRNA genes, with an emphasis on the influence of bidirectional transcription and transposable elements, two pervasive features of vertebrate genomes. Together these properties reveal a remarkably dynamic and malleable noncoding transcriptome which may represent an important source of robustness and evolvability.

PubMed Disclaimer

Figures

Figure 1
Figure 1. Rapid turnover of lncRNA repertoires
A. Evolution of lncRNA and coding gene content. The amounts of lncRNA (blue circle; see below for references) and protein-coding coding (red circles) genes are superimposed to facilitate their comparison. Transposable element (TE) content and genome size are represented for each species (0% for Plasmodium [160]) as a grey circle next to the species name. The light gray fraction represents TE content, and the size of the circle reflects the size of the genome. The number of conserved orthologous genes is shown at each tree node when estimates are available or can be inferred from the literature (see below for references). Shared lncRNA amounts in tetrapods are from [3] and the pan-vertebrate lncRNA count (n=29) is from [12]. In eutherians (placental mammals), shared amounts are also extrapolated from [60, 63] and variations between studies are shown using a darker blue circle. The amount of shared lncRNA genes between Drosophila and mosquito is extrapolated from [67] and the 42 syntenic lncRNAs between Drosophila and vertebrates is from [5]. Beyond ribosomal RNA genes, we are only aware of a single lncRNA conserved across nearly all eukaryotes, the telomeric RNA TERRA [161-163]. References for lncRNA genes amounts are as follow: human, Gencode v19, Dec 2013, GRCh37 - Ensembl 74 [2] and [3, 164]; chimpanzee, macaque [3]; mouse, Gencode v2, Dec 2013, GRCm38 - Ensembl 74 [2] and [3, 164]; rat and cow lncRNA content was estimated to be similar to related organisms based on consistent amounts from single tissue analyses (liver for rat [63], skin [165] and muscle [166] for cow [see also 167]) and data for the organs of other mammals [3]; opossum [3]; chicken [3, 167]; frog [3]; zebrafish [12, 164, 167, 168]; nematode [167, 169]; Drosophila [5, 6]; in mosquito, 633 lncRNAs were identified with a very strict cut offs for identification. Therefore, given these first estimations for lncRNA content in drosophila, on the figure mosquito lncRNA content is represented as >1000 lncRNA genes (based on a set of 633 lncRNAs with very strict cut-offs [199]); yeast [167]; Ganoderma lucidum [170]; plasmodium [171]; Arabidopsis [7]; maize [8, 9]. Estimations from [3] include projected annotation, (see Extended Table 2 and Supp. Methods in ref. [3]). See also [4] for more details about most lncRNA datasets. References for protein-coding genes amount for each species are from corresponding genome papers and updated using release 75 of Ensembl [172]. References for estimation of shared protein-coding genes are as follow: Eutherian [173-175]; Amniotes to Vertebrates [176-179], drosophila-Mosquito [180]; yeast to G. lucidum [181]; 237 P. falciparum proteins show strong matches to proteins in eukaryotic genomes [160]. B. Limited overlap between lncRNA catalogs obtained from different sources. The Venn diagrams show the amount of overlap in different lncRNA gene catalogs obtained for the same species. References: Drosophila melanogaster: [5, 6]. Human: [2, 27] [see 49].
Figure 2
Figure 2. lncRNA classification
LncRNA annotation is a challenging task under active development [reviewed in 182]. Here we illustrate a subset of many non-mutually exclusive criteria that may be used to classify lncRNAs. (A) Genomic context. lncRNAs may be divided based on their position and orientation relative to protein-coding genes: for instance overlapping (genic) or non-overlapping (intergenic: lincRNA) protein-coding genes [see 1, 11, 27]. (B) Chromatin context. Different populations can be defined by distinct chromatin marks around their transcription start site. For instance enhancer-associated (elncRNA) or promoter-associated (plncRNA) lncRNAs are characterized by mono- vs tri-methylation of lysine 4 of histone H3 respectively (K4me1 and K4me3) [29, 51]. This information can be combined with genomic context to further classify lncRNAs. For example, some intragenic lncRNAs, named meRNAs (multiexonic polyA+ RNAs), originate from active enhancers lying within protein-coding genes [110]. (C) Subcellular localization. Cellular fractionation and hybridization techniques can reveal whether lncRNAs are differentially located or accumulate in the nucleus or the cytoplasm [1] or other sub-organellar compartments such as nuclear paraspeckles [e.g. 183] or cytosolic ribosomal complexes [e.g. 26]. (D) RNA structure and motifs. Some lncRNAs may be grouped according to shared structural features and motifs. For instance, several lncRNAs, typified by MALAT1, are characterized by the formation of triple-helical structures at their 3’ end [184]. These structures and motifs are important for the stabilization, subcellular localization, and function of these lncRNAs. For example, a small motif involved in restricting lncRNA localization to the nucleus was identified [185]. (E) Processing. Some lncRNAs can be precursors of smaller RNA species such as piRNAs, miRNAs or snoRNAs [186-188]. For example, the BORDERLINE lncRNA is a precursor to small RNAs involved in demarcating an epigenetically distinct chromosomal domain in S. pombe [189]. It has also been shown that in yeast distinct lncRNA classes are sorted during 3’ end formation [190]. (F) Function. Reminiscent of Gene Ontology classification, lncRNAs may be grouped according to (i) their molecular activities (e.g. chromatin modification competitive endogenous loci [see for review], architectural, etc.) or (ii) the cellular/biological processes they are involved in such as cell differentiation [e.g. 192], senescence [e.g. 193], circadian clock [e.g. 194], cell cycle regulation [reviewed in 195], pluripotency [e.g. 17, 31, 123], and innate immunity [196]. lncRNAs may also be classified based on their association with certain disease groups or states, such as neurological disorders [reviewed in 197] or cancer [198].
Figure 3
Figure 3. Stabilization of newly born transcripts
A to C. Models for lncRNA birth. Grey line: DNA. Purple: noncoding transcripts. The arrow on the left denotes progression in time. A. Transcription of unstable and short noncoding RNAs (e.g. PROMPT), from a bidirectional promoter (divergent transcription in the antisense direction from a protein-coding gene, brown ellipse) or from a newly inserted TE (orange box). B. Both transcript represented in A may elongate by gain of 5’ splicing sites and/or loss of poly adenylation sites [91]. C. Acquisition of splicing signals stabilizes further the transcript.
Figure 4
Figure 4. TE involvement in lncRNA turnover
The figure represents “TE first” and “lncRNA first” models. On the left, phylogenetic relationships between four hypothetical species are represented along with four independent waves of TE invasion (filled and numbered triangles, as follow: 1; brown. 2; orange. 3; pink. 4; yellow). Filled boxes with the same colors represent a TE after insertion on the three other panels. At locus A, the “TE first” model is schematized by a transcript born after TE invasions. Orange TE provides the TSS and some TE material corresponding to a more ancient invasion (brown) could be coopted as well. At locus B, the “lncRNA first” model (the origin of the lncRNA predates TE incorporation) is schematized by transposons integrating or close to lncRNAs. This can lead to transcript alterations: birth of an alternative lncRNA that may or may not replace the originally shared lncRNA (pink), or death of the lncRNA by disruption of the cis-regulatory sequences (yellow). The two models are non-exclusive and can draw a quite complicated evolutionary picture due to the continuous turn over; for example lineage specific TEs could insert close to the lncRNA represented in locus A and alter it. LncRNA exons are represented as boxes filled in light grey, and arrow marks the TSS. Grey lines represent genomic DNA.

References

    1. Derrien T, et al. The GENCODE v7 catalog of human long noncoding RNAs: analysis of their gene structure, evolution, and expression. Genome Research. 2012;22(9):1775–1789. - PMC - PubMed
    1. Harrow J, et al. GENCODE: The reference human genome annotation for The ENCODE Project. Genome / National Research Council Canada = Génome / Conseil national de recherches Canada. 2012 - PMC - PubMed
    1. Necsulea A, et al. The evolution of lncRNA repertoires and expression patterns in tetrapods. Nature. 2014;505(7485):635–640. - PubMed
    1. Ulitsky I, Bartel DP. lincRNAs: genomics, evolution, and mechanisms. Cell. 2013;154(1):26–46. - PMC - PubMed
    1. Young RS, et al. Identification and properties of 1,119 candidate lincRNA loci in the Drosophila melanogaster genome. Genome biology and evolution. 2012;4(4):427–442. - PMC - PubMed

Publication types

Substances

LinkOut - more resources