Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Sep 8;6(9):2881-91.
doi: 10.1534/g3.116.030338.

A Genomic Analysis of Factors Driving lincRNA Diversification: Lessons from Plants

Affiliations

A Genomic Analysis of Factors Driving lincRNA Diversification: Lessons from Plants

Andrew D L Nelson et al. G3 (Bethesda). .

Abstract

Transcriptomic analyses from across eukaryotes indicate that most of the genome is transcribed at some point in the developmental trajectory of an organism. One class of these transcripts is termed long intergenic noncoding RNAs (lincRNAs). Recently, attention has focused on understanding the evolutionary dynamics of lincRNAs, particularly their conservation within genomes. Here, we take a comparative genomic and phylogenetic approach to uncover factors influencing lincRNA emergence and persistence in the plant family Brassicaceae, to which Arabidopsis thaliana belongs. We searched 10 genomes across the family for evidence of > 5000 lincRNA loci from A. thaliana From loci conserved in the genomes of multiple species, we built alignments and inferred phylogeny. We then used gene tree/species tree reconciliation to examine the duplication history and timing of emergence of these loci. Emergence of lincRNA loci appears to be linked to local duplication events, but, surprisingly, not whole genome duplication events (WGD), or transposable elements. Interestingly, WGD events are associated with the loss of loci for species having undergone relatively recent polyploidy. Lastly, we identify 1180 loci of the 6480 previously annotated A. thaliana lincRNAs (18%) with elevated levels of conservation. These conserved lincRNAs show higher expression, and are enriched for stress-responsiveness and cis-regulatory motifs known as conserved noncoding sequences (CNSs). These data highlight potential functional pathways and suggest that CNSs may regulate neighboring genes at both the genomic and transcriptomic level. In sum, we provide insight into processes that may influence lincRNA diversification by providing an evolutionary context for previously annotated lincRNAs.

Keywords: Brassicaceae; comparative genomics; evolution; lincRNA; transcriptomics.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Schematic representation for identification, clustering, and phylogenetic analysis of AtlincRNAs and their homologous loci. (A) Species analyzed within Brassicaceae. A chronogram of the Brassicaceae species, and outgroup T. hassleriana, used in this study. Lineages I and II are indicated in red. Number of homologous AtlincRNA loci detected in each species shown. (B) General scheme for identifying AtlincRNA sequence homologs in other species. The Liu et al. (2012) lincRNA dataset (dark green box denoted by “Q”) were used as the query in a reciprocal BLAST of genomes (round colored circles). Overlap between identified sequence homologs and known gene datasets (colored triangles) was determined for annotation purposes. Homologous sequences, along with available annotations, were extracted and aligned. Colored lines represent sequences, and are color coded to match the genomes from which they were extracted. (C) Phylogenetic inferences of conserved AtlincRNA families. Currently accepted species relationships, with two lineages, indicated (center). A red asterisk represents the last common ancestor of lineage I and II species. Aligned AtlincRNA families were filtered according to the conservation criteria shown. The number of conserved AtlincRNA families (all species combined) passing through each phylogenetic analysis step is listed.
Figure 2
Figure 2
Comparison of conserved homologous lincRNA loci in select mammals and Brassicaceae. (A) Percent of human (H. sap) lncRNA homologs identified in close relatives. Percentage of recovered loci are shown next to each bar. The accepted organismal phylogeny and estimated times of divergence for these three species was derived from Arnason et al. (2008) and is shown to the left to allow for direct comparisons. (B) Percentage of homologous loci recovered for AtlincRNAs (green), genic (yellow) and intergenic (blue) loci using a similar search protocol as that shown for human lncRNAs. Species’ names are abbreviations of those shown in Figure 1A. Percentage is shown next to each bar. Divergence times and species phylogeny was obtained from Beilstein et al. (2010).
Figure 3
Figure 3
Features enriched in conserved AtlincRNAs. (A) Percent of AtlincRNA loci overlapping with conserved noncoding sequence defined by Haudry et al. (2013). Conserved AtlincRNA loci are defined by having sequence homologs in ≥ four species, with at least one species in the opposite lineage (i.e., Lineage II). Nonconserved AtlincRNAs are those with < four sequence homologs. ** P-value < 0.001. (B) Box and whiskers plot of expression values for AtlincRNA families with homologous loci identified for increasingly divergent species. Expression is denoted as the average FPKM (fragment per kilobase of exon per million fragments mapped) values across four different tissues along a logarithmic scale [flowers, leaves, siliques, root; values from Liu et al. (2012)]. Transcription data were available for 2666 AtlincRNAs. The number of families with representatives at each divergence time-point is listed. Divergence times correspond to those shown in Figure 1A. A Pearson’s Correlation Coefficient was calculated (CC, top left). A linear regression analysis was performed to determine the statistical significance of this coefficient. (C) Percent of all nonconserved (orange) or conserved AtlincRNA (blue) families with miRNA binding motifs. (D) Percent of stress-responsive AtlincRNAs out of total number of AtlincRNAs conserved to each node (nodes indicates by divergence dates shown along x-axis). Actual number of stress-responsive AtlincRNAs shown above each bar. Where shown, *** indicates P-value < 0.0001 relative to the A. thaliana-specific lincRNAs (node 1).
Figure 4
Figure 4
Transposable element (TE) content in AtlincRNAs. (A) Percent of lincRNAs and coding sequences from A. thaliana that overlap with ≥ 10 bp of a TE as determined by RepeatMasker. Actual percent shown above each bar. (B) Connection between TEs and AtlincRNA emergence. AtlincRNAs were binned based on when they are believed to have emerged [shown on x-axis in millions of years ago (Mya)]. TE content, either within or adjacent to the lincRNA, was determined for AtlincRNAs within each bin.
Figure 5
Figure 5
Inferred timing of duplication and duplication dependence in the conserved AtlincRNA families. Timing of duplications within A. thaliana conserved lincRNA families. The data represent duplications that occur along the backbone leading to the AtlincRNA (blue bar). Duplications are shown per node, with approximate divergence times (Mya) shown in red. The number of duplication events per node are shown in the green circles. The number of AtlincRNA families with duplications per node are shown in purple circles. Some families contained a duplication at multiple nodes and therefore were counted multiple times. Overall, 296 AtlincRNA loci showed evidence of a duplication event at least once, but in some cases multiple times. The total number of duplications are shown below.
Figure 6
Figure 6
Sequence loss and decay events in the conserved AtlincRNA families. (A) Strategy for inferring sequence decay or loss for the absent loci in the conserved AtlincRNA families using a less stringent BLASTN cutoff (1e–5) and synteny. (B) Bar graph of the percent (out of total 1023) of lincRNA loci experiencing sequence decay in the species listed. Pairwise comparisons of the proportion of lost or decayed loci were performed between all species using a score test for a difference of binomial proportions. Species that, after a Bonferroni correction, were not significantly different from one another were grouped. (C) Bar graph of the percent (out of total 1023) of lincRNA loci experiencing loss in the species listed. Raw numbers are shown in File S2. Light blue bars depict the level of loss and decay observed for protein-coding loci.

References

    1. Adrian J., Farrona S., Reimer J. J., Albani M. C., Coupland G., et al. , 2010. cis-regulatory elements and chromatin state coordinately control temporal and spatial expression of FLOWERING LOCUS T in Arabidopsis. Plant Cell 22(5): 1425–1440. - PMC - PubMed
    1. Altschul S. F., Gish W., Miller W., Myers E. W., Lipman D. J., 1990. Basic local alignment search tool. J. Mol. Biol. 215(3): 403–410. - PubMed
    1. Arnason U., Adegoke J. A., Gullberg A., Harley E. H., Janke A., et al. , 2008. Mitogenomic relationships of placental mammals and molecular estimates of their divergences. Gene 421(1–2): 37–51. - PubMed
    1. Beilstein M. A., Al-Shehbaz I. A., Kellogg E. A., 2006. Brassicaceae phylogeny and trichome evolution. Am. J. Bot. 93(4): 607–619. - PubMed
    1. Beilstein M. A., Nagalingum N. S., Clements M. D., Manchester S. R., Mathews S., 2010. Dated molecular phylogenies indicate a Miocene origin for Arabidopsis thaliana. Proc. Natl. Acad. Sci. USA 107(43): 18724–18728. - PMC - PubMed

Publication types

Substances

LinkOut - more resources