Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2018 Apr 24;50(1):20.
doi: 10.1186/s12711-018-0391-0.

Cross-species inference of long non-coding RNAs greatly expands the ruminant transcriptome

Affiliations
Comparative Study

Cross-species inference of long non-coding RNAs greatly expands the ruminant transcriptome

Stephen J Bush et al. Genet Sel Evol. .

Abstract

Background: mRNA-like long non-coding RNAs (lncRNAs) are a significant component of mammalian transcriptomes, although most are expressed only at low levels, with high tissue-specificity and/or at specific developmental stages. Thus, in many cases lncRNA detection by RNA-sequencing (RNA-seq) is compromised by stochastic sampling. To account for this and create a catalogue of ruminant lncRNAs, we compared de novo assembled lncRNAs derived from large RNA-seq datasets in transcriptional atlas projects for sheep and goats with previous lncRNAs assembled in cattle and human. We then combined the novel lncRNAs with the sheep transcriptional atlas to identify co-regulated sets of protein-coding and non-coding loci.

Results: Few lncRNAs could be reproducibly assembled from a single dataset, even with deep sequencing of the same tissues from multiple animals. Furthermore, there was little sequence overlap between lncRNAs that were assembled from pooled RNA-seq data. We combined positional conservation (synteny) with cross-species mapping of candidate lncRNAs to identify a consensus set of ruminant lncRNAs and then used the RNA-seq data to demonstrate detectable and reproducible expression in each species. In sheep, 20 to 30% of lncRNAs were located close to protein-coding genes with which they are strongly co-expressed, which is consistent with the evolutionary origin of some ncRNAs in enhancer sequences. Nevertheless, most of the lncRNAs are not co-expressed with neighbouring protein-coding genes.

Conclusions: Alongside substantially expanding the ruminant lncRNA repertoire, the outcomes of our analysis demonstrate that stochastic sampling can be partly overcome by combining RNA-seq datasets from related species. This has practical implications for the future discovery of lncRNAs in other species.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Minimal overlap of lncRNAs at the sequence level. Venn diagrams show the number of sheep (a and c) or goat (b and d) lncRNAs that can be aligned—either with an alignment of any length or quality (A and B), or with ≥ 50% identity over ≥ 50% of the length of the target sequence (c and d)—to either shortlist of goat (a and c) or sheep (b and d) lncRNAs, and to sets of cattle and human lncRNAs from previous studies. The majority (58% of sheep lncRNAs, and 49% of goat lncRNAs) have no associated alignment. Alignments are detailed in Additional file 1: Table S10 (sheep) and Additional file 1: Table S11 (goat)
Fig. 2
Fig. 2
Stochastic detection and assembly of lncRNAs by RNA-seq libraries. These results—a consequence of limitations in sequencing breadth and depth—suggest that for a given species, only a subset of the total lncRNA transcriptome is likely to be captured. Nevertheless, the number of candidate lncRNAs for that species can be increased if directly mapping, to a positionally conserved region of the genome, the lncRNAs from either a related (sheep, goat, cattle) or more distant (human) species. Many of these mapped lncRNAs (which could not be completely reconstructed with the RNA-seq libraries of that species) are nevertheless detectably expressed
Fig. 3
Fig. 3
Proportion of sheep expression atlas samples for which a candidate lncRNA cannot be fully reconstructed. The sheep expression atlas comprises 429 RNA-seq libraries, representing 110 distinct samples; that is, each sample is a tissue/cell type at a given developmental stage, with up to six replicates per sample. Twenty-two candidate lncRNAs cannot be reconstructed in any given sample (i.e., the proportion of samples is 100%). These lncRNAs could be assembled only after pooling data from multiple samples. Data for this figure are in Additional file 1: Table S22
Fig. 4
Fig. 4
3D visualisation of a gene-to-gene correlation graph. Each node (sphere) represents a gene. Nodes are connected by edges (lines) that represent Pearson’s correlations between the two sets of expression level estimates, at a threshold greater than or equal to 0.95. The graph comprises 11,841 nodes and 2214,099 edges. Genes cluster together according to the similarity of their expression profiles (i.e. their degree of co-expression), with clusters (coloured sets of nodes) determined by using the MCL algorithm. Expression level estimates for the lncRNAs in this graph are in Additional file 1: Table S19. The genes comprising each co-expression cluster are in Additional file 1: Table S23. The lncRNAs that are co-regulated with protein-coding genes are found within the same co-expression cluster

References

    1. Ponting CP, Oliver PL, Reik W. Evolution and functions of long noncoding RNAs. Cell. 2009;136:629–641. - PubMed
    1. Engreitz JM, Ollikainen N, Guttman M. Long non-coding RNAs: spatial amplifiers that control nuclear structure and gene expression. Nat Rev Mol Cell Biol. 2016;17:756–770. - PubMed
    1. Rinn JL, Chang HY. Genome regulation by long noncoding RNAs. Ann Rev Biochem. 2012;81:145–166. - PMC - PubMed
    1. Chen J, Xue Y. Emerging roles of non-coding RNAs in epigenetic regulation. Sci China Life Sci. 2016;59:227–235. - PubMed
    1. Kung JT, Colognori D, Lee JT. Long noncoding RNAs: past, present, and future. Genetics. 2013;193:651–669. - PMC - PubMed

Publication types

Substances

LinkOut - more resources