Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Apr 23;5(4):e10316.
doi: 10.1371/journal.pone.0010316.

Mining mammalian transcript data for functional long non-coding RNAs

Affiliations

Mining mammalian transcript data for functional long non-coding RNAs

Amit N Khachane et al. PLoS One. .

Abstract

Background: The role of long non-coding RNAs (lncRNAs) in controlling gene expression has garnered increased interest in recent years. Sequencing projects, such as Fantom3 for mouse and H-InvDB for human, have generated abundant data on transcribed components of mammalian cells, the majority of which appear not to be protein-coding. However, much of the non-protein-coding transcriptome could merely be a consequence of 'transcription noise'. It is therefore essential to use bioinformatic approaches to identify the likely functional candidates in a high throughput manner.

Principal findings: We derived a scheme for classifying and annotating likely functional lncRNAs in mammals. Using the available experimental full-length cDNA data sets for human and mouse, we identified 78 lncRNAs that are either syntenically conserved between human and mouse, or that originate from the same protein-coding genes. Of these, 11 have significant sequence homology. We found that these lncRNAs exhibit: (i) patterns of codon substitution typical of non-coding transcripts; (ii) preservation of sequences in distant mammals such as dog and cow, (iii) significant sequence conservation relative to their corresponding flanking regions (in 50% cases, flanking regions do not have homology at all; and in the remaining, the degree of conservation is significantly less); (iv) existence mostly as single-exon forms (8/11); and, (v) presence of conserved and stable secondary structure motifs within them. We further identified orthologous protein-coding genes that are contributing to the pool of lncRNAs; of which, genes implicated in carcinogenesis are significantly over-represented.

Conclusion: Our comparative mammalian genomics approach coupled with evolutionary analysis identified a small population of conserved long non-protein-coding RNAs (lncRNAs) that are potentially functional across Mammalia. Additionally, our analysis indicates that amongst the orthologous protein-coding genes that produce lncRNAs, those implicated in cancer pathogenesis are significantly over-represented, suggesting that these lncRNAs could play an important role in cancer pathomechanisms.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. A schematic representation of the discovery pipeline for conserved expressed long non-protein-coding RNAs (lncRNAs).
Figure 2
Figure 2. A schematic representation of different genomic regions from which lncRNA originate relative to the structure of a protein-coding gene.
Figure 3
Figure 3. A model for antisense regulation of target mRNA transcripts by lncRNAs.
The following lncRNA sequences: HIT000079026.8 and HIT000091723.8, have complementary relationship to UTR of the following protein-coding transcripts: ENST00000393449 and ENST00000383790, respectively.
Figure 4
Figure 4. Assessment for protein-coding ability.
Comparison between Ka/Ks values of long ORFs derived from six-frame conceptual translation for human-mouse lncRNAs and orthologous neighboring protein-coding genes.

References

    1. Okazaki Y, Furuno M, Kasukawa T, Adachi J, Bono H, et al. Analysis of the mouse transcriptome based on functional annotation of 60,770 full-length cDNAs. Nature. 2002;420:563–573. - PubMed
    1. Maeda N, Kasukawa T, Oyama R, Gough J, Frith M, et al. Transcript annotation in FANTOM3: mouse gene catalog based on physical cDNAs. PLoS Genet. 2006;2:e62. - PMC - PubMed
    1. Yamasaki C, Murakami K, Fujii Y, Sato Y, Harada E, et al. The H-Invitational Database (H-InvDB), a comprehensive annotation resource for human genes and transcripts. Nucleic Acids Res. 2008;36:D793–799. - PMC - PubMed
    1. Post-transcriptional processing generates a diversity of 5′-modified long and short RNAs. Nature. 2009;457:1028–1032. - PMC - PubMed
    1. Brosius J. Waste not, want not–transcript excess in multicellular eukaryotes. Trends Genet. 2005;21:287–288. - PubMed

Publication types

Substances