Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2018 Nov 6;19(1):407.
doi: 10.1186/s12859-018-2441-6.

Detection of long non-coding RNA homology, a comparative study on alignment and alignment-free metrics

Affiliations
Comparative Study

Detection of long non-coding RNA homology, a comparative study on alignment and alignment-free metrics

Teresa M R Noviello et al. BMC Bioinformatics. .

Abstract

Background: Long non-coding RNAs (lncRNAs) represent a novel class of non-coding RNAs having a crucial role in many biological processes. The identification of long non-coding homologs among different species is essential to investigate such roles in model organisms as homologous genes tend to retain similar molecular and biological functions. Alignment-based metrics are able to effectively capture the conservation of transcribed coding sequences and then the homology of protein coding genes. However, unlike protein coding genes the poor sequence conservation of long non-coding genes makes the identification of their homologs a challenging task.

Results: In this study we compare alignment-based and alignment-free string similarity metrics and look at promoter regions as a possible source of conserved information. We show that promoter regions encode relevant information for the conservation of long non-coding genes across species and that such information is better captured by alignment-free metrics. We perform a genome wide test of this hypothesis in human, mouse, and zebrafish.

Conclusions: The obtained results persuaded us to postulate the new hypothesis that, unlike protein coding genes, long non-coding genes tend to preserve their regulatory machinery rather than their transcribed sequence. All datasets, scripts, and the prediction tools adopted in this study are available at https://github.com/bioinformatics-sannio/lncrna-homologs .

Keywords: Homology; Long ncRNA; String similarity.

PubMed Disclaimer

Conflict of interest statement

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Figures

Fig. 1
Fig. 1
P-value barplot for permutation test in Human-Mouse. -log10(p-values) estimated by permutation test over a null distribution of random non–homologous pairs in Human-Mouse on promoter (blue bars) and transcript sequences (red bars) for each considered metric. Homologous lncRNA couples are ranked according to the best prediction computed on promoter sequences among metrics. The x-axis reports true homologous pairs for the two species
Fig. 2
Fig. 2
P-value barplot for permutation test in Mouse-Zebrafish. -log10(p-values) estimated by permutation test over a null distribution of random non–homologous pairs in Mouse-Zebrafish on promoter (blue bars) and transcript sequences (red bars) for each considered metric. Homologous lncRNA couples are ranked according to the best prediction computed on promoter sequences among metrics. The x-axis reports true homologous pairs for the two species
Fig. 3
Fig. 3
P-value barplot for permutation test in Human-Zebrafish. -log10(p-values) estimated by permutation test over a null distribution of random non–homologous pairs in Human-Zebrafish on promoter (blue bars) and transcript sequences (red bars) for each considered metric. Homologous lncRNA couples are ranked according to the best prediction computed on promoter sequences among metrics. The x-axis reports true homologous pairs for the two species
Fig. 4
Fig. 4
NONCODE AUPR plots. Metric prediction performance computed on promoter and transcript sequences for NONCODE lncRNA homologs (AUPR on y-axis and n, the number of consecutive nucleotides in n-gram metrics, on x-axis)
Fig. 5
Fig. 5
ZFLNC AUPR plots. Metric prediction performance computed on promoter and transcript sequences for ZFLNC lncRNA homologs (AUPR on y-axis and n, the number of consecutive nucleotides in n-gram metrics, on x-axis)
Fig. 6
Fig. 6
Functional concordance plots. GO Biological Process (BP) terms enrichment of flanking protein coding genes of lncRNAs overlapping the conserved elements in Zebrafish (green bars) and predicted to be homologs according to Jaccard similarity with n=12 (red bars) in Human and Mouse. Blue bars indicate the percentages from the entire transcriptome of the specific specie of the BP terms
Fig. 7
Fig. 7
Distribution of conserved and non conserved flanking genes

References

    1. Carninci P, Kasukawa T, Katayama S, Gough J, Frith M, Maeda N, Oyama R, Ravasi T, Lenhard B, Wells C, et al. The transcriptional landscape of the mammalian genome. Science. 2005;309(5740):1559–63. doi: 10.1126/science.1112014. - DOI - PubMed
    1. Mercer TR, Dinger ME, Mattick JS. Long non-coding rnas: insights into functions. Nat Rev Genet. 2009;10(3):155–9. doi: 10.1038/nrg2521. - DOI - PubMed
    1. Wapinski O, Chang HY. Long noncoding rnas and human disease. Trends Cell Biol. 2011;21(6):354–61. doi: 10.1016/j.tcb.2011.04.001. - DOI - PubMed
    1. Gong J, Liu W, Zhang J, Miao X, Guo A-Y. lncrnasnp: a database of snps in lncrnas and their potential functions in human and mouse. Nucleic Acids Res. 2014;43(D1):181–6. doi: 10.1093/nar/gku1000. - DOI - PMC - PubMed
    1. Sun K, Chen X, Jiang P, Song X, Wang H, Sun H. iSeeRNA: identification of long intergenic non-coding RNA transcripts from transcriptome sequencing data. BMC Genomics. 2013;14(Suppl 2):S7. doi: 10.1186/1471-2164-14-S2-S7. - DOI - PMC - PubMed

Publication types

Substances

LinkOut - more resources