Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2009 Dec;15(12):2075-82.
doi: 10.1261/rna.1556009. Epub 2009 Oct 27.

The tedious task of finding homologous noncoding RNA genes

Affiliations

The tedious task of finding homologous noncoding RNA genes

Peter Menzel et al. RNA. 2009 Dec.

Abstract

User-driven in silico RNA homology search is still a nontrivial task. In part, this is the consequence of a limited precision of the computational tools in spite of recent exciting progress in this area, and to a certain extent, computational costs are still problematic in practice. An important, and as we argue here, dominating issue is the dependence on good curated (secondary) structural alignments of the RNAs. These are often hard to obtain, not so much because of an inherent limitation in the available data, but because they require substantial manual curation, an effort that is rarely acknowledged. Here, we qualitatively describe a realistic scenario for what a "regular user" (i.e., a nonexpert in a particular RNA family) can do in practice, and what kind of results are likely to be achieved. Despite the indisputable advances in computational RNA biology, the conclusion is discouraging: BLAST still works better or equally good as other methods unless extensive expert knowledge on the RNA family is included. However, when good curated data are available the recent development yields further improvements in finding remote homologs. Homology search beyond the reach of BLAST hence is not at all a routine task.

PubMed Disclaimer

Figures

FIGURE 1.
FIGURE 1.
Homology search results. Members of the training set are indicated by boxes: Except for Y and vault RNAs, only mammalian sequences were used to construct the search patterns. For E2 and let-7, Rfam 8.1 provided only multiple human paralogs as seed sequences. For SRP, RNase MRP RNA, U3, and vault RNAs, we also ran RaveNnA on the small teleostei and invertebrate genomes, where ERPIN did not find the already annotated sequences. (Arrow) The range of the RaveNnA screens. (×) False-negative results, i.e., the fact that a homolog is known to exist but was not detected by any method. Complete sequences and detailed result tables are found at the Supplemental website.
FIGURE 2.
FIGURE 2.
Vertebrate telomerase structures. (A) Secondary structures of medaka (Oryzias latipes, n = 312), human (n = 451), and dogfish shark (Squalus acanthias, n = 559). Data adapted from Xie et al. (2008). (B) Sequence conservation. The panel includes data exported from the UCSC Genome Browser (Karolchik et al. 2008), showing the PhastCons (Siepel et al. 2005) conservation track based on the 28 vertebrate MULTIZ alignments (Blanchette et al. 2004), as well as a selection of pairwise alignments with the human locus. Note that outside the mammals only partial alignments are available in the automatic comparative genomics tracks. In particular, the homologs in Xenopus and teleost fishes are known in the literature but not identified in the genome-wide alignments.

References

    1. Andersen ES, Rosenblad MA, Larsen N, Westergaard JC, Burks J, Wower IK, Wower J, Gorodkin J, Samuelsson T, Zwieb C. The tmRDB and SRPDB resources. Nucleic Acids Res. 2006;33:D163–D168. - PMC - PubMed
    1. Andersen E, Lind-Thomsen A, Knudsen B, Kristensen S, Havgaard J, Torarinsson E, Larsen N, Zwieb C, Sestoft P, Kjems J, et al. Semiautomated improvement of RNA alignments. RNA. 2007;13:1850–1859. - PMC - PubMed
    1. Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Wheeler DL. Genbank. Nucleic Acids Res. 2005;33:D34–D38. - PMC - PubMed
    1. Billoud B, Kontic M, Viari A. Palingol: A declarative programming language to describe nucleic acids' secondary structures and to scan sequence database. Nucleic Acids Res. 1996;24:1395–1403. - PMC - PubMed
    1. Blanchette M, Kent WJ, Riemer C, Elnitski L, Smit AFA, Roskin KM, Baertsch R, Rosenbloom K, Clawson H, Green ED, et al. Aligning multiple genomic sequences with the threaded blockset aligner. Genome Res. 2004;14:708–715. - PMC - PubMed

Publication types

Substances