Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Apr 20:8:14.
doi: 10.1186/1748-7188-8-14. eCollection 2013.

LocARNAscan: Incorporating thermodynamic stability in sequence and structure-based RNA homology search

Affiliations

LocARNAscan: Incorporating thermodynamic stability in sequence and structure-based RNA homology search

Sebastian Will et al. Algorithms Mol Biol. .

Abstract

Background: The search for distant homologs has become an import issue in genome annotation. A particular difficulty is posed by divergent homologs that have lost recognizable sequence similarity. This same problem also arises in the recognition of novel members of large classes of RNAs such as snoRNAs or microRNAs that consist of families unrelated by common descent. Current homology search tools for structured RNAs are either based entirely on sequence similarity (such as blast or hmmer) or combine sequence and secondary structure. The most prominent example of the latter class of tools is Infernal. Alternatives are descriptor-based methods. In most practical applications published to-date, however, the information contained in covariance models or manually prescribed search patterns is dominated by sequence information. Here we ask two related questions: (1) Is secondary structure alone informative for homology search and the detection of novel members of RNA classes? (2) To what extent is the thermodynamic propensity of the target sequence to fold into the correct secondary structure helpful for this task?

Results: Sequence-structure alignment can be used as an alternative search strategy. In this scenario, the query consists of a base pairing probability matrix, which can be derived either from a single sequence or from a multiple alignment representing a set of known representatives. Sequence information can be optionally added to the query. The target sequence is pre-processed to obtain local base pairing probabilities. As a search engine we devised a semi-global scanning variant of LocARNA's algorithm for sequence-structure alignment. The LocARNAscan tool is optimized for speed and low memory consumption. In benchmarking experiments on artificial data we observe that the inclusion of thermodynamic stability is helpful, albeit only in a regime of extremely low sequence information in the query. We observe, furthermore, that the sensitivity is bounded in particular by the limited accuracy of the predicted local structures of the target sequence.

Conclusions: Although we demonstrate that a purely structure-based homology search is feasible in principle, it is unlikely to outperform tools such as Infernal in most application scenarios, where a substantial amount of sequence information is typically available. The LocARNAscan approach will profit, however, from high throughput methods to determine RNA secondary structure. In transcriptome-wide applications, such methods will provide accurate structure annotations on the target side.

Availability: Source code of the free software LocARNAscan 1.0 and supplementary data are available at http://www.bioinf.uni-leipzig.de/Software/LocARNAscan.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Schematic view of the computation of S by LocARNAscan. The dark green dots in Sj,l are sufficient to calculate the current entry Sj,l, given the set of arcs indicated by red lines.
Figure 2
Figure 2
Fitting of several commonly used probability distributions to the histogram of LocARNAscanscores. Scores correspond to LocARNAscan alignments using the profile of the RFAM family RF00504 (glycine riboswitch) as input query. The first row shows the fitting of log-normal, gumbel, and generalized extreme value (gev) distributions (red curves) to the alignment scores shown as histogram. The shown scores have been shifted to positive values. In the lower panel, we compare the distributions by Q-Q plots. These plots, which plot the quantiles of the observed scores vs. expected quantiles from the theoretical distributions, visualize in how far two distributions differ in location, scale and skew from each other. All tested known probability distributions (including normal and gamma distribution; data not shown) do not represent the LocARNAscan alignment score distribution well; visible in the Q-Q-plot, since none of the Q-Q plots follows a straight line.
Figure 3
Figure 3
Classification of thermodynamically stabilized vs. non-stabilized occurrences.

References

    1. Berretta J, Morillon A. Pervasive transcription constitutes a new level of eukaryotic genome regulation. EMBO Rep. 2009;10:973–982. doi: 10.1038/embor.2009.181. - DOI - PMC - PubMed
    1. Ponjavic J, Ponting CP, Lunter G. Functionality or transcriptional noise? Evidence for selection within long noncoding RNAs. Genome Res. 2007;17:556–565. doi: 10.1101/gr.6036807. - DOI - PMC - PubMed
    1. Pheasant M, Mattick JS. Raising the estimate of functional human sequences. Genome Res. 2007;17:1245–1253. doi: 10.1101/gr.6406307. - DOI - PubMed
    1. Ponting CP, Hardison RC. What fraction of the human genome is functional? Genome Res. 2011;21:1769–1776. doi: 10.1101/gr.116814.110. - DOI - PMC - PubMed
    1. Menzel P, Gorodkin J, Stadler PF. The tedious task of finding homologous non-coding RNA genes. RNA. 2009;15:2075–2082. doi: 10.1261/rna.1556009. - DOI - PMC - PubMed

LinkOut - more resources