Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2013;14 Suppl 9(Suppl 9):S4.
doi: 10.1186/1471-2105-14-S9-S4. Epub 2013 Jun 28.

Classification and assessment tools for structural motif discovery algorithms

Affiliations
Comparative Study

Classification and assessment tools for structural motif discovery algorithms

Ghada Badr et al. BMC Bioinformatics. 2013.

Abstract

Background: Motif discovery is the problem of finding recurring patterns in biological data. Patterns can be sequential, mainly when discovered in DNA sequences. They can also be structural (e.g. when discovering RNA motifs). Finding common structural patterns helps to gain a better understanding of the mechanism of action (e.g. post-transcriptional regulation). Unlike DNA motifs, which are sequentially conserved, RNA motifs exhibit conservation in structure, which may be common even if the sequences are different. Over the past few years, hundreds of algorithms have been developed to solve the sequential motif discovery problem, while less work has been done for the structural case.

Methods: In this paper, we survey, classify, and compare different algorithms that solve the structural motif discovery problem, where the underlying sequences may be different. We highlight their strengths and weaknesses. We start by proposing a benchmark dataset and a measurement tool that can be used to evaluate different motif discovery approaches. Then, we proceed by proposing our experimental setup. Finally, results are obtained using the proposed benchmark to compare available tools. To the best of our knowledge, this is the first attempt to compare tools solely designed for structural motif discovery.

Results: Results show that the accuracy of discovered motifs is relatively low. The results also suggest a complementary behavior among tools where some tools perform well on simple structures, while other tools are better for complex structures.

Conclusions: We have classified and evaluated the performance of available structural motif discovery tools. In addition, we have proposed a benchmark dataset with tools that can be used to evaluate newly developed tools.

PubMed Disclaimer

Figures

Figure 1
Figure 1
RNA Secondary Structures. RNA Secondary Structures. Left: non-interacting RNA (only intra-molecular base pairs). Right: interacting RNA (with inter-molecular base pairs).
Figure 2
Figure 2
Bracket notation. Bracket notation for the structures in Figure 1.
Figure 3
Figure 3
Planar graph representation. RNA various loop types in planar graph representation.
Figure 4
Figure 4
RNA expression. RNA expression for the structures in Figure 1.
Figure 5
Figure 5
Component-Based representation. Component-Based representation.
Figure 6
Figure 6
Covariance model. Covariance model: ordered binary tree (right) and the internal states (left) for parts of the non-interacting structure in Figure 1.
Figure 7
Figure 7
Connectivity table. Connectivity table for the non-interacting structure in Figure 1.
Figure 8
Figure 8
Arc representation. Arc representation.
Figure 9
Figure 9
Benchmark generation. Benchmark generation.
Figure 10
Figure 10
Measurement tool. Measurement tool.
Figure 11
Figure 11
Measurements averaged over all benchmark datasets. Sensitivity (Sn), Positive Predictive Value (PPV), and Specificity (Sp) averaged over all benchmark datasets.
Figure 12
Figure 12
Measurements averaged over all simple datasets, Ref.1. Sensitivity (Sn), Positive Predictive Value (PPV), and Specificity (Sp) averaged over all simple datasets, Ref.1.
Figure 13
Figure 13
Measurements averaged over all datasets in Ref.2. Sensitivity (Sn), Positive Predictive Value (PPV), and Specificity (Sp) averaged over all more complex datasets, Ref.2.
Figure 14
Figure 14
Measurements averaged over all datasets in Ref.3. Sensitivity (Sn), Positive Predictive Value (PPV), and Specificity (Sp) averaged for the most complex datasets, Ref.3.
Figure 15
Figure 15
Running time. Average running time (linux real time converted to minutes).

Similar articles

Cited by

References

    1. Badr G, Turcotte M. Proceedings of the 7th international conference on Bioinformatics research and applications. ISBRA'11, Berlin, Heidelberg: Springer-Verlag; 2011. Component-based matching for multiple interacting RNA sequences; pp. 73–86.
    1. Carvalho AM, Freitas AT, Oliveira AL, Sagot M. An Efficient Algorithm for the Identification of Structured Motifs in DNA Promoter Sequences. IEEE/ACM Trans Comput Biol Bioinformatics. 2006;3(2):126–140. doi: 10.1109/TCBB.2006.16. - DOI - PubMed
    1. George A, Tenenbaum S. Informatic Resources for Identifying and Annotating Structural RNA Motifs. Molecular Biotechnology. 2009;41(2):180–193. doi: 10.1007/s12033-008-9114-z. - DOI - PMC - PubMed
    1. Mathews DH, Turner DH. Prediction of RNA secondary structure by free energy minimization. Current opinion in structural biology. 2006;16(3):270–278. doi: 10.1016/j.sbi.2006.05.010. - DOI - PubMed
    1. Sung W. RNA Secondary Structure Prediction. The practical bioinformatician. 2004. pp. 167–192. World Scientific.

Publication types

LinkOut - more resources