RNAscClust: clustering RNA sequences using structure conservation and graph based motifs
- PMID: 28334186
- PMCID: PMC5870858
- DOI: 10.1093/bioinformatics/btx114
RNAscClust: clustering RNA sequences using structure conservation and graph based motifs
Abstract
Motivation: Clustering RNA sequences with common secondary structure is an essential step towards studying RNA function. Whereas structural RNA alignment strategies typically identify common structure for orthologous structured RNAs, clustering seeks to group paralogous RNAs based on structural similarities. However, existing approaches for clustering paralogous RNAs, do not take the compensatory base pair changes obtained from structure conservation in orthologous sequences into account.
Results: Here, we present RNAscClust , the implementation of a new algorithm to cluster a set of structured RNAs taking their respective structural conservation into account. For a set of multiple structural alignments of RNA sequences, each containing a paralog sequence included in a structural alignment of its orthologs, RNAscClust computes minimum free-energy structures for each sequence using conserved base pairs as prior information for the folding. The paralogs are then clustered using a graph kernel-based strategy, which identifies common structural features. We show that the clustering accuracy clearly benefits from an increasing degree of compensatory base pair changes in the alignments.
Availability and implementation: RNAscClust is available at http://www.bioinf.uni-freiburg.de/Software/RNAscClust .
Contact: gorodkin@rth.dk or backofen@informatik.uni-freiburg.de.
Supplementary information: Supplementary data are available at Bioinformatics online.
© The Author(s) 2017. Published by Oxford University Press.
Figures





Similar articles
-
Foldalign 2.5: multithreaded implementation for pairwise structural RNA alignment.Bioinformatics. 2016 Apr 15;32(8):1238-40. doi: 10.1093/bioinformatics/btv748. Epub 2015 Dec 24. Bioinformatics. 2016. PMID: 26704597 Free PMC article.
-
deepBlockAlign: a tool for aligning RNA-seq profiles of read block patterns.Bioinformatics. 2012 Jan 1;28(1):17-24. doi: 10.1093/bioinformatics/btr598. Epub 2011 Nov 3. Bioinformatics. 2012. PMID: 22053076 Free PMC article.
-
antaRNA: ant colony-based RNA sequence design.Bioinformatics. 2015 Oct 1;31(19):3114-21. doi: 10.1093/bioinformatics/btv319. Epub 2015 May 27. Bioinformatics. 2015. PMID: 26023105 Free PMC article.
-
Lightweight comparison of RNAs based on exact sequence-structure matches.Bioinformatics. 2009 Aug 15;25(16):2095-102. doi: 10.1093/bioinformatics/btp065. Epub 2009 Feb 2. Bioinformatics. 2009. PMID: 19189979 Free PMC article.
-
Energy-based RNA consensus secondary structure prediction in multiple sequence alignments.Methods Mol Biol. 2014;1097:125-41. doi: 10.1007/978-1-62703-709-9_7. Methods Mol Biol. 2014. PMID: 24639158 Review.
Cited by
-
Clusters of mammalian conserved RNA structures in UTRs associate with RBP binding sites.NAR Genom Bioinform. 2024 Aug 9;6(3):lqae089. doi: 10.1093/nargab/lqae089. eCollection 2024 Sep. NAR Genom Bioinform. 2024. PMID: 39131818 Free PMC article.
-
Multiple Sequence Alignments Enhance Boundary Definition of RNA Structures.Genes (Basel). 2018 Dec 4;9(12):604. doi: 10.3390/genes9120604. Genes (Basel). 2018. PMID: 30518121 Free PMC article.
-
Advances in Computational Methodologies for Classification and Sub-Cellular Locality Prediction of Non-Coding RNAs.Int J Mol Sci. 2021 Aug 13;22(16):8719. doi: 10.3390/ijms22168719. Int J Mol Sci. 2021. PMID: 34445436 Free PMC article. Review.
-
GraphClust2: Annotation and discovery of structured RNAs with scalable and accessible integrative clustering.Gigascience. 2019 Dec 1;8(12):giz150. doi: 10.1093/gigascience/giz150. Gigascience. 2019. PMID: 31808801 Free PMC article.
-
The identification and functional annotation of RNA structures conserved in vertebrates.Genome Res. 2017 Aug;27(8):1371-1383. doi: 10.1101/gr.208652.116. Epub 2017 May 9. Genome Res. 2017. PMID: 28487280 Free PMC article.
References
-
- Backofen R., Hess W.R. (2010) Computational prediction of sRNAs and their targets in bacteria. RNA Biol., 7, 33–42. - PubMed
-
- Broder A.Z. (1997). On the resemblance and containment of documents. In: Compression and Complexity of Sequences 1997 (Proceedings), pp. 21–29.
-
- Costa F., De Grave K. (2010). Fast neighborhood subgraph pairwise distance kernel. In: Proceedings of the 27th International Conference on Machine Learning (ICML-10), Haifa, Israel, pp. 255–262. Omnipress.
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources