Dynalign II: common secondary structure prediction for RNA homologs with domain insertions

Yinghan Fu, Gaurav Sharma, David H Mathews

PMID: 25416799
PMCID: PMC4267632
DOI: 10.1093/nar/gku1172

Dynalign II: common secondary structure prediction for RNA homologs with domain insertions

Yinghan Fu et al. Nucleic Acids Res. 2014.

. 2014 Dec 16;42(22):13939-48.

doi: 10.1093/nar/gku1172.

Authors

Yinghan Fu, Gaurav Sharma, David H Mathews

PMID: 25416799
PMCID: PMC4267632
DOI: 10.1093/nar/gku1172

Abstract

Homologous non-coding RNAs frequently exhibit domain insertions, where a branch of secondary structure is inserted in a sequence with respect to its homologs. Dynamic programming algorithms for common secondary structure prediction of multiple RNA homologs, however, do not account for these domain insertions. This paper introduces a novel dynamic programming algorithm methodology that explicitly accounts for the possibility of inserted domains when predicting common RNA secondary structures. The algorithm is implemented as Dynalign II, an update to the Dynalign software package for predicting the common secondary structure of two RNA homologs. This update is accomplished with negligible increase in computational cost. Benchmarks on ncRNA families with domain insertions validate the method. Over base pairs occurring in inserted domains, Dynalign II improves accuracy over Dynalign, attaining 80.8% sensitivity (compared with 14.4% for Dynalign) and 91.4% positive predictive value (PPV) for tRNA; 66.5% sensitivity (compared with 38.9% for Dynalign) and 57.0% PPV for RNase P RNA; and 50.1% sensitivity (compared with 24.3% for Dynalign) and 58.5% PPV for SRP RNA. Compared with Dynalign, Dynalign II also exhibits statistically significant improvements in overall sensitivity and PPV. Dynalign II is available as a component of RNAstructure, which can be downloaded from http://rna.urmc.rochester.edu/RNAstructure.html.

PubMed Disclaimer

Figures

**Figure 1.**
Expansion of W(i, j, k, l) to allow domain insertions. (A) and (B) Represent two of the original filling steps of W(i, j, k, l) that are for conserved domains. (C)–(F) Are expanded steps that allow consideration of inserted domains in four different positions: (C) 3′ side of sequence 2, (D) 3′ side of sequence 1, (E) 5′ side of sequence 2 and (F) 5′ side of sequence 1. The black solid lines represent sequences, black dashed lines represent gaps, black arcs represent base pairs and colored brackets are the substructures represented by the array members.

**Figure 2.**
Expansion of V(i, j, k, l) to allows domain insertions. (A) represents the step in the original Dynalign algorithm where two conserved domains form inside a conserved base pair. (B)–(E) Illustrate how the modifications in Dynalign II account for potential inserted domains within the conserved base pair of V(i, j, k, l) at four positions: (B) 5′ side of sequence 2, (C) 3′ side of sequence 1, (D) 5′ side of sequence 2 and (E) the 5′ side of sequence 1.

**Figure 3.**
Expansion of W5(i, k) to account for domain insertions. (A) represents the recursion in the original Dynalign algorithm where W5(i, k) considers a conserved domain. (B) and (C) Represent the consideration of an inserted domain in W5(i, k) at two positions: (B) 3′ side of sequence 1 and (C) the 3′ side of sequence 2.

**Figure 4.**
Expansion of V(i, j, k, l) allowing stem extension and internal loop aligning with consecutive stacking base pairs. (A) and (B) Represent an internal loop in one sequence aligned with consecutive stacking base pairs in another, where in (A) the internal loop is in sequence 1 and in (B) it is in sequence 2. (C) and (D) Represent the extension of a conserved stem, where in (C) the internal loop, stacking base pair or bulge loop is inserted in sequence 2 and in (D) it is inserted in sequence 1.

**Figure 5.**
Overall structure prediction accuracy for secondary structure prediction. (A) Shows the sensitivity of the four prediction methods over homologous pairs from tRNA, 5S rRNA, RNase P RNA and SRP RNA data sets. (B) Shows the PPV of the four prediction methods on the four families. Colors represent the program used, as identified by the legends. The numerical values are indicated on the bars. The improvements in performance of Dynalign II over Dynalign and of Dynalign II over Fold are statistically significant for each RNA family Supplementary Tables S9 and S10 in the Supplementary Materials provide the P-values for the tests.

**Figure 6.**
Structure prediction accuracy over base pairs in inserted domains. (A) Shows the sensitivity of Dynalign II and Dynalign on the tRNA, RNase P and SRP data sets. (B) Shows the PPV of Dynalign II and Dynalign on the tRNA, RNase P and SRP data sets. Colors represent the program used and are identified by the legends. The numerical values of the sensitivities and PPVs are indicated on the bars.

**Figure 7.**
Known structures for two SRP homologs with a domain insertion in one homolog. (A) *Bacillus amyloliquefaciens* D11416 (SRP database ID: Baci.amyl._D11416) and (B) *Pyrococcus horikoshii* BA000001 (SRP database: Pyro.hori._BA000001) from the SRP database (37). The nucleotides are numbered from 5′-3′. The inserted domain in (B) is marked by a blue rectangle.

**Figure 8.**
Structure predictions for the homologs in Figure 7 obtained with the original Dynalign algorithm. (A) and (B) are the Dynalign predictions for the structures of the *Bacillus amyloliquefaciens* D11416 (A) and the *Pyrococcus horikoshii* BA000001 (B), respectively. The correctly predicted base pairs are colored green and their pairs are more heavily weighted. The incorrectly predicted base pairs are colored gray and their pairs are less heavily weighted.

**Figure 9.**
Structure prediction results for Dynalign II. (A) and (B) are the Dynalign II predictions for the structures of the *Bacillus amyloliquefaciens* D11416 and the *Pyrococcus horikoshii* BA000001, respectively. Correctly predicted base pairs are colored green and their pairs are more heavily weighted. The incorrectly predicted base pairs are colored gray and their pairs are less heavily weighted. The correctly identified inserted domain is marked by a blue rectangle.

See this image and copyright information in PMC

References

1. Eddy S.R. Non-coding RNA genes and the modern RNA world. Nat. Rev. Genet. 2001;2:919–929. - PubMed
1. Waters L.S., Storz G. Regulatory RNAs in bacteria. Cell. 2009;136:615–628. - PMC - PubMed
1. Doudna J.A., Cech T.R. The chemical repertoire of natural ribozymes. Nature. 2002;418:222–228. - PubMed
1. Tucker B.J., Breaker R.R. Riboswitches as versatile gene control elements. Curr. Opin. Struct. Biol. 2005;15:342–348. - PubMed
1. Marraffini L.A., Sontheimer E.J. CRISPR interference: RNA-directed adaptive immunity in bacteria and archaea. Nat. Rev. Genet. 2010;11:181–190. - PMC - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Dynalign II: common secondary structure prediction for RNA homologs with domain insertions

Dynalign II: common secondary structure prediction for RNA homologs with domain insertions

Authors

Abstract

Figures

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources