Comparative Study

. 2007 Apr 19:8:130.

doi: 10.1186/1471-2105-8-130.

Efficient pairwise RNA structure prediction using probabilistic alignment constraints in Dynalign

Arif Ozgun Harmanci¹, Gaurav Sharma, David H Mathews

Affiliations

Affiliation

¹ Department of Electrical and Computer Engineering, University of Rochester, Hopeman 204, Rochester, NY 14627, USA. arharman@ece.rochester.edu <arharman@ece.rochester.edu>

PMID: 17445273
PMCID: PMC1868766
DOI: 10.1186/1471-2105-8-130

Comparative Study

Efficient pairwise RNA structure prediction using probabilistic alignment constraints in Dynalign

Arif Ozgun Harmanci et al. BMC Bioinformatics. 2007.

. 2007 Apr 19:8:130.

doi: 10.1186/1471-2105-8-130.

Authors

Arif Ozgun Harmanci¹, Gaurav Sharma, David H Mathews

Affiliation

¹ Department of Electrical and Computer Engineering, University of Rochester, Hopeman 204, Rochester, NY 14627, USA. arharman@ece.rochester.edu <arharman@ece.rochester.edu>

PMID: 17445273
PMCID: PMC1868766
DOI: 10.1186/1471-2105-8-130

Abstract

Background: Joint alignment and secondary structure prediction of two RNA sequences can significantly improve the accuracy of the structural predictions. Methods addressing this problem, however, are forced to employ constraints that reduce computation by restricting the alignments and/or structures (i.e. folds) that are permissible. In this paper, a new methodology is presented for the purpose of establishing alignment constraints based on nucleotide alignment and insertion posterior probabilities. Using a hidden Markov model, posterior probabilities of alignment and insertion are computed for all possible pairings of nucleotide positions from the two sequences. These alignment and insertion posterior probabilities are additively combined to obtain probabilities of co-incidence for nucleotide position pairs. A suitable alignment constraint is obtained by thresholding the co-incidence probabilities. The constraint is integrated with Dynalign, a free energy minimization algorithm for joint alignment and secondary structure prediction. The resulting method is benchmarked against the previous version of Dynalign and against other programs for pairwise RNA structure prediction.

Results: The proposed technique eliminates manual parameter selection in Dynalign and provides significant computational time savings in comparison to prior constraints in Dynalign while simultaneously providing a small improvement in the structural prediction accuracy. Savings are also realized in memory. In experiments over a 5S RNA dataset with average sequence length of approximately 120 nucleotides, the method reduces computation by a factor of 2. The method performs favorably in comparison to other programs for pairwise RNA structure prediction: yielding better accuracy, on average, and requiring significantly lesser computational resources.

Conclusion: Probabilistic analysis can be utilized in order to automate the determination of alignment constraints for pairwise RNA structure prediction methods in a principled fashion. These constraints can reduce the computational and memory requirements of these methods while maintaining or improving their accuracy of structural prediction. This extends the practical reach of these methods to longer length sequences. The revised Dynalign code is freely available for download.

PubMed Disclaimer

Figures

**Figure 1**
Example illustrating *co-incidence*. (a) A sample alignment of two sequences, where inserted locations in a sequence are shown with a gap ⊓ in the other sequence in the corresponding location. (b) The set of *co-incident* position pairs is depicted. Coordinates corresponding to the co-incident position pairs are indicated by black (aligned) or cross-hatched (insertion) blocks.

**Figure 2**
Illustration of alignment of nucleotide at n₁in 1^stsequence and nucleotide at n₂in 2^ndsequence and how forward and backward variables are related to alignment of n₁and n₂. Forward variable keeps track of events before alignment position (n₁, n₂) and backward variable keeps track of events after alignment position (n₁, n₂).

**Figure 3**
Logarithm of posterior probabilities for co-incidences of nucleotide positions for a pair of sequences in a surface plot representation.

**Figure 4**
Illustration of the probabilistic alignment constraint set and comparison against the true alignment and the M parameter constraint set for tRNA sequences: X57045 and X57046. The abscissa and ordinates of the plots indicate nucleotide positions n₁and n₂along the sequences X57045 and X57046, respectively, (a) Probabilistic alignment constraint set with permitted alignments (shown in dark gray). (b) Alignment constraint set with true alignment super-imposed (in black). (c) Alignment constraint set for the prior M parameter with M = 7 (shown in light gray). (d) Difference between the M parameter and the probabilistic alignment constraint sets. Light gray regions indicate nucleotide position alignments permitted by the M constraint and not by the probabilistic constraint and the situation is vice-versa for dark gray regions.

**Figure 5**
A sample sequence alignment and corresponding state sequence.

**Figure 6**
State transition diagram for the (hidden) Markov Process determining alignment between the sequences. The three states ALN, INS1, and INS2 represent alignment, insertion in sequence 1 and insertion in sequence 2, respectively.

**Figure 7**
Trellis illustrating an alignment path.

**Figure 8**
Segment of the HMM trellis illustrating the feasible edges arriving at and originating from one trellis node. The red solid circle represents the trellis node for which edges are depicted (it corresponds to an ALN state). Feasible edges arriving at the node are shown as blue arrows that converge at the trellis node and feasible edges originating from the node are shown as green arrows diverging outward from the trellis node.

**Figure 9**
Fraction of aligned base pairs missed as a function of probability threshold for restricting alignments.

**Figure 10**
Fraction of sequence alignments missed as a function of probability threshold for restricting alignments.

See this image and copyright information in PMC

References

1. Durbin R, Eddy SR, Krogh A, Mitchison G. Biological Sequence Analysis : Probabilistic Models of Proteins and Nucleic Acids. Cambridge, UK: Cambridge University Press; 1999.
1. Eddy SR. Non-coding RNA Genes and the modern RNA World. Nat Rev. 2001;8:919–929. doi: 10.1038/35103511. - DOI - PubMed
1. Pace NR, Thomas BC, Woese CR. The RNA World. second. Cold Spring Harbor Laboratory Press; 1999. Probing RNA structure, function and history by comparative analysis; pp. 113–141.
1. Sankoff D. Simultaneous Solution of RNA Folding, Alignment and Protosequence Problems. SIAM J App Math. 1985;8(5):810–825. doi: 10.1137/0145048. - DOI
1. Mathews DH, Turner DH. Dynalign: An Algorithm for Finding the Secondary Structure Common to two RNA Sequences. J Mol Biol. 2002;8:191–203. doi: 10.1006/jmbi.2001.5351. - DOI - PubMed

Publication types

Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

Grants and funding

GM076485/GM/NIGMS NIH HHS/United States

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Efficient pairwise RNA structure prediction using probabilistic alignment constraints in Dynalign

Affiliation

Efficient pairwise RNA structure prediction using probabilistic alignment constraints in Dynalign

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources