Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2007 Apr 19:8:130.
doi: 10.1186/1471-2105-8-130.

Efficient pairwise RNA structure prediction using probabilistic alignment constraints in Dynalign

Affiliations
Comparative Study

Efficient pairwise RNA structure prediction using probabilistic alignment constraints in Dynalign

Arif Ozgun Harmanci et al. BMC Bioinformatics. .

Abstract

Background: Joint alignment and secondary structure prediction of two RNA sequences can significantly improve the accuracy of the structural predictions. Methods addressing this problem, however, are forced to employ constraints that reduce computation by restricting the alignments and/or structures (i.e. folds) that are permissible. In this paper, a new methodology is presented for the purpose of establishing alignment constraints based on nucleotide alignment and insertion posterior probabilities. Using a hidden Markov model, posterior probabilities of alignment and insertion are computed for all possible pairings of nucleotide positions from the two sequences. These alignment and insertion posterior probabilities are additively combined to obtain probabilities of co-incidence for nucleotide position pairs. A suitable alignment constraint is obtained by thresholding the co-incidence probabilities. The constraint is integrated with Dynalign, a free energy minimization algorithm for joint alignment and secondary structure prediction. The resulting method is benchmarked against the previous version of Dynalign and against other programs for pairwise RNA structure prediction.

Results: The proposed technique eliminates manual parameter selection in Dynalign and provides significant computational time savings in comparison to prior constraints in Dynalign while simultaneously providing a small improvement in the structural prediction accuracy. Savings are also realized in memory. In experiments over a 5S RNA dataset with average sequence length of approximately 120 nucleotides, the method reduces computation by a factor of 2. The method performs favorably in comparison to other programs for pairwise RNA structure prediction: yielding better accuracy, on average, and requiring significantly lesser computational resources.

Conclusion: Probabilistic analysis can be utilized in order to automate the determination of alignment constraints for pairwise RNA structure prediction methods in a principled fashion. These constraints can reduce the computational and memory requirements of these methods while maintaining or improving their accuracy of structural prediction. This extends the practical reach of these methods to longer length sequences. The revised Dynalign code is freely available for download.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Example illustrating co-incidence. (a) A sample alignment of two sequences, where inserted locations in a sequence are shown with a gap ⊓ in the other sequence in the corresponding location. (b) The set of co-incident position pairs is depicted. Coordinates corresponding to the co-incident position pairs are indicated by black (aligned) or cross-hatched (insertion) blocks.
Figure 2
Figure 2
Illustration of alignment of nucleotide at n1 in 1st sequence and nucleotide at n2 in 2nd sequence and how forward and backward variables are related to alignment of n1 and n2. Forward variable keeps track of events before alignment position (n1, n2) and backward variable keeps track of events after alignment position (n1, n2).
Figure 3
Figure 3
Logarithm of posterior probabilities for co-incidences of nucleotide positions for a pair of sequences in a surface plot representation.
Figure 4
Figure 4
Illustration of the probabilistic alignment constraint set and comparison against the true alignment and the M parameter constraint set for tRNA sequences: X57045 and X57046. The abscissa and ordinates of the plots indicate nucleotide positions n1 and n2 along the sequences X57045 and X57046, respectively, (a) Probabilistic alignment constraint set with permitted alignments (shown in dark gray). (b) Alignment constraint set with true alignment super-imposed (in black). (c) Alignment constraint set for the prior M parameter with M = 7 (shown in light gray). (d) Difference between the M parameter and the probabilistic alignment constraint sets. Light gray regions indicate nucleotide position alignments permitted by the M constraint and not by the probabilistic constraint and the situation is vice-versa for dark gray regions.
Figure 5
Figure 5
A sample sequence alignment and corresponding state sequence.
Figure 6
Figure 6
State transition diagram for the (hidden) Markov Process determining alignment between the sequences. The three states ALN, INS1, and INS2 represent alignment, insertion in sequence 1 and insertion in sequence 2, respectively.
Figure 7
Figure 7
Trellis illustrating an alignment path.
Figure 8
Figure 8
Segment of the HMM trellis illustrating the feasible edges arriving at and originating from one trellis node. The red solid circle represents the trellis node for which edges are depicted (it corresponds to an ALN state). Feasible edges arriving at the node are shown as blue arrows that converge at the trellis node and feasible edges originating from the node are shown as green arrows diverging outward from the trellis node.
Figure 9
Figure 9
Fraction of aligned base pairs missed as a function of probability threshold for restricting alignments.
Figure 10
Figure 10
Fraction of sequence alignments missed as a function of probability threshold for restricting alignments.

Similar articles

Cited by

References

    1. Durbin R, Eddy SR, Krogh A, Mitchison G. Biological Sequence Analysis : Probabilistic Models of Proteins and Nucleic Acids. Cambridge, UK: Cambridge University Press; 1999.
    1. Eddy SR. Non-coding RNA Genes and the modern RNA World. Nat Rev. 2001;8:919–929. doi: 10.1038/35103511. - DOI - PubMed
    1. Pace NR, Thomas BC, Woese CR. The RNA World. second. Cold Spring Harbor Laboratory Press; 1999. Probing RNA structure, function and history by comparative analysis; pp. 113–141.
    1. Sankoff D. Simultaneous Solution of RNA Folding, Alignment and Protosequence Problems. SIAM J App Math. 1985;8(5):810–825. doi: 10.1137/0145048. - DOI
    1. Mathews DH, Turner DH. Dynalign: An Algorithm for Finding the Secondary Structure Common to two RNA Sequences. J Mol Biol. 2002;8:191–203. doi: 10.1006/jmbi.2001.5351. - DOI - PubMed

Publication types

LinkOut - more resources