Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2007 Jun 8:8:190.
doi: 10.1186/1471-2105-8-190.

Can Clustal-style progressive pairwise alignment of multiple sequences be used in RNA secondary structure prediction?

Affiliations

Can Clustal-style progressive pairwise alignment of multiple sequences be used in RNA secondary structure prediction?

Amelia B Bellamy-Royds et al. BMC Bioinformatics. .

Abstract

Background: In ribonucleic acid (RNA) molecules whose function depends on their final, folded three-dimensional shape (such as those in ribosomes or spliceosome complexes), the secondary structure, defined by the set of internal basepair interactions, is more consistently conserved than the primary structure, defined by the sequence of nucleotides.

Results: The research presented here investigates the possibility of applying a progressive, pairwise approach to the alignment of multiple RNA sequences by simultaneously predicting an energy-optimized consensus secondary structure. We take an existing algorithm for finding the secondary structure common to two RNA sequences, Dynalign, and alter it to align profiles of multiple sequences. We then explore the relative successes of different approaches to designing the tree that will guide progressive alignments of sequence profiles to create a multiple alignment and prediction of conserved structure.

Conclusion: We have found that applying a progressive, pairwise approach to the alignment of multiple ribonucleic acid sequences produces highly reliable predictions of conserved basepairs, and we have shown how these predictions can be used as constraints to improve the results of a single-sequence structure prediction algorithm. However, we have also discovered that the amount of detail included in a consensus structure prediction is highly dependent on the order in which sequences are added to the alignment (the guide tree), and that if a consensus structure does not have sufficient detail, it is less likely to provide useful constraints for the single-sequence method.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Boxplots showing the range of sensitivity and positive predictive value statistics for mfold optimum predictions on the two test data sets.
Figure 2
Figure 2
Boxplots showing the range of sensitivity and positive predictive value statistics for all pairwise alignments of the tRNA sequences, at each gap penalty value tested.
Figure 3
Figure 3
Boxplots showing the range of sensitivity and positive predictive value statistics for all pairwise alignments of the 5S rRNA sequences, at each gap penalty value tested.
Figure 4
Figure 4
Sensitivity versus positive predictive value (PPV) for all predicted tRNA structures from the pairwise alignments and all stages of the randomized progressive alignments, with the gap penalty used and the number of sequences in the alignment indicated by the colour and size of the points, respectively.
Figure 5
Figure 5
Sensitivity versus positive predictive value (PPV) for all predicted 5S rRNA structures from the pairwise alignments and all stages of the randomized progressive alignments, with the gap penalty used and the number of sequences in the alignment indicated by the colour and size of the points, respectively.
Figure 6
Figure 6
Boxplots showing the range in sensitivity and positive predictive value (PPV) for the twelve predicted tRNA structures from the final alignment and consensus predictions of all runs with gap penalty ≥ 2.0 kcal/mol, defined by the algorithm used to build the guide tree. NJ is neighbor-joining, CW is Clustal W, B1, B2 and B3 represent the 3 runs with balanced guide trees, while L1, L2 and L3 represent the 3 runs with linear guide trees. Undefined PPV values are plotted as zero.
Figure 7
Figure 7
Boxplots showing the range in sensitivity and positive predictive value (PPV) for the twelve predicted 5S rRNA structures from the final alignment and consensus predictions of all runs with gap penalty ≥ 2.0 kcal/mol, defined by the algorithm used to build the guide tree. NJ is neighbor-joining, CW is Clustal W, B1, B2 and B3 represent the 3 runs with balanced guide trees, while L1, L2 and L3 represent the 3 runs with linear guide trees. Undefined PPV values are plotted as zero.
Figure 8
Figure 8
The phylogeny predicted by Clustal W for the tRNA sequences (a), and the guide tree created by neighbour-joining the scores from the pairwise alignments with gap penalty of 4 kcal/mol (b). Images produced by drawgram, from the Phylip package of programs [30].
Figure 9
Figure 9
The phylogeny predicted by Clustal W for the 5S rRNA sequences (a), and the guide tree created by neighbour-joining the scores from the pairwise alignments with gap penalty of 4 kcal/mol (b). Images produced by drawgram, from the Phylip package of programs [30].
Figure 10
Figure 10
The sensitivity, positive predictive value (PPV), and Matthew's Correlation Coefficient (MCC) statistics for the unconstrained mfold optimum structure, selected consensus structures, and the mfold optimum prediction when the consensus structure is used as a forced constraint ('refolded'), for each tRNA sequence. (a) Gap penalty of 4.0 kcal/mol used to generate consensus alignment. (b) Gap penalty of 6.0 kcal/mol used to generate consensus alignment.
Figure 11
Figure 11
The sensitivity, positive predictive value (PPV), and Matthew's Correlation Coefficient (MCC) statistics for the unconstrained mfold optimum structure, selected consensus structure, and the mfold optimum prediction when the consensus structure is used as a forced constraint ('refolded'), for each 5S rRNA sequence.
Figure 12
Figure 12
The reference secondary structure predicted for 5S rRNA sequence V00336 (a), the structure predicted by mfold as the unconstrained optimum (b), the conserved structure predicted by the nearest-neighbour consensus algorithm with gap penalty 4 or 6 kcal/mol (c), and the structure predicted by mfold as the optimum, when it was forced to include the consensus basepairs (d). Images produced by the sir graph utility of the mfold program [24].

References

    1. Sankoff D. Simultaneous solution of RNA folding, alignment and protosequence problems. SIAM J Appl Math. 1985;45:810–825. doi: 10.1137/0145048. - DOI
    1. Gorodkin J, Stricklin SL, Stormo GD. Discovering common stem-loop motifs in unaligned RNA sequences. Nucl Acids Res. 2001;29:2135–2144. doi: 10.1093/nar/29.10.2135. - DOI - PMC - PubMed
    1. Hofacker IL, Bernhart SHF, Stadler PF. Alignment of RNA base pairing probability matrices. Bioinformatics. 2004;20:2222–2227. doi: 10.1093/bioinformatics/bth229. - DOI - PubMed
    1. Thompson JD, Higgins DG, Gibson TJ. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucl Acids Res. 1994;22:4673–80. doi: 10.1093/nar/22.22.4673. - DOI - PMC - PubMed
    1. Mathews D, Turner D. Dynalign: an algorithm for finding the secondary structure common to two RNA sequences. J Mol Biol. 2002;317:191–203. doi: 10.1006/jmbi.2001.5351. - DOI - PubMed

Publication types

LinkOut - more resources