. 2005 Jun 28:6:160.

doi: 10.1186/1471-2105-6-160.

Multiple sequence alignments of partially coding nucleic acid sequences

Roman R Stocsits¹, Ivo L Hofacker, Claudia Fried, Peter F Stadler

Affiliations

PMID: 15985156
PMCID: PMC1182351
DOI: 10.1186/1471-2105-6-160

Multiple sequence alignments of partially coding nucleic acid sequences

Roman R Stocsits et al. BMC Bioinformatics. 2005.

. 2005 Jun 28:6:160.

doi: 10.1186/1471-2105-6-160.

Authors

Roman R Stocsits¹, Ivo L Hofacker, Claudia Fried, Peter F Stadler

Affiliation

¹ Interdisciplinary Centre for Bioinformatics, University of Leipzig, Haertelstrasse 16-18, D-04107 Leipzig, Germany. roman@bioinf.uni-leipzig.de

PMID: 15985156
PMCID: PMC1182351
DOI: 10.1186/1471-2105-6-160

Abstract

Background: High quality sequence alignments of RNA and DNA sequences are an important prerequisite for the comparative analysis of genomic sequence data. Nucleic acid sequences, however, exhibit a much larger sequence heterogeneity compared to their encoded protein sequences due to the redundancy of the genetic code. It is desirable, therefore, to make use of the amino acid sequence when aligning coding nucleic acid sequences. In many cases, however, only a part of the sequence of interest is translated. On the other hand, overlapping reading frames may encode multiple alternative proteins, possibly with intermittent non-coding parts. Examples are, in particular, RNA virus genomes.

Results: The standard scoring scheme for nucleic acid alignments can be extended to incorporate simultaneously information on translation products in one or more reading frames. Here we present a multiple alignment tool, codaln, that implements a combined nucleic acid plus amino acid scoring model for pairwise and progressive multiple alignments that allows arbitrary weighting for almost all scoring parameters. Resource requirements of codaln are comparable with those of standard tools such as ClustalW.

Conclusion: We demonstrate the applicability of codaln to various biologically relevant types of sequences (bacteriophage Levivirus and Vertebrate Hox clusters) and show that the combination of nucleic acid and amino acid sequence information leads to improved alignments. These, in turn, increase the performance of analysis tools that depend strictly on good input alignments such as methods for detecting conserved RNA secondary structure elements.

PubMed Disclaimer

Figures

**Figure 1**
Example for the higher sequence heterogeneity on the level of nucleic acids. A hypothetical amino acid alignment on top represents a high degree of similarity. See the same sequences below on the level of nucleic acids with very low sequence similarity. The pairwise identity is only 33%, just slightly above the 25% identity expected for two random nucleic acid sequences.

**Figure 2**
Application of the scoring model to a hypothetical alignment. Note that there are no amino acid contributions in the right hand part of the example because of the single indel that causes a frameshift. For illustration we show BLOSUM62 scores and simple scores for nucleic acids and gaps rather than the rescaled default values (His/Gln has score 0).

**Figure 3**
Reports on the annotated and inferred structure of the input sequences are automatically generated by codaln, respecting all user intervention.

**Figure 4**
Relative distribution of gaps in an alignment of genomic *Hox4* sequences. The alignment is essentially gap-less in *exon 2*. ClustalW (above) returns a very poor alignment of *exon 1* in which gaps occur with a broad distribution. In contrast, codaln respects the coding region so that almost all gap lengths in this area are divisible by 3.

**Figure 5**
Hogeweg mountain plots of conserved RNA structures in Levivirus genomes. Above: ClustalW, below: codaln. Colors indicate the number of consistent mutations: red 1, ochre 2, green 3, turquoise 4, blue 5; Saturated colors indicate that there are only sequences that are compatible to the structure prediction. Decreasing saturation of the colors indicates 1 or 2 non-compatible sequences. The thickness of the slabs is proportional to the average frequency of the base pair in the thermodynamic equilibrium. For further details see [3].

**Figure 6**
The 5'-terminal hairpin in Levivirus (left) is probably the analogon to the recognition signal site for the RNA replicase in Alloleviviruses which is well analyzed in Qβ (right). In Qβ the replicase amplifies RNA templates autocatalytically with high efficiency. This recognition element in Levivirus likely has a similar function.

See this image and copyright information in PMC

Cited by

MACSE: Multiple Alignment of Coding SEquences accounting for frameshifts and stop codons.
Ranwez V, Harispe S, Delsuc F, Douzery EJ. Ranwez V, et al. PLoS One. 2011;6(9):e22594. doi: 10.1371/journal.pone.0022594. Epub 2011 Sep 16. PLoS One. 2011. PMID: 21949676 Free PMC article.
An upstream protein-coding region in enteroviruses modulates virus infection in gut epithelial cells.
Lulla V, Dinan AM, Hosmillo M, Chaudhry Y, Sherry L, Irigoyen N, Nayak KM, Stonehouse NJ, Zilbauer M, Goodfellow I, Firth AE. Lulla V, et al. Nat Microbiol. 2019 Feb;4(2):280-292. doi: 10.1038/s41564-018-0297-1. Epub 2018 Nov 26. Nat Microbiol. 2019. PMID: 30478287 Free PMC article.
A novel ilarvirus protein CP-RT is expressed via stop codon readthrough and suppresses RDR6-dependent RNA silencing.
Lukhovitskaya N, Brown K, Hua L, Pate AE, Carr JP, Firth AE. Lukhovitskaya N, et al. PLoS Pathog. 2024 May 30;20(5):e1012034. doi: 10.1371/journal.ppat.1012034. eCollection 2024 May. PLoS Pathog. 2024. PMID: 38814986 Free PMC article.
MAGNOLIA: multiple alignment of protein-coding and structural RNA sequences.
Fontaine A, de Monte A, Touzet H. Fontaine A, et al. Nucleic Acids Res. 2008 Jul 1;36(Web Server issue):W14-8. doi: 10.1093/nar/gkn321. Epub 2008 May 30. Nucleic Acids Res. 2008. PMID: 18515348 Free PMC article.
HBVRegDB: annotation, comparison, detection and visualization of regulatory elements in hepatitis B virus sequences.
Panjaworayan N, Roessner SK, Firth AE, Brown CM. Panjaworayan N, et al. Virol J. 2007 Dec 17;4:136. doi: 10.1186/1743-422X-4-136. Virol J. 2007. PMID: 18086305 Free PMC article.

See all "Cited by" articles

References

1. Rivas E, Eddy SR. Noncoding RNA gene detection using comparative sequence analysis. BMC Bioinformatics. 2001;2:19. - PMC - PubMed
1. Hofacker IL, Fekete M, Flamm C, Huynen MA, Rauscher S, Stolorz PE, Stadler PF. Automatic Detection of Conserved RNA Structure Elements in Complete RNA Virus Genomes. Nucl Acids Res. 1998;26:3825–3836. doi: 10.1093/nar/26.16.3825. - DOI - PMC - PubMed
1. Hofacker IL, Stadler PF. Automatic Detection of Conserved Base Pairing Patterns in RNA Virus Genomes. Comp & Chem. 1999;23:401–414. doi: 10.1016/S0097-8485(99)00013-3. - DOI - PubMed
1. Thurner C, Hofacker IL, Stadler PF. Conserved RNA Pseudoknots. In: Giegerich R, Stoye J, editor. Proceedings of the GCB 2004 (Bielefeld), Volume P-53 of GI-Edition: Lecture Notes in Informatics. 2004. pp. 207–216.
1. Washietl S, Hofacker IL, Stadler PF. Fast and reliable prediction of noncoding RNAs. Proc Natl Acad Sci USA. 2005;102:2454–2459. doi: 10.1073/pnas.0409169102. - DOI - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Multiple sequence alignments of partially coding nucleic acid sequences

Affiliation

Multiple sequence alignments of partially coding nucleic acid sequences

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Other Literature Sources