The CHAOS/DIALIGN WWW server for multiple alignment of genomic sequences

Michael Brudno¹, Rasmus Steinkamp, Burkhard Morgenstern

Affiliations

PMID: 15215346
PMCID: PMC441499
DOI: 10.1093/nar/gkh361

The CHAOS/DIALIGN WWW server for multiple alignment of genomic sequences

Michael Brudno et al. Nucleic Acids Res. 2004.

. 2004 Jul 1;32(Web Server issue):W41-4.

doi: 10.1093/nar/gkh361.

Authors

Michael Brudno¹, Rasmus Steinkamp, Burkhard Morgenstern

Affiliation

¹ Department of Computer Science, Stanford University, Stanford, CA 94305, USA.

PMID: 15215346
PMCID: PMC441499
DOI: 10.1093/nar/gkh361

Abstract

Cross-species sequence comparison is a powerful approach to analyze functional sites in genomic sequences and many discoveries have been made based on genomic alignments. Herein, we present a WWW-based software system for multiple alignment of large genomic sequences. Our server utilizes the previously developed combination of CHAOS and DIALIGN to achieve both speed and alignment accuracy. CHAOS is a fast database search tool that creates a list of local sequence similarities. These are used by DIALIGN as anchor points to speed up the final alignment procedure. The resulting alignment is returned to the user in different formats together with a list of anchor points found by CHAOS. The CHAOS/DIALIGN software is freely available at http://dialign.gobics.de/chaos-dialign-submission.

PubMed Disclaimer

Figures

**Figure 1**
The CHAOS/DIALIGN WWW server for multiple alignment of genomic sequences. Input sequences are uploaded as a single multi-sequence file in FASTA format.

**Figure 2**
Output alignment in DIALIGN format. Names of the aligned sequences are shown on the left. Numbers between names and sequences denote the position of the first residue in a line within the respective sequence. Capital letters denote aligned residues, i.e. residues involved in at least one of the fragments, the alignment consists of. Lower-case letters denote residues not belonging to any of these selected fragments. They are not considered to be aligned. Thus, if a lower-case letter is in the same column with other letters, this is pure chance; these residues are not considered to be homologous. Numbers below the alignment roughly reflect the degree of local similarity among the sequences. More precisely: they represent the sum of weight scores for those fragments that connect residues at the respective column. The numbers are normalized in such a way that every position gets a value between 0 and 9 and in *every* alignment, the region of maximum similarity is scored 9. Thus, these scores indicate *relative* rather than *absolute* similarity.

**Figure 3**
List of fragments (= aligned segment pairs) returned by the program. The list contains those fragments that are part of the respective optimal pair-wise alignments in order of decreasing overlap weights. The list contains coordinates, weight scores and consistency information. For example, the first fragment involves sequences 2 and 3, starts at positions 185,955 and 178,118, respectively, within these sequences, is 90 nucleotides in length, has a weight score of 42.00, an overlap weight score of 107.68 and was found in the first iteration step of the alignment procedure; see (16). The fragment was consistent (‘cons’) in the multiple alignment procedure; i.e. it is included in the final multiple alignment.

See this image and copyright information in PMC

References

1. Bafna V. and Huson,D.H. (2000) The conserved exon method for gene finding. Bioinformatics, 16, 190–202. - PubMed
1. Batzoglou S., Pachter,L., Mesirov,J.P., Berger,B. and Lander,E.S. (2000) Human and mouse gene structure: comparative analysis and application to exon prediction. Genome Res., 10, 950–958. - PMC - PubMed
1. Korf I., Flicek,P., Duan,D. and Brent,M.R. (2001) Integrating genomic homology into gene structure prediction. Bioinformatics, 17, S140–S148. - PubMed
1. Wiehe T., Gebauer-Jung,S., Mitchell-Olds,T. and Guigó,R. (2001) SGP-1: Prediction and validation of homologous genes based on sequence alignments. Genome Res., 11, 1574–1583. - PMC - PubMed
1. Taher L., Rinner,O., Gargh,S., Sczyrba,A., Brudno,M., Batzoglou,S. and Morgenstern,B. (2003) AGenDA: homology-based gene prediction. Bioinformatics, 19, 1575–1577. - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

The CHAOS/DIALIGN WWW server for multiple alignment of genomic sequences

Affiliation

The CHAOS/DIALIGN WWW server for multiple alignment of genomic sequences

Authors

Affiliation

Abstract

Figures

References

MeSH terms

LinkOut - more resources

Full Text Sources