Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2004 Apr;14(4):693-9.
doi: 10.1101/gr.1960404.

MAVID: constrained ancestral alignment of multiple sequences

Affiliations
Comparative Study

MAVID: constrained ancestral alignment of multiple sequences

Nicolas Bray et al. Genome Res. 2004 Apr.

Abstract

We describe a new global multiple-alignment program capable of aligning a large number of genomic regions. Our progressive-alignment approach incorporates the following ideas: maximum-likelihood inference of ancestral sequences, automatic guide-tree construction, protein-based anchoring of ab-initio gene predictions, and constraints derived from a global homology map of the sequences. We have implemented these ideas in the MAVID program, which is able to accurately align multiple genomic regions up to megabases long. MAVID is able to effectively align divergent sequences, as well as incomplete unfinished sequences. We demonstrate the capabilities of the program on the benchmark CFTR region, which consists of 1.8 Mb of human sequence and 20 orthologous regions in marsupials, birds, fish, and mammals. Finally, we describe two large MAVID alignments, an alignment of all the available HIV genomes and a multiple alignment of the entire human, mouse, and rat genomes.

PubMed Disclaimer

Figures

Figure 1
Figure 1
MAVID architecture overview. (A) Sequences are aligned upward along a guide tree and (B) alignments of alignments are performed at internal nodes. To align two alignments (C), maximum likelihood ancestor sequences are inferred from each of the separate alignments, and (D) the ancestor sequences are aligned with MAVID. The resulting multiple alignment (E) (corresponding to a subset of leaves of the tree) is then recorded at the internal node.
Figure 2
Figure 2
The top half of the figure shows two exon matches determined from the homology map. In particular, exon r in sequence 1 is aligned to an exon in sequence 3, and exon s in sequence 2 is aligned to another exon in sequence 3 (double arrows). At this stage, none of the sequences have been aligned - the matches are based on the pairwise protein alignments of the predicted genes. During a MAVID alignment of ancestral sequences in the progressive multiple alignment, position r from sequence 1 maps to position u in the ancestral sequence A, and position s maps to position v in the ancestral sequence B (solid lines). Even though sequence 3 is not in the multiple alignment yet, the constraint forces position u to be aligned before position v in the final multiple alignment (broken line). The constraint is enforced by removing all the matches violating the constraint from consideration during the anchoring of the alignment.
Figure 3
Figure 3
Coverage of human chromosome 20 RefSeq exons by the MAVID alignments. Of a total of 3927 exons, only six were not in the homology map. A total of 53.5% of the exons were covered by precomputed exon anchors in either mouse or rat. The remaining exons are mostly aligned by MAVID, resulting in 93.6% of the exons covered by alignment in either mouse or rat.
Figure 4
Figure 4
The HIV tree as inferred from 242 sequences obtained from Los Alamos National Laboratories. The sequences are labeled by strains, and the different strains have been grouped together.

Similar articles

Cited by

References

    1. Boffelli, B., McAuliffe, J., Ovcharenko, D., Lewis, K.D., Ovcharenko, I., Pachter, L., and Rubin, E.M. 2003. Phylogenetic shadowing of primate sequences to find functional regions of the human genome. Science 299: 1391-1394. - PubMed
    1. Bray, N. and Pachter, L. 2003. MAVID multiple alignment server. Nucleic Acids Res. 31: 3525-3526. - PMC - PubMed
    1. Bray, N., Dubchak, I., and Pachter, L. 2003. AVID: A global alignment program. Genome Res. 13: 97-102. - PMC - PubMed
    1. Brudno, M., Do, C.B., Cooper, G.M., Kim, M.F., Davydov, E., N.C.S. Program, Green, E.D., Sidow, A., and Batzoglou, S. 2003. LAGAN and Multi-LAGAN: Efficient tools for large-scale multiple alignment of genomic DNA. Genome Res. 13: 721-731. - PMC - PubMed
    1. Burge, C. and Karlin, S. 1997. Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. 268: 78-94. - PubMed

WEB SITE REFERENCES

    1. http://www.nisc.nih.gov/; NIH Intramural Sequencing Center.
    1. http://hiv-web.lanl.gov/; LANL HIV Databases.
    1. http://baboon.math.berkeley.edu/mavid/; The MAVID Web server.
    1. http://baboon.math.berkeley.edu/mavid/data/; Supplemental Data.
    1. http://hanuman.math.berkeley.edu/kbrowser/; K-BROWSER.

Publication types

MeSH terms

Substances

LinkOut - more resources