Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 May 15;29(10):1341-2.
doi: 10.1093/bioinformatics/btt128. Epub 2013 Mar 16.

HAL: a hierarchical format for storing and analyzing multiple genome alignments

Affiliations

HAL: a hierarchical format for storing and analyzing multiple genome alignments

Glenn Hickey et al. Bioinformatics. .

Abstract

Motivation: Large multiple genome alignments and inferred ancestral genomes are ideal resources for comparative studies of molecular evolution, and advances in sequencing and computing technology are making them increasingly obtainable. These structures can provide a rich understanding of the genetic relationships between all subsets of species they contain. Current formats for storing genomic alignments, such as XMFA and MAF, are all indexed or ordered using a single reference genome, however, which limits the information that can be queried with respect to other species and clades. This loss of information grows with the number of species under comparison, as well as their phylogenetic distance.

Results: We present HAL, a compressed, graph-based hierarchical alignment format for storing multiple genome alignments and ancestral reconstructions. HAL graphs are indexed on all genomes they contain. Furthermore, they are organized phylogenetically, which allows for modular and parallel access to arbitrary subclades without fragmentation because of rearrangements that have occurred in other lineages. HAL graphs can be created or read with a comprehensive C++ API. A set of tools is also provided to perform basic operations, such as importing and exporting data, identifying mutations and coordinate mapping (liftover).

Availability: All documentation and source code for the HAL API and tools are freely available at http://github.com/glennhickey/hal.

Contact: hickey@soe.ucsc.edu or haussler@soe.ucsc.edu

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
(A) A single genome as represented in HAL. Two sequences are stored in an array of DNA characters and are segmented with respect to its parent (top segments) and children (bottom segments). (B) The same genome in the context of HAL graph of five genomes. The dashed edge corresponds to an inversion event

References

    1. Blanchette M, et al. Aligning multiple genomic sequences with the threaded blockset al.gner. Genome Res. 2004;14:708–715. - PMC - PubMed
    1. Haussler D, et al. Genome 10k: a proposal to obtain whole-genome sequence for 10 000 vertebrate species. J. Hered. 2009;100:659–674. - PMC - PubMed
    1. Neph S, et al. Bedops: high-performance genomic feature operations. Bioinformatics. 2012;28:1919–1920. - PMC - PubMed
    1. Notredame C. Recent evolutions of multiple sequence alignment algorithms. PLoS Comput. Biol. 2007;3:e123. - PMC - PubMed
    1. Paten B, et al. Cactus: algorithms for genome multiple sequence alignment. Genome Res. 2011;21:1512–1528. - PMC - PubMed

Publication types

LinkOut - more resources