Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2003 Aug;73(2):336-54.
doi: 10.1086/377106. Epub 2003 Jul 11.

Finding haplotype block boundaries by using the minimum-description-length principle

Affiliations

Finding haplotype block boundaries by using the minimum-description-length principle

Eric C Anderson et al. Am J Hum Genet. 2003 Aug.

Abstract

We present a method for detecting haplotype blocks that simultaneously uses information about linkage-disequilibrium decay between the blocks and the diversity of haplotypes within the blocks. By use of phased single-nucleotide polymorphism data, our method partitions a chromosome into a series of adjacent, nonoverlapping blocks. The partition is made by choosing among a family of Markov models for block structure in a chromosomal region. Specifically, in the model, the occurrence of haplotypes within blocks follows a time-inhomogeneous Markov process along the chromosome, and we choose among possible partitions by using the two-stage minimum-description-length criterion. When applied to data simulated from the coalescent with recombination hotspots, our method reliably situates block boundaries at the hotspots and infrequently places block boundaries at sites with background levels of recombination. We apply three previously published block-finding methods to the same data, showing that they either are relatively insensitive to recombination hotspots or fail to discriminate between background sites of recombination and hotspots. When applied to the 5q31 data of Daly et al., our method identifies more block boundaries in agreement with those found by Daly et al. than do other methods. These results suggest that our method may be useful for designing association-based mapping studies that exploit haplotype blocks.

PubMed Disclaimer

Figures

Figure  1
Figure  1
Distribution of the number of recombination events between adjacent marker pairs. In 1,000 ancestries simulated from the coalescent with 10 recombination hotspots, we recorded the number of recombinations occurring in intervals between adjacent marker pairs. The plot shows a histogram of the number of intermarker intervals within which a given number of recombination events occurred. This histogram has two modes: the right mode, which represents recombinations occurring at simulated hotspots, accounts for only ∼9% of the intermarker intervals but ∼95% of all recombinations; the left mode represents recombinations that did not occur at hotspots. The distribution was obtained for the parameter settings of Rh=200, ρ=200, and N=18. (Note the Y-axis break between 2,000 and 8,000.)
Figure  2
Figure  2
Distribution of the number of recombinations within marker intervals identified as block boundaries, for the MDB (A), FGT (B), and htSNP (C) methods. Plus symbols (+) show the proportion of intermarker intervals, identified as block boundaries by each method, within which a given number of recombinations occurred in 1,000 simulated ancestries. Open circles (○) show the proportion of all intermarker intervals (irrespective of whether they were inferred to contain block boundaries) within which a given number of recombinations occurred (as in fig. 1). If block boundaries were inferred completely at random, the two distributions within each panel would be nearly identical. The relative height of the right mode of the plus symbols reflects the extent to which block-finding methods selectively infer intervals containing hotspots to be block boundaries. Simulation parameters were Rh=200, ρ=200, and N=18 for 10 hotspots.

References

Electronic-Database Information

    1. MDBlocks Home, http://ib.berkeley.edu/labs/slatkin/eriq/software/mdb_web/index.htm (for the MDBlocks program, as well as for information on the preparation of the 5q31 data of Daly et al. and the mtDNA data used in the analyses presented here)

References

    1. Cover TM, Thomas JA (1991) Elements of information theory. John Wiley & Sons, New York
    1. Cullen M, Perfetto SP, Klitz W, Nelson G, Carrington M (2002) High-resolution patterns of meiotic recombination across the human major histocompatibility complex. Am J Hum Genet 71:759–776 - PMC - PubMed
    1. Daly MJ, Rioux JD, Schaffner SF, Hudson TJ, Lander ES (2001) High-resolution haplotype structure in the human genome. Nat Genet 29:229–232 - PubMed
    1. Gabriel SB, Schaffner SF, Nguyen H, Moore JM, Roy J, Blumenstiel B, Higgins J, DeFelice M, Lochner A, Faggart M, Liu-Cordero SN, Rotimi C, Adeyemo A, Cooper R, Ward R, Lander ES, Daly MJ, Altshuler D (2002) The structure of haplotype blocks in the human genome. Science 296:2225–2229 - PubMed
    1. Goldstein DB (2001) Islands of linkage disequilibrium. Nat Genet 29:109–111 - PubMed

Publication types

Substances

LinkOut - more resources