Finding haplotype block boundaries by using the minimum-description-length principle
- PMID: 12858289
- PMCID: PMC1182137
- DOI: 10.1086/377106
Finding haplotype block boundaries by using the minimum-description-length principle
Abstract
We present a method for detecting haplotype blocks that simultaneously uses information about linkage-disequilibrium decay between the blocks and the diversity of haplotypes within the blocks. By use of phased single-nucleotide polymorphism data, our method partitions a chromosome into a series of adjacent, nonoverlapping blocks. The partition is made by choosing among a family of Markov models for block structure in a chromosomal region. Specifically, in the model, the occurrence of haplotypes within blocks follows a time-inhomogeneous Markov process along the chromosome, and we choose among possible partitions by using the two-stage minimum-description-length criterion. When applied to data simulated from the coalescent with recombination hotspots, our method reliably situates block boundaries at the hotspots and infrequently places block boundaries at sites with background levels of recombination. We apply three previously published block-finding methods to the same data, showing that they either are relatively insensitive to recombination hotspots or fail to discriminate between background sites of recombination and hotspots. When applied to the 5q31 data of Daly et al., our method identifies more block boundaries in agreement with those found by Daly et al. than do other methods. These results suggest that our method may be useful for designing association-based mapping studies that exploit haplotype blocks.
Figures
 
              
              
              
              
                
                
                 
              
              
              
              
                
                
                References
Electronic-Database Information
- 
    - MDBlocks Home, http://ib.berkeley.edu/labs/slatkin/eriq/software/mdb_web/index.htm (for the MDBlocks program, as well as for information on the preparation of the 5q31 data of Daly et al. and the mtDNA data used in the analyses presented here)
 
References
- 
    - Cover TM, Thomas JA (1991) Elements of information theory. John Wiley & Sons, New York
 
- 
    - Daly MJ, Rioux JD, Schaffner SF, Hudson TJ, Lander ES (2001) High-resolution haplotype structure in the human genome. Nat Genet 29:229–232 - PubMed
 
- 
    - Gabriel SB, Schaffner SF, Nguyen H, Moore JM, Roy J, Blumenstiel B, Higgins J, DeFelice M, Lochner A, Faggart M, Liu-Cordero SN, Rotimi C, Adeyemo A, Cooper R, Ward R, Lander ES, Daly MJ, Altshuler D (2002) The structure of haplotype blocks in the human genome. Science 296:2225–2229 - PubMed
 
- 
    - Goldstein DB (2001) Islands of linkage disequilibrium. Nat Genet 29:109–111 - PubMed
 
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
- Full Text Sources
 
        