Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering
- PMID: 17924348
- PMCID: PMC2265661
- DOI: 10.1086/521987
Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering
Abstract
Whole-genome association studies present many new statistical and computational challenges due to the large quantity of data obtained. One of these challenges is haplotype inference; methods for haplotype inference designed for small data sets from candidate-gene studies do not scale well to the large number of individuals genotyped in whole-genome association studies. We present a new method and software for inference of haplotype phase and missing data that can accurately phase data from whole-genome association studies, and we present the first comparison of haplotype-inference methods for real and simulated data sets with thousands of genotyped individuals. We find that our method outperforms existing methods in terms of both speed and accuracy for large data sets with thousands of individuals and densely spaced genetic markers, and we use our method to phase a real data set of 3,002 individuals genotyped for 490,032 markers in 3.1 days of computing time, with 99% of masked alleles imputed correctly. Our method is implemented in the Beagle software package, which is freely available.
Figures


Similar articles
-
A haplotype inference algorithm for trios based on deterministic sampling.BMC Genet. 2010 Aug 23;11:78. doi: 10.1186/1471-2156-11-78. BMC Genet. 2010. PMID: 20727218 Free PMC article.
-
Extending long-range phasing and haplotype library imputation algorithms to large and heterogeneous datasets.Genet Sel Evol. 2020 Jul 8;52(1):38. doi: 10.1186/s12711-020-00558-2. Genet Sel Evol. 2020. PMID: 32640985 Free PMC article.
-
Fast and Robust Identity-by-Descent Inference with the Templated Positional Burrows-Wheeler Transform.Mol Biol Evol. 2021 May 4;38(5):2131-2151. doi: 10.1093/molbev/msaa328. Mol Biol Evol. 2021. PMID: 33355662 Free PMC article.
-
Missing data imputation and haplotype phase inference for genome-wide association studies.Hum Genet. 2008 Dec;124(5):439-50. doi: 10.1007/s00439-008-0568-7. Epub 2008 Oct 11. Hum Genet. 2008. PMID: 18850115 Free PMC article. Review.
-
[Analysis and application of SNP and haplotype in the human genome].Yi Chuan Xue Bao. 2005 Aug;32(8):879-89. Yi Chuan Xue Bao. 2005. PMID: 16231744 Review. Chinese.
Cited by
-
Integration of multi-omics approaches for functional characterization of muscle related selective sweep genes in Nanchukmacdon.Sci Rep. 2021 Mar 30;11(1):7219. doi: 10.1038/s41598-021-86683-4. Sci Rep. 2021. PMID: 33785872 Free PMC article.
-
Enhanced localization of genetic samples through linkage-disequilibrium correction.Am J Hum Genet. 2013 Jun 6;92(6):882-94. doi: 10.1016/j.ajhg.2013.04.023. Epub 2013 May 30. Am J Hum Genet. 2013. PMID: 23726367 Free PMC article.
-
Genotype imputation in a coalescent model with infinitely-many-sites mutation.Theor Popul Biol. 2013 Aug;87:62-74. doi: 10.1016/j.tpb.2012.09.006. Epub 2012 Oct 16. Theor Popul Biol. 2013. PMID: 23079542 Free PMC article.
-
Differential positive selection of malaria resistance genes in three indigenous populations of Peninsular Malaysia.Hum Genet. 2015 Apr;134(4):375-92. doi: 10.1007/s00439-014-1525-2. Epub 2015 Jan 30. Hum Genet. 2015. PMID: 25634076
-
Association mapping and genomic selection for sorghum adaptation to tropical soils of Brazil in a sorghum multiparental random mating population.Theor Appl Genet. 2021 Jan;134(1):295-312. doi: 10.1007/s00122-020-03697-8. Epub 2020 Oct 14. Theor Appl Genet. 2021. PMID: 33052425
References
Web Resources
-
- Beagle genetic analysis software package, http://www.stat.auckland.ac.nz/~browning/beagle/beagle.html
-
- WTCCC, http://www.wtccc.org.uk/
References
-
- Excoffier L, Slatkin M (1995) Maximum-likelihood estimation of molecular haplotype frequencies in a diploid population. Mol Biol Evol 12:921–927 - PubMed
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
Molecular Biology Databases