A unified approach to genotype imputation and haplotype-phase inference for large data sets of trios and unrelated individuals
- PMID: 19200528
- PMCID: PMC2668004
- DOI: 10.1016/j.ajhg.2009.01.005
A unified approach to genotype imputation and haplotype-phase inference for large data sets of trios and unrelated individuals
Abstract
We present methods for imputing data for ungenotyped markers and for inferring haplotype phase in large data sets of unrelated individuals and parent-offspring trios. Our methods make use of known haplotype phase when it is available, and our methods are computationally efficient so that the full information in large reference panels with thousands of individuals is utilized. We demonstrate that substantial gains in imputation accuracy accrue with increasingly large reference panel sizes, particularly when imputing low-frequency variants, and that unphased reference panels can provide highly accurate genotype imputation. We place our methodology in a unified framework that enables the simultaneous use of unphased and phased data from trios and unrelated individuals in a single analysis. For unrelated individuals, our imputation methods produce well-calibrated posterior genotype probabilities and highly accurate allele-frequency estimates. For trios, our haplotype-inference method is four orders of magnitude faster than the gold-standard PHASE program and has excellent accuracy. Our methods enable genotype imputation to be performed with unphased trio or unrelated reference panels, thus accounting for haplotype-phase uncertainty in the reference panel. We present a useful measure of imputation accuracy, allelic R(2), and show that this measure can be estimated accurately from posterior genotype probabilities. Our methods are implemented in version 3.0 of the BEAGLE software package.
Figures





Similar articles
-
A genotype imputation method for de-identified haplotype reference information by using recurrent neural network.PLoS Comput Biol. 2020 Oct 1;16(10):e1008207. doi: 10.1371/journal.pcbi.1008207. eCollection 2020 Oct. PLoS Comput Biol. 2020. PMID: 33001993 Free PMC article.
-
Fast imputation using medium or low-coverage sequence data.BMC Genet. 2015 Jul 14;16:82. doi: 10.1186/s12863-015-0243-7. BMC Genet. 2015. PMID: 26168789 Free PMC article.
-
Methods of tagSNP selection and other variables affecting imputation accuracy in swine.BMC Genet. 2013 Feb 21;14:8. doi: 10.1186/1471-2156-14-8. BMC Genet. 2013. PMID: 23433396 Free PMC article.
-
A Pipeline for Phasing and Genotype Imputation on Mixed Human Data (Parents-Offspring Trios and Unrelated Subjects) by Reviewing Current Methods and Software.Life (Basel). 2022 Dec 5;12(12):2030. doi: 10.3390/life12122030. Life (Basel). 2022. PMID: 36556394 Free PMC article. Review.
-
Missing data imputation and haplotype phase inference for genome-wide association studies.Hum Genet. 2008 Dec;124(5):439-50. doi: 10.1007/s00439-008-0568-7. Epub 2008 Oct 11. Hum Genet. 2008. PMID: 18850115 Free PMC article. Review.
Cited by
-
Using selection index theory to estimate consistency of multi-locus linkage disequilibrium across populations.BMC Genet. 2015 Jul 19;16:87. doi: 10.1186/s12863-015-0252-6. BMC Genet. 2015. PMID: 26187501 Free PMC article.
-
A missense mutation in TUBD1 is associated with high juvenile mortality in Braunvieh and Fleckvieh cattle.BMC Genomics. 2016 May 25;17:400. doi: 10.1186/s12864-016-2742-y. BMC Genomics. 2016. PMID: 27225349 Free PMC article.
-
Identifying Candidate Genes for Hypoxia Adaptation of Tibet Chicken Embryos by Selection Signature Analyses and RNA Sequencing.Genes (Basel). 2020 Jul 20;11(7):823. doi: 10.3390/genes11070823. Genes (Basel). 2020. PMID: 32698384 Free PMC article.
-
A computationally efficient algorithm for genomic prediction using a Bayesian model.Genet Sel Evol. 2015 Apr 30;47(1):34. doi: 10.1186/s12711-014-0082-4. Genet Sel Evol. 2015. PMID: 25926276 Free PMC article.
-
Genome-wide association study on legendre random regression coefficients for the growth and feed intake trajectory on Duroc Boars.BMC Genet. 2015 May 30;16:59. doi: 10.1186/s12863-015-0218-8. BMC Genet. 2015. PMID: 26024912 Free PMC article.
References
-
- Zeggini E., Scott L.J., Saxena R., Voight B.F., Marchini J.L., Hu T., de Bakker P.I., Abecasis G.R., Almgren P., Andersen G. Meta-analysis of genome-wide association data and large-scale replication identifies additional susceptibility loci for type 2 diabetes. Nat. Genet. 2008;40:638–645. - PMC - PubMed
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources