A unified approach to genotype imputation and haplotype-phase inference for large data sets of trios and unrelated individuals
- PMID: 19200528
- PMCID: PMC2668004
- DOI: 10.1016/j.ajhg.2009.01.005
A unified approach to genotype imputation and haplotype-phase inference for large data sets of trios and unrelated individuals
Abstract
We present methods for imputing data for ungenotyped markers and for inferring haplotype phase in large data sets of unrelated individuals and parent-offspring trios. Our methods make use of known haplotype phase when it is available, and our methods are computationally efficient so that the full information in large reference panels with thousands of individuals is utilized. We demonstrate that substantial gains in imputation accuracy accrue with increasingly large reference panel sizes, particularly when imputing low-frequency variants, and that unphased reference panels can provide highly accurate genotype imputation. We place our methodology in a unified framework that enables the simultaneous use of unphased and phased data from trios and unrelated individuals in a single analysis. For unrelated individuals, our imputation methods produce well-calibrated posterior genotype probabilities and highly accurate allele-frequency estimates. For trios, our haplotype-inference method is four orders of magnitude faster than the gold-standard PHASE program and has excellent accuracy. Our methods enable genotype imputation to be performed with unphased trio or unrelated reference panels, thus accounting for haplotype-phase uncertainty in the reference panel. We present a useful measure of imputation accuracy, allelic R(2), and show that this measure can be estimated accurately from posterior genotype probabilities. Our methods are implemented in version 3.0 of the BEAGLE software package.
Figures





Similar articles
-
A genotype imputation method for de-identified haplotype reference information by using recurrent neural network.PLoS Comput Biol. 2020 Oct 1;16(10):e1008207. doi: 10.1371/journal.pcbi.1008207. eCollection 2020 Oct. PLoS Comput Biol. 2020. PMID: 33001993 Free PMC article.
-
Fast imputation using medium or low-coverage sequence data.BMC Genet. 2015 Jul 14;16:82. doi: 10.1186/s12863-015-0243-7. BMC Genet. 2015. PMID: 26168789 Free PMC article.
-
Methods of tagSNP selection and other variables affecting imputation accuracy in swine.BMC Genet. 2013 Feb 21;14:8. doi: 10.1186/1471-2156-14-8. BMC Genet. 2013. PMID: 23433396 Free PMC article.
-
A Pipeline for Phasing and Genotype Imputation on Mixed Human Data (Parents-Offspring Trios and Unrelated Subjects) by Reviewing Current Methods and Software.Life (Basel). 2022 Dec 5;12(12):2030. doi: 10.3390/life12122030. Life (Basel). 2022. PMID: 36556394 Free PMC article. Review.
-
Missing data imputation and haplotype phase inference for genome-wide association studies.Hum Genet. 2008 Dec;124(5):439-50. doi: 10.1007/s00439-008-0568-7. Epub 2008 Oct 11. Hum Genet. 2008. PMID: 18850115 Free PMC article. Review.
Cited by
-
GWAS, MWAS and mGWAS provide insights into precision agriculture based on genotype-dependent microbial effects in foxtail millet.Nat Commun. 2022 Oct 7;13(1):5913. doi: 10.1038/s41467-022-33238-4. Nat Commun. 2022. PMID: 36207301 Free PMC article.
-
A genome-wide association study of alcohol-dependence symptom counts in extended pedigrees identifies C15orf53.Mol Psychiatry. 2013 Nov;18(11):1218-24. doi: 10.1038/mp.2012.143. Epub 2012 Oct 23. Mol Psychiatry. 2013. PMID: 23089632 Free PMC article.
-
Genetic and genomic analysis of reproduction traits in holstein cattle using SNP chip data and imputed sequence level genotypes.BMC Genomics. 2024 Sep 19;25(1):880. doi: 10.1186/s12864-024-10782-5. BMC Genomics. 2024. PMID: 39300329 Free PMC article.
-
GENOME-WIDE ASSOCIATION STUDY (GWAS) AND GENOME-WIDE BY ENVIRONMENT INTERACTION STUDY (GWEIS) OF DEPRESSIVE SYMPTOMS IN AFRICAN AMERICAN AND HISPANIC/LATINA WOMEN.Depress Anxiety. 2016 Apr;33(4):265-80. doi: 10.1002/da.22484. Depress Anxiety. 2016. PMID: 27038408 Free PMC article.
-
Selection for silage yield and composition did not affect genomic diversity within the Wisconsin Quality Synthetic maize population.G3 (Bethesda). 2015 Feb 2;5(4):541-9. doi: 10.1534/g3.114.015263. G3 (Bethesda). 2015. PMID: 25645532 Free PMC article.
References
-
- Zeggini E., Scott L.J., Saxena R., Voight B.F., Marchini J.L., Hu T., de Bakker P.I., Abecasis G.R., Almgren P., Andersen G. Meta-analysis of genome-wide association data and large-scale replication identifies additional susceptibility loci for type 2 diabetes. Nat. Genet. 2008;40:638–645. - PMC - PubMed
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources