Genotype imputation with thousands of genomes
- PMID: 22384356
- PMCID: PMC3276165
- DOI: 10.1534/g3.111.001198
Genotype imputation with thousands of genomes
Abstract
Genotype imputation is a statistical technique that is often used to increase the power and resolution of genetic association studies. Imputation methods work by using haplotype patterns in a reference panel to predict unobserved genotypes in a study dataset, and a number of approaches have been proposed for choosing subsets of reference haplotypes that will maximize accuracy in a given study population. These panel selection strategies become harder to apply and interpret as sequencing efforts like the 1000 Genomes Project produce larger and more diverse reference sets, which led us to develop an alternative framework. Our approach is built around a new approximation that uses local sequence similarity to choose a custom reference panel for each study haplotype in each region of the genome. This approximation makes it computationally efficient to use all available reference haplotypes, which allows us to bypass the panel selection step and to improve accuracy at low-frequency variants by capturing unexpected allele sharing among populations. Using data from HapMap 3, we show that our framework produces accurate results in a wide range of human populations. We also use data from the Malaria Genetic Epidemiology Network (MalariaGEN) to provide recommendations for imputation-based studies in Africa. We demonstrate that our approximation improves efficiency in large, sequence-based reference panels, and we discuss general computational strategies for modern reference datasets. Genome-wide association studies will soon be able to harness the power of thousands of reference genomes, and our work provides a practical way for investigators to use this rich information. New methodology from this study is implemented in the IMPUTE2 software package.
Keywords: GWAS; haplotype; human; linkage disequilibrium; reference panel.
Figures




Similar articles
-
Improving power of association tests using multiple sets of imputed genotypes from distributed reference panels.Genet Epidemiol. 2017 Dec;41(8):744-755. doi: 10.1002/gepi.22067. Epub 2017 Sep 1. Genet Epidemiol. 2017. PMID: 28861891 Free PMC article.
-
Inclusion of Population-specific Reference Panel from India to the 1000 Genomes Phase 3 Panel Improves Imputation Accuracy.Sci Rep. 2017 Jul 27;7(1):6733. doi: 10.1038/s41598-017-06905-6. Sci Rep. 2017. PMID: 28751670 Free PMC article.
-
A genotype imputation method for de-identified haplotype reference information by using recurrent neural network.PLoS Comput Biol. 2020 Oct 1;16(10):e1008207. doi: 10.1371/journal.pcbi.1008207. eCollection 2020 Oct. PLoS Comput Biol. 2020. PMID: 33001993 Free PMC article.
-
Genotype Imputation in Genome-Wide Association Studies.Curr Protoc Hum Genet. 2019 Jun;102(1):e84. doi: 10.1002/cphg.84. Curr Protoc Hum Genet. 2019. PMID: 31216114 Review.
-
Two-stage strategy using denoising autoencoders for robust reference-free genotype imputation with missing input genotypes.J Hum Genet. 2024 Oct;69(10):511-518. doi: 10.1038/s10038-024-01261-6. Epub 2024 Jun 25. J Hum Genet. 2024. PMID: 38918526 Free PMC article. Review.
Cited by
-
Variations in the FRA10AC1 Fragile Site and 15q21 Are Associated with Cerebrospinal Fluid Aβ1-42 Level.PLoS One. 2015 Aug 7;10(8):e0134000. doi: 10.1371/journal.pone.0134000. eCollection 2015. PLoS One. 2015. PMID: 26252872 Free PMC article.
-
Population genetics of rare variants and complex diseases.Hum Hered. 2012;74(3-4):118-28. doi: 10.1159/000346826. Epub 2013 Apr 11. Hum Hered. 2012. PMID: 23594490 Free PMC article.
-
Pregnancy does not modify the risk of MS in genetically susceptible women.Neurol Neuroimmunol Neuroinflamm. 2020 Oct 9;7(6):e898. doi: 10.1212/NXI.0000000000000898. Print 2020 Nov. Neurol Neuroimmunol Neuroinflamm. 2020. PMID: 33037103 Free PMC article.
-
Role of monoamine-oxidase-A-gene variation in the development of glioblastoma in males: a case control study.J Neurooncol. 2019 Nov;145(2):287-294. doi: 10.1007/s11060-019-03294-w. Epub 2019 Sep 25. J Neurooncol. 2019. PMID: 31556016 Free PMC article.
-
Prioritizing candidate genes post-GWAS using multiple sources of data for mastitis resistance in dairy cattle.BMC Genomics. 2018 Sep 6;19(1):656. doi: 10.1186/s12864-018-5050-x. BMC Genomics. 2018. PMID: 30189836 Free PMC article.
References
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
Molecular Biology Databases