Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 Jul 22;44(8):955-9.
doi: 10.1038/ng.2354.

Fast and accurate genotype imputation in genome-wide association studies through pre-phasing

Affiliations

Fast and accurate genotype imputation in genome-wide association studies through pre-phasing

Bryan Howie et al. Nat Genet. .

Abstract

The 1000 Genomes Project and disease-specific sequencing efforts are producing large collections of haplotypes that can be used as reference panels for genotype imputation in genome-wide association studies (GWAS). However, imputing from large reference panels with existing methods imposes a high computational burden. We introduce a strategy called 'pre-phasing' that maintains the accuracy of leading methods while reducing computational costs. We first statistically estimate the haplotypes for each individual within the GWAS sample (pre-phasing) and then impute missing genotypes into these estimated haplotypes. This reduces the computational cost because (i) the GWAS samples must be phased only once, whereas standard methods would implicitly repeat phasing with each reference panel update, and (ii) it is much faster to match a phased GWAS haplotype to one reference haplotype than to match two unphased GWAS genotypes to a pair of reference haplotypes. We implemented our approach in the MaCH and IMPUTE2 frameworks, and we tested it on data sets from the Wellcome Trust Case Control Consortium 2 (WTCCC2), the Genetic Association Information Network (GAIN), the Women's Health Initiative (WHI) and the 1000 Genomes Project. This strategy will be particularly valuable for repeated imputation as reference panels evolve.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Imputation schematic. Each box represents a genetic dataset and each arrow represents an analysis step. The sizes of the boxes reflect the relative numbers of genotypes they contain, and the widths of the arrows reflect the relative computational costs of the analyses. Given a single GWAS dataset (red box), successively larger reference panels (blue boxes) lead to larger and more accurate imputed datasets (orange boxes). The computational cost of imputation is much lower when using pre-phased GWAS haplotypes (green box, right-hand side) than when using traditional imputation approaches (left-hand side).

References

    1. The International HapMap Project. Nature. 2003;426:789–96. - PubMed
    1. Altshuler DM, et al. Integrating common and rare genetic variation in diverse human populations. Nature. 2010;467:52–8. - PMC - PubMed
    1. A map of human genome variation from population-scale sequencing. Nature. 2010;467:1061–73. - PMC - PubMed
    1. Marchini J, Howie B. Genotype imputation for genome-wide association studies. Nat Rev Genet. 2010;11:499–511. - PubMed
    1. Li Y, Willer C, Sanna S, Abecasis G. Genotype imputation. Annu Rev Genomics Hum Genet. 2009;10:387–406. - PMC - PubMed

Publication types