Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 Jun 27:3:117.
doi: 10.3389/fgene.2012.00117. eCollection 2012.

Genotype Imputation for Latinos Using the HapMap and 1000 Genomes Project Reference Panels

Affiliations

Genotype Imputation for Latinos Using the HapMap and 1000 Genomes Project Reference Panels

Xiaoyi Gao et al. Front Genet. .

Abstract

Genotype imputation is a vital tool in genome-wide association studies (GWAS) and meta-analyses of multiple GWAS results. Imputation enables researchers to increase genomic coverage and to pool data generated using different genotyping platforms. HapMap samples are often employed as the reference panel. More recently, the 1000 Genomes Project resource is becoming the primary source for reference panels. Multiple GWAS and meta-analyses are targeting Latinos, the most populous, and fastest growing minority group in the US. However, genotype imputation resources for Latinos are rather limited compared to individuals of European ancestry at present, largely because of the lack of good reference data. One choice of reference panel for Latinos is one derived from the population of Mexican individuals in Los Angeles contained in the HapMap Phase 3 project and the 1000 Genomes Project. However, a detailed evaluation of the quality of the imputed genotypes derived from the public reference panels has not yet been reported. Using simulation studies, the Illumina OmniExpress GWAS data from the Los Angles Latino Eye Study and the MACH software package, we evaluated the accuracy of genotype imputation in Latinos. Our results show that the 1000 Genomes Project AMR + CEU + YRI reference panel provides the highest imputation accuracy for Latinos, and that also including Asian samples in the panel can reduce imputation accuracy. We also provide the imputation accuracy for each autosomal chromosome using the 1000 Genomes Project panel for Latinos. Our results serve as a guide to future imputation based analysis in Latinos.

Keywords: 1000 Genomes Project; HapMap Project; Latino; genotype imputation.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Distributions of per individual and per SNP errors and the imputed MACH Rsq. Pink and blue denote the 200-individual and 52-individual reference panels, respectively. Genotype imputation accuracy is tested in an additional 500 simulated individuals. The true (simulated) genotypes of 11,825 SNPs on chromosome 22 (those HapMap Phase 3 SNPs not present on the Illumina OmniExpree Beadchip) are compared with the imputed genotypes. (A) Distribution of per individual errors. (B) Distribution of per SNP errors. (C) Boxplots of the MACH Rsq for the imputed SNPs.
Figure 2
Figure 2
Pairwise plot of the dosage r2 by the MACH Rsq. Diagonal line (red) is a perfect match between the MACH Rsq and the dosage r2. Further off the diagonal line means poorer estimate. The correlation coefficient between Rsq and r2 is 0.96.
Figure 3
Figure 3
Boxplot of the MACH Rsq for the imputed SNPs stratified by the minor allele frequency. Boxplot of the MACH Rsq for 485,313 imputed SNPs on chromosome 22 (with all typed SNPs by the Illumina OmniExpress excluded) based on the 1000 Genomes Project AMR + CEU + YRI reference panel. Abbreviations: MAF, minor allele frequency.
Figure 4
Figure 4
Genotype imputation accuracy by chromosome. Genotype imputation accuracy is measured by per genotype error rate by randomly masking 2% genome-wide SNPs.
Figure A1
Figure A1
Principal components analysis of the simulated individuals and the HapMap Mexican–American individuals.
Figure A2
Figure A2
Pairwise plot of the dosage r2 by the MACH Rsq for chromosome 9. Diagonal line (red) is a perfect match between the MACH Rsq and the dosage r2. Further off the diagonal line means poorer estimate.

Similar articles

Cited by

References

    1. Altshuler D. M., Gibbs R. A., Peltonen L., Dermitzakis E., Schaffner S. F., Yu F., Bonnen P. E., De Bakker P. I., Deloukas P., Gabriel S. B., Gwilliam R., Hunt S., Inouye M., Jia X., Palotie A., Parkin M., Whittaker P., Chang K., Hawes A., Lewis L. R., Ren Y., Wheeler D., Muzny D. M., Barnes C., Darvishi K., Hurles M., Korn J. M., Kristiansson K., Lee C., Mccarrol S. A., Nemesh J., Keinan A., Montgomery S. B., Pollack S., Price A. L., Soranzo N., Gonzaga-Jauregui C., Anttila V., Brodeur W., Daly M. J., Leslie S., Mcvean G., Moutsianas L., Nguyen H., Zhang Q., Ghori M. J., Mcginnis R., Mclaren W., Takeuchi F., Grossman S. R., Shlyakhter I., Hostetter E. B., Sabeti P. C., Adebamowo C. A., Foster M. W., Gordon D. R., Licinio J., Manca M. C., Marshall P. A., Matsuda I., Ngare D., Wang V. O., Reddy D., Rotimi C. N., Royal C. D., Sharp R. R., Zeng C., Brooks L. D., Mcewen J. E. (2010). Integrating common and rare genetic variation in diverse human populations. Nature 467, 52–5810.1038/nature09298 - DOI - PMC - PubMed
    1. Browning B. L., Browning S. R. (2009). A unified approach to genotype imputation and haplotype-phase inference for large data sets of trios and unrelated individuals. Am. J. Hum. Genet. 84, 210–22310.1016/j.ajhg.2009.01.005 - DOI - PMC - PubMed
    1. Browning S. R., Browning B. L. (2007). Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. Am. J. Hum. Genet. 81, 1084–109710.1086/521987 - DOI - PMC - PubMed
    1. Browning S. R., Browning B. L. (2011). Haplotype phasing: existing methods and new developments. Nat. Rev. Genet. 12, 703–71410.1038/nrn3145 - DOI - PMC - PubMed
    1. Durbin R. M., Abecasis G. R., Altshuler D. L., Auton A., Brooks L. D., Gibbs R. A., Hurles M. E., Mcvean G. A. (2010). A map of human genome variation from population-scale sequencing. Nature 467, 1061–107310.1038/nature09534 - DOI - PMC - PubMed

LinkOut - more resources