Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Apr 3:10:239.
doi: 10.3389/fgene.2019.00239. eCollection 2019.

Rare Variants Imputation in Admixed Populations: Comparison Across Reference Panels and Bioinformatics Tools

Affiliations

Rare Variants Imputation in Admixed Populations: Comparison Across Reference Panels and Bioinformatics Tools

Sanjeev Sariya et al. Front Genet. .

Abstract

Background: Imputation has become a standard approach in genome-wide association studies (GWAS) to infer in silico untyped markers. Although feasibility for common variants imputation is well established, we aimed to assess rare and ultra-rare variants' imputation in an admixed Caribbean Hispanic population (CH).

Methods: We evaluated imputation accuracy in CH (N = 1,000), focusing on rare (0.1% ≤ minor allele frequency (MAF) ≤ 1%) and ultra-rare (MAF < 0.1%) variants. We used two reference panels, the Haplotype Reference Consortium (HRC; N = 27,165) and 1000 Genome Project (1000G phase 3; N = 2,504) and multiple phasing (SHAPEIT, Eagle2) and imputation algorithms (IMPUTE2, MACH-Admix). To assess imputation quality, we reported: (a) high-quality variant counts according to imputation tools' internal indexes (e.g., IMPUTE2 "Info" ≥ 80%). (b) Wilcoxon Signed-Rank Test comparing imputation quality for genotyped variants that were masked and imputed; (c) Cohen's kappa coefficient to test agreement between imputed and whole-exome sequencing (WES) variants; (d) imputation of G206A mutation in the PSEN1 (ultra-rare in the general population an more frequent in CH) followed by confirmation genotyping. We also tested ancestry proportion (European, African and Native American) against WES-imputation mismatches in a Poisson regression fashion.

Results: SHAPEIT2 retrieved higher percentage of imputed high-quality variants than Eagle2 (rare: 51.02% vs. 48.60%; ultra-rare 0.66% vs. 0.65%, Wilcoxon p-value < 0.001). SHAPEIT-IMPUTE2 employing HRC outperformed 1000G (64.50% vs. 59.17%; 1.69% vs. 0.75% for high-quality rare and ultra-rare variants, respectively, Wilcoxon p-value < 0.001). SHAPEIT-IMPUTE2 outperformed MaCH-Admix. Compared to 1000G, HRC-imputation retrieved a higher number of high-quality rare and ultra-rare variants, despite showing lower agreement between imputed and WES variants (e.g., rare: 98.86% for HRC vs. 99.02% for 1000G). High Kappa (K = 0.99) was observed for both reference panels. Twelve G206A mutation carriers were imputed and all validated by confirmation genotyping. African ancestry was associated with higher imputation errors for uncommon and rare variants (p-value < 1e-05).

Conclusion: Reference panels with larger numbers of haplotypes can improve imputation quality for rare and ultra-rare variants in admixed populations such as CH. Ethnic composition is an important predictor of imputation accuracy, with higher African ancestry associated with poorer imputation accuracy.

Keywords: 1000G; GWAS; admixed population; imputation; rare variants.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Comparison of average Info quality between HRC and 1000G reference panel for all autosomal chromosomes.
Figure 2
Figure 2
Comparison of average Info on CHR14: 70–75 MB (5 MB) vs. 73–74 MB (1 MB) region.

Similar articles

Cited by

References

    1. Alexander D. H., Novembre J., Lange K. (2009). Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 19 1655–1664. 10.1101/gr.094052.109 - DOI - PMC - PubMed
    1. Arnold S. E., Vega I. E., Karlawish J. H., Wolk D. A., Nunez J., Negron M., et al. (2013). Frequency and clinicopathological characteristics of presenilin 1 Gly206Ala mutation in Puerto Rican Hispanics with dementia. J. Alzheimers Dis. 33 1089–1095. 10.3233/JAD-2012-121570 - DOI - PMC - PubMed
    1. Athan E. S., Williamson J., Ciappa A., Santana V., Romas S. N., Lee J. H., et al. (2001). A founder mutation in presenilin 1 causing early-onset Alzheimer disease in unrelated Caribbean Hispanic families. JAMA 286 2257–2263. 10.1001/jama.286.18.2257 - DOI - PubMed
    1. Browning B. L., Browning S. R. (2009). A unified approach to genotype imputation and haplotype-phase inference for large data sets of trios and unrelated individuals. Am. J. Hum. Genet. 84 210–223. 10.1016/j.ajhg.2009.01.005 - DOI - PMC - PubMed
    1. Danecek P., Auton A., Abecasis G., Albers C. A., Banks E., DePristo M. A., et al. (2011). The variant call format and VCFtools. Bioinformatics 27 2156–2158. 10.1093/bioinformatics/btr330 - DOI - PMC - PubMed

LinkOut - more resources