Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Dec 29:16:1109.
doi: 10.1186/s12864-015-2192-y.

A new strategy for enhancing imputation quality of rare variants from next-generation sequencing data via combining SNP and exome chip data

Collaborators, Affiliations

A new strategy for enhancing imputation quality of rare variants from next-generation sequencing data via combining SNP and exome chip data

Young Jin Kim et al. BMC Genomics. .

Abstract

Background: Rare variants have gathered increasing attention as a possible alternative source of missing heritability. Since next generation sequencing technology is not yet cost-effective for large-scale genomic studies, a widely used alternative approach is imputation. However, the imputation approach may be limited by the low accuracy of the imputed rare variants. To improve imputation accuracy of rare variants, various approaches have been suggested, including increasing the sample size of the reference panel, using sequencing data from study-specific samples (i.e., specific populations), and using local reference panels by genotyping or sequencing a subset of study samples. While these approaches mainly utilize reference panels, imputation accuracy of rare variants can also be increased by using exome chips containing rare variants. The exome chip contains 250 K rare variants selected from the discovered variants of about 12,000 sequenced samples. If exome chip data are available for previously genotyped samples, the combined approach using a genotype panel of merged data, including exome chips and SNP chips, should increase the imputation accuracy of rare variants.

Results: In this study, we describe a combined imputation which uses both exome chip and SNP chip data simultaneously as a genotype panel. The effectiveness and performance of the combined approach was demonstrated using a reference panel of 848 samples constructed using exome sequencing data from the T2D-GENES consortium and 5,349 sample genotype panels consisting of an exome chip and SNP chip. As a result, the combined approach increased imputation quality up to 11 %, and genomic coverage for rare variants up to 117.7 % (MAF < 1 %), compared to imputation using the SNP chip alone. Also, we investigated the systematic effect of reference panels on imputation quality using five reference panels and three genotype panels. The best performing approach was the combination of the study specific reference panel and the genotype panel of combined data.

Conclusions: Our study demonstrates that combined datasets, including SNP chips and exome chips, enhances both the imputation quality and genomic coverage of rare variants.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Scatter plot of estimated r2 against dosage r2 by MAF bins. Estimated r2 was plotted against dosage r2 by MAF bins (a) MAF ≥ 5 %, (b) MAF = 1–5 %, (c) MAF = 0.5–1 %, (d) MAF < 0.5 %, (e) MAF = 0.3–0.5 %, and (f) MAF < 0.3 %. The red dotted line represents the diagonal
Fig. 2
Fig. 2
Mean estimated r2 of genotype panels by MAF bins
Fig. 3
Fig. 3
Mean estimated r 2 of various combinations of reference panels and genotype panels. Reference panels are the 1000 genomes phase 1 dataset (1KG) and various combinations of whole exome sequencing data (WES), SNP chip data (GWAS), and exome chip data (EXOME)
Fig. 4
Fig. 4
Mean estimated r2 varied by sample size of reference panel

References

    1. Hindorff LA, Sethupathy P, Junkins HA, Ramos EM, Mehta JP, Collins FS, et al. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc Natl Acad Sci U S A. 2009;106(23):9362–7. doi: 10.1073/pnas.0903103106. - DOI - PMC - PubMed
    1. Zuk O, Schaffner SF, Samocha K, Do R, Hechter E, Kathiresan S, et al. Searching for missing heritability: designing rare variant association studies. Proc Natl Acad Sci U S A. 2014;111(4):E455–64. doi: 10.1073/pnas.1322563111. - DOI - PMC - PubMed
    1. Bansal V, Libiger O, Torkamani A, Schork NJ. Statistical analysis strategies for association studies involving rare variants. Nat Rev Genet. 2010;11(11):773–85. doi: 10.1038/nrg2867. - DOI - PMC - PubMed
    1. Gorlov IP, Gorlova OY, Sunyaev SR, Spitz MR, Amos CI. Shifting paradigm of association studies: value of rare single-nucleotide polymorphisms. Am J Hum Genet. 2008;82(1):100–12. doi: 10.1016/j.ajhg.2007.09.006. - DOI - PMC - PubMed
    1. Lee S, Abecasis GR, Boehnke M, Lin X. Rare-variant association analysis: study designs and statistical tests. Am J Hum Genet. 2014;95(1):5–23. doi: 10.1016/j.ajhg.2014.06.009. - DOI - PMC - PubMed

Publication types

MeSH terms