Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Jun 22:6:28323.
doi: 10.1038/srep28323.

The impact of genotype calling errors on family-based studies

Affiliations

The impact of genotype calling errors on family-based studies

Qi Yan et al. Sci Rep. .

Abstract

Family-based sequencing studies have unique advantages in enriching rare variants, controlling population stratification, and improving genotype calling. Standard genotype calling algorithms are less likely to call rare variants correctly, often mistakenly calling heterozygotes as reference homozygotes. The consequences of such non-random errors on association tests for rare variants are unclear, particularly in transmission-based tests. In this study, we investigated the impact of genotyping errors on rare variant association tests of family-based sequence data. We performed a comprehensive analysis to study how genotype calling errors affect type I error and statistical power of transmission-based association tests using a variety of realistic parameters in family-based sequencing studies. In simulation studies, we found that biased genotype calling errors yielded not only an inflation of type I error but also a power loss of association tests. We further confirmed our observation using exome sequence data from an autism project. We concluded that non-symmetric genotype calling errors need careful consideration in the analysis of family-based sequence data and we provided practical guidance on ameliorating the test bias.

PubMed Disclaimer

Figures

Figure 1
Figure 1. QQ plots for type I error rate simulation studies (gTDT results) with different scenarios of error patterns.
We considered four scenarios to mimic this error pattern: 1. r2 (the error rate of calling homozygote 0/0 as heterozygote 0/1) = 0; r1 (the error rate of calling heterozygote 0/1 as homozygote 0/0) = 1%, 5% or 10% in parents; 2. r2 = 0; r1 = 1%, 5% or 10% in offspring; 3. r1 = 0; r2 = 0.1%, 0.5% or 1% in parents; 4. r1 = 0; r2 = 0.1%, 0.5% or 1% in offspring. The 95% point-wise confidence band (gray area) is computed under the assumption of the p-values being drawn independently from a uniform [0, 1] distribution.
Figure 2
Figure 2. QQ plots for genes (gTDT results) in chromosome 1 from 116 parent-offspring trios from the autism study and only genotypes with GQ > 5 are used.
The 95% point-wise confidence band (gray area) is computed under the assumption of the p-values being drawn independently from a uniform [0, 1] distribution. (A) Variant calling was carried out by GATK best-practice pipeline with different depths; (B) Variant calling was carried out by GATK best-practice pipeline, Beagle4 and Polymutt with the same depth of 6x.
Figure 3
Figure 3. The impact of genotyping bias on different lengths of genes (gTDT results).
(A) QQ plots for genes including more than 100 variants with different depths; (B) QQ plots for genes including less than 50 variants with different depths.

References

    1. O’Roak B. J. et al.. Exome sequencing in sporadic autism spectrum disorders identifies severe de novo mutations. Nat Genet 43, 585–589 (2011). - PMC - PubMed
    1. Zaidi S. et al.. De novo mutations in histone-modifying genes in congenital heart disease. Nature 498, 220–223 (2013). - PMC - PubMed
    1. Nielsen R., Paul J. S., Albrechtsen A. & Song Y. S. Genotype and SNP calling from next-generation sequencing data. Nature reviews. Genetics 12, 443–451 (2011). - PMC - PubMed
    1. Pompanon F., Bonin A., Bellemain E. & Taberlet P. Genotyping errors: causes, consequences and solutions. Nature reviews. Genetics 6, 847–859 (2005). - PubMed
    1. O’Rawe J. et al.. Low concordance of multiple variant-calling pipelines: practical implications for exome and genome sequencing. Genome Med 5, 28 (2013). - PMC - PubMed

Publication types