Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013;8(3):e56626.
doi: 10.1371/journal.pone.0056626. Epub 2013 Mar 5.

Assessing the impact of differential genotyping errors on rare variant tests of association

Affiliations

Assessing the impact of differential genotyping errors on rare variant tests of association

Morgan Mayer-Jochimsen et al. PLoS One. 2013.

Abstract

Genotyping errors are well-known to impact the power and type I error rate in single marker tests of association. Genotyping errors that happen according to the same process in cases and controls are known as non-differential genotyping errors, whereas genotyping errors that occur with different processes in the cases and controls are known as differential genotype errors. For single marker tests, non-differential genotyping errors reduce power, while differential genotyping errors increase the type I error rate. However, little is known about the behavior of the new generation of rare variant tests of association in the presence of genotyping errors. In this manuscript we use a comprehensive simulation study to explore the effects of numerous factors on the type I error rate of rare variant tests of association in the presence of differential genotyping error. We find that increased sample size, decreased minor allele frequency, and an increased number of single nucleotide variants (SNVs) included in the test all increase the type I error rate in the presence of differential genotyping errors. We also find that the greater the relative difference in case-control genotyping error rates the larger the type I error rate. Lastly, as is the case for single marker tests, genotyping errors classifying the common homozygote as the heterozygote inflate the type I error rate significantly more than errors classifying the heterozygote as the common homozygote. In general, our findings are in line with results from single marker tests. To ensure that type I error inflation does not occur when analyzing next-generation sequencing data careful consideration of study design (e.g. use of randomization), caution in meta-analysis and using publicly available controls, and the use of standard quality control metrics is critical.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Type I error rate when error rate in controls is 0.10%.
The observed type I error rate, averaged across all simulation settings, for each of the five rare variant tests (CMC, WS, PR, CMAT and SKAT). Differential genotyping error can be substantial, even at low error rates.
Figure 2
Figure 2. Type I error rate when error rate in controls is 1%.
The observed type I error rate, averaged across all simulation settings, for each of the five rare variant tests (CMC, WS, PR, CMAT and SKAT). Modest levels of differential genotyping error rates can substantially increase the type I error rate.
Figure 3
Figure 3. Type I error rate when error rate in controls is 5%.
The observed type I error rate, averaged across all simulation settings, for each of the five rare variant tests (CMC, WS, PR, CMAT and SKAT). High levels of differential genotyping errors can substantially increase the type I error rate.
Figure 4
Figure 4. Type I error rate by case genotyping error rate and sample size.
An example of how the Type I error rate changes by sample size and amount of differential genotyping error. Notably, as the amount of differential genotyping error increases, and as the sample size increases, the Type I error rate increases. Here we show results from the PR test with a control genotype error rate of 0.1%, ε01 = ε10, 8 SNVs, with 6 SNVs at MAF = 0.1% and 2 SNVs at MAF = 1%. Different values for the case error rate vary along the x-axis.
Figure 5
Figure 5. Type I error rate variability by error model for a gene with 8 SNVs.
Figure 5 considers a gene containing 8 rare variants. All error models have control error rates fixed at ε10 = 1% and ε01 = 0.1%. For error model A cases: ε10 = 1.1%, ε01 = 0.2%, error model B is cases: ε10 = 1.3%, ε01 = 0.4%, error model C is cases: ε10 = 1.5%, ε01 = 0.6% and error model D is cases: ε10 = 2.0%, ε01 = 1.1%. Type I error increases for all error models as the genotyping error rate increases.
Figure 6
Figure 6. Type I error rate variability by error model for a gene with 32 SNVs.
Figure 6 considers a gene containing 32 rare variants and considers the same error models as are in Figure 5.
Figure 7
Figure 7. Type I error rate variability across additional error models: a gene with 8 SNVs.
Figure 7 considers loci with 8 rare variants. All error models have controls: ε10 = 10% and ε01 = 1%. For error model E cases: ε10 = 10.1%, ε01 = 1.1%, error model F is cases: ε10 = 10.3%, ε01 = 1.3%, error model G is cases: ε10 = 10.5%, ε01 = 1.5% and error model H is cases: ε10 = 11.0%, ε01 = 2%.
Figure 8
Figure 8. Type I error rate variability across additional error models: a gene with 32 SNVs.
Figure 8 considers loci with 32 rare variants and considers the same error models as are in Figure 7.
Figure 9
Figure 9. Type I error rate variability across additional error models: a gene with 8 SNVs.
Figure 9 considers loci with 8 rare variants. All error models have controls: ε10 = 50% and ε01 = 5%. For error model I cases: ε10 = 50.1%, ε01 = 5.1%, error model J is cases: ε10 = 50.3%, ε01 = 5.3%, error model K is cases: ε10 = 50.5%, ε01 = 5.5% and error model L is cases: ε10 = 51.0%, ε01 = 6%.
Figure 10
Figure 10. Type I error rate variability across additional error models: a gene with 32 SNVs.
Figure 10 considers loci with 32 rare variants and considers the same error models as are in Figure 9.

References

    1. Li B, Leal SM (2008) Methods for Detecting Associations with Rare Variants for Common Diseases: Application to Analysis of Sequence Data: 311–321. doi:10.1016/j.ajhg.2008.06.024. - PMC - PubMed
    1. Madsen BE, Browning SR (2009) A groupwise association test for rare mutations using a weighted sum statistic. PLoS genetics 5: e1000384. Available: http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=2633048&tool=p.... Accessed 2012 July 26. - PMC - PubMed
    1. Li Q, Zhang HKY (2010) Approaches for evaluating rare polymorphisms in genetic association studies. Human Heredity2 69: 219–228. - PMC - PubMed
    1. Li Y, Byrnes AE, Li M (2010) To identify associations with rare variants, just WHaIT: Weighted haplotype and imputation-based tests. American journal of human genetics 87: 728–735. Available: http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=2978961&tool=p.... Accessed 2012 Sept 26. - PMC - PubMed
    1. Morris AP, Zeggini E (2010) An evaluation of statistical approaches to rare variant analysis in genetic association studies. Genetic epidemiology 34: 188–193. Available: http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=2962811&tool=p.... Accessed 2012 July 30. - PMC - PubMed

Publication types

Substances

LinkOut - more resources