Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2008 Nov;180(3):1609-16.
doi: 10.1534/genetics.108.088005. Epub 2008 Sep 14.

Distributions of Hardy-Weinberg equilibrium test statistics

Affiliations

Distributions of Hardy-Weinberg equilibrium test statistics

R V Rohlfs et al. Genetics. 2008 Nov.

Erratum in

  • Genetics. 2009 Oct;183(2):756

Abstract

It is well established that test statistics and P-values derived from discrete data, such as genetic markers, are also discrete. In most genetic applications, the null distribution for a discrete test statistic is approximated with a continuous distribution, but this approximation may not be reasonable. In some cases using the continuous approximation for the expected null distribution may cause truly null test statistics to appear nonnull. We explore the implications of using continuous distributions to approximate the discrete distributions of Hardy-Weinberg equilibrium test statistics and P-values. We derive exact P-value distributions under the null and alternative hypotheses, enabling a more accurate analysis than is possible with continuous approximations. We apply these methods to biological data and find that using continuous distribution theory with exact tests may underestimate the extent of Hardy-Weinberg disequilibrium in a sample. The implications may be most important for the widespread use of whole-genome case-control association studies and Hardy-Weinberg equilibrium (HWE) testing for data quality control.

PubMed Disclaimer

Figures

F<sc>igure</sc> 1.—
Figure 1.—
Exact P-value distributions conditional on MAF. The top five plots show distribution functions for exact P-values for 1000 individuals with the indicated MAF. The bottom plot is the normalized sum of the plots above.
F<sc>igure</sc> 2.—
Figure 2.—
Total exact P-value distribution. The HWE exact P-value probability density function is under the null hypothesis where n = 100 and minor allele frequencies are uniformly distributed.
F<sc>igure</sc> 3.—
Figure 3.—
HWE test power. Power is shown over all pA for n = 100, 1000 and θ = 4, 2, 0.1. Note that θ = 4 implies HWE, so the power is equivalent to the type I error rate. Red lines show χ2-test power, and blue lines show exact test power.
F<sc>igure</sc> 4.—
Figure 4.—
Observed MAF distribution. This plot shows the distribution of observed minor allele counts among SNPs with no missing data in the WTCCC 1958 birth cohort.
F<sc>igure</sc> 5.—
Figure 5.—
Observed vs. expected exact P-values for the 1958 birth cohort data set. The top Q-Q plots compare P-values observed in 34,625 SNPs over 1504 individuals in the WTCCC 1958 birth cohort to P-values sampled from the calculated expected null distribution for n = 1504 and observed MAF weights. The bottom Q-Q plots compare the same observed P-values to a uniform distribution. SNPs with P-values <1e-6 are represented as triangles at the top of the −log(p) plots.
F<sc>igure</sc> 6.—
Figure 6.—
Observed vs. expected exact P-values for simulated SNPs. The top Q-Q plots compare P-values calculated for 100,000 simulated SNPs over 100 individuals with uniform MAFs to P-values sampled from the calculated expected null distribution for n = 100 with uniformly distributed MAFs. The bottom Q-Q plots compare the same calculated P-values to a uniform distribution.
F<sc>igure</sc> 7.—
Figure 7.—
MAF-bound effects. The plots show total (a) type I error rate, (b) power, and (c) FDR for every possible lower bound on formula image, where n = 100, θ = 2, and the rate of truly alternative SNPs is 0.1. Dashed lines indicate the chi-square test and solid lines Fisher's exact test. The horizontal dotted line in the type I error plot shows the significance threshold of 0.05.

References

    1. Cohen, J. E., M. Lynch and C. E. Taylor, 1991. Forensic DNA tests and Hardy-Weinberg equilibrium. Science 253 1037–1038. - PubMed
    1. Gibbons, J. D., 2006. Randomized tests, in Encyclopedia of Statistical Sciences, edited by S. Kotz, C. B. Read, N. Balakrishnan and B. Vldakovic. John Wiley & Sons, New York.
    1. Gomes, I., A. Collins, C. Lonjou, N. S. Thomas, J. Wilkinson et al., 1999. Hardy-Weinberg quality control. Ann. Hum. Genet. 63 535–538. - PubMed
    1. Guo, S. W., and E. A. Thompson, 1992. Performing the exact test of Hardy-Weinberg proportion for multiple alleles. Biometrics 48 361–372. - PubMed
    1. Hardy, G. H., 1908. Mendelian proportions in a mixed population. Science 28 49–50. - PubMed

Publication types