Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Jul 25;25(5):bbae416.
doi: 10.1093/bib/bbae416.

Efficient test for deviation from Hardy-Weinberg equilibrium with known or ambiguous typing in highly polymorphic loci

Affiliations

Efficient test for deviation from Hardy-Weinberg equilibrium with known or ambiguous typing in highly polymorphic loci

Or Shkuri et al. Brief Bioinform. .

Abstract

The Hardy-Weinberg equilibrium (HWE) assumption is essential to many population genetics models. Multiple tests were developed to test its applicability in observed genotypes. Current methods are divided into exact tests applicable to small populations and a small number of alleles, and approximate goodness-of-fit tests. Existing tests cannot handle ambiguous typing in multi-allelic loci. We here present a novel exact test Unambiguous Multi Allelic Test (UMAT) not limited to the number of alleles and population size, based on a perturbative approach around the current observations. We show its accuracy in the detection of deviation from HWE. We then propose an additional model to handle ambiguous typing using either sampling into UMAT or a goodness-of-fit test test with a variance estimate taking ambiguity into account, named Asymptotic Statistical Test with Ambiguity (ASTA). We show the accuracy of ASTA and the possibility of detecting the source of deviation from HWE. We apply these tests to the HLA loci to reproduce multiple previously reported deviations from HWE, and a large number of new ones.

Keywords: Gibbs sampling; Hardy–Weinberg equilibrium; imputation algorithms.

PubMed Disclaimer

Figures

Figure 1
Figure 1
DC case results; (a). visual representation of the swap of two alleles in the UMAT test; first sample an allele formula image according to the alleles distribution, then sample an allele formula image according to the allele pairs distribution given allele formula image and subtract one individual with this pair (it must exist), then, sample a pair of alleles formula image according to the HWE alleles distribution and add one individual to it; then, in each iteration, sample an allele formula image according to the allele pairs distribution given allele formula image and subtract one individual from the pair formula image; sample an allele formula image according to the allele marginal distribution and add one individual to the pair formula image; this way, only the marginal observations of the alleles formula image are changed; formula image and formula image have one excessive marginal observation, as formula image and formula image had in the beginning and therefore take their place; note that we can randomly invert their order; similarly, formula image and formula image are missing one marginal observation, as formula image and formula image had in the beginning and therefore take their place in the next iteration; (b). 49 implementations of UMAT using the same observations in HWE, with formula image (formula image represents the fit of the simulation to the HWE, as described in Materials and methods), formula image and formula image alleles; (c). 49 implementations of UMAT using the same observations with a slight deviation of HWE, with formula image, formula image and formula image alleles; (d). P-value results of UMAT for different allele numbers and alpha values, with formula image; (e). P-value results of UMAT for different population sizes and alpha values, with 100 alleles; (f). elapsed time in seconds for running UMAT using different allele numbers and population sizes, with formula image; as one can see, the population size has no effect on the run time; (g). scatter plot showing for 100 randomly chosen SNPs, the P-value results obtained with Chi-Square and UMAT.
Figure 2
Figure 2
AC case results; (a). P-value results of UMAT with ambiguity for different formula image values (formula image represents the probability for a sample to have ambiguous typing, as described in Materials and methods) and formula image values, with 100 alleles and formula image; (b)–(d). scatter plot of real versus estimated variance for simulated data with 50 alleles, formula image (respectively); each dot is an allele pair formula image; the x-axis represents the sampled variance over formula image, the y-axis represents either the value of formula image or the corrected denominator used in ASTA: formula image; (e). fraction of positive results for formula image, formula image and each test: raditional Chi-Squared, ASTA, and Chi-Squared with sampling (i.e. for each person sampling certain alleles given all his possible allele pair observations and then using a traditional Chi-Squared); the results are out of 300 simulations with five alleles and formula image; (f). we simulated data (see Methods) using formula image alleles, formula image; here all the data are in HWE, except from the pairs containing the formula image allele (top bar); each bar corresponds to a specific allele and shows the normalized statistic (the sum of Chi-Square term with correction only on pairs containing this allele, and divided by the Degree of Freedom (DOF): the number of pairs containing this allele minus 1), as well as the formula image for this allele, here the formula image is calculated using the statistic and DOF of the allele.
Figure 3
Figure 3
Deviation of alleles from HWE; each column is one of the five broad US populations; each row is a locus (A, B, C, DQB1, DRB1); within each subplot the bar is the ASTA score divided by the DOF; the color represents the log P value; deeper colors are more significant; the bars are ordered by decreasing significance; note that the scale of each plot is very different, but the coloring of the bar is consistent among all subplots.
Figure 4
Figure 4
Each subplot is a different locus (A,B,C,DQB1, DRB1); the different colors represent the log base 10 of: the scores divided by the DOF; the ASTA score (orange) is slightly higher than the classical Chi-Square (blue) and from the sampling (green); note that in this case, the ambiguity is limited; as such, the differences are not very large; the top populations are the sub-populations, followed by the broad populations, followed by the entire donor registry; each detailed population is colored according to the broad population.
Figure 5
Figure 5
Statistics of Chi-Squared, ASTA, and the inverse CDF of Chi-Squared using a 0.5 significance level; each statistic is calculated for different degrees of freedom (DOF) values, ranging from 1 to the number of allele pairs minus 1; the statistics are first calculated for the first two allele pairs (1 DOF), and subsequently, one additional pair is included to recalculate the statistics (increasing the DOF by 1) until all pairs are included; observations for each plot are generated with the following parameters: (a). formula image alleles, formula image population size, formula image, formula image; (b). formula image alleles, formula image population size, formula image, formula image; (c). formula image alleles, formula image population size, formula image, formula image; (d). formula image alleles, formula image population size, formula image, formula image.

References

    1. Hou Y, Prinz M, Staak M. Comparison of different tests for deviation from hardy-weinberg equilibrium of ampflp population data. In: Bär W, Fiori A, Rossi U (eds.), Advances in Forensic Haemogenetics: 15th Congress of the International Society for Forensic Haemogenetics (Internationale Gesellschaft für forensische Hämogenetik eV), Venezia, 13–15 October 1993, pp. 511–4. Springer-Verlag, Berlin Heidelberg, 1994. 10.1007/978-3-642-78782-9_141. - DOI
    1. Rohlfs RV, Weir BS. Distributions of hardy–weinberg equilibrium test statistics. Genetics 2008;180:1609–16. 10.1534/genetics.108.088005. - DOI - PMC - PubMed
    1. Hao W, Storey JD. Extending tests of hardy–weinberg equilibrium to structured populations. Genetics 2019;213:759–70. 10.1534/genetics.119.302370. - DOI - PMC - PubMed
    1. Sun L, Gan J, Jiang L. et al. Recursive test of hardy-weinberg equilibrium in tetraploids. Trends Genet 2021;37:504–13. 10.1016/j.tig.2020.11.006. - DOI - PubMed
    1. Breuning MH, van den Berg-Loonen EM, Bernini LF. et al. Localization of hla on the short arm of chromosome 6. Hum Genet 1977;37:131–9. 10.1007/BF00393575. - DOI - PubMed