Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Jun;136(6):727-741.
doi: 10.1007/s00439-017-1786-7. Epub 2017 Apr 3.

A genome-wide study of Hardy-Weinberg equilibrium with next generation sequence data

Affiliations

A genome-wide study of Hardy-Weinberg equilibrium with next generation sequence data

Jan Graffelman et al. Hum Genet. 2017 Jun.

Abstract

Statistical tests for Hardy-Weinberg equilibrium have been an important tool for detecting genotyping errors in the past, and remain important in the quality control of next generation sequence data. In this paper, we analyze complete chromosomes of the 1000 genomes project by using exact test procedures for autosomal and X-chromosomal variants. We find that the rate of disequilibrium largely exceeds what might be expected by chance alone for all chromosomes. Observed disequilibrium is, in about 60% of the cases, due to heterozygote excess. We suggest that most excess disequilibrium can be explained by sequencing problems, and hypothesize mechanisms that can explain exceptional heterozygosities. We report higher rates of disequilibrium for the MHC region on chromosome 6, regions flanking centromeres and p-arms of acrocentric chromosomes. We also detected long-range haplotypes and areas with incidental high disequilibrium. We report disequilibrium to be related to read depth, with variants having extreme read depths being more likely to be out of equilibrium. Disequilibrium rates were found to be 11 times higher in segmental duplications and simple tandem repeat regions. The variants with significant disequilibrium are seen to be concentrated in these areas. For next generation sequence data, Hardy-Weinberg disequilibrium seems to be a major indicator for copy number variation.

PubMed Disclaimer

Conflict of interest statement

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Figures

Fig. 1
Fig. 1
Percentage of significant HW tests for polymorphic autosomal variants as a function of the percentage of missing values at α=0.001. The horizontal dashed line corresponds to the HapMap exclusion threshold
Fig. 2
Fig. 2
Manhattan plot of exact mid p values for Hardy–Weinberg equilibrium of the JPT sample. The horizontal line corresponds to the Bonferroni significance threshold (-log10(0.05/12133408)=8.4, using only polymorphic autosomal variants)
Fig. 3
Fig. 3
a Hardy–Weinberg track showing the exact mid p values of tests for disequilibrium for each variant in the MHC region on chromosome 6. bd Plots of the exact p values for the observed spikes colored according to the sign of the inbreeding coefficient (green f>0, red f<0), annotated with HLA class I and II genes. Plotting symbols indicate if a variant is inside a segmental duplication, inside a tandem repeat, inside both, or outside such regions
Fig. 4
Fig. 4
Plots of exact p values on the p-arm of chromosome 21 (green f>0, red f<0)
Fig. 5
Fig. 5
Plots of exact p values around the centromeres of chromosomes 1–4 (green f>0, red f<0). Vertical lines indicate limits and center of the centromere
Fig. 6
Fig. 6
Plots of exact p values at the p tip of chromosomes 4, 10, 12 and 14 (green f>0, red f<0)
Fig. 7
Fig. 7
Plots of exact mid p values of chromosome X. a Testing females only. b Testing males and females (green f>0, red f<0). Dashed vertical lines indicate the limits of the centromere region
Fig. 8
Fig. 8
Plots of exact p values at 145.5–146.5 Mb on chromosome 6 (green f>0, red f<0). Monomorphic markers not shown
Fig. 9
Fig. 9
Spikes of HWE exact p values on four different chromosomes (green f>0, red f<0)
Fig. 10
Fig. 10
Percentage of significant HWD as a function of read depth (green f>0, red f<0)
Fig. 11
Fig. 11
a Percentage of significant HWD in and outside segmental duplications for each chromosome. b Percentage of significant HWD in and outside simple tandem repeats for each chromosome. Dotted horizontal lines represent the overall autosomal rate
Fig. 12
Fig. 12
Percentage of significant HWD in and outside segmental duplications or tandem repeat regions for each chromosome as a function of the significance threshold (α)

References

    1. Bailey JA, Gu Z, Clark RA, Reinert K, Samonte RV, Schwartz S, Adams MD, Myers EW, Li PW, Eichler EE. Recent segmental duplications in the human genome. Science. 2002;297(5583):1003–1007. doi: 10.1126/science.1072047. - DOI - PubMed
    1. Beckmann JS, Estivill X, Antonarakis SE. Copy number variants and genetic traits: closer to the resolution of phenotypic to genotypic variability. Nature Rev Genet. 2007;8:639–646. doi: 10.1038/nrg2149. - DOI - PubMed
    1. Benson G. Tandem repeats finder: a program to analyze DNA sequences. Nucl. Acids Res. 1999;27(2):573–580. doi: 10.1093/nar/27.2.573. - DOI - PMC - PubMed
    1. Crooks L, Carlborg O, Marklund S, Johansson AM. Identification of null alleles and deletions from SNP genotypes for an intercross between domestic and wild chickens. G3 (Bethesda) 2013;3(8):1253–1260. doi: 10.1534/g3.113.006643. - DOI - PMC - PubMed
    1. Crow JF, Kimura M. An introduction to population genetics theory. New York: Harper & Row Publishers; 1970.

Publication types

LinkOut - more resources