Estimating genome-wide significance for whole-genome sequencing studies

ChangJiang Xu¹, Ioanna Tachmazidou, Klaudia Walter, Antonio Ciampi, Eleftheria Zeggini, Celia M T Greenwood; UK10K Consortium

Affiliations

Affiliation

¹ Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Montreal, Canada; Lady Davis Institute for Medical Research, Jewish General Hospital, Montreal, Canada.

PMID: 24676807
PMCID: PMC4489336
DOI: 10.1002/gepi.21797

Estimating genome-wide significance for whole-genome sequencing studies

ChangJiang Xu et al. Genet Epidemiol. 2014 May.

. 2014 May;38(4):281-90.

doi: 10.1002/gepi.21797. Epub 2014 Feb 14.

Authors

ChangJiang Xu¹, Ioanna Tachmazidou, Klaudia Walter, Antonio Ciampi, Eleftheria Zeggini, Celia M T Greenwood; UK10K Consortium

Affiliation

¹ Department of Epidemiology, Biostatistics and Occupational Health, McGill University, Montreal, Canada; Lady Davis Institute for Medical Research, Jewish General Hospital, Montreal, Canada.

PMID: 24676807
PMCID: PMC4489336
DOI: 10.1002/gepi.21797

Abstract

Although a standard genome-wide significance level has been accepted for the testing of association between common genetic variants and disease, the era of whole-genome sequencing (WGS) requires a new threshold. The allele frequency spectrum of sequence-identified variants is very different from common variants, and the identified rare genetic variation is usually jointly analyzed in a series of genomic windows or regions. In nearby or overlapping windows, these test statistics will be correlated, and the degree of correlation is likely to depend on the choice of window size, overlap, and the test statistic. Furthermore, multiple analyses may be performed using different windows or test statistics. Here we propose an empirical approach for estimating genome-wide significance thresholds for data arising from WGS studies, and we demonstrate that the empirical threshold can be efficiently estimated by extrapolating from calculations performed on a small genomic region. Because analysis of WGS may need to be repeated with different choices of test statistics or windows, this prediction approach makes it computationally feasible to estimate genome-wide significance thresholds for different analysis choices. Based on UK10K whole-genome sequence data, we derive genome-wide significance thresholds ranging between 2.5 × 10(-8) and 8 × 10(-8) for our analytic choices in window-based testing, and thresholds of 0.6 × 10(-8) -1.5 × 10(-8) for a combined analytic strategy of testing common variants using single-SNP tests together with rare variants analyzed with our sliding-window test strategy.

Keywords: effective number of independent tests; genome-wide significance; multiple testing; rare-variant analysis; region-based tests; sliding windows; whole-genome sequencing.

PubMed Disclaimer

Figures

**Figure 1**
Correlations between 2,000 adjacent window-based tests statistics for SKAT and burden tests and for two different MAF thresholds. Because windows were defined to contain the same number of rare variants, the boundaries of the windows vary with MAF. Regions have been aligned so that the genomic region captured in the bottom row is contained within the genomic region at the top. The axes are the window numbers, counted from the 5' end of chromosome 3. Left column: SKAT tests. Right column: burden tests. Top row: MAF threshold = 0.01; bottom row: MAF threshold = 0.05. Gray: correlation > 0.1; yellow: correlation > 0.35; blue: correlation > 0.5; red: correlation > 0.75.

**Figure 2**
Estimates of significance thresholds as a function of the number of window tests, comparing estimates derived from the correlation matrices (the methods of Li et al., and Li and Ji) with estimates from simulations. Results are shown for MAF threshold = 0.05 and 0.01 and for SKAT and burden tests. The horizontal axis is −log₁₀(0.05/m), for m tests; the maximum value corresponds to −log₁₀(0.05/2,000), because the largest matrices we used were 2,000 × 2,000. Dots are the means of the estimated values of −log₁₀(0.05/*m_e*) across all sections of chromosome 3 of the same size, and linear regressions have been fit to each series points. The gray line is the line of equality, y = x.

**Figure 3**
Estimates of genome-wide significance thresholds for window-based tests of rare variants, derived from simulations, for three MAF thresholds and three test statistics. The horizontal axis is −log₁₀(0.05/m), for m tests on chromosome 3. Each point is the mean of −log₁₀ of the estimated FWER at 5% for disjoint sections of chromosome 3 of the same size, and ±1.96*(SD) at each point. A linear regression was fitted to the points in each panel, and the gray line is the line of equality, y = x.

**Figure 4**
Estimates of genome-wide significance thresholds for a combined strategy including window-based tests of rare variants and single-marker tests of common variants. Results are derived from simulations, for three MAF thresholds and three test statistics. The horizontal axis is −log₁₀(0.05/m), for m tests. Each dot is a single estimated value for −log₁₀ of the FWER at 5% for sections of chromosome 3 of varying size. A linear regression was fit through all the data. The gray line is the line of equality, y = x.

See this image and copyright information in PMC

References

1. Chen Z, Liu Q. A new approach to account for the correlations among single nucleotide polymorphisms in genome-wide association studies. Hum Hered. 2011;72(1):1–9. - PMC - PubMed
1. Cheverud JM. A simple correction for multiple comparisons in interval mapping genome scans. Heredity (Edinb) 2001;87(Pt 1):52–58. - PubMed
1. Dudbridge F, Gusnanto A. Estimation of significance thresholds for genomewide association scans. Genet Epidemiol. 2008;32(3):227–234. - PMC - PubMed
1. Gao X, Starmer J, Martin ER. A multiple testing correction method for genetic association studies using correlated single nucleotide polymorphisms. Genet Epidemiol. 2008;32(4):361–369. - PubMed
1. Lee S, Emond MJ, Bamshad MJ, Barnes KC, Rieder MJ, Nickerson DA, Team NGESP-ELP, Christiani DC, Wurfel MM, Lin X. Optimal unified approach for rare-variant association testing with application to small-sample case-control whole-exome sequencing studies. Am J Hum Genet. 2012;91(2):224–237. - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations
Molecular Biology Databases
- NIAID Data Ecosystem - Find datasets on Infectious and Immune-mediated Diseases

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Estimating genome-wide significance for whole-genome sequencing studies

Affiliation

Estimating genome-wide significance for whole-genome sequencing studies

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Molecular Biology Databases