Optimal unified approach for rare-variant association testing with application to small-sample case-control whole-exome sequencing studies

Seunggeun Lee¹, Mary J Emond, Michael J Bamshad, Kathleen C Barnes, Mark J Rieder, Deborah A Nickerson; NHLBI GO Exome Sequencing Project—ESP Lung Project Team; David C Christiani, Mark M Wurfel, Xihong Lin

Affiliations

PMID: 22863193
PMCID: PMC3415556
DOI: 10.1016/j.ajhg.2012.06.007

Optimal unified approach for rare-variant association testing with application to small-sample case-control whole-exome sequencing studies

Seunggeun Lee et al. Am J Hum Genet. 2012.

. 2012 Aug 10;91(2):224-37.

doi: 10.1016/j.ajhg.2012.06.007. Epub 2012 Aug 2.

Authors

Affiliation

¹ Department of Biostatistics, Harvard School of Public Health, Boston, MA 02115, USA.

PMID: 22863193
PMCID: PMC3415556
DOI: 10.1016/j.ajhg.2012.06.007

Abstract

We propose in this paper a unified approach for testing the association between rare variants and phenotypes in sequencing association studies. This approach maximizes power by adaptively using the data to optimally combine the burden test and the nonburden sequence kernel association test (SKAT). Burden tests are more powerful when most variants in a region are causal and the effects are in the same direction, whereas SKAT is more powerful when a large fraction of the variants in a region are noncausal or the effects of causal variants are in different directions. The proposed unified test maintains the power in both scenarios. We show that the unified test corresponds to the optimal test in an extended family of SKAT tests, which we refer to as SKAT-O. The second goal of this paper is to develop a small-sample adjustment procedure for the proposed methods for the correction of conservative type I error rates of SKAT family tests when the trait of interest is dichotomous and the sample size is small. Both small-sample-adjusted SKAT and the optimal unified test (SKAT-O) are computationally efficient and can easily be applied to genome-wide sequencing association studies. We evaluate the finite sample performance of the proposed methods using extensive simulation studies and illustrate their application using the acute-lung-injury exome-sequencing data of the National Heart, Lung, and Blood Institute Exome Sequencing Project.

PubMed Disclaimer

Figures

**Figure 1**
Power Estimates for the Six Competing Methods when All Causal Variants Were Deleterious Empirical power of the six methods for randomly selected 3 kb regions wherein all causal variants were deleterious. From top to bottom, the plots consider the significance levels 0.01, 10^{− 3}, and 2.5×10^{− 6}, respectively. From left to right, the plots consider settings in which 10% of rare variants were causal, 20% of rare variants were causal, and 50% of rare variants were causal, respectively. For causal variants, we assumed |β_j| = c|log₁₀(p_j)| / 2, where p_j was the MAF of the j^th variant. A different c was used for the three panels from left to right: c = log(7), log(5), log(2.5) for the percentage of causal variants being 10%, 20%, and 50% respectively. Hence, the powers between the three panels from left to right are not comparable. Total sample sizes considered were 200, 500, and 1,000, and half were cases in case-control studies.

**Figure 2**
Power Estimates for the Six Competing Methods when 20%/80% of Causal Variants Were Protective/Deleterious Empirical power of the six methods for randomly selected 3 kb regions wherein 20%/80% of causal variants were protective/deleterious. From top to bottom, the plots consider the significance levels 0.01, 10^{− 3}, and 2.5×10^{− 6}, respectively. From left to right, the plots consider settings in which 10% of rare variants were causal, 20% of rare variants were causal, and 50% of rare variants were causal, respectively. For causal variants, we assumed |β_j| = c|log₁₀(p_j)| / 2, where p_j was the MAF of the j^th variant. A different c was used for the three panels from the left to the right: c = log(7), log(5), log(2.5) for the percentage of causal variants being 10%, 20%, and 50% respectively. Hence, the powers between the three panels from left to right are not comparable. Total sample sizes considered were 200, 500, and 1,000, and half were cases in case-control studies.

**Figure 3**
Power Estimates for the Six Competing Methods when 50%/50% of Causal Variants Were Protective/Deleterious Empirical power of the six methods for randomly selected 3 kb regions wherein 50%/50% of causal variants were protective/deleterious. From top to bottom, the plots consider the significance levels 0.01, 10^{− 3}, and 2.5×10^{− 6}, respectively. From left to right, the plots consider settings in which 10% of rare variants were causal, 20% of rare variants were causal, and 50% of rare variants were causal, respectively. For causal variants, we assumed |β_j| = c|log₁₀(p_j)| / 2, where p_j was the MAF of the j^th variant. A different c was used for the three panels from left to right: c = log(7), log(5), log(2.5) for the percentage of causal variants being 10%, 20%, and 50%. Hence, the powers between the three panels from left to right are not comparable. Total sample sizes considered were 200, 500, and 1,000, and half were cases in case-control studies.

**Figure 4**
Analysis of the ALI Exome-Sequence Data − log₁₀ Q-Q plots of observed versus expected p values for the ALI exome-sequence data for the six methods: burden tests (N and W), SKAT, SKAT-O, adjusted SKAT, and adjusted SKAT-O. The x axis represents − log₁₀ expected p values, and the y axis represents − log₁₀ observed p values. A total of 6,488 genes with at least four rare variants were tested for associations with ALI severity.

See this image and copyright information in PMC

References

1. Shendure J., Ji H. Next-generation DNA sequencing. Nat. Biotechnol. 2008;26:1135–1145. - PubMed
1. Mardis E.R. Next-generation DNA sequencing methods. Annu. Rev. Genomics Hum. Genet. 2008;9:387–402. - PubMed
1. Bodmer W., Bonilla C. Common and rare variants in multifactorial susceptibility to common diseases. Nat. Genet. 2008;40:695–701. - PMC - PubMed
1. Schork N.J., Murray S.S., Frazer K.A., Topol E.J. Common vs. rare allele hypotheses for complex diseases. Curr. Opin. Genet. Dev. 2009;19:212–219. - PMC - PubMed
1. Cohen J.C., Kiss R.S., Pertsemlidis A., Marcel Y.L., McPherson R., Hobbs H.H. Multiple rare alleles contribute to low plasma levels of HDL cholesterol. Science. 2004;305:869–872. - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
Medical
- MedlinePlus Health Information
Molecular Biology Databases
- NIAID Data Ecosystem - Find datasets on Infectious and Immune-mediated Diseases

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Optimal unified approach for rare-variant association testing with application to small-sample case-control whole-exome sequencing studies

Affiliation

Optimal unified approach for rare-variant association testing with application to small-sample case-control whole-exome sequencing studies

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Medical

Molecular Biology Databases