Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 Aug 10;91(2):224-37.
doi: 10.1016/j.ajhg.2012.06.007. Epub 2012 Aug 2.

Optimal unified approach for rare-variant association testing with application to small-sample case-control whole-exome sequencing studies

Affiliations

Optimal unified approach for rare-variant association testing with application to small-sample case-control whole-exome sequencing studies

Seunggeun Lee et al. Am J Hum Genet. .

Abstract

We propose in this paper a unified approach for testing the association between rare variants and phenotypes in sequencing association studies. This approach maximizes power by adaptively using the data to optimally combine the burden test and the nonburden sequence kernel association test (SKAT). Burden tests are more powerful when most variants in a region are causal and the effects are in the same direction, whereas SKAT is more powerful when a large fraction of the variants in a region are noncausal or the effects of causal variants are in different directions. The proposed unified test maintains the power in both scenarios. We show that the unified test corresponds to the optimal test in an extended family of SKAT tests, which we refer to as SKAT-O. The second goal of this paper is to develop a small-sample adjustment procedure for the proposed methods for the correction of conservative type I error rates of SKAT family tests when the trait of interest is dichotomous and the sample size is small. Both small-sample-adjusted SKAT and the optimal unified test (SKAT-O) are computationally efficient and can easily be applied to genome-wide sequencing association studies. We evaluate the finite sample performance of the proposed methods using extensive simulation studies and illustrate their application using the acute-lung-injury exome-sequencing data of the National Heart, Lung, and Blood Institute Exome Sequencing Project.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Power Estimates for the Six Competing Methods when All Causal Variants Were Deleterious Empirical power of the six methods for randomly selected 3 kb regions wherein all causal variants were deleterious. From top to bottom, the plots consider the significance levels 0.01, 10 − 3, and 2.5×10 − 6, respectively. From left to right, the plots consider settings in which 10% of rare variants were causal, 20% of rare variants were causal, and 50% of rare variants were causal, respectively. For causal variants, we assumed |βj| = c|log10(pj)| / 2, where pj was the MAF of the jth variant. A different c was used for the three panels from left to right: c = log(7), log(5), log(2.5) for the percentage of causal variants being 10%, 20%, and 50% respectively. Hence, the powers between the three panels from left to right are not comparable. Total sample sizes considered were 200, 500, and 1,000, and half were cases in case-control studies.
Figure 2
Figure 2
Power Estimates for the Six Competing Methods when 20%/80% of Causal Variants Were Protective/Deleterious Empirical power of the six methods for randomly selected 3 kb regions wherein 20%/80% of causal variants were protective/deleterious. From top to bottom, the plots consider the significance levels 0.01, 10 − 3, and 2.5×10 − 6, respectively. From left to right, the plots consider settings in which 10% of rare variants were causal, 20% of rare variants were causal, and 50% of rare variants were causal, respectively. For causal variants, we assumed |βj| = c|log10(pj)| / 2, where pj was the MAF of the jth variant. A different c was used for the three panels from the left to the right: c = log(7), log(5), log(2.5) for the percentage of causal variants being 10%, 20%, and 50% respectively. Hence, the powers between the three panels from left to right are not comparable. Total sample sizes considered were 200, 500, and 1,000, and half were cases in case-control studies.
Figure 3
Figure 3
Power Estimates for the Six Competing Methods when 50%/50% of Causal Variants Were Protective/Deleterious Empirical power of the six methods for randomly selected 3 kb regions wherein 50%/50% of causal variants were protective/deleterious. From top to bottom, the plots consider the significance levels 0.01, 10 − 3, and 2.5×10 − 6, respectively. From left to right, the plots consider settings in which 10% of rare variants were causal, 20% of rare variants were causal, and 50% of rare variants were causal, respectively. For causal variants, we assumed |βj| = c|log10(pj)| / 2, where pj was the MAF of the jth variant. A different c was used for the three panels from left to right: c = log(7), log(5), log(2.5) for the percentage of causal variants being 10%, 20%, and 50%. Hence, the powers between the three panels from left to right are not comparable. Total sample sizes considered were 200, 500, and 1,000, and half were cases in case-control studies.
Figure 4
Figure 4
Analysis of the ALI Exome-Sequence Data − log10 Q-Q plots of observed versus expected p values for the ALI exome-sequence data for the six methods: burden tests (N and W), SKAT, SKAT-O, adjusted SKAT, and adjusted SKAT-O. The x axis represents − log10 expected p values, and the y axis represents − log10 observed p values. A total of 6,488 genes with at least four rare variants were tested for associations with ALI severity.

References

    1. Shendure J., Ji H. Next-generation DNA sequencing. Nat. Biotechnol. 2008;26:1135–1145. - PubMed
    1. Mardis E.R. Next-generation DNA sequencing methods. Annu. Rev. Genomics Hum. Genet. 2008;9:387–402. - PubMed
    1. Bodmer W., Bonilla C. Common and rare variants in multifactorial susceptibility to common diseases. Nat. Genet. 2008;40:695–701. - PMC - PubMed
    1. Schork N.J., Murray S.S., Frazer K.A., Topol E.J. Common vs. rare allele hypotheses for complex diseases. Curr. Opin. Genet. Dev. 2009;19:212–219. - PMC - PubMed
    1. Cohen J.C., Kiss R.S., Pertsemlidis A., Marcel Y.L., McPherson R., Hobbs H.H. Multiple rare alleles contribute to low plasma levels of HDL cholesterol. Science. 2004;305:869–872. - PubMed

Publication types