Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2012 Sep;13(4):762-75.
doi: 10.1093/biostatistics/kxs014. Epub 2012 Jun 14.

Optimal tests for rare variant effects in sequencing association studies

Affiliations
Comparative Study

Optimal tests for rare variant effects in sequencing association studies

Seunggeun Lee et al. Biostatistics. 2012 Sep.

Abstract

With development of massively parallel sequencing technologies, there is a substantial need for developing powerful rare variant association tests. Common approaches include burden and non-burden tests. Burden tests assume all rare variants in the target region have effects on the phenotype in the same direction and of similar magnitude. The recently proposed sequence kernel association test (SKAT) (Wu, M. C., and others, 2011. Rare-variant association testing for sequencing data with the SKAT. The American Journal of Human Genetics 89, 82-93], an extension of the C-alpha test (Neale, B. M., and others, 2011. Testing for an unusual distribution of rare variants. PLoS Genetics 7, 161-165], provides a robust test that is particularly powerful in the presence of protective and deleterious variants and null variants, but is less powerful than burden tests when a large number of variants in a region are causal and in the same direction. As the underlying biological mechanisms are unknown in practice and vary from one gene to another across the genome, it is of substantial practical interest to develop a test that is optimal for both scenarios. In this paper, we propose a class of tests that include burden tests and SKAT as special cases, and derive an optimal test within this class that maximizes power. We show that this optimal test outperforms burden tests and SKAT in a wide range of scenarios. The results are illustrated using simulation studies and triglyceride data from the Dallas Heart Study. In addition, we have derived sample size/power calculation formula for SKAT with a new family of kernels to facilitate designing new sequence association studies.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Empirical power of SKAT-O and competing methods at α=2.5×10−6 using simulation studies when region size=3 kb and β±=100/0. Top panel considers continuous phenotypes and bottom panel considers dichotomous phenotypes. From left to right, the plots consider the setting in which 10% of rare variants were causal, 20% of rare variants were causal, and 50% of rare variants were causal. The detailed simulation setups are described in “Simulations and Real Data Analysis”.
Fig. 2.
Fig. 2.
Empirical power of SKAT-O and competing methods at α=2.5×10−6 using simulation studies when region size=3 kb and β±=80/20. Top panel considers continuous phenotypes and bottom panel considers dichotomous phenotypes. From left to right, the plots consider the setting in which 10% of rare variants were causal, 20% of rare variants were causal, and 50% of rare variants were causal. The detailed simulation setups are described in “Simulations and Real Data Analysis”.
Fig. 3.
Fig. 3.
Single variant analysis results of Dallas Heart Study data. (a) Histogram of minor allele frequencies of 92 variants with MAF<0.03. (b)–(d) Plots of formula image versus t-statistic values of each variant of ANGPTL3, 4, and 5 genes. The dashed line represents the 95% confidence interval of no-association.

References

    1. Basu S., Pan W. Comparison of statistical tests for disease association with rare variants. Genetic Epidemiology. 2011;35:606–619. - PMC - PubMed
    1. Cohen J. C., Kiss R. S., Pertsemlidis A., Marcel Y. L., McPherson R., Hobbs H. H. Multiple rare alleles contribute to low plasma levels of HDL cholesterol. Science. 2004;305:869. - PubMed
    1. Davies R. B. Algorithm AS 155: the distribution of a linear combination of χ2 random variables. Applied Statistics. 1980;29:323–333.
    1. Davies R. B. Hypothesis testing when a nuisance parameter is present only under the alternative. Biometrika. 1987;74:33. - PubMed
    1. Kryukov G. V., Pennacchio L. A., Sunyaev S. R. Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics. 2007;80:727–739. - PMC - PubMed

Publication types