Optimal tests for rare variant effects in sequencing association studies

Seunggeun Lee¹, Michael C Wu, Xihong Lin

Affiliations

PMID: 22699862
PMCID: PMC3440237
DOI: 10.1093/biostatistics/kxs014

Comparative Study

Optimal tests for rare variant effects in sequencing association studies

Seunggeun Lee et al. Biostatistics. 2012 Sep.

. 2012 Sep;13(4):762-75.

doi: 10.1093/biostatistics/kxs014. Epub 2012 Jun 14.

Authors

Seunggeun Lee¹, Michael C Wu, Xihong Lin

Affiliation

¹ Department of Biostatistics, Harvard School of Public Health, Boston, MA 02115, USA.

PMID: 22699862
PMCID: PMC3440237
DOI: 10.1093/biostatistics/kxs014

Abstract

With development of massively parallel sequencing technologies, there is a substantial need for developing powerful rare variant association tests. Common approaches include burden and non-burden tests. Burden tests assume all rare variants in the target region have effects on the phenotype in the same direction and of similar magnitude. The recently proposed sequence kernel association test (SKAT) (Wu, M. C., and others, 2011. Rare-variant association testing for sequencing data with the SKAT. The American Journal of Human Genetics 89, 82-93], an extension of the C-alpha test (Neale, B. M., and others, 2011. Testing for an unusual distribution of rare variants. PLoS Genetics 7, 161-165], provides a robust test that is particularly powerful in the presence of protective and deleterious variants and null variants, but is less powerful than burden tests when a large number of variants in a region are causal and in the same direction. As the underlying biological mechanisms are unknown in practice and vary from one gene to another across the genome, it is of substantial practical interest to develop a test that is optimal for both scenarios. In this paper, we propose a class of tests that include burden tests and SKAT as special cases, and derive an optimal test within this class that maximizes power. We show that this optimal test outperforms burden tests and SKAT in a wide range of scenarios. The results are illustrated using simulation studies and triglyceride data from the Dallas Heart Study. In addition, we have derived sample size/power calculation formula for SKAT with a new family of kernels to facilitate designing new sequence association studies.

PubMed Disclaimer

Figures

**Fig. 1.**
Empirical power of SKAT-O and competing methods at α=2.5×10⁻⁶ using simulation studies when region *size*=3 kb and β±=100/0. Top panel considers continuous phenotypes and bottom panel considers dichotomous phenotypes. From left to right, the plots consider the setting in which 10% of rare variants were causal, 20% of rare variants were causal, and 50% of rare variants were causal. The detailed simulation setups are described in “Simulations and Real Data Analysis”.

**Fig. 2.**
Empirical power of SKAT-O and competing methods at α=2.5×10⁻⁶ using simulation studies when region *size*=3 kb and β±=80/20. Top panel considers continuous phenotypes and bottom panel considers dichotomous phenotypes. From left to right, the plots consider the setting in which 10% of rare variants were causal, 20% of rare variants were causal, and 50% of rare variants were causal. The detailed simulation setups are described in “Simulations and Real Data Analysis”.

**Fig. 3.**
Single variant analysis results of Dallas Heart Study data. (a) Histogram of minor allele frequencies of 92 variants with *MAF*<0.03. (b)–(d) Plots of versus t-statistic values of each variant of ANGPTL3, 4, and 5 genes. The dashed line represents the 95% confidence interval of no-association.

formula image — **Fig. 3.**
Single variant analysis results of Dallas Heart Study data. (a) Histogram of minor allele frequencies of 92 variants with *MAF*<0.03. (b)–(d) Plots of versus t-statistic values of each variant of ANGPTL3, 4, and 5 genes. The dashed line represents the 95% confidence interval of no-association.

See this image and copyright information in PMC

References

1. Basu S., Pan W. Comparison of statistical tests for disease association with rare variants. Genetic Epidemiology. 2011;35:606–619. - PMC - PubMed
1. Cohen J. C., Kiss R. S., Pertsemlidis A., Marcel Y. L., McPherson R., Hobbs H. H. Multiple rare alleles contribute to low plasma levels of HDL cholesterol. Science. 2004;305:869. - PubMed
1. Davies R. B. Algorithm AS 155: the distribution of a linear combination of χ2 random variables. Applied Statistics. 1980;29:323–333.
1. Davies R. B. Hypothesis testing when a nuisance parameter is present only under the alternative. Biometrika. 1987;74:33. - PubMed
1. Kryukov G. V., Pennacchio L. A., Sunyaev S. R. Most rare missense alleles are deleterious in humans: implications for complex disease and association studies. The American Journal of Human Genetics. 2007;80:727–739. - PMC - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Optimal tests for rare variant effects in sequencing association studies

Affiliation

Optimal tests for rare variant effects in sequencing association studies

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Research Materials