. 2012;7(2):e30238.

doi: 10.1371/journal.pone.0030238. Epub 2012 Feb 17.

Reconsidering association testing methods using single-variant test statistics as alternatives to pooling tests for sequence data with rare variants

Daniel D Kinnamon¹, Ray E Hershberger, Eden R Martin

Affiliations

PMID: 22363423
PMCID: PMC3281828
DOI: 10.1371/journal.pone.0030238

Reconsidering association testing methods using single-variant test statistics as alternatives to pooling tests for sequence data with rare variants

Daniel D Kinnamon et al. PLoS One. 2012.

. 2012;7(2):e30238.

doi: 10.1371/journal.pone.0030238. Epub 2012 Feb 17.

Authors

Daniel D Kinnamon¹, Ray E Hershberger, Eden R Martin

Affiliation

¹ Dr. John T. Macdonald Foundation Department of Human Genetics, Miller School of Medicine, University of Miami, Miami, Florida, United States of America.

PMID: 22363423
PMCID: PMC3281828
DOI: 10.1371/journal.pone.0030238

Abstract

Association tests that pool minor alleles into a measure of burden at a locus have been proposed for case-control studies using sequence data containing rare variants. However, such pooling tests are not robust to the inclusion of neutral and protective variants, which can mask the association signal from risk variants. Early studies proposing pooling tests dismissed methods for locus-wide inference using nonnegative single-variant test statistics based on unrealistic comparisons. However, such methods are robust to the inclusion of neutral and protective variants and therefore may be more useful than previously appreciated. In fact, some recently proposed methods derived within different frameworks are equivalent to performing inference on weighted sums of squared single-variant score statistics. In this study, we compared two existing methods for locus-wide inference using nonnegative single-variant test statistics to two widely cited pooling tests under more realistic conditions. We established analytic results for a simple model with one rare risk and one rare neutral variant, which demonstrated that pooling tests were less powerful than even Bonferroni-corrected single-variant tests in most realistic situations. We also performed simulations using variants with realistic minor allele frequency and linkage disequilibrium spectra, disease models with multiple rare risk variants and extensive neutral variation, and varying rates of missing genotypes. In all scenarios considered, existing methods using nonnegative single-variant test statistics had power comparable to or greater than two widely cited pooling tests. Moreover, in disease models with only rare risk variants, an existing method based on the maximum single-variant Cochran-Armitage trend chi-square statistic in the locus had power comparable to or greater than another existing method closely related to some recently proposed methods. We conclude that efficient locus-wide inference using single-variant test statistics should be reconsidered as a useful framework for devising powerful association tests in sequence data with rare variants.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

**Figure 1. MAF and within-gene pairwise LD distributions in actual sequence data.**
Distributions of MAFs (Panel A) and within-gene pairwise LD (Panel B) for biallelic variants in six candidate genes for dilated cardiomyopathy. Pairwise LD was measured by the correlation coefficient (r) between major/minor alleles for variants within the same gene. These distributions were estimated from 184 Coriell samples of European descent. The vertical dashed line in Panel B indicates r = 0.

**Figure 2. Analytic power comparisons in a small sample (N = 500).**
Analytic locus-wide power at α = 0.05 of the BC-CA (lower bound), collapsing, and summing tests at a locus comprising one neutral and one risk variant as a function of the pairwise correlation coefficient between major/minor alleles (r). The variants had the same MAF = 0.005 (Panel A) or MAF = 0.01 (Panel B), and the relative risk was 3 (Panel A) or 2 (Panel B) for each additional minor allele at the risk variant. Both panels assume penetrance of 0.05 for the major allele homozygote at the risk variant and a balanced case-control sample with N = 500 total subjects.

**Figure 3. Simulated type I error rate comparison.**
Monte Carlo estimates of rejection rates for each association testing procedure based on 1,000 samples from a null disease model with no risk variants. Estimates are reported by call rate, nominal α level, and sample size (N). Error bars represent exact binomial 95% confidence intervals for the rejection rate, and dashed horizontal lines are included at the nominal α level. The CMC could not be performed at a call rate of 95% because no individual had complete genotype data in any sample; at a call rate of 99.5%, CMC results with F ddf>4 were available in 23, 619, and 992 samples for N = 500, 1,000, and 2,000, respectively.

**Figure 4. Simulated power comparison for rare risk variants (MAF<0.005; OR = 3).**
Monte Carlo estimates of rejection rates for each association testing procedure based on 1,000 samples from a disease model with 50 rare risk variants (MAF<0.005; OR = 3), which represent ∼5% of all variants in the locus in the average population. Estimates are reported by call rate, nominal α level, and sample size (N). Error bars represent exact binomial 95% confidence intervals for the rejection rate. The CMC could not be performed at a call rate of 95% because no individual had complete genotype data in any sample; at a call rate of 99.5%, CMC results with F ddf>4 were available in 22, 596, and 991 samples for N = 500, 1,000, and 2,000, respectively.

**Figure 5. Simulated power comparison for rare risk variants (MAF<0.01; OR = 2).**
Monte Carlo estimates of rejection rates for each association testing procedure based on 1,000 samples from a disease model with 50 rare risk variants (MAF<0.01; OR = 2), which represent ∼5% of all variants in the locus in the average population. Estimates are reported by call rate, nominal α level, and sample size (N). Error bars represent exact binomial 95% confidence intervals for the rejection rate. The CMC could not be performed at a call rate of 95% because no individual had complete genotype data in any sample; at a call rate of 99.5%, CMC results with F ddf>4 were available in 18, 573, and 986 samples for N = 500, 1,000, and 2,000, respectively.

**Figure 6. Simulated power comparison for a mixture of rare and common risk variants.**
Monte Carlo estimates of rejection rates for each association testing procedure based on 1,000 samples from a disease model with 50 total risk variants, which represent ∼5% of all variants in the locus in the average population, randomly allocated between rare variants (MAF<0.01; OR = 2), low-frequency variants (0.01≤MAF<0.05; OR = 1.5), and common variants (0.05≤MAF<0.10; OR = 1.2). Estimates are reported by call rate, nominal α level, and sample size (N). Error bars represent exact binomial 95% confidence intervals for the rejection rate. The CMC could not be performed at a call rate of 95% because no individual had complete genotype data in any sample; at a call rate of 99.5%, CMC results with F ddf>4 were available in 15, 564, and 989 samples for N = 500, 1,000, and 2,000, respectively.

See this image and copyright information in PMC

Cited by

A hybrid likelihood model for sequence-based disease association studies.
Chen YC, Carter H, Parla J, Kramer M, Goes FS, Pirooznia M, Zandi PP, McCombie WR, Potash JB, Karchin R. Chen YC, et al. PLoS Genet. 2013;9(1):e1003224. doi: 10.1371/journal.pgen.1003224. Epub 2013 Jan 24. PLoS Genet. 2013. PMID: 23358228 Free PMC article.
Linear regression in genetic association studies.
Bůžková P. Bůžková P. PLoS One. 2013;8(2):e56976. doi: 10.1371/journal.pone.0056976. Epub 2013 Feb 21. PLoS One. 2013. PMID: 23437286 Free PMC article.
Single-variant and multi-variant trend tests for genetic association with next-generation sequencing that are robust to sequencing error.
Kim W, Londono D, Zhou L, Xing J, Nato AQ, Musolf A, Matise TC, Finch SJ, Gordon D. Kim W, et al. Hum Hered. 2012;74(3-4):172-83. doi: 10.1159/000346824. Epub 2013 Apr 11. Hum Hered. 2012. PMID: 23594495 Free PMC article.
The Complex and Diverse Genetic Architecture of Dilated Cardiomyopathy.
Hershberger RE, Cowan J, Jordan E, Kinnamon DD. Hershberger RE, et al. Circ Res. 2021 May 14;128(10):1514-1532. doi: 10.1161/CIRCRESAHA.121.318157. Epub 2021 May 13. Circ Res. 2021. PMID: 33983834 Free PMC article. Review.
The impact of population demography and selection on the genetic architecture of complex traits.
Lohmueller KE. Lohmueller KE. PLoS Genet. 2014 May 29;10(5):e1004379. doi: 10.1371/journal.pgen.1004379. eCollection 2014. PLoS Genet. 2014. PMID: 24875776 Free PMC article.

See all "Cited by" articles

References

1. Bodmer W, Bonilla C. Common and rare variants in multifactorial susceptibility to common diseases. Nat Genet. 2008;40:695–701. - PMC - PubMed
1. Pritchard JK. Are rare variants responsible for susceptibility to complex diseases? Am J Hum Genet. 2001;69:124–137. - PMC - PubMed
1. McClellan J, King M-C. Genetic heterogeneity in human disease. Cell. 2010;141:210–217. - PubMed
1. Schork NJ, Murray SS, Frazer KA, Topol EJ. Common vs. rare allele hypotheses for complex diseases. Curr Opin Genet Dev. 2009;19:212–219. - PMC - PubMed
1. Li B, Leal SM. Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. Am J Hum Genet. 2008;83:311–321. - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Reconsidering association testing methods using single-variant test statistics as alternatives to pooling tests for sequence data with rare variants

Affiliation

Reconsidering association testing methods using single-variant test statistics as alternatives to pooling tests for sequence data with rare variants

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Related information

Grants and funding

LinkOut - more resources

Full Text Sources