. 2010 Oct 14;6(10):e1000954.

doi: 10.1371/journal.pcbi.1000954.

A covering method for detecting genetic associations between rare variants and common phenotypes

Gaurav Bhatia¹, Vikas Bansal, Olivier Harismendy, Nicholas J Schork, Eric J Topol, Kelly Frazer, Vineet Bafna

Affiliations

PMID: 20976246
PMCID: PMC2954823
DOI: 10.1371/journal.pcbi.1000954

A covering method for detecting genetic associations between rare variants and common phenotypes

Gaurav Bhatia et al. PLoS Comput Biol. 2010.

. 2010 Oct 14;6(10):e1000954.

doi: 10.1371/journal.pcbi.1000954.

Authors

Gaurav Bhatia¹, Vikas Bansal, Olivier Harismendy, Nicholas J Schork, Eric J Topol, Kelly Frazer, Vineet Bafna

Affiliation

¹ Department of Computer Science and Engineering, University of California San Diego, La Jolla, California, USA. gbhatia@mit.edu

PMID: 20976246
PMCID: PMC2954823
DOI: 10.1371/journal.pcbi.1000954

Abstract

Genome wide association (GWA) studies, which test for association between common genetic markers and a disease phenotype, have shown varying degrees of success. While many factors could potentially confound GWA studies, we focus on the possibility that multiple, rare variants (RVs) may act in concert to influence disease etiology. Here, we describe an algorithm for RV analysis, RareCover. The algorithm combines a disparate collection of RVs with low effect and modest penetrance. Further, it does not require the rare variants be adjacent in location. Extensive simulations over a range of assumed penetrance and population attributable risk (PAR) values illustrate the power of our approach over other published methods, including the collapsing and weighted-collapsing strategies. To showcase the method, we apply RareCover to re-sequencing data from a cohort of 289 individuals at the extremes of Body Mass Index distribution (NCT00263042). Individual samples were re-sequenced at two genes, FAAH and MGLL, known to be involved in endocannabinoid metabolism (187Kbp for 148 obese and 150 controls). The RareCover analysis identifies exactly one significantly associated region in each gene, each about 5 Kbp in the upstream regulatory regions. The data suggests that the RVs help disrupt the expression of the two genes, leading to lowered metabolism of the corresponding cannabinoids. Overall, our results point to the power of including RVs in measuring genetic associations.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

**Figure 1. Permutation -values versus the statistic value on the union-variant .**
The mean of the empirical -values (obtained by permuting cases and controls) were plotted against each value of the statistic obtained over many tests over the entire range of simulation parameters, by varying sample size , locus PAR, and penetrance. As is the most significant subset among many possible subsets, the theoretical -value suggested by the distribution cannot be used directly. However, the plot shows that the locus value correlates tightly with the -value, implying that the union statistic can be used to filter the significant windows with no loss of power. The saturation at the ends is due to the number of trial being limited to .

formula image — **Figure 1. Permutation -values versus the statistic value on the union-variant .**
The mean of the empirical -values (obtained by permuting cases and controls) were plotted against each value of the statistic obtained over many tests over the entire range of simulation parameters, by varying sample size , locus PAR, and penetrance. As is the most significant subset among many possible subsets, the theoretical -value suggested by the distribution cannot be used directly. However, the plot shows that the locus value correlates tightly with the -value, implying that the union statistic can be used to filter the significant windows with no loss of power. The saturation at the ends is due to the number of trial being limited to .

**Figure 2. Power of RV analyses, tested over different values of penetrance , PAR , and individuals (cases+controls).**
For each choice of parameters, test cases were simulated. Each test-case was analyzed using methods, and the -value computed using permutations of cases and controls. The score is considered significant only if it is higher than all permuted values. The power of the test is the fraction of test-cases that had a significant score. RareCover dominates the other methods implying greater power over all choice of parameters. For all methods, power increases with an increase in , or sample size.

**Figure 3. Comparisons between causal RVs, and RVs recovered by RareCover.**
The -axis describes the raw number of causal RVs (), RVs recovered (), their intersection, and the fraction recovered (, scaled for exposition). Close to of the causal RVs are recovered over a wide range of sample populations.

**Figure 4. Power calculations on populations with bottleneck, and recent expansion.**
Simulated population data with quantitative trait (QT) values was provided by Kryukov et al. The QT values are normally distributed. Individuals carrying any causal mutation have QT values drawn from a Normal distribution with a shifted mean. The shift is characterized as Low (), Medium (), and High (). As the locus PAR values are low, power is computed as the fraction of simulations that showed significance at -value . Individuals were chosen from the lower (Control) and upper (Case) tails of the QT distribution. The power of all methods is compared using the % extremes ( cases, controls), and the % ( cases, and controls). RareCover is shown to have the highest power, comparable to the power of the causal mutations.

**Figure 5. Allele frequency spectra in various demographic models.**
BRE refers to the simulation of population under bottleneck followed by recent expansion from Kryukov et al.; CP refers to the simulation under a constant population size. The allele frequencies in CP are biased toward rare variants in cases, while there is little bias in BRE. The performance of RareCover is robust to data sets with different allele frequency spectra.

**Figure 6. Running time of RareCover as a function of sample size, and number of SNPs.**
As RareCover is a greedy approach, the running time increases linearly with an increase in number of SNPs, and individuals. The running time shown here does not include the time for disk input and output of the data, which incurs a fixed additional cost of ms to each run. The total running time is about twice that of single marker tests.

**Figure 7. FAAH locus association.**
RareCover was used to analyze overlapping windows of Kbp in the re-sequenced region around FAAH. A -value was computed for each window using permutations of cases and controls. Each point corresponds to the -value of a single window starting at that location. The most significant window (described by the box) is Kbp upstream of the FAAH transcription start site. The region is part of an LTR element, which are known to carry regulatory signals, and is enriched in transcription factor binding sites, suggesting a regulatory role for the rare variants.

See this image and copyright information in PMC

References

1. Lander ES. The new genomics: global views of biology. Science. 1996;274:536–539. - PubMed
1. Pritchard JK, Cox NJ. The allelic architecture of human disease genes: common disease-common variant…or not? Hum Mol Genet. 2002;11:2417–2423. - PubMed
1. Reich DE, Lander ES. On the allelic spectrum of human disease. Trends Genet. 2001;17:502–510. - PubMed
1. Consortium TWTCC. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature. 2007;447:661–678. - PMC - PubMed
1. Frayling TM, Timpson NJ, Weedon MN, Zeggini E, Freathy RM, et al. A common variant in the FTO gene is associated with body mass index and predisposes to childhood and adult obesity. Science. 2007;316:889–894. - PMC - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions
Actions
Actions
Actions

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

A covering method for detecting genetic associations between rare variants and common phenotypes

Affiliation

A covering method for detecting genetic associations between rare variants and common phenotypes

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Substances

Associated data

Grants and funding

LinkOut - more resources

Full Text Sources

Medical