Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Oct 14;6(10):e1000954.
doi: 10.1371/journal.pcbi.1000954.

A covering method for detecting genetic associations between rare variants and common phenotypes

Affiliations

A covering method for detecting genetic associations between rare variants and common phenotypes

Gaurav Bhatia et al. PLoS Comput Biol. .

Abstract

Genome wide association (GWA) studies, which test for association between common genetic markers and a disease phenotype, have shown varying degrees of success. While many factors could potentially confound GWA studies, we focus on the possibility that multiple, rare variants (RVs) may act in concert to influence disease etiology. Here, we describe an algorithm for RV analysis, RareCover. The algorithm combines a disparate collection of RVs with low effect and modest penetrance. Further, it does not require the rare variants be adjacent in location. Extensive simulations over a range of assumed penetrance and population attributable risk (PAR) values illustrate the power of our approach over other published methods, including the collapsing and weighted-collapsing strategies. To showcase the method, we apply RareCover to re-sequencing data from a cohort of 289 individuals at the extremes of Body Mass Index distribution (NCT00263042). Individual samples were re-sequenced at two genes, FAAH and MGLL, known to be involved in endocannabinoid metabolism (187Kbp for 148 obese and 150 controls). The RareCover analysis identifies exactly one significantly associated region in each gene, each about 5 Kbp in the upstream regulatory regions. The data suggests that the RVs help disrupt the expression of the two genes, leading to lowered metabolism of the corresponding cannabinoids. Overall, our results point to the power of including RVs in measuring genetic associations.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Permutation -values versus the statistic value on the union-variant .
The mean of the empirical formula image-values (obtained by permuting cases and controls) were plotted against each value of the formula image statistic obtained over many tests over the entire range of simulation parameters, by varying sample size formula image, locus PAR, and penetrance. As formula image is the most significant subset among many possible subsets, the theoretical formula image-value suggested by the formula image distribution cannot be used directly. However, the plot shows that the locus formula image value correlates tightly with the formula image-value, implying that the union formula image statistic can be used to filter the significant windows with no loss of power. The saturation at the ends is due to the number of trial being limited to formula image.
Figure 2
Figure 2. Power of RV analyses, tested over different values of penetrance , PAR , and individuals (cases+controls).
For each choice of parameters, formula image test cases were simulated. Each test-case was analyzed using formula image methods, and the formula image-value computed using formula image permutations of cases and controls. The score is considered significant only if it is higher than all permuted values. The power of the test is the fraction of test-cases that had a significant score. RareCover dominates the other methods implying greater power over all choice of parameters. For all methods, power increases with an increase in formula image, or sample size.
Figure 3
Figure 3. Comparisons between causal RVs, and RVs recovered by RareCover.
The formula image-axis describes the raw number of causal RVs (formula image), RVs recovered (formula image), their intersection, and the fraction recovered (formula image, scaled for exposition). Close to formula image of the causal RVs are recovered over a wide range of sample populations.
Figure 4
Figure 4. Power calculations on populations with bottleneck, and recent expansion.
Simulated population data with quantitative trait (QT) values was provided by Kryukov et al. The QT values are normally distributed. Individuals carrying any causal mutation have QT values drawn from a Normal distribution with a shifted mean. The shift is characterized as Low (formula image), Medium (formula image), and High (formula image). As the locus PAR values are low, power is computed as the fraction of formula image simulations that showed significance at formula image-value formula image. Individuals were chosen from the lower (Control) and upper (Case) tails of the QT distribution. The power of all methods is compared using the formula image% extremes (formula image cases, formula image controls), and the formula image% (formula image cases, and formula image controls). RareCover is shown to have the highest power, comparable to the power of the causal mutations.
Figure 5
Figure 5. Allele frequency spectra in various demographic models.
BRE refers to the simulation of population under bottleneck followed by recent expansion from Kryukov et al.; CP refers to the simulation under a constant population size. The allele frequencies in CP are biased toward rare variants in cases, while there is little bias in BRE. The performance of RareCover is robust to data sets with different allele frequency spectra.
Figure 6
Figure 6. Running time of RareCover as a function of sample size, and number of SNPs.
As RareCover is a greedy approach, the running time increases linearly with an increase in number of SNPs, and individuals. The running time shown here does not include the time for disk input and output of the data, which incurs a fixed additional cost of formula imagems to each run. The total running time is about twice that of single marker tests.
Figure 7
Figure 7. FAAH locus association.
RareCover was used to analyze overlapping windows of formula imageKbp in the re-sequenced region around FAAH. A formula image-value was computed for each window using formula image permutations of cases and controls. Each point corresponds to the formula image-value of a single window starting at that location. The most significant window (described by the box) is formula imageKbp upstream of the FAAH transcription start site. The region is part of an LTR element, which are known to carry regulatory signals, and is enriched in transcription factor binding sites, suggesting a regulatory role for the rare variants.

Similar articles

Cited by

References

    1. Lander ES. The new genomics: global views of biology. Science. 1996;274:536–539. - PubMed
    1. Pritchard JK, Cox NJ. The allelic architecture of human disease genes: common disease-common variant…or not? Hum Mol Genet. 2002;11:2417–2423. - PubMed
    1. Reich DE, Lander ES. On the allelic spectrum of human disease. Trends Genet. 2001;17:502–510. - PubMed
    1. Consortium TWTCC. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature. 2007;447:661–678. - PMC - PubMed
    1. Frayling TM, Timpson NJ, Weedon MN, Zeggini E, Freathy RM, et al. A common variant in the FTO gene is associated with body mass index and predisposes to childhood and adult obesity. Science. 2007;316:889–894. - PMC - PubMed

Publication types

Substances

Associated data