Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2010 Nov;11(11):773-85.
doi: 10.1038/nrg2867. Epub 2010 Oct 13.

Statistical analysis strategies for association studies involving rare variants

Affiliations
Review

Statistical analysis strategies for association studies involving rare variants

Vikas Bansal et al. Nat Rev Genet. 2010 Nov.

Abstract

The limitations of genome-wide association (GWA) studies that focus on the phenotypic influence of common genetic variants have motivated human geneticists to consider the contribution of rare variants to phenotypic expression. The increasing availability of high-throughput sequencing technologies has enabled studies of rare variants but these methods will not be sufficient for their success as appropriate analytical methods are also needed. We consider data analysis approaches to testing associations between a phenotype and collections of rare variants in a defined genomic region or set of regions. Ultimately, although a wide variety of analytical approaches exist, more work is needed to refine them and determine their properties and power in different contexts.

PubMed Disclaimer

Figures

Figure 1
Figure 1. Sample size requirements and statistical power for variants of different frequencies
(A). Sample sizes necessary to detect an association between an allele with a specific effect size and a binary trait. The plots assume a standard z-test for the difference in the frequency of the allele between the two phenotypic categories. A genome-wide type I error rate of 10−9 was assumed, under the assumption that one may perform 2 orders of magnitude more tests in a complete sequence-based GWAS than a standard GWAS. (B). Similar setting to that provided in Figure 1A except the effect size depicted on the x axis gives the ratio of the frequency of the allele in the case vs. control groups. These curves give insight into the power gains associated with the collapsing strategy. Consider the black line in Figure 1B and testing a single rare variant with a frequency of 0.01 in the controls and 0.02 in the cases. This difference would require approximately 250,000 cases and controls to detect with 80% power at a super genome-wide level of significance. However, if one were to test 5 such variants with the same frequencies after collapsing them (assuming they are independent and no individual has more than one such variant), then one would effectively be testing a 0.05 frequency among the controls and a 0.10 frequency among the cases. From the red line in Figure 1B this difference would require only 3000 cases and controls. (C). Power to detect a quantitative trait locus with a sample of 1000 individuals as a function of fraction of phenotypic variation explained by the locus via standard linear regression analysis. A genome-wide type I error rate of 10−9 was assumed.
Figure 2
Figure 2. Scenarios in which DNA sequence variants distinguish cases and controls
Blue lines indicate genomic regions; red boxes indicate variants. A. Variants at a single locus with common alleles are more frequent in cases then controls. B. Multiple rare variations contribute to the phenotype such that the collective frequency of these variations is greater in cases. This would create a greater diversity of haplotypes or DNA sequences among the cases. C. Multiple rare variations contribute to the phenotype, but act in a synergistic fashion such that cases are likely to have more similar DNA sequences compared to controls. D. Multiple rare variations contribute to a phenotype, but the variations contributing to the phenotype reside in specific genomic regions. This situation would create greater sequence diversity among the cases, as in setting B, but only within the genomic regions of relevance.

Comment in

References

    1. Manolio TA, Brooks LD, Collins FS. A HapMap harvest of insights into the genetics of common disease. J Clin Invest. 2008;118:1590–605. - PMC - PubMed
    1. Manolio TA, et al. Finding the missing heritability of complex diseases. Nature. 2009;461:747–53. This paper describes the motivation for considering alternative approaches to discovering genes that influence common complex diseases. It is essentially argues that current GWAS paradigms focusing on common variants have simple failed to identify the majority of genetic variants that influence particular phenotypes. - PMC - PubMed
    1. Pinto D, et al. Functional impact of global rare copy number variation in autism spectrum disorders. Nature. 2010;466:368–72. - PMC - PubMed
    1. Frazer KA, Murray SS, Schork NJ, Topol EJ. Human genetic variation and its contribution to complex traits. Nat Rev Genet. 2009;10:241–51. - PubMed
    1. Tycko B. Mapping allele-specific DNA methylation: a new tool for maximizing information from GWAS. Am J Hum Genet. 2010;86:109–12. - PMC - PubMed

Publication types