Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 May 29;44(6):623-30.
doi: 10.1038/ng.2303.

Exome sequencing and the genetic basis of complex traits

Affiliations

Exome sequencing and the genetic basis of complex traits

Adam Kiezun et al. Nat Genet. .

Abstract

Exome sequencing is emerging as a popular approach to study the effect of rare coding variants on complex phenotypes. The promise of exome sequencing is grounded in theoretical population genetics and in empirical successes of candidate gene sequencing studies. Many projects aimed at common diseases are underway, and their results are eagerly anticipated. In this Perspective, using exome sequencing data from 438 individuals, we discuss several aspects of exome sequencing studies that we view as particularly important. We review processing and quality control of raw sequence data, evaluate the statistical properties of exome sequencing studies, discuss rare variant burden tests to detect association to phenotypes, and demonstrate the importance of accounting for population stratification in the analysis of rare variants. We conclude that enthusiasm for exome sequencing studies of complex traits should be combined with the caution that thousands of samples may be required to reach sufficient statistical power.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors declare that they have no competing financial interests.

Figures

Figure 1
Figure 1
Discovery of novel variants for increasing numbers of samples. For each functional class, the fold-increase over the number of variants in one sample for that class is plotted as a function of the number of samples in a sequencing experiment. For example, the number of nonsense variants discovered in 300 samples is 40 times greater than the average number discovered in a single sample while the number of synonymous variants is only 10 times greater (although the absolute number of nonsense variants is a relatively minor proportion of the total variation discovered); this effect is due to purifying selection. All classes of variants are discovered at rates exceeding what would be predicted under a neutral model of evolution in a population of constant size, an effect of population growth. The crossing between curves for synonymous variants and the theoretical prediction most likely is a signature of the out-of-Africa bottleneck. See Methods for additional details.
Figure 2
Figure 2
Association analysis. (a) Q-Q plot of association p-values under the null hypothesis. (b) Distributions of lowest p-values under whole-exome permutations. The histograms show the distributions of the lowest p-values across permutations for the T5 test. The red vertical line indicates the 0.05 exome-wide significance level for the most significant gene (i.e., the most significant gene is exome-wide significant if its p-value is lower that the level indicated by the red line).
Figure 2
Figure 2
Association analysis. (a) Q-Q plot of association p-values under the null hypothesis. (b) Distributions of lowest p-values under whole-exome permutations. The histograms show the distributions of the lowest p-values across permutations for the T5 test. The red vertical line indicates the 0.05 exome-wide significance level for the most significant gene (i.e., the most significant gene is exome-wide significant if its p-value is lower that the level indicated by the red line).
Figure 3
Figure 3
Extrapolation of gene burden results. Horizontal solid red line shows Bonferroni genome-wide significance threshold of P = 2.5 × 10−6. Horizontal dashed line shows the threshold derived from whole-exome permutations (Figure 2b). For larger sample sizes, the permutation threshold would be closer to the Bonferroni threshold, asymptotically approaching it as the sample sizes increase.

Similar articles

Cited by

References

    1. Fuller CW, et al. The challenges of sequencing by synthesis. Nature Biotechnology. 2009;27:1013–1023. - PubMed
    1. Rusk N, Kiermer V. Primer: Sequencing—the next generation. Nature Methods. 2008;5:15. - PubMed
    1. Metzker ML. Sequencing technologies the next generation. Nature Reviews Genetics. 2009;11:31–46. - PubMed
    1. Shendure J, Ji H. Next-generation DNA sequencing. Nature Biotechnology. 2008;26:1135–1145. - PubMed
    1. Clarke J, et al. Continuous base identification for single-molecule nanopore DNA sequencing. Nature Nanotechnology. 2009;4:265–270. - PubMed

Publication types

Grants and funding