Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Apr;9(4):e1003443.
doi: 10.1371/journal.pgen.1003443. Epub 2013 Apr 11.

Analysis of rare, exonic variation amongst subjects with autism spectrum disorders and population controls

Affiliations

Analysis of rare, exonic variation amongst subjects with autism spectrum disorders and population controls

Li Liu et al. PLoS Genet. 2013 Apr.

Abstract

We report on results from whole-exome sequencing (WES) of 1,039 subjects diagnosed with autism spectrum disorders (ASD) and 870 controls selected from the NIMH repository to be of similar ancestry to cases. The WES data came from two centers using different methods to produce sequence and to call variants from it. Therefore, an initial goal was to ensure the distribution of rare variation was similar for data from different centers. This proved straightforward by filtering called variants by fraction of missing data, read depth, and balance of alternative to reference reads. Results were evaluated using seven samples sequenced at both centers and by results from the association study. Next we addressed how the data and/or results from the centers should be combined. Gene-based analyses of association was an obvious choice, but should statistics for association be combined across centers (meta-analysis) or should data be combined and then analyzed (mega-analysis)? Because of the nature of many gene-based tests, we showed by theory and simulations that mega-analysis has better power than meta-analysis. Finally, before analyzing the data for association, we explored the impact of population structure on rare variant analysis in these data. Like other recent studies, we found evidence that population structure can confound case-control studies by the clustering of rare variants in ancestry space; yet, unlike some recent studies, for these data we found that principal component-based analyses were sufficient to control for ancestry and produce test statistics with appropriate distributions. After using a variety of gene-based tests and both meta- and mega-analysis, we found no new risk genes for ASD in this sample. Our results suggest that standard gene-based tests will require much larger samples of cases and controls before being effective for gene discovery, even for a disorder like ASD.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Distribution of rare variants per gene in Baylor and Broad data sets after filtering.
Minor allele counts (MAC) are restricted to variants with minor allele frequency formula image. Panel (A), distribution of mean MAC per sample, averaged over all genes. Panel (B), in the Baylor samples, genes were binned based on the counts of rare variants (which range from 1 to 30); for each bin the vertical axis shows the distribution of counts (boxplot) from the same genes in the Broad samples. The red line indicates an equal count in Broad and Baylor.
Figure 2
Figure 2. Theoretical power comparison: Meta versus Mega.
Theoretical power functions of meta- (red) and mega-analysis (blue) at significance level of formula image. formula image is the strength of signal per variant and formula image is the number of rare variants. (A) formula image; (B) formula image; (C) formula image; and (D) formula image.
Figure 3
Figure 3. Simulation of power.
The empirical power comparisons of SKAT applied to Broad (blue), Baylor (green), and combined via mega- (red) and meta-analysis (orange). We use causal variants to generate the phenotype based on the model in Eqn. 1 with formula image. Causal rate is the fraction of variants with formula image, which varied from formula image20% to 50%. We choose weights formula image and use SKAT to calculate the p-values for Baylor, Broad and merged data sets. We combine all singleton variants as a super-variant. For meta analysis, the weighted Z-score method combines the two p-values from Baylor and Broad for each gene. Panel (A) formula image and the significance level is set at .001; in panel (B) formula image and the significance level is set at .01.
Figure 4
Figure 4. Q–Q plot of simulation tests under the assumption that linkage disequilibrium among rare variants has little impact on the distribution of the test statistic.
144 genes are selected from the Broad data set. Each gene has exactly formula image rare variants, formula image. For each gene, we first randomly assign the phenotypes for 913 samples based on a coin toss, then calculate the test statistics formula image, and corresponding p-value computed under the assumption that formula image. We repeat this 100 times per gene, to obtain more than 10,000 p-values.
Figure 5
Figure 5. PCA from common variants, low frequency variants, and both types of variants.
Plotted are the first eigen-vector versus second eigen-vector for Broad samples. Eigen-vectors are obtained by applying PCA to all common variants that have no missingness (56,607 variants) (A), all low frequency variants that have no missingness (29,509 variants) (B), and both type of variants (C). The colors are obtained by clustering individuals based on their coordinates in panel (A) using model based clustering .
Figure 6
Figure 6. PCA for case (orange) and control (blue) samples.
Panels (A) and (B) plot the top two eigen-vectors for Baylor and Broad, respectively. Eigen-vectors are obtained by applying PCA to all common variants (CVs) that have no missingness (14,702 CVs used in Baylor and 56,607 CVs used in Broad).
Figure 7
Figure 7. Distribution of doubletons as a function of the eigen-map.
The first eigen-vector versus second eigen-vector for (A) Baylor and (B) Broad samples. Eigen-vectors are obtained by applying PCA to all common variants. For each individual, we count the number of doubletons. To indicate the relative number of doubletons per individual, points are color-coded as follows: black (bottom formula image: fewest doubletons), blue (next 25formula image), green (next 25formula image), and orange (top 25formula image: most doubletons) within the Baylor and Broad samples, respectively.
Figure 8
Figure 8. Doubletons counts versus minor allele counts (MAC_c) in common variants (CVs).
MAC_c are computed for all variants with minor allele frequency formula image. Panel (A) is the doubleton counts of Baylor cases versus MACs of CVs in the exome. Panel (B) is a zoomed in version of panel (A). Panel (C) is the doubleton counts of Broad cases versus MAC_c of CVs in the exome.
Figure 9
Figure 9. Distribution of the genomic control factor .
By permuting case/control status 100 times the distribution of formula image is obtained based on the 1000 largest genes. The red line shows the mean of the permutation distribution and the green line shows formula image obtained from the data using (A) Broad SKAT p-values obtained without eigen-vectors; (B) Broad SKAT p-values, with common variants (CVs) eigen-vectors, (C) Broad SKAT p-values, with low frequency variants (LFVs) eigen-vectors; and (D) Broad SKAT p-values, with CVs plus LFVs eigen-vectors.
Figure 10
Figure 10. -log10(observed p-values) versus -log10(expected p-values) of SKAT and Burden test for Mega-analysis.
Panel (A) shows SKAT p-values, Panel (B) shows burden test p-values. formula image and 1.047, for mega SKAT and burden test, respectively.

References

    1. Pinto D, Pagnamenta AT, Klei L, Anney R, Merico D, et al. (2010) Functional impact of global rare copy number variation in autism spectrum disorders. Nature 466: 368–72. - PMC - PubMed
    1. Levy D, Ronemus M, Yamrom B, Lee Y, Leotta A, et al. (2011) Rare de novo and transmitted copy-number variation in autistic spectrum disorders. Neuron 70: 886–897. - PubMed
    1. Sanders S, Hus V, Luo R, Murtha M, Moreno-De-Luca D, et al. (2011) Multiple recurrent de novo cnvs, including duplications of the 7q11. 23 williams syndrome region, are strongly associated with autism. Neuron 70: 863–885. - PMC - PubMed
    1. Sanders S, Murtha M, Gupta A, Murdoch J, Raubeson M, et al. (2012) De novo mutations revealed by whole-exome sequencing are strongly associated with autism. Nature 485: 82–93. - PMC - PubMed
    1. Neale B, Kou Y, Liu L, Ma'ayan A, Samocha K, et al. (2012) Patterns and rates of exonic de novo mutations in autism spectrum disorders. Nature 485: 242–245. - PMC - PubMed

Publication types

LinkOut - more resources