. 2018 Apr 26;50(5):727-736.

doi: 10.1038/s41588-018-0107-y.

An analytical framework for whole-genome sequence association studies and its implications for autism spectrum disorder

Donna M Werling^#¹, Harrison Brand^#^{2

3

4}, Joon-Yong An^#¹, Matthew R Stone^#², Lingxue Zhu^#⁵, Joseph T Glessner^{2

3

4}, Ryan L Collins^{2

3

6}, Shan Dong¹, Ryan M Layer^{7

8}, Eirene Markenscoff-Papadimitriou¹, Andrew Farrell^{7

8}, Grace B Schwartz¹, Harold Z Wang², Benjamin B Currall^{2

3

4}, Xuefang Zhao^{2

3

4}, Jeanselle Dea¹, Clif Duhn¹, Carolyn A Erdman¹, Michael C Gilson¹, Rachita Yadav^{2

3

4}, Robert E Handsaker^{4

9}, Seva Kashin^{4

9}, Lambertus Klei¹⁰, Jeffrey D Mandell¹, Tomasz J Nowakowski^{1

11

12}, Yuwen Liu¹³, Sirisha Pochareddy¹⁴, Louw Smith¹, Michael F Walker¹, Matthew J Waterman¹⁵, Xin He¹³, Arnold R Kriegstein¹⁶, John L Rubenstein¹, Nenad Sestan¹⁴, Steven A McCarroll^{4

9}, Benjamin M Neale^{4

17

18}, Hilary Coon^{19

20}, A Jeremy Willsey^{1

21}, Joseph D Buxbaum^{22

23

24

25}, Mark J Daly^{4

17

18}, Matthew W State¹, Aaron R Quinlan^{7

8

20}, Gabor T Marth^{7

8}, Kathryn Roeder^{5

26}, Bernie Devlin²⁷, Michael E Talkowski^{28

29

30

31}, Stephan J Sanders³²

Affiliations

¹ Department of Psychiatry, UCSF Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA, USA.
² Center for Genomic Medicine and Department of Neurology, Massachusetts General Hospital, Boston, MA, USA.
³ Department of Neurology, Harvard Medical School, Boston, MA, USA.
⁴ Program in Medical and Population Genetics and Stanley Center for Psychiatric Research, Broad Institute, Cambridge, MA, USA.
⁵ Department of Statistics, Carnegie Mellon University, Pittsburgh, PA, USA.
⁶ Program in Bioinformatics and Integrative Genomics, Division of Medical Sciences, Harvard Medical School, Boston, MA, USA.
⁷ Department of Human Genetics, University of Utah School of Medicine, Salt Lake City, UT, USA.
⁸ USTAR Center for Genetic Discovery, University of Utah School of Medicine, Salt Lake City, UT, USA.
⁹ Department of Genetics, Harvard Medical School, Boston, MA, USA.
¹⁰ Department of Psychiatry, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA.
¹¹ Department of Anatomy, University of California, San Francisco, San Francisco, CA, USA.
¹² Eli and Edythe Broad Center of Regeneration Medicine and Stem Cell Research, University of California, San Francisco, San Francisco, CA, USA.
¹³ Department of Human Genetics, University of Chicago, Chicago, IL, USA.
¹⁴ Department of Neuroscience and Kavli Institute for Neuroscience, Yale School of Medicine, New Haven, CT, USA.
¹⁵ Department of Biology, Eastern Nazarene College, Quincy, MA, USA.
¹⁶ Department of Neurology, University of California, San Francisco, San Francisco, CA, USA.
¹⁷ Analytical and Translational Genetics Unit and Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA.
¹⁸ Department of Medicine, Harvard Medical School, Boston, MA, USA.
¹⁹ Department of Psychiatry, University of Utah School of Medicine, Salt Lake City, UT, USA.
²⁰ Department of Biomedical Informatics, University of Utah School of Medicine, Salt Lake City, UT, USA.
²¹ Institute for Neurodegenerative Diseases, UCSF Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA, USA.
²² Seaver Autism Center for Research and Treatment, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
²³ Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
²⁴ Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
²⁵ Mindich Child Health and Development Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
²⁶ Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA, USA.
²⁷ Department of Psychiatry, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA. devlinbj@upmc.edu.
²⁸ Center for Genomic Medicine and Department of Neurology, Massachusetts General Hospital, Boston, MA, USA. talkowski@chgr.mgh.harvard.edu.
²⁹ Department of Neurology, Harvard Medical School, Boston, MA, USA. talkowski@chgr.mgh.harvard.edu.
³⁰ Program in Medical and Population Genetics and Stanley Center for Psychiatric Research, Broad Institute, Cambridge, MA, USA. talkowski@chgr.mgh.harvard.edu.
³¹ Departments of Pathology and Psychiatry, Massachusetts General Hospital, Boston, MA, USA. talkowski@chgr.mgh.harvard.edu.
³² Department of Psychiatry, UCSF Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA, USA. stephan.sanders@ucsf.edu.

^# Contributed equally.

PMID: 29700473
PMCID: PMC5961723
DOI: 10.1038/s41588-018-0107-y

An analytical framework for whole-genome sequence association studies and its implications for autism spectrum disorder

Donna M Werling et al. Nat Genet. 2018.

. 2018 Apr 26;50(5):727-736.

doi: 10.1038/s41588-018-0107-y.

Authors

Affiliations

¹ Department of Psychiatry, UCSF Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA, USA.
² Center for Genomic Medicine and Department of Neurology, Massachusetts General Hospital, Boston, MA, USA.
³ Department of Neurology, Harvard Medical School, Boston, MA, USA.
⁴ Program in Medical and Population Genetics and Stanley Center for Psychiatric Research, Broad Institute, Cambridge, MA, USA.
⁵ Department of Statistics, Carnegie Mellon University, Pittsburgh, PA, USA.
⁶ Program in Bioinformatics and Integrative Genomics, Division of Medical Sciences, Harvard Medical School, Boston, MA, USA.
⁷ Department of Human Genetics, University of Utah School of Medicine, Salt Lake City, UT, USA.
⁸ USTAR Center for Genetic Discovery, University of Utah School of Medicine, Salt Lake City, UT, USA.
⁹ Department of Genetics, Harvard Medical School, Boston, MA, USA.
¹⁰ Department of Psychiatry, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA.
¹¹ Department of Anatomy, University of California, San Francisco, San Francisco, CA, USA.
¹² Eli and Edythe Broad Center of Regeneration Medicine and Stem Cell Research, University of California, San Francisco, San Francisco, CA, USA.
¹³ Department of Human Genetics, University of Chicago, Chicago, IL, USA.
¹⁴ Department of Neuroscience and Kavli Institute for Neuroscience, Yale School of Medicine, New Haven, CT, USA.
¹⁵ Department of Biology, Eastern Nazarene College, Quincy, MA, USA.
¹⁶ Department of Neurology, University of California, San Francisco, San Francisco, CA, USA.
¹⁷ Analytical and Translational Genetics Unit and Center for Genomic Medicine, Massachusetts General Hospital, Boston, MA, USA.
¹⁸ Department of Medicine, Harvard Medical School, Boston, MA, USA.
¹⁹ Department of Psychiatry, University of Utah School of Medicine, Salt Lake City, UT, USA.
²⁰ Department of Biomedical Informatics, University of Utah School of Medicine, Salt Lake City, UT, USA.
²¹ Institute for Neurodegenerative Diseases, UCSF Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA, USA.
²² Seaver Autism Center for Research and Treatment, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
²³ Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
²⁴ Friedman Brain Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
²⁵ Mindich Child Health and Development Institute, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
²⁶ Department of Computational Biology, Carnegie Mellon University, Pittsburgh, PA, USA.
²⁷ Department of Psychiatry, University of Pittsburgh School of Medicine, Pittsburgh, PA, USA. devlinbj@upmc.edu.
²⁸ Center for Genomic Medicine and Department of Neurology, Massachusetts General Hospital, Boston, MA, USA. talkowski@chgr.mgh.harvard.edu.
²⁹ Department of Neurology, Harvard Medical School, Boston, MA, USA. talkowski@chgr.mgh.harvard.edu.
³⁰ Program in Medical and Population Genetics and Stanley Center for Psychiatric Research, Broad Institute, Cambridge, MA, USA. talkowski@chgr.mgh.harvard.edu.
³¹ Departments of Pathology and Psychiatry, Massachusetts General Hospital, Boston, MA, USA. talkowski@chgr.mgh.harvard.edu.
³² Department of Psychiatry, UCSF Weill Institute for Neurosciences, University of California, San Francisco, San Francisco, CA, USA. stephan.sanders@ucsf.edu.

^# Contributed equally.

PMID: 29700473
PMCID: PMC5961723
DOI: 10.1038/s41588-018-0107-y

Abstract

Genomic association studies of common or rare protein-coding variation have established robust statistical approaches to account for multiple testing. Here we present a comparable framework to evaluate rare and de novo noncoding single-nucleotide variants, insertion/deletions, and all classes of structural variation from whole-genome sequencing (WGS). Integrating genomic annotations at the level of nucleotides, genes, and regulatory regions, we define 51,801 annotation categories. Analyses of 519 autism spectrum disorder families did not identify association with any categories after correction for 4,123 effective tests. Without appropriate correction, biologically plausible associations are observed in both cases and controls. Despite excluding previously identified gene-disrupting mutations, coding regions still exhibited the strongest associations. Thus, in autism, the contribution of de novo noncoding variation is probably modest in comparison to that of de novo coding variants. Robust results from future WGS studies will require large cohorts and comprehensive analytical strategies that consider the substantial multiple-testing burden.

PubMed Disclaimer

Figures

**Figure 1. Burden analyses for gene-defined annotation categories**
a) The observed relative risk of *de novo* mutations in cases vs. controls is shown by the red line against grey violin plots representing the kernel density estimation of relative risk from 10,000 label-swapping permutations of case-control status for 11 gene-defined annotation categories. Box plots further illustrate the relative risk from permutations, including the median (center line), first and third quartiles (box), 1.5x interquartile range or the most extreme value (whiskers), and permuted relative risk observations beyond 1.5x interquartile range (outlier points). P-values from a case-control label-swapping permutation analysis and Bonferroni-corrected p-values (10 tests) ≤0.05 are shown. Loss-of-function variants were not analyzed as cases with such mutations were excluded from the cohort. b) The analysis in ‘a’ is repeated considering only *de novo* mutations in or near 179 ASD genes. Permutation p-values are Bonferroni-corrected for 7 tests. Considering SNVs and indels separately does not alter these findings (Supplementary Fig. 5).

**Figure 2. Defining annotation categories**
Five groups of annotations were defined: 1) Conservation across species; 2) Variant type; 3) GENCODE gene definitions; 4) Gene lists; and 5) Functional annotations. Picking one annotation from each group resulted in 66,402 possible combinations of which 51,801 were non-redundant (Supplementary Table 7). The 13,704 annotations categories that included at least seven observed mutations were considered in the category-wide association test.

**Figure 3. Category-wide association study**
a) The burden of *de novo* SNVs and indels in n=519 cases vs. n=519 controls for 13,704 annotation categories with ≥7 observed variants are shown as points in the volcano plot (Supplementary Table 7). Permutation p-values were calculated by 10,000 label-swapping permutations of case-control status in each annotation category. No test survives Bonferroni correction for 4,123 effective tests (top horizontal red line). b) Correlations of p-values between annotation categories (small dots) in simulated data are shown by proximity in the first two t-SNE dimensions. The large circles show 200 independent clusters of annotation categories defined by k-means clustering. The circle size represents the degrees of freedom accounted for by the cluster using Eigenvalue decomposition. In total, 4,123 effective tests explain 99% of the variability in p-values (Supplementary Fig. 6). **c–h)** Six clusters from (b) are shown in greater detail, with cluster number in bold. The edges represent p-value correlation ≥0.4. **i–k)** The number of nominally significant annotation categories (p≤0.05 from two-sided binomial test) was calculated for cases, controls, and 10,000 permutations to assess whether more annotation categories are enriched for *de novo* variants in cases than expected in (a). Cases have a greater than expected number of nominally significant categories relating to coding mutations and noncoding indels, but not for all noncoding mutations, nor for noncoding mutations nearest to ASD genes. P-values were calculated as the proportion of permutations in which the same or a greater number of categories had a two-sided binomial test p-value ≤0.05 as in the observed data.

**Figure 4. Structural variation in 519 ASD families**
Structural variation (SV) analyses identified an average of 5,863 SVs per genome 171 *de novo* SVs. a) We observed no difference in distribution of SV sizes between cases (n=519) and sibling controls (n=519) for any class of SV (cxSV = complex SV) at an unadjusted nominal significance threshold (two-tailed Wilcoxon rank-sum test; alpha = 0.05). b) We observed no differences in maternal/paternal transmission rates between cases and sibling controls for any class of SV or any range of variant frequencies (VF) (two-tailed binomial test). Mean paternal transmission rate (dot) and 95% binomial confidence intervals are shown in plot (error bars). c) We observed no significant enrichments for either *de novo* or rare inherited SV (VF < 0.1%) in genic or noncoding annotations after correcting for multiple comparisons in a two-sided sign test between case and control counts. Error bars represent the 95% confidence intervals. d) Analysis of balanced SV discovered a *de novo* reciprocal translocation in a case predicted to disrupt *GRIN2B* (t(12q21.2;13p11.2)), a constrained gene previously implicated in ASD^,. e) WGS revealed small CNVs undetected by previous analyses, including a 4,391bp *de novo* deletion of exons 8–10 of *CHD2* (GRCh37.63:chr15:g.93484245_93488636del), a gene previously implicated in ASD from *de novo* coding mutations. f) Analysis of breakpoint sequences also classified 23 *de novo* SVs that were predicted to be germline mosaic in the parents, including this 242.8kb paternally transmitted mosaic duplication at 8q24.23 that was previously characterized as *de novo* in the child (GRCh37.63:chr8:g.136681615_136924426dup). Bar plots represent the means and 95% confidence intervals of estimated copy number in the duplicated locus. All p-values were calculated with a two-tailed t-test of estimated copy numbers in sequential 36.4kb bins.

**Figure 5. Effective number of tests in CWAS and power calculation**
a) The green line shows the threshold to achieve 80% power at nominal significance across the range of relative risks of a category (log₁₀ scaled x-axis) and number of de novo mutations per individual within the category (log₁₀ scaled y-axis). The purple line shows the 80% power corrected for 4,123 effective tests. The grey dots represent the observed results for *de novo* mutation burden in 519 families for the 13,704 annotation categories with ≥7 mutations. b) The lines show the threshold of 80% power across the range of relative risks and category sizes as sample size increases (correcting for correspondingly more effective tests, see Supplementary Information). For reference, the relative location for six classes of variation are shown.

See this image and copyright information in PMC

Comment in

Straws in a haystack.
[No authors listed] [No authors listed] Nat Genet. 2018 May;50(5):631. doi: 10.1038/s41588-018-0125-9. Nat Genet. 2018. PMID: 29700466 No abstract available.
Sizing up whole-genome sequencing studies of common diseases.
Wray NR, Gratten J. Wray NR, et al. Nat Genet. 2018 May;50(5):635-637. doi: 10.1038/s41588-018-0113-0. Nat Genet. 2018. PMID: 29700468 No abstract available.

References

1. Schizophrenia Working Group of the Psychiatric Genomics, C. Biological insights from 108 schizophrenia-associated genetic loci. Nature. 2014;511:421–7. - PMC - PubMed
1. Astle WJ, et al. The Allelic Landscape of Human Blood Cell Trait Variation and Links to Common Complex Disease. Cell. 2016;167:1415–1429. e19. - PMC - PubMed
1. de Lange KM, et al. Genome-wide association study implicates immune activation of multiple integrin genes in inflammatory bowel disease. Nat Genet. 2017;49:256–261. - PMC - PubMed
1. Sanders SJ, et al. Insights into Autism Spectrum Disorder Genomic Architecture and Biology from 71 Risk Loci. Neuron. 2015;87:1215–33. - PMC - PubMed
1. Deciphering Developmental Disorders, S. Prevalence and architecture of de novo mutations in developmental disorders. Nature. 2017;542:433–438. - PMC - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
- scite Smart Citations
Medical
- MedlinePlus Consumer Health Information
- MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

An analytical framework for whole-genome sequence association studies and its implications for autism spectrum disorder

Affiliations

An analytical framework for whole-genome sequence association studies and its implications for autism spectrum disorder

Authors

Affiliations

Abstract

Figures

Comment in

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Medical