Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2021 Jan;46(1):55-69.
doi: 10.1038/s41386-020-0768-y. Epub 2020 Jul 15.

Leveraging large genomic datasets to illuminate the pathobiology of autism spectrum disorders

Affiliations
Review

Leveraging large genomic datasets to illuminate the pathobiology of autism spectrum disorders

Veronica B Searles Quick et al. Neuropsychopharmacology. 2021 Jan.

Abstract

"Big data" approaches in the form of large-scale human genomic studies have led to striking advances in autism spectrum disorder (ASD) genetics. Similar to many other psychiatric syndromes, advances in genotyping technology, allowing for inexpensive genome-wide assays, has confirmed the contribution of polygenic inheritance involving common alleles of small effect, a handful of which have now been definitively identified. However, the past decade of gene discovery in ASD has been most notable for the application, in large family-based cohorts, of high-density microarray studies of submicroscopic chromosomal structure as well as high-throughput DNA sequencing-leading to the identification of an increasingly long list of risk regions and genes disrupted by rare, de novo germline mutations of large effect. This genomic architecture offers particular advantages for the illumination of biological mechanisms but also presents distinctive challenges. While the tremendous locus heterogeneity and functional pleiotropy associated with the more than 100 identified ASD-risk genes and regions is daunting, a growing armamentarium of comprehensive, large, foundational -omics databases, across species and capturing developmental trajectories, are increasingly contributing to a deeper understanding of ASD pathology.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1. Types of genetic variants.
a The majority of genetic variation in the human genome is common (population frequency ≥ 1%, blue). These variants are transmitted from parents to offspring via Mendelian inheritance patterns. A smaller proportion is rare (≤1%, purple) and also transmitted from parents. ∼70 variants are de novo (red), observed only in the child, but not in either parent. b The impact of single-nucleotide variants (SNVs) and small (≤50 bp) insertion/deletions (indels) depends on their location in the genome. In the 1.5% of the genome that encodes proteins (the exome), these variants can either be synonymous (no change to the resulting protein), missense (a single amino acid is changed in the protein with variable functional impact), or protein-truncating (leads to nonsense-mediated decay and no protein). Variants and their consequences (red stars) are shown on the father’s allele, but can also arise on the maternal allele. c Copy number variants (CNVs) are large (≥50 bp to millions of nucleotides) deletions (resulting in no protein), or duplications (potentially resulting in excess protein). Figure adapted from Sanders [81] with author permission.
Fig. 2
Fig. 2. A model of rare large-effect de novo mutations acting in combination with common risk alleles.
a An idealized distribution of common polygenic risks that are normally distributed in the general population. The red vertical dotted line represents an arbitrary cutoff for the diagnosis of ASD. For a highly heritable disorder such as ASD, those at the low end of the distribution of risk (left) will be less likely to meet diagnostic criteria than those on the far right end of the distribution. The superimposition of the upper panel and the lower panel (b), representing the distribution of ASD symptoms in the population, models the observation that the vast majority of common allele population risk is present in individuals without a clinical diagnosis. The lower panel (b) shows the same red dotted vertical line reflecting an arbitrary cutoff for the categorical diagnosis of ASD. The abbreviations in parenthesis (epi epilepsy, ADHD attention deficit hyperactivity disorder, SCZ schizophrenia, SLI specific language impairment) reflects the observation that highly penetrant ASD risks may also carry risks for diagnoses apart from ASD. The arrows on the bottom of the diagram represent large-effect rare de novo mutations. The purple arrow is showing how a large risk de novo mutation can move an individual with intermediate risk and the likelihood of no symptoms across the diagnostic threshold. The gray arrow reflects the observation that these risks while large are not Mendelian and that sometimes rare large-effect mutations do not show a phenotype at all, which may reflect that they are acting in the context of very low polygenic risk. The purple box on the right side of (b), reflects the finding that while de novo mutations carry a very small proportion of population risk, they represent a substantial fraction of individuals who exceed clinical thresholds.
Fig. 3
Fig. 3. Levels of pathogenesis and convergent analysis.
a ASD can manifest or be investigated at multiple different levels, starting from a genetic variant (marked by red star) all the way to behavioral phenotypes. b A conceptual illustration of convergent analysis from risk genes to behavior in ASD, in which multiple independent risk genes are studied in parallel to triangulate on specific protein complexes, functional networks, cell types, and or/circuits that show overlap among functionally diverse risk genes. Figures adapted from Willsey et al. [13] and Sestan and State 2018 [156] with author permission.
Fig. 4
Fig. 4. A strategy for combining human brain expression data and high-confidence risk genes to identify spatiotemporal convergence.
Willsey et al. [112] established co-expression networks for the nine highest confidence ASD-risk genes at the time of publication. There networks were established by setting a high threshold for gene expression correlation irrespective of sign—based on the hypothesis that coordinated gene activity, whether in the same or opposite directions, is a useful proxy for shared biological function. Networks were created for spatiotemporal periods defined in the Brainspan database [113], using their time windows. Co-expression networks based on the highest confidence genes were then examined for enrichment of an independent list of probable ASD-risk genes and compared to the null expectation, looking for enrichment of genes that have evidence for ASD risk within any of the predefined networks. In this case, statistically significant evidence was found for enrichment of PFC in mid-fetal development at approximately 18–24 weeks, and additional signal was identified in medial dorsal thalamus and cerebellum later in development (in early infancy).

References

    1. American Psychiatric Association. Diagnostic and statistical manual of mental disorders. 5th edn. Arlington, VA: American Psychiatric Association; 2013.
    1. Baio J, Wiggins L, Christensen DL, Maenner M, Daniels J, Warren Z, et al. Prevalence of autism spectrum disorder among children aged 8 years—autism and developmental disabilities monitoring network, 11 sites, United States, 2014. MMWR Surveill Summ. 2018;67(SS-6):1–23. - PMC - PubMed
    1. Folstein S, Rutter M. Infantile autism: a genetic study of twin pairs. Vol 18. Pergamon Press; 1977. - PubMed
    1. Satterstrom FK, Kosmicki JA, Wang J, Breen M, De Rubeis S, An J, et al. Large-scale exome sequencing study implicates both developmental and functional changes in the neurobiology of autism. Cell. 2020;180:568–584.e23. - PMC - PubMed
    1. Sanders SJ, He X, Willsey AJ, Ercan-Sencicek A, Samocha K, Cicek A, et al. Insights into autism spectrum disorder genomic architecture and biology from 71 risk loci. Neuron. 2015;87:1215–33. - PMC - PubMed

Publication types