Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Aug;53(8):1125-1134.
doi: 10.1038/s41588-021-00899-8. Epub 2021 Jul 26.

Recent ultra-rare inherited variants implicate new autism candidate risk genes

Collaborators, Affiliations

Recent ultra-rare inherited variants implicate new autism candidate risk genes

Amy B Wilfert et al. Nat Genet. 2021 Aug.

Abstract

Autism is a highly heritable complex disorder in which de novo mutation (DNM) variation contributes significantly to risk. Using whole-genome sequencing data from 3,474 families, we investigate another source of large-effect risk variation, ultra-rare variants. We report and replicate a transmission disequilibrium of private, likely gene-disruptive (LGD) variants in probands but find that 95% of this burden resides outside of known DNM-enriched genes. This variant class more strongly affects multiplex family probands and supports a multi-hit model for autism. Candidate genes with private LGD variants preferentially transmitted to probands converge on the E3 ubiquitin-protein ligase complex, intracellular transport and Erb signaling protein networks. We estimate that these variants are approximately 2.5 generations old and significantly younger than other variants of similar type and frequency in siblings. Overall, private LGD variants are under strong purifying selection and appear to act on a distinct set of genes not yet associated with autism.

PubMed Disclaimer

Figures

Figure 1 ∣
Figure 1 ∣. Overview of private variants in discovery cohort.
Private variants are defined as variants observed in one and only one parent in the cohort. a, Distribution of likely gene-disruptive (LGD), missense (MIS), and synonymous (SYN) private variants per child (probands and unaffected siblings). b, The cumulative number of each variant class by assigned population group (EUR, European (n = 5,685); AFR, African (n = 290); EAS, East Asian (n = 252); AMR, Amerindian (n = 193); SAS, South Asian (n = 103)), excluding SAS. c, Private, transmitted variant counts per child grouped by ancestry before (All) and after (No dbSNP) filtering with dbSNPv150. Excess of private variants is partially but not fully resolved after excluding sites observed in dbSNP. We were unable to assign ancestry to one of these five population groups for 74 of the children in this study. The y-axis was truncated at 20,000 variants per child; however, both the AFR and EUR populations had a small number of children with variant counts above this threshold (see Supplementary Tables 6 and 7 for details). Black lines indicate the average variant count per population in b and c.
Figure 2 ∣
Figure 2 ∣. Burden of private LGD variants in affected children.
a, Burden of private LGD variants in probands as compared to siblings was quantified (odds ratio (OR)) at increasing thresholds of gene constraint (pLI) in our discovery (n = 4,201 affected and 2,191 unaffected children), replication (n = 6,453 affected and 3,007 unaffected children), and combined discovery and replication (n =10,657 affected and 5,199 unaffected children) cohorts. Filled circles indicate Bonferroni-corrected P < 0.05 (42 tests per cohort), unfilled circles indicate nominal P < 0.05, and shaded areas indicate 95% confidence intervals around the OR estimate. OR and confidence intervals were calculated using logistic regression (see Supplementary Table 11 for details). b, Enrichment of private, LGD variant transmission to probands for five autism risk gene sets (FWER, COE, ASC, SANDERS, SFARI). With the exception of SFARI, most gene sets were identified based on an excess of de novo mutations (DNMs) in parent–child trios (see Online Methods). OR was based on a comparison of the proportion of carriers between probands and siblings in our discovery (n = 4,201 affected and 2,191 unaffected children), replication (n = 6,453 affected and 3,007 unaffected children), and combined (n = 10,657 affected and 5,199 unaffected children) cohorts using a two-sided Fisher’s exact test (see Supplementary Table 5 for details). Dashed black line indicates OR = 1, which represents no difference between probands and siblings. Families with monozygotic twins (n = 75 in discovery, n = 63 in replication, and n = 138 in combined) were removed from analysis. For the combined set, variants were restricted to regions with at least 20x average coverage in the exomes. Reported P-values are nominal, points indicate the OR estimate, and error bars indicate 95% confidence intervals around the OR estimate.
Figure 3 ∣
Figure 3 ∣. Genetic properties of inherited LGD variant burden.
a, At least 95.4% of private, transmitted LGD variant burden resides outside of genes identified with an excess of DNMs in ASD/NDD cases (321 genes considered and 154 genes with transmissions) based on analysis of CCDG autism genomes (n = 4,201 affected and 2,191 unaffected children). We observe 141 DNM-enriched genes with transmissions to probands and 85 genes with transmissions to siblings (Supplementary Table 12). OR for five cumulative pLI bins were compared before and after excluding DNM-enriched genes in ASD/NDD cases. The percentage of remaining burden is calculated as quotient of the OR for the pLI bin after removing genes enriched for DNMs in ASD/NDD cases and the OR for all genes in that pLI bin. Families with monozygotic twins (n = 75) were excluded from this analysis. OR and associated P-values were calculated using a two-sided Fisher’s exact test. Points indicate the OR estimate, and error bars indicate the 95% confidence interval around the OR estimate. b, Multiplex families (n = 1,268 families, 2,691 probands, 533 siblings) show a higher burden of private, transmitted LGD variants in probands as compared to siblings across three pLI thresholds compared to simplex families (n = 7,962 families, 7,962 probands, 4,666 siblings). c, We observe a significant enrichment of probands carrying two private, transmitted LGD variants (2 LGD) when compared to unaffected siblings at various levels of gene constraint (3 cumulative pLI bins considered) based on CCDG genomes sequenced from autism families (n = 4,201 probands, 2,191 siblings). Families with monozygotic twins (n = 75) were excluded from this analysis. OR was calculated using a two-sided Fisher’s exact test, and reported P-values are Bonferroni corrected for nine (b) and three (c) tests (see Supplementary Tables 7 and 8 for details).
Figure 4 ∣
Figure 4 ∣. PPI network for autism candidate genes.
We identified 163 constrained genes (pLI ≥ 0.99) carrying private LGD variants transmitted only to autism probands based on combined dataset and not previously identified as a DNM-enriched ASD gene (Supplementary Table 9). STRING network shows a significant excess of PPI (P = 0.00164). Gene names are colored if observed in two (blue) or three or more (red) probands and labeled if observed in two independent families (*) or more (**). Families with monozygotic twins (n = 138) were removed from analysis. Analyses were restricted to regions with at least 20x average coverage in the exomes.
Figure 5 ∣
Figure 5 ∣. Estimate of allele age.
The software Relate was used to estimate the coalescent age (in generations) for private LGD (red) and SYN (blue) variants in 163 candidate genes, private LGD varianrs (green) in 83 sibling-only genes, and ~500 sites from all remaining genes for European probands (yellow, n = 3,776) and siblings (pink, n = 1,909). P-values were calculated using a two-sided t-test and Bonferroni corrected for six tests. Plot was truncated at 20 generations. Data points older than this are included in calculating represented statistics (e.g., boxplots, medians, P-values) but are not visualized. To view all data points, see Supplementary Figure 14. Boxplot whiskers represent 1.5 times the upper and lower interquartile ranges. Upper and lower hinges correspond to the 25th and 75th percentiles, and the middle line represents the median. Mean values are noted on the plot.

References

    1. Baio J et al.Prevalence of autism spectrum disorder among children aged 8 years – Autism and Developmental Disabilities Monitoring Network, 11 Sites, United States, 2014. MMWR Surveill. Summ 67, 1–23 (2018). - PMC - PubMed
    1. Iossifov I et al.The contribution of de novo coding mutations to autism spectrum disorder. Nature 515, 216–221 (2014). - PMC - PubMed
    1. Krumm N et al.Excess of rare, inherited truncating mutations in autism. Nat. Genet 47, 582–588 (2015). - PMC - PubMed
    1. Turner TN et al.Genomic patterns of de novo mutation in simplex autism. Cell 171, 710–722 e712 (2017). - PMC - PubMed
    1. Gaugler T et al.Most genetic risk for autism resides with common variation. Nat. Genet 46, 881–885 (2014). - PMC - PubMed

METHODS-ONLY REFERENCES

    1. Regier AA et al.Functional equivalence of genome sequencing analysis pipelines enables harmonized variant calling across human genetics projects. Nat. Commun 9, 4038 (2018). - PMC - PubMed
    1. Li H & Durbin R Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760 (2009). - PMC - PubMed
    1. McKenna A et al.The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010). - PMC - PubMed
    1. Li H et al.The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009). - PMC - PubMed
    1. Poplin R et al.Scaling accurate genetic variant discovery to tens of thousands of samples. bioRxiv, 10.1101/201178 (2018). - DOI

Publication types

LinkOut - more resources