Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2024 Aug 16:2024.08.15.24312078.
doi: 10.1101/2024.08.15.24312078.

Decomposition of phenotypic heterogeneity in autism reveals distinct and coherent genetic programs

Affiliations

Decomposition of phenotypic heterogeneity in autism reveals distinct and coherent genetic programs

Aviya Litman et al. medRxiv. .

Update in

Abstract

Unraveling the phenotypic and genetic complexity of autism is extremely challenging yet critical for understanding the biology, inheritance, trajectory, and clinical manifestations of the many forms of the condition. Here, we leveraged broad phenotypic data from a large cohort with matched genetics to characterize classes of autism and their patterns of core, associated, and co-occurring traits, ultimately demonstrating that phenotypic patterns are associated with distinct genetic and molecular programs. We used a generative mixture modeling approach to identify robust, clinically-relevant classes of autism which we validate and replicate in a large independent cohort. We link the phenotypic findings to distinct patterns of de novo and inherited variation which emerge from the deconvolution of these genetic signals, and demonstrate that class-specific common variant scores strongly align with clinical outcomes. We further provide insights into the distinct biological pathways and processes disrupted by the sets of mutations in each class. Remarkably, we discover class-specific differences in the developmental timing of genes that are dysregulated, and these temporal patterns correspond to clinical milestone and outcome differences between the classes. These analyses embrace the phenotypic complexity of children with autism, unraveling genetic and molecular programs underlying their heterogeneity and suggesting specific biological dysregulation patterns and mechanistic hypotheses.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
a, Study design for parsing the phenotypic heterogeneity of autism and deciphering the genetic factors contributing to individual presentations. A matrix of probands by phenotypes was constructed, with 239 features describing item-level and composite (summary score) phenotypic measure data (developmental milestones, repetitive behavior, social communication, and emotional and behavioral traits) across 5,392 individuals with complete feature data. The 5392 × 239 matrix was used as input to a general finite mixture model to learn the distributions of the latent classes in the input data. We describe four data-driven classes of autism which exhibit distinct phenotypic presentations and trait patterns. We externally validated the classes by showing that their profiles align with clinical data not included in the model training, and performed downstream genetic analyses to associate genetic factors and patterns with each phenotype class. Created with BioRender. b, To demonstrate differences in phenotype patterns across the four classes, we assess the propensity towards each of 7 phenotype domains with clinical significance for autism. Features were assigned to domains based on published factor analyses of the included questionnaires. For each class, we computed the significance of enrichment in both directions (enriched, depleted) for each feature. Values close to 1 indicate that the majority or all phenotypes within the category are significantly and positively enriched for a given phenotype domain (indicating higher difficulties for that class), and values close to −1 indicate significant negative enrichment or depletion for a given phenotype domain (indicating lower difficulties). The resulting classes are: Broadly Impacted (magenta, n=554), Social/Behavioral (green, n=1860), Mixed ASD with DD (blue, n=1002), and Moderate Challenges (orange, n=1976). c, Distributions of two key developmental milestones: age when first walked and age when first used words (both in months) across the four classes, with non-autistic siblings as a control. One-sided independent t-test with multiple hypothesis correction was used to determine significance of enrichment for each class compared to siblings (* indicates FDR < 0.1, ** indicates FDR < 0.05, and *** indicates FDR < 0.01 in all figures). d, Individual total scores from the SCQ by class, with non-autistic siblings as a control. The final SCQ score is quantified on a 0–39 scale, with higher scores indicating greater impairment. One-sided independent t-tests with multiple hypothesis correction were used to determine significance of enrichment. Center lines in all boxplots represent the median, box limits represent the upper and lower quartiles, whiskers extend to show the rest of the distribution, and outliers are shown separately.
Figure 2
Figure 2
a, Clinical validation of classes with external medical diagnoses across three categories: neurodevelopmental, mental health, and co-occurring conditions. We computed the fold enrichment (FE, x-axis) and Benjamini-Hochberg corrected statistical significance (FDR) for a selection of available diagnoses in the Basic Medical Questionnaire for each class. Open circles indicate lack of statistical significance (FDR > 0.05), while closed circles indicate significant enrichments for a diagnosis, and are colored by the corresponding phenotype class. All statistical comparisons were computed with non-autistic siblings as background. The dotted line indicates FE = 1. b, External validation of classes with additional parent-reported data from background history and medical history questionnaires. Four variables of interest are displayed: language level at enrollment (parent report with 4 levels reflecting language abilities: 0 = Nonverbal, 1 = Single words, 2 = Phrases, 3 = Sentences), total number of interventions probands in each class have had (including options like medication, physical therapy, social skills groups, speech therapy, recreational therapy, and counseling, among others), cognitive impairment at enrollment (binary indicator of a diagnosis of intellectual disability or cognitive impairment), and diagnosis age in months. Box plot and raw data were plotted for continuous variables, while bar plots showing mean and standard errors were plotted for the binary and categorical variables. c, Replication of phenotype classes in an independent cohort – the Simons Simplex Collection. An independent model was trained on the SPARK dataset for 108 features which matched across the measures available for the two cohorts, and was then applied to the SSC dataset. Class labels were obtained for all SSC individuals who had complete data across the 108 features (n = 861). Enrichment and depletion of features within each class was computed, and the proportion and direction of enrichment for each of the seven phenotype categories was obtained. The resulting proportions for each category across the 4 classes from SPARK and SSC were correlated (Pearson r, x-axis).
Figure 3
Figure 3
a, Polygenic scores (PGS) for ASD GWAS and related phenotypes and conditions. PGS were normalized by the mean of sibling scores within each condition. b, Count per offspring of high-impact de novo variants (left) and high-impact rare inherited variants (right) across all protein-coding genes. High-impact variants are defined as variants predicted to be either high-confidence loss of function (LoF) or likely pathogenic missense. c, Analysis of evolutionarily constrained genes across autism classes and non-autistic siblings. Using the gene-centric measure of evolutionary constraint, pLI, we assigned genes with pLI > 0.5 to one of two classes: pLI > 0.995 (higher constraint genes), or 0.5 <= pLI < 0.995 (lower constraint genes). Count burdens (de novo LoF) per offspring were then computed for each class. In all figures, circles indicate the mean and lines show the standard error for each class. Statistical significance is computed with a one-sided independent t-test to compare each class to non-autistic siblings. One star (*) indicates FDR < 0.1, two stars (**) indicate FDR < 0.05, and three stars (***) indicate FDR < 0.01.
Figure 4
Figure 4
a, Scatter plot displaying enrichment versus significance of de novo LoF (dnLoF) burden for each class and gene set. We extracted 7 relevant gene sets and computed the aggregated dnLoF burden for each individual across every gene set. Log-transformed fold change (x-axis) and q-values (y-axis) were computed relatively to non-autistic siblings using a one-sided independent t-test. Gene sets in classes with FDR > 0.05 are shown below the dotted line (FDR = 0.05) and are indicated by open shapes. b, Odds ratios (y-axis) across classes for ASD risk genes (left) and FMRP target genes (right). We show similar trends in other autism-specific gene sets (Supplementary Fig. 5). Odds ratios for de novo synonymous (dnSyn) variation are also displayed for ASD risk genes. c, Top cell biological processes and molecular functions (GO) enriched for dnLoF in each autism class. Gene sets for GO enrichment analyses include all protein-coding genes impacted by high-confidence de novo loss of function or pathogenic missense variants present in individuals from each class. The plots display fold enrichment values (x-axis) and log-transformed FDR values (bubble size). Terms were selected by FDR and sorted by fold enrichment. For the Moderate Challenges, Social/Behavioral, and Mixed ASD with DD classes, an FDR cutoff of 0.05 was used, while a cutoff of 0.1 was used for the Broadly Impacted class due to its smaller sample size. Shaded boxes represent GO biological processes, and unshaded boxes represent GO molecular functions.
Figure 5
Figure 5
a, Trends from Herring et al. representing the gene expression trajectories of brain development genes differentially expressed across developmental stages. Gene expression trajectories follow one of four general patterns: “Up” (first), “Trans Up” (second), “Trans Down” (third), “Down” (fourth). Trends are measured across the 6 stages of development (x-axis): fetal, neonatal, infancy, childhood, adolescence, and adulthood. b, Patterns of de novo LoF variant enrichment across classes (x-axis), major cell types of the prefrontal cortex (y-axis), and gene expression trends (y-axis). For each class, we computed the fold enrichment (bubble size) and corrected p-values (FDR) of variant burden compared to non-autistic siblings. Open circles indicate FDR > 0.05 (not significant), and closed circles indicate significant enrichment (FDR ≤ 0.05). Each column is colored by the corresponding phenotypic class color, with purple representing the combined pool of all probands. Cell type and trend combinations with no significant enrichment in any class are not shown.

References

    1. American Psychiatric Association. 5, 5 (2013).
    1. Lord C. et al. Autism spectrum disorder. Nat Rev Dis Primers 6, 5 (2020). - PMC - PubMed
    1. Chiarotti F. & Venerosi A. Epidemiology of Autism Spectrum Disorders: A Review of Worldwide Prevalence Estimates Since 2014. Brain Sci 10, (2020). - PMC - PubMed
    1. Simonoff E. et al. Psychiatric disorders in children with autism spectrum disorders: prevalence, comorbidity, and associated factors in a population-derived sample. J. Am. Acad. Child Adolesc. Psychiatry 47, 921–929 (2008). - PubMed
    1. Sanders S. J. et al. De novo mutations revealed by whole-exome sequencing are strongly associated with autism. Nature 485, 237–241 (2012). - PMC - PubMed

Publication types

LinkOut - more resources