Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Nov 8;20(11):e1012468.
doi: 10.1371/journal.pcbi.1012468. eCollection 2024 Nov.

A deep learning model for prediction of autism status using whole-exome sequencing data

Affiliations

A deep learning model for prediction of autism status using whole-exome sequencing data

Qing Wu et al. PLoS Comput Biol. .

Abstract

Autism is a developmental disability. Research demonstrated that children with autism benefit from early diagnosis and early intervention. Genetic factors are considered major contributors to the development of autism. Machine learning (ML), including deep learning (DL), has been evaluated in phenotype prediction, but this method has been limited in its application to autism. We developed a DL model, the Separate Translated Autism Research Neural Network (STAR-NN) model to predict autism status. The model was trained and tested using whole exome sequencing data from 43,203 individuals (16,809 individuals with autism and 26,394 non-autistic controls). Polygenic scores from common variants and the aggregated count of rare variants on genes were used as input. In STAR-NN, protein truncating variants, possibly damaging missense variants and mild effect missense variants on the same gene were separated at the input level and merged to one gene node. In this way, rare variants with different level of pathogenic effects were treated separately. We further validated the performance of STAR-NN using an independent dataset, including 13,827 individuals with autism and 14,052 non-autistic controls. STAR-NN achieved a modest ROC-AUC of 0.7319 on the testing dataset and 0.7302 on the independent dataset. STAR-NN outperformed other traditional ML models. Gene Ontology analysis on the selected gene features showed an enrichment for potentially informative pathways including calcium ion transport.

PubMed Disclaimer

Conflict of interest statement

No competing interests.

Figures

Fig 1
Fig 1. The workflow and framework of STAR-NN.
After quality control, rare variants (minor allele frequency, MAF < 1%) identified from whole exome sequencing data were separated into four categories based on their function effect: protein truncating variants (PTVs), MisA (Missense variants with MPC > 2), MisB (Missense variants with 1 < MPC < 2) and MisC (Missense variants with 0 < MPC < 1). MisA and MisB were then combined as MisAB. Three types of rare exonic variants were used as input for STAR-NN model. In addition, polygenic score (PGS) generated from common variants (MAF > 1%) from microarray data were also used as input for STAR-NN. STAR-NN uses a three-to-one mapping strategy to learn different types of variants on the same gene separately. G represents gene node, S with grey color represents the option to add gene sets node before final output (shaded circle). *, Quality control on WES1 and WES2 used the same standards, further details provided in Materials and Methods. #, numbers in brackets showing (the count of variants, in the count of individuals) in the dataset.
Fig 2
Fig 2. Performance of STAR-NN.
A. ROC-AUC plot, showing STAR-NN outperformed six traditional machine learning model and a basic deep neural network (DNN) model. Variants of different type was not separated in traditional machine learning model and the basic DNN. B. ROC-AUC plot, showing STAR-NN with selected gene features outperformed the model using other gene sets as input. C. The density plot of PGS for individuals with autism and non-autistic controls. D. The distribution plot of score generated from STAR-NN for individuals with autism and non-autistic controls.
Fig 3
Fig 3. Score from STAR-NN in male and female population.
The density plot of PGS for individuals with autism and non-autistic controls in females (A) and males(B). The density plot of autism score generated from STAR-NN in females (C) and males (D). The dashed line shows the mean value for each distribution.

References

    1. Chiarotti F, Venerosi A. Epidemiology of Autism Spectrum Disorders: A Review of Worldwide Prevalence Estimates Since 2014. Brain Sci. 2020;10(5). Epub 2020/05/07. doi: 10.3390/brainsci10050274 ; PubMed Central PMCID: PMC7288022. - DOI - PMC - PubMed
    1. Gabbay-Dizdar N, Ilan M, Meiri G, Faroy M, Michaelovski A, Flusser H, et al.. Early diagnosis of autism in the community is associated with marked improvement in social symptoms within 1–2 years. Autism. 2021:13623613211049011. Epub 2021/10/09. doi: 10.1177/13623613211049011 . - DOI - PMC - PubMed
    1. Fuller EA, Kaiser AP. The Effects of Early Intervention on Social Communication Outcomes for Children with Autism Spectrum Disorder: A Meta-analysis. J Autism Dev Disord. 2020;50(5):1683–700. Epub 2019/02/26. doi: 10.1007/s10803-019-03927-z ; PubMed Central PMCID: PMC7350882. - DOI - PMC - PubMed
    1. Hyman SL, Levy SE, Myers SM, Council On Children With Disabilities SOD, Behavioral P. Identification, Evaluation, and Management of Children With Autism Spectrum Disorder. Pediatrics. 2020;145(1). Epub 2019/12/18. doi: 10.1542/peds.2019-3447 . - DOI - PubMed
    1. Kodak T, Bergmann S. Autism Spectrum Disorder: Characteristics, Associated Behaviors, and Early Intervention. Pediatr Clin North Am. 2020;67(3):525–35. Epub 2020/05/24. doi: 10.1016/j.pcl.2020.02.007 . - DOI - PubMed

LinkOut - more resources