Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Apr 7;9(4):e24754.
doi: 10.2196/24754.

Retracted: Diagnostic Classification and Prognostic Prediction Using Common Genetic Variants in Autism Spectrum Disorder: Genotype-Based Deep Learning

Affiliations

Retracted: Diagnostic Classification and Prognostic Prediction Using Common Genetic Variants in Autism Spectrum Disorder: Genotype-Based Deep Learning

Haishuai Wang et al. JMIR Med Inform. .

Retraction in

Abstract

Background: In the United States, about 3 million people have autism spectrum disorder (ASD), and around 1 out of 59 children are diagnosed with ASD. People with ASD have characteristic social communication deficits and repetitive behaviors. The causes of this disorder remain unknown; however, in up to 25% of cases, a genetic cause can be identified. Detecting ASD as early as possible is desirable because early detection of ASD enables timely interventions in children with ASD. Identification of ASD based on objective pathogenic mutation screening is the major first step toward early intervention and effective treatment of affected children.

Objective: Recent investigation interrogated genomics data for detecting and treating autism disorders, in addition to the conventional clinical interview as a diagnostic test. Since deep neural networks perform better than shallow machine learning models on complex and high-dimensional data, in this study, we sought to apply deep learning to genetic data obtained across thousands of simplex families at risk for ASD to identify contributory mutations and to create an advanced diagnostic classifier for autism screening.

Methods: After preprocessing the genomics data from the Simons Simplex Collection, we extracted top ranking common variants that may be protective or pathogenic for autism based on a chi-square test. A convolutional neural network-based diagnostic classifier was then designed using the identified significant common variants to predict autism. The performance was then compared with shallow machine learning-based classifiers and randomly selected common variants.

Results: The selected contributory common variants were significantly enriched in chromosome X while chromosome Y was also discriminatory in determining the identification of autistic individuals from nonautistic individuals. The ARSD, MAGEB16, and MXRA5 genes had the largest effect in the contributory variants. Thus, screening algorithms were adapted to include these common variants. The deep learning model yielded an area under the receiver operating characteristic curve of 0.955 and an accuracy of 88% for identifying autistic individuals from nonautistic individuals. Our classifier demonstrated a considerable improvement of ~13% in terms of classification accuracy compared to standard autism screening tools.

Conclusions: Common variants are informative for autism identification. Our findings also suggest that the deep learning process is a reliable method for distinguishing the diseased group from the control group based on the common variants of autism.

Keywords: autism spectrum disorder; common genetic variants, diagnostic classification; deep learning.

PubMed Disclaimer

Conflict of interest statement

Conflicts of Interest: None declared.

Figures

Figure 1
Figure 1
Overall framework for deciphering contributory common variants and predicting autism spectrum disorder diagnosis. A. Data preprocessing. VCF_GT recoding is to encode VCF_GT values as dummy variables. If both alleles are reference alleles, it is encoded as 0; if both alleles are alternate alleles, it is encoded as 2; otherwise, it is 1. B. Data split and significant variant selection. The data set was split into training set and test set. Variants were ranked based on their chi-score and P value, and only top ranked (high chi-score value and low P value) variants were selected as contributory common variants for autism spectrum disorder. C. Convolutional neural network classifier. The selected significant common variants in the training data were fed into a convolutional neural network to train a classifier. Thereafter, the trained model was applied on the test data for autism spectrum disorder diagnosis prediction. ASD: autism spectrum disorder; CNN: convolutional neural network; SSC: Simons Simplex Collection; VCF: variant call format; VCF_CQ: variant call format-conditional genotype quality; VCF_DP: variant call format-read depth; VCF_GT: variant call format-genotype quality.
Figure 2
Figure 2
A. Variants with high relative importance scores in chi-square test. The Y-axis corresponds to variant IDs of these variants, and the X-axis corresponds to the relative importance values of the corresponding variants. B. Visualization of the top 100 selected significantly common variants using t-distributed stochastic neighbor embedding. Different colors represent different classes (ie, case and control). This visualization indicates that the 2 groups are differentiable using the selected top common variants. t-SNE: t-distributed stochastic neighbor embedding.
Figure 3
Figure 3
A. The area under the receiver operating characteristic curve of DeepAutism, random forest, logistic regression, and Naive Bayes for predicting autism spectrum disorder diagnosis based on the selected top 100 significantly common variants on the test data. B. The visualization table that describes the performance of the DeepAutism classifier on the test data. DeepAutism correctly predicted 697 out of 787 total samples and correctly predicted autism spectrum disorder in 423 samples out of 456 samples with autism spectrum disorders. AUC: area under the receiver operating characteristic curve; ASD: autism spectrum disorder; NB: Naive Bayes; LR: logistic regression; RF: random forest.

Comment in

Similar articles

Cited by

References

    1. Rylaarsdam L, Guemez-Gamboa A. Genetic Causes and Modifiers of Autism Spectrum Disorder. Front Cell Neurosci. 2019;13:385. doi: 10.3389/fncel.2019.00385. https://doi.org/10.3389/fncel.2019.00385 - DOI - DOI - PMC - PubMed
    1. Frazier TW, Thompson L, Youngstrom EA, Law P, Hardan AY, Eng C, Morris N. A twin study of heritable and shared environmental contributions to autism. J Autism Dev Disord. 2014 Aug;44(8):2013–25. doi: 10.1007/s10803-014-2081-2. http://europepmc.org/abstract/MED/24604525 - DOI - PMC - PubMed
    1. Sutton H. Autism caused mostly by genetics, according to study. Disability Compliance for Higher Education. 2019 Aug 22;25(2):9–9. doi: 10.1002/dhe.30707. http://paperpile.com/b/cCsC73/7Rzn - DOI
    1. McDonald NM, Senturk D, Scheffler A, Brian JA, Carver LJ, Charman T, Chawarska K, Curtin S, Hertz-Piccioto I, Jones EJH, Klin A, Landa R, Messinger DS, Ozonoff S, Stone WL, Tager-Flusberg H, Webb SJ, Young G, Zwaigenbaum L, Jeste SS. Developmental Trajectories of Infants With Multiplex Family Risk for Autism: A Baby Siblings Research Consortium Study. JAMA Neurol. 2020 Jan 01;77(1):73–81. doi: 10.1001/jamaneurol.2019.3341. http://europepmc.org/abstract/MED/31589284 2752283 - DOI - PMC - PubMed
    1. de la Torre-Ubieta L, Won H, Stein JL, Geschwind DH. Advancing the understanding of autism disease mechanisms through genetics. Nat Med. 2016 Apr;22(4):345–61. doi: 10.1038/nm.4071. http://europepmc.org/abstract/MED/27050589 nm.4071 - DOI - PMC - PubMed

Publication types

LinkOut - more resources