Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Aug 13;27(9):110709.
doi: 10.1016/j.isci.2024.110709. eCollection 2024 Sep 20.

Personalized identification of autism-related bacteria in the gut microbiome using explainable artificial intelligence

Affiliations

Personalized identification of autism-related bacteria in the gut microbiome using explainable artificial intelligence

Pierfrancesco Novielli et al. iScience. .

Abstract

Autism spectrum disorder (ASD) affects social interaction and communication. Emerging evidence links ASD to gut microbiome alterations, suggesting that microbial composition may play a role in the disorder. This study employs explainable artificial intelligence (XAI) to examine the contributions of individual microbial species to ASD. By using local explanation embeddings and unsupervised clustering, the research identifies distinct ASD subgroups, underscoring the disorder's heterogeneity. Specific microbial biomarkers associated with ASD are revealed, and the best classifiers achieved an AU-ROC of 0.965 ± 0.005 and an AU-PRC of 0.967 ± 0.008. The findings support the notion that gut microbiome composition varies significantly among individuals with ASD. This work's broader significance lies in its potential to inform personalized interventions, enhancing precision in ASD management and classification. These insights highlight the importance of individualized microbiome profiles for developing tailored therapeutic strategies for ASD.

Keywords: Developmental neuroscience; Microbiology; Microbiome; Neuroscience.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

None
Graphical abstract
Figure 1
Figure 1
Comparative performance metrics of ML models across 50 runs with statistical significance annotations (These boxplots represent the distribution of metrics for three different models, providing insights into the model performance across 50 runs of the algorithms. Panel (a) shows Accuracy, (b) shows Sensitivity, (c) shows Specificity, (d) shows Precision, (e) shows AU-ROC, and (f) shows AU-PRC. The asterisks (∗) on the plots indicate the p-value from the statistical comparison between different distributions, obtained using the Mann-Whitney U test. The significance levels are as follows: ∗ for p-values between 0.05 and 0.01, ∗∗ for p-values between 0.01 and 0.001, ∗∗∗ for p-values between 0.001 and 0.0001, and ∗∗∗∗ for p-values less than 0.0001. The Decision Tree consistently performs worse than the other two models. The Random Forest outperforms the others in terms of Accuracy and Sensitivity, while XGB outperforms the others in terms of Specificity, Precision, and AU-PRC).
Figure 2
Figure 2
XGB features important plot (Feature importance plot obtained using XGBoost. The image displays the top 20 features ranked by their importance, based on the analysis performed with cross-validation repeated 50 times. The distributions of the feature importance values are shown as boxplots for each fold of the cross-validation process, providing insights into the relative significance of these features in the predictive model).
Figure 3
Figure 3
SHAP summary plot with violin plots (SHAP summary plot illustrating the violin plots of the SHAP values for each feature. Each point on the plot represents a Shapley value of a subject, with the y axis indicating the corresponding feature and the x axis representing the Shapley value itself. The color gradient reflects the feature value, ranging from low to high. The features are ordered based on their mean importance, with more important features positioned toward the top. Violin plots utilize “violin-shaped” figures to illustrate the distribution and density of SHAP values for each feature, offering insights into range, variability, skewness, symmetry, and multimodality of the SHAP value distribution).
Figure 4
Figure 4
SHAP values bar plots (A) SHAP values bar plot for a child with autism spectrum disorder (ASD) (A) and a child with TD (B). The plot identifies the most influential feature(s) and their impact on the ASD or TD classification. The SHAP values for these subjects are associated with a single iteration of the pipeline workflow, but the average SHAP values per subject show a similarity to these results).
Figure 5
Figure 5
PCA biplot of relative abundance data for TD and ASD subjects (PCA biplot displaying relative abundance data, where blue dots correspond to TD subjects and orange dots represent individuals with ASD. The OTUs with the largest PC loadings can be seen in the figure, showing a strong correlation with the principal components. The x and y axes represent the first two principal components (PC1 and PC2), with their explained variance ratio).
Figure 6
Figure 6
PCA biplot of SHAP values in TD and ASD subjects (PCA biplot of SHAP values displaying relative abundance data, where blue dots correspond to TD subjects and orange dots represent individuals with ASD. Bacteria linked to autism spectrum disorder (ASD), including OTU625, OTU976, OTU1301, OTU390, and OTU1225, exhibit high PC loadings, indicating robust associations with the principal components. The x and y axes represent the first two principal components (PC1 and PC2), with their explained variance ratio).
Figure 7
Figure 7
PCA plots of SHAP values with ASD probability overlay (PCA plots illustrating SHAP values overlaid with ASD probability, revealing a distinct pattern of escalating ASD likelihood from the right to the left side of PC1. The x and y axes represent the first two principal components (PC1 and PC2), with their explained variance ratio).
Figure 8
Figure 8
K-means clustering outcome on local explanation embeddings for ASD individuals (Outcome of K-means clustering applied to the local explanation embeddings of individuals with ASD. The x and y axes represent the first two principal components (PC1 and PC2), with their relative explained variance).
Figure 9
Figure 9
Boxplots of ASD probability distributions across different clusters (Boxplots of the distributions of ASD probabilities among different clusters).
Figure 10
Figure 10
Schematic workflow of the performed analyses

References

    1. Banks W.A. Evidence for a cholecystokinin gut-brain axis with modulation by bombesin. Peptides. 1980;1:347–351. doi: 10.1016/0196-9781(80)90013-3. - DOI - PubMed
    1. Bercik P., Collins S.M., Verdu E.F. Microbes and the gut-brain axis. Neuro Gastroenterol. Motil. 2012;24:405–413. doi: 10.1111/j.1365-2982.2012.01906.x. - DOI - PubMed
    1. Shahin K., Soleimani-Delfan A., He Z., Sansonetti P., Collard J.M. Metagenomics revealed a correlation of gut phageome with autism spectrum disorder. Gut Pathog. 2023;15:39. doi: 10.1186/s13099-023-00561-0. - DOI - PMC - PubMed
    1. Sekirov I., Russell S.L., Antunes L.C.M., Finlay B.B. Gut microbiota in health and disease. Physiol. Rev. 2010;90:859–904. doi: 10.1152/physrev.00045.2009. - DOI - PubMed
    1. Ley R.E., Peterson D.A., Gordon J.I. Ecological and evolutionary forces shaping microbial diversity in the human intestine. Cell. 2006;124:837–848. doi: 10.1016/j.cell.2006.02.017. - DOI - PubMed

LinkOut - more resources