Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2025 Mar 28:2025.03.24.644676.
doi: 10.1101/2025.03.24.644676.

Integrating Interpretable Machine Learning and Multi-omics Systems Biology for Personalized Biomarker Discovery and Drug Repurposing in Alzheimer's Disease

Affiliations

Integrating Interpretable Machine Learning and Multi-omics Systems Biology for Personalized Biomarker Discovery and Drug Repurposing in Alzheimer's Disease

Mohammadsadeq Mottaqi et al. bioRxiv. .

Abstract

Background: Alzheimer's disease (AD) is a complex neurodegenerative disorder with substantial molecular variability across different brain regions and individuals, hindering therapeutic development. This study introduces PRISM-ML, an interpretable machine learning (ML) framework integrating multiomics data to uncover patient-specific biomarkers, subtissue-level pathology, and drug repurposing opportunities.

Methods: We harmonized transcriptomic and genomic data of three independent brain studies containing 2105 post-mortem brain samples (1363 AD, 742 controls) across nine tissues. A Random Forest classifier with SHapley Additive exPlanations (SHAP) identified patient-level biomarkers. Clustering further delineated each tissue into subtissues, and network analysis revealed critical "bottleneck" (hub) genes. Finally, a knowledge graph-based screening identified multi-target drug candidates, and a real-world pharmacoepidemiologic study evaluated their clinical relevance.

Results: We uncovered 36 molecularly distinct subtissues, each defined by a set of associated unique biomarkers and genetic drivers. Through network analysis of gene-gene interactions networks, we highlighted 262 bottleneck genes enriched in synaptic, cytoskeletal, and membrane-associated processes. Knowledge graph queries identified six FDA-approved drugs predicted to target multiple bottleneck genes and AD-relevant pathways simultaneously. One candidate, promethazine, demonstrated an association with reduced AD incidence in a large healthcare dataset of over 364000 individuals (hazard ratios ≤ 0.43; p < 0.001). These findings underscore the potential for multi-target approaches, reveal connections between AD and cardiovascular pathways, and offer novel insights into the heterogeneous biology of AD.

Conclusions: PRISM-ML bridges interpretable ML with multi-omics and systems biology to decode AD heterogeneity, revealing region-specific mechanisms and repurposable therapeutics. The validation of promethazine in real-world data underscores the clinical relevance of multi-target strategies, paving the way for more personalized treatments in AD and other complex disorders.

Keywords: Biological Network; Computational Biology; Drug Repurposing; GWAS; Personalized Medicine; Transcriptomics.

PubMed Disclaimer

Conflict of interest statement

Competing interests The authors declare that they have no competing interests.

Figures

Figure 1.
Figure 1.. PRISM-ML: integrating systems biology, multiomics, machine learning for Alzheimer’s disease drug repurposing.
(a) Multiomics integration and subtissue identification: Bulk RNA-seq and GWAS data from 2,105 post-mortem brain samples (1,363 AD patients, 742 controls) across nine tissues were harmonized. A Random Forest classifier with SHAP analysis identified 175 patient-specific unique biomarkers per sample on average. Unsupervised clustering stratified each tissue into four molecularly distinct subtissues (36 in total). Subtissue-specific biomarkers—high-impact genes shared across samples within each subtissue cluster—were derived by intersecting patient-level biomarker sets. Mutation rate analysis of AD-associated SNPs revealed subtissue-specific genetic drivers. (b) Subtissue-specific networks and bottleneck genes: Subtissue-specific gene-gene interaction networks connect biomarkers and genetic drivers via critical intermediate “message-passing” genes. Topological metrics (e.g., betweenness centrality) prioritized 262 high-centrality bottleneck genes enriched in synaptic transmission, ion transport, and extracellular matrix organization. (c) Drug repurposing and real-world validation: Knowledge-graph screening flags six FDA-approved drugs (e.g., promethazine, Disopyramide) that target multiple bottleneck genes. In a cohort of 364733 individuals, promethazine use was associated with a significantly reduced AD risk (HRs ≤ 0.43; p < 0.001).
Figure 2.
Figure 2.. Sample-level and tissue-specific biomarkers reveal regionally conserved molecular dysregulation in Alzheimer’s disease.
(a) t-SNE visualization of all 2,105 post-mortem brain samples (AD and controls) colored by tissue of origin, illustrating the overall distribution before filtering. (b) Scaled SHAP value distributions for representative genes, ranked by their contribution to the Random Forest model’s classification of AD vs. control. Each point corresponds to one sample, demonstrating how individual genes differentially influence outputs of the models. (c) t-SNE projection of the 720 AD samples that exceeded the average predictive score (i.e., “confidently classified” AD cases), showing the final subset used for tissue-specific biomarker analysis. (d) Bar plot of the number of “common biomarkers” found in each brain region (i.e., high-impact genes shared by all confidently classified AD samples within that tissue). TCX and PCC subtissues exhibit the largest sets. (e) Heatmap illustrating overlaps of common biomarkers among different tissues; darker cells denote a greater degree of shared genes. (f–h) GSEA results for the union of all tissue-specific biomarkers (56 genes). (f) Enrichment in membrane-binding activities (integrin, heme, channel, heparin, symporter), (g) shows associations with secreted proteins and extracellular matrix organization, and (h) emphasizes processes involved in transport, including ion and potassium transport. Collectively, these results underscore the biological relevance of the machine-learning–derived biomarkers, pinpointing pathways central to AD pathogenesis.
Figure 3.
Figure 3.. Subtissue-level biomarker discovery, functional enrichment, and genetic driver analysis in AD.
(a) Bar plot of shared biomarkers (y-axis) identified by intersecting patient-level gene sets within each of the 36 subtissues (x-axis), ranging from 4–80 per subtissue/cluster. (b) Heatmap showing the overlap of biomarker sets across different subtissues. Darker cells depict a higher degree number of shared genes. (c) Disease enrichment analysis (top six diseases) based on the union of subtissue-specific biomarkers, highlighting comorbid or mechanistically related conditions (e.g., type 2 diabetes, chronic renal failure, and breast cancer). (d–f) Representative Gene Ontology (GO) and Reactome enrichment results, demonstrating that the aggregated biomarkers are enriched in processes related to cell transport, differentiation, angiogenesis, extracellular matrix organization, and carbohydrate metabolism. (g) Volcano plot from differential gene expression analysis comparing AD vs. control samples (|log2 fold-change| > 0.5, and Bonferroni-adjusted p < 0.05). Biomarkers previously identified by the machine learning approach (colored points) show varied expression patterns, emphasizing the synergy between statistical and ML-driven methods. (h) Analysis of 96 AD-associated genes in GWAS data, illustrating a bar plot of subtissue-specific genetic drivers. Chi-square and permutation tests detect significant differences in mutation rates compared with background frequencies, underscoring the heterogeneous genetic architecture underlying AD pathophysiology across distinct brain regions.
Figure 4.
Figure 4.. Subtissue-specific gene-gene interaction networks, identification of critical bottleneck genes, and their functional relevance in AD.
(a) Weighted gene co-expression network (magenta) from a representative neocortex subtissue, capturing all genes expressed in that cluster. (b) The same network filtered to include only biomarkers (orange nodes), genetic drivers (green nodes), and intermediate “message-passing” genes (blue nodes). (c) Highlighting hub genes (red nodes) identified by multiple centrality measures (e.g., degree, betweenness, PageRank). (d) Bar chart showing the number of intermediate bridging genes for each of the 36 subtissues; one was excluded for insufficient samples. (e) Variations in the counts of critical bottleneck genes (i.e., highly ranked novel hub genes) across subtissues, reflecting regional heterogeneity. (f) STRING-based protein–protein interaction map of the final 262 bottleneck genes, illustrating their interconnectivity in the broader human proteome. (g–j) Functional enrichment analyses of these bottleneck genes, including cellular component (g), molecular function (h), Reactome pathway (i), and disease associations (j). The enriched terms highlighted roles in membrane localization, synapse organization, ion channel activity, signal transduction, cytoskeletal regulation, and various comorbid conditions such as type 2 diabetes. (k) Pharmacoepidemiologic analysis of the associations of promethazine versus cyproheptadine with AD (forest plot) demonstrating a hazard ratio under different adjustment methods, supporting the therapeutic relevance of one identified drug candidate in real-world patient data. (l) molecular structures of the six repurposed candidate drugs (m) Illustration of real-world pharmacoepidemiologic study design.

References

    1. Monteiro AR, Barbosa DJ, Remião F, Silva R. Alzheimer’s disease: Insights and new prospects in disease pathophysiology, biomarkers and disease-modifying drugs. Biochem Pharmacol. 2023. May 1;211:115522. - PubMed
    1. Zhang J, Zhang Y, Wang J, Xia Y, Zhang J, Chen L. Recent advances in Alzheimer’s disease: mechanisms, clinical trials and new drug development strategies. Signal Transduct Target Ther [Internet]. 2024;9(1):211. Available from: 10.1038/s41392-024-01911-3 - DOI - PMC - PubMed
    1. De A, Mishra TK, Saraf S, Tripathy B, Reddy SS. A Review on the Use of Modern Computational Methods in Alzheimer’s Disease-Detection and Prediction. Curr Alzheimer Res [Internet]. 2024. Mar 12 [cited 2025 Jan 12];20(12):845–61. Available from: https://pubmed.ncbi.nlm.nih.gov/38468529/ - PubMed
    1. Young AL, Oxtoby NP, Garbarino S, Fox NC, Barkhof F, Schott JM, et al. Data-driven modelling of neurodegenerative disease progression: thinking outside the black box. Nat Rev Neurosci [Internet]. 2024. Feb 1 [cited 2025 Jan 12];25(2):111–30. Available from: https://pubmed.ncbi.nlm.nih.gov/38191721/ - PubMed
    1. Lundberg SM, Lee SI. A Unified Approach to Interpreting Model Predictions. Adv Neural Inf Process Syst [Internet]. 2017. May 22 [cited 2025 Jan 12];2017-December:4766–75. Available from: https://arxiv.org/abs/1705.07874v2

Publication types

LinkOut - more resources