Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2024 Aug 26:2024.08.25.609610.
doi: 10.1101/2024.08.25.609610.

Predicting Alzheimer's Cognitive Resilience Score: A Comparative Study of Machine Learning Models Using RNA-seq Data

Affiliations

Predicting Alzheimer's Cognitive Resilience Score: A Comparative Study of Machine Learning Models Using RNA-seq Data

Akihiro Kitani et al. bioRxiv. .

Abstract

Alzheimer's disease (AD) is an important research topic. While amyloid plaques and neurofibrillary tangles are hallmark pathological features of AD, cognitive resilience (CR) is a phenomenon where cognitive function remains preserved despite the presence of these pathological features. This study aimed to construct and compare predictive machine learning models for CR scores using RNA-seq data from the Religious Orders Study and Memory and Aging Project (ROSMAP) and Mount Sinai Brain Bank (MSBB) cohorts. We evaluated support vector regression (SVR), random forest, XGBoost, linear, and transformer-based models. The SVR model exhibited the best performance, with contributing genes identified using Shapley additive explanations (SHAP) scores, providing insights into biological pathways associated with CR. Finally, we developed a tool called the resilience gene analyzer (REGA), which visualizes SHAP scores to interpret the contributions of individual genes to CR. REGA is available at https://igcore.cloud/GerOmics/REsilienceGeneAnalyzer/.

Keywords: Alzheimer’s disease; Shapley additive explanations; machine learning; resilience gene analyzer; transcriptomics.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Overview of the data analysis performed in this study. RNA-seq datasets from the ROSMAP [39] and MSBB [40] cohorts were analyzed independently. The target variable was the resilience score, and the features were the expression levels of genes that were highly correlated with the resilience score. The models used in the present work included linear models, SVR [43], random forest [44], XGBoost [45], and transformer-based models [46,47]. Learning was performed using 5-fold cross-validation. The evaluation metrics were RMSE and R2, and gene contributions were calculated using the SHAP scores [41] in the best-performing model.
Fig. 2.
Fig. 2.
Prediction performances of different machine learning models using various feature sets in the MSBB study data. RMSE for each model is shown for the test data, with error bars representing standard error. Statistical significance was assessed using the Kruskal–Wallis test followed by Dunn’s multiple comparison test, with p-values adjusted by Bonferroni correction: *<0.05, **<0.01, ***<0.001; n = 5 for 5-fold cross-validation).
Fig. 3.
Fig. 3.
Prediction performances of different machine learning models using various feature sets in the ROSMAP study data. RMSE for each model is shown for the test data, with error bars representing standard error. Statistical significance was assessed using the Kruskal–Wallis test followed by the Dunn’s multiple comparison test, with p-values adjusted by Bonferroni correction: *<0.05, **<0.01, ***<0.001; n = 5 for 5-fold cross-validation).
Fig. 4.
Fig. 4.
SHAP values of the best models for predicting resilience scores in the (A) MSBB and (B) ROSMAP cohorts. SHAP values were calculated for 100 data points, with each dot representing an individual’s patient’s data.
Fig. 5.
Fig. 5.
Visualization of gene contribution to the prediction of resilience scores. The REGA tool interface for visualizing the contribution of individual genes to resilience score predictions is shown. This tool allows users to select a dataset (MSBB or ROSMAP) and specific genes for the analysis. The left panel shows the selection options for datasets and genes. The top-right panel shows the distribution of SHAP scores for all genes, with a dashed line indicating the importance score for the selected gene. The bottom-right panel shows the SHAP scores for individual samples, indicating the contribution of the selected genes to the resilience score predictions.

Similar articles

References

    1. de Vries LE, Huitinga I, Kessels HW, et al. The concept of resilience to Alzheimer’s Disease: current definitions and cellular and molecular mechanisms. Mol. Neurodegener. 2024; 19:33. - PMC - PubMed
    1. Negro D, Opazo P. Cognitive resilience in Alzheimer’s disease: from large-scale brain networks to synapses. Brain Commun. 2024; 6:fcae050. - PMC - PubMed
    1. Arenaza-Urquijo EM, Vemuri P. Resistance vs resilience to Alzheimer disease: Clarifying terminology for preclinical studies. Neurology 2018; 90:695–703 - PMC - PubMed
    1. Arenaza-Urquijo EM, Boyle R, Casaletto K, et al. Sex and gender differences in cognitive resilience to aging and Alzheimer’s disease. Alzheimers. Dement. 2024; - PMC - PubMed
    1. Aiello Bowles EJ, Crane PK, Walker RL, et al. Cognitive resilience to Alzheimer’s disease pathology in the human brain. J. Alzheimers. Dis. 2019; 68:1071–1083 - PMC - PubMed

Publication types

LinkOut - more resources