Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Mar 3;7(2):fcaf097.
doi: 10.1093/braincomms/fcaf097. eCollection 2025.

Unraveling the role of proteins in dementia: insights from two UK cohorts with causal evidence

Affiliations

Unraveling the role of proteins in dementia: insights from two UK cohorts with causal evidence

Jessica Gong et al. Brain Commun. .

Abstract

Population-based proteomics offers a groundbreaking avenue to predict future disease risks, enhance our understanding of disease mechanisms, and discover novel therapeutic targets and biomarkers. The role of plasma proteins in dementia, however, requires further exploration. This study investigated 276 protein-dementia associations in 229 incident all-cause dementia, 89 Alzheimer's disease, and 41 vascular dementia among 3249 participants (55% women, 97.2% white ethnicity) from the English Longitudinal Study of Ageing (ELSA) over a median 9.8-year follow-up. We used Cox proportional hazard regression for the analysis. Receiver operating characteristic analyses were conducted to assess the precision of the identified proteins from the fully adjusted Cox regression models in predicting incident all-cause dementia, both individually and in combination with demographic predictors, APOE genotype, and memory score, to estimate the area under the curve. Additionally, the eXtreme Gradient Boosting machine learning algorithm was used to identify the most important features predictive of future all-cause dementia onset. These associations were then validated in 1506 incident all-cause dementia, 732 Alzheimer's disease, 281 vascular dementia, and 111 frontotemporal dementia cases among 52 745 individuals (53.9% women, 93.3% White ethnicity) from the UK Biobank over a median 13.7-year follow-up. Two-sample bi-directional Mendelian randomization and drug target Mendelian randomization were further employed to determine the causal direction between protein concentration and dementia. NEFL (hazard ratio [HR] [95% confidence intervals (CIs)]: 1.54 [1.29, 1.84]) and RPS6KB1 (HR [95% CI]: 1.33 [1.16, 1.52]) were robustly associated with incident all-cause dementia; MMP12 (HR [95% CI]: 2.06 [1.41, 2.99]) was associated with vascular dementia in ELSA, after correcting for multiple testing. Additional markers EDA2R and KIM1 were identified from subgroup and sensitivity analyses. Combining NEFL and RPS6KB1 with other predictors yielded high predictive accuracy (area under the curve = 0.871) for incident all-cause dementia. The eXtreme Gradient Boosting machine learning algorithm also identified RPS6KB1, NEFL, and KIM1 as the most important protein features for predicting future all-cause dementia. Sex difference was evident for the association between RPS6KB1 and all-cause dementia, with stronger association in men (P for interaction = 0.037). Replication in the UK Biobank confirmed the associations between the identified proteins and various dementia subtypes. The results from Mendelian randomization in the reverse direction indicated that several proteins serve as early markers for dementia, rather than being direct causes of the disease. These findings provide insights into putative mechanisms for dementia. Future studies are needed to validate the findings on RPS6KB1 in relation to dementia risk.

Keywords: ELSA; Mendelian randomization; UK biobank; dementia; proteomics.

PubMed Disclaimer

Conflict of interest statement

Olink had no part in designing the study or analyzing the data. No conflicts of interest to be declared from any of the authors.

Figures

Graphical Abstract
Graphical Abstract
Figure 1
Figure 1
Volcano plot shows the hazard ratio (x-axis) and two-sided P values (y-axis) for the association of protein concentration with incident all-cause dementia using imputed data. X-axis displays the hazard ratios from Cox proportional hazard regression models adjusted for age, sex, education, ethnicity, smoking status, depression, cardiovascular disease, body mass index, systolic blood pressure, low-density lipoprotein (LDL) cholesterol, in a sample size of 3249. Y-axis displays the nominal uncorrected P-value (−log10). Proteins above the horizontal dotted red line were significantly associated with incident all-cause dementia FDR-corrected P-value < 0.05.
Figure 2
Figure 2
Predictive accuracy of NEFL and RPS6KB1, alone or in combination with demographic variables, apolipoprotein E 4 (APOE 4) status and memory score for all-cause dementia. Area under the curve (AUC) of the receiver operating characteristic (ROC) curves illustrate the performance of various variable models in predicting the incidence of all-cause dementia in a sample size of 3249. Demographics variables included sex, age, education and ethnicity. Memory score included a combined test score of immediate recall and delayed recall.
Figure 3
Figure 3
Protein importance ranking using XGBoost decision tree-based machine learning algorithm and SHAP visualization for selected features on all-cause dementia. (A) SHapley Additive exPlanations (SHAP) values from eXtreme Gradient Boosting (XGBoost) model displaying the top 20 selected features in a sample size of 3249. The y-axis indicates the feature names in order of importance ranked from top to bottom. The x-axis represents the SHAP value, which indicates the degree of change in log odds. The width of the range of the horizontal bars showed the extent of the contribution to the prediction of all-cause dementia. The colour of each point on the graph represents the value of the corresponding feature. The direction on the x-axis indicates the likelihood of developing all-cause dementia towards the right, and likelihood of free from dementia towards the left. (B) Mean absolute SHAP values for the top 20 selected features derived from XGBoost model in a sample size of 3249.
Figure 4
Figure 4
Forest plots for the associations between identified proteins from ELSA and dementia and dementia subtypes validated in the UK biobank. Multiple adjusted hazard ratios and 95% confidence intervals (95% CIs) from Cox Proportional Hazard Regression models for NEFL, KIM1 (HAVCR1), MMP12, EDA2R and the associations with: (A) all-cause dementia; (B) Alzheimer’s disease; (C) vascular dementia; (D) frontotemporal dementia. All models adjusted for age, sex, education, ethnicity, smoking status, depression, cardiovascular disease, body mass index, systolic blood pressure, low-density lipoprotein (LDL) cholesterol, in a sample size of 52 745. P values were FDR corrected.
Figure 5
Figure 5
Enrichment analysis of the identified proteins in genotype-tissue expression (GTEx) 2023, illuminating the druggable genome (IDG) drug target 2022, and proteomics drug atlas (PDA) 2023. Enrichment for Genotype-Tissue Expression (GTEx) 2023, Illuminating the Druggable Genome (IDG) drug target 2022, and Proteomics Drug Atlas (PDA) 2023. Significant proteins after FDR correction (denoted as PFDR) derived from Cox proportional hazard regressions in minimally- and fully adjusted models were fed into Enrichr (https://maayanlab.cloud/enrichr/) for enrichment analysis. The full list of proteins from ELSA was used as the background gene set. Terms above the horizontal dotted line were enriched after FDR-correction with P-value < 0.05, and the texts were highlighted in red.
Figure 6
Figure 6
Enrichment analysis of the functional annotations in identified proteins in gene ontology (GO) 2023, Kyoto encyclopaedia of genes and genomes (KEGG) 2021, and reactome pathways 2022. Enrichment for gene ontology (GO) 2023 (GO_MF: Gene Ontology Molecular function), Kyoto Encyclopaedia of Genes and Genomes (KEGG) 2021, and Reactome pathways 2022. Significant proteins after FDR correction (denoted as PFDR) derived from Cox proportional hazard regressions in minimally- and fully adjusted models were fed into Enrichr (https://maayanlab.cloud/enrichr/) for enrichment analysis. The full list of proteins from ELSA was used as the background gene set. Terms above the horizontal dotted line were enriched after FDR-correction with P-value < 0.05, and the text were highlighted in red.

Update of

References

    1. Shi Y, Holtzman DM. Interplay between innate immunity and Alzheimer disease: APOE and TREM2 in the spotlight. Nat Rev Immunol. 2018;18(12):759–772. - PMC - PubMed
    1. Yousef H, Czupalla CJ, Lee D, et al. Aged blood impairs hippocampal neural precursor activity and activates microglia via brain endothelial cell VCAM1. Nat Med. 2019;25(6):988–1000. - PMC - PubMed
    1. Suhre K, McCarthy MI, Schwenk JM. Genetics meets proteomics: Perspectives for large population-based studies. Nat Rev Genet. 2021;22(1):19–37. - PubMed
    1. Lindbohm JV, Mars N, Walker KA, et al. Plasma proteins, cognitive decline, and 20-year risk of dementia in the whitehall II and atherosclerosis risk in communities studies. Alzheimers Dement. 2022;18(4):612–624. - PMC - PubMed
    1. Gomes B, Ashley EA. Artificial intelligence in molecular medicine. N Engl J Med. 2023;388(26):2456–2465. - PubMed

LinkOut - more resources