Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Aug;12(7):100169.
doi: 10.1016/j.tjpad.2025.100169. Epub 2025 Apr 16.

Using machine learning and electronic health record (EHR) data for the early prediction of Alzheimer's Disease and Related Dementias

Affiliations

Using machine learning and electronic health record (EHR) data for the early prediction of Alzheimer's Disease and Related Dementias

Sonia Akter et al. J Prev Alzheimers Dis. 2025 Aug.

Abstract

Background: Over 6 million patients in the United States are affected by Alzheimer's Disease and Related Dementias (ADRD). Early detection of ADRD can significantly improve patient outcomes through timely treatment.

Objective: To develop and validate machine learning (ML) models for early ADRD diagnosis and prediction using de-identified EHR data from the University of Missouri (MU) Healthcare.

Design: Retrospective case-control study.

Setting: The study used de-identified EHR data provided by the MU NextGen Biomedical Informatics, modeled with the PCORnet Common Data Model (CDM).

Participants: An initial cohort of 380,269 patients aged 40 or older with at least two healthcare encounters was narrowed to a final dataset of 4,012 ADRD cases and 119,723 controls.

Methods: Six ML classifier models: Gradient-Boosted Trees (GBT), Light Gradient-Boosting Machine (LightGBM), Random Forest (RF), eXtreme Gradient-Boosting (XGBoost), Logistic Regression (LR), and Adaptive Boosting (AdaBoost) were evaluated using Area Under the Receiver Operating Characteristic Curve (AUC-ROC), accuracy, sensitivity, specificity, and F1 score. SHAP (SHapley Additive exPlanations) analysis was applied to interpret predictions.

Results: The GBT model achieved the best AUC-ROC scores of 0.809-0.833 across 1- to 5-year prediction windows. SHAP analysis identified depressive disorder, age groups 80-90 yrs and 70-80 yrs, heart disease, anxiety, and the novel risk factors of sleep apnea, and headache.

Conclusion: This study underscores the potential of ML models for leveraging EHR data to enable early ADRD prediction, supporting timely interventions, and improving patient outcomes. By identifying both established and novel risk factors, these findings offer new opportunities for personalized screening and management strategies, advancing both clinical and informatics science.

Keywords: Alzheimer's disease; Dementias; Early prediction; Electronic health record data; Machine learning (ML).

PubMed Disclaimer

Conflict of interest statement

Declaration of competing interest The authors have no conflicts of interest to disclose.

Figures

Fig. 1
Fig. 1
Flowchart for preparing the case-control study.
Fig. 2
Fig. 2
Performance assessment of ML models in ADRD prediction. (Left) ROC curve analysis for the 5-year prediction window using six different ML models. (Right) ROC curve analysis for predictions across 1-, 2-, 3-, 4-, and 5-year windows. The GBT model, being the best performer, was used for the ROC plot.
Fig. 3
Fig. 3
SHAP plots of the top-12 features for the GBT models (1-year - 5-year prediction windows).
Fig. 3
Fig. 3
SHAP plots of the top-12 features for the GBT models (1-year - 5-year prediction windows).

Similar articles

Cited by

References

    1. Mattson M.P. Pathways towards and away from Alzheimer's disease. Nature. 2004;430(7000):631–639. - PMC - PubMed
    1. Kavitha C., Mani V., Srividhya S.R., Khalaf O.I., Tavera Romero C.A. Early-stage Alzheimer's disease prediction using machine learning models. Front Public Heal. 2022;10(March):1–13. - PMC - PubMed
    1. Šerý O., Povová J., Míšek I., Pešák L., Janout V. Molecular mechanisms of neuropathological changes in Alzheimer's disease: a review. Folia Neuropathol. 2013;51(1):1–9. - PubMed
    1. Jack C.R., Bennett D.A., Blennow K., Carrillo M.C., Dunn B., Haeberlein S.B., et al. NIA-AA Research Framework: toward a biological definition of Alzheimer's disease. Alzheimer's Dement [Internet] 2018;14(4):535–562. doi: 10.1016/j.jalz.2018.02.018. Disponible a. - DOI - PMC - PubMed
    1. Hammond T.C., Xing X., Wang C., Ma D., Nho K., Crane P.K., et al. β-amyloid and tau drive early Alzheimer's disease decline while glucose hypometabolism drives late decline. Commun Biol [Internet] 2020;3(1):1–13. doi: 10.1038/s42003-020-1079-x. Disponible a. - DOI - PMC - PubMed