Mini-mental status examination phenotyping for Alzheimer's disease patients using both structured and narrative electronic health record features
- PMID: 39520712
- PMCID: PMC11648712
- DOI: 10.1093/jamia/ocae274
Mini-mental status examination phenotyping for Alzheimer's disease patients using both structured and narrative electronic health record features
Abstract
Objective: This study aims to automate the prediction of Mini-Mental State Examination (MMSE) scores, a widely adopted standard for cognitive assessment in patients with Alzheimer's disease, using natural language processing (NLP) and machine learning (ML) on structured and unstructured EHR data.
Materials and methods: We extracted demographic data, diagnoses, medications, and unstructured clinical visit notes from the EHRs. We used Latent Dirichlet Allocation (LDA) for topic modeling and Term-Frequency Inverse Document Frequency (TF-IDF) for n-grams. In addition, we extracted meta-features such as age, ethnicity, and race. Model training and evaluation employed eXtreme Gradient Boosting (XGBoost), Stochastic Gradient Descent Regressor (SGDRegressor), and Multi-Layer Perceptron (MLP).
Results: We analyzed 1654 clinical visit notes collected between September 2019 and June 2023 for 1000 Alzheimer's disease patients. The average MMSE score was 20, with patients averaging 76.4 years old, 54.7% female, and 54.7% identifying as White. The best-performing model (ie, lowest root mean squared error (RMSE)) is MLP, which achieved an RMSE of 5.53 on the validation set using n-grams, indicating superior prediction performance over other models and feature sets. The RMSE on the test set was 5.85.
Discussion: This study developed a ML method to predict MMSE scores from unstructured clinical notes, demonstrating the feasibility of utilizing NLP to support cognitive assessment. Future work should focus on refining the model and evaluating its clinical relevance across diverse settings.
Conclusion: We contributed a model for automating MMSE estimation using EHR features, potentially transforming cognitive assessment for Alzheimer's patients and paving the way for more informed clinical decisions and cohort identification.
Keywords: Alzheimer’s disease; electronic health records; machine learning; natural language processing; phenotyping.
© The Author(s) 2024. Published by Oxford University Press on behalf of the American Medical Informatics Association. All rights reserved. For permissions, please email: journals.permissions@oup.com.
Conflict of interest statement
None declared.
Similar articles
-
Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23. Clin Orthop Relat Res. 2024. PMID: 39051924
-
Assess the documentation of cognitive tests and biomarkers in electronic health records via natural language processing for Alzheimer's disease and related dementias.Int J Med Inform. 2023 Feb;170:104973. doi: 10.1016/j.ijmedinf.2022.104973. Epub 2022 Dec 21. Int J Med Inform. 2023. PMID: 36577203 Free PMC article.
-
Selegiline for Alzheimer's disease.Cochrane Database Syst Rev. 2003;(1):CD000442. doi: 10.1002/14651858.CD000442. Cochrane Database Syst Rev. 2003. PMID: 12535396
-
Mini-Cog for the diagnosis of Alzheimer's disease dementia and other dementias within a primary care setting.Cochrane Database Syst Rev. 2018 Feb 22;2(2):CD011415. doi: 10.1002/14651858.CD011415.pub2. Cochrane Database Syst Rev. 2018. Update in: Cochrane Database Syst Rev. 2021 Jul 14;7:CD011415. doi: 10.1002/14651858.CD011415.pub3. PMID: 29470861 Free PMC article. Updated.
-
Galantamine for dementia due to Alzheimer's disease and mild cognitive impairment.Cochrane Database Syst Rev. 2024 Nov 5;11(11):CD001747. doi: 10.1002/14651858.CD001747.pub4. Cochrane Database Syst Rev. 2024. PMID: 39498781
Cited by
-
Improving Phenotyping of Patients With Immune-Mediated Inflammatory Diseases Through Automated Processing of Discharge Summaries: Multicenter Cohort Study.JMIR Med Inform. 2025 Apr 9;13:e68704. doi: 10.2196/68704. JMIR Med Inform. 2025. PMID: 40203304 Free PMC article.
-
Leveraging long context in retrieval augmented language models for medical question answering.NPJ Digit Med. 2025 May 2;8(1):239. doi: 10.1038/s41746-025-01651-w. NPJ Digit Med. 2025. PMID: 40316710 Free PMC article.
References
-
- Alzheimer’s Association. 2022 Alzheimer’s disease facts and figures. Alzheimers Dement. 2022;18:700-789. - PubMed
-
- Alzheimer’s Association. 2023 Alzheimer’s disease facts and figures. Alzheimers Dement. 2023;19:1598-1695. - PubMed
-
- Centers for Disease Control and Prevention. Leading causes of death. 2024. https://www.cdc.gov/nchs/fastats/leading-causes-of-death.htm
-
- Heron M. Deaths: leading causes for 2017. Natl Vital Stat Rep. 2019;68:1-77. - PubMed
-
- Secretary for Planning and Evaluation [ASPE], A. National plan to address Alzheimer’s disease: 2021 update. 2021.
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Medical
Miscellaneous