Development of a machine learning model to predict mild cognitive impairment using natural language processing in the absence of screening

Affiliations

¹ Kaiser Permanente Washington Health Research Institute, 1730 Minor Ave., Suite 1600, Seattle, WA, 98101, USA. robert.b.penfold@kp.org.
² Kaiser Permanente Washington Health Research Institute, 1730 Minor Ave., Suite 1600, Seattle, WA, 98101, USA.
³ Janssen Research and Development, LLC, Raritan, USA.

PMID: 35549702
PMCID: PMC9097352
DOI: 10.1186/s12911-022-01864-z

Development of a machine learning model to predict mild cognitive impairment using natural language processing in the absence of screening

Robert B Penfold et al. BMC Med Inform Decis Mak. 2022.

. 2022 May 12;22(1):129.

doi: 10.1186/s12911-022-01864-z.

Affiliations

¹ Kaiser Permanente Washington Health Research Institute, 1730 Minor Ave., Suite 1600, Seattle, WA, 98101, USA. robert.b.penfold@kp.org.
² Kaiser Permanente Washington Health Research Institute, 1730 Minor Ave., Suite 1600, Seattle, WA, 98101, USA.
³ Janssen Research and Development, LLC, Raritan, USA.

PMID: 35549702
PMCID: PMC9097352
DOI: 10.1186/s12911-022-01864-z

Abstract

Background: Patients and their loved ones often report symptoms or complaints of cognitive decline that clinicians note in free clinical text, but no structured screening or diagnostic data are recorded. These symptoms/complaints may be signals that predict who will go on to be diagnosed with mild cognitive impairment (MCI) and ultimately develop Alzheimer's Disease or related dementias. Our objective was to develop a natural language processing system and prediction model for identification of MCI from clinical text in the absence of screening or other structured diagnostic information.

Methods: There were two populations of patients: 1794 participants in the Adult Changes in Thought (ACT) study and 2391 patients in the general population of Kaiser Permanente Washington. All individuals had standardized cognitive assessment scores. We excluded patients with a diagnosis of Alzheimer's Disease, Dementia or use of donepezil. We manually annotated 10,391 clinic notes to train the NLP model. Standard Python code was used to extract phrases from notes and map each phrase to a cognitive functioning concept. Concepts derived from the NLP system were used to predict future MCI. The prediction model was trained on the ACT cohort and 60% of the general population cohort with 40% withheld for validation. We used a least absolute shrinkage and selection operator logistic regression approach (LASSO) to fit a prediction model with MCI as the prediction target. Using the predicted case status from the LASSO model and known MCI from standardized scores, we constructed receiver operating curves to measure model performance.

Results: Chart abstraction identified 42 MCI concepts. Prediction model performance in the validation data set was modest with an area under the curve of 0.67. Setting the cutoff for correct classification at 0.60, the classifier yielded sensitivity of 1.7%, specificity of 99.7%, PPV of 70% and NPV of 70.5% in the validation cohort.

Discussion and conclusion: Although the sensitivity of the machine learning model was poor, negative predictive value was high, an important characteristic of models used for population-based screening. While an AUC of 0.67 is generally considered moderate performance, it is also comparable to several tests that are widely used in clinical practice.

Keywords: Dementia; Early identification; MCI; NLP.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

**Fig. 1**
ROC curve for training and validation cohorts. Green dotted line: ACT + general population training. Light green dotted line: ACT training. Orange dotted line: general population 60% training sample. Blue dotted line: general population 40% validation sample. Gray dotted line: demographic variables only. ACT + general population 60% training: AUC = 0.716 (0.695, 0.736). ACT alone: AUC = 0.700 (0.673, 0.726). General population, 60% Training: AUC = 0.698 (0.663, 0.731). General population, 40% validation: AUC = 0.670 (0.638, 0.702). Demographics only (no NLP variables): AUC = 0.598 (0.576, 0.621)

See this image and copyright information in PMC

References

1. Plassman BL, Langa KM, Fisher GG, Heeringa SG, Weir DR, Ofstedal MB, Burke JR, Hurd MD, Potter GG, Rodgers WL, et al. Prevalence of dementia in the United States: the aging, demographics, and memory study. Neuroepidemiology. 2007;29(1–2):125–132. doi: 10.1159/000109998. - DOI - PMC - PubMed
1. International AsD . World Alzheimer Report 2009: the global prevalence of dementia. London: Alzheimer’s Disease International; 2009.
1. What is Alzheimer's. https://www.alz.org/alzheimers-dementia/what-is-alzheimers.
1. Gauthier S, Reisberg B, Zaudig M, Petersen RC, Ritchie K, Broich K, Belleville S, Brodaty H, Bennett D, Chertkow H, et al. Mild cognitive impairment. The Lancet. 2006;367(9518):1262–1270. doi: 10.1016/S0140-6736(06)68542-5. - DOI - PubMed
1. Folstein MF, Folstein SE, McHugh PR. Mini-mental state: a practical method for grading the cognitive state of patients for the clinician. J Psychiatr Res. 1975;12(3):189–198. doi: 10.1016/0022-3956(75)90026-6. - DOI - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

U01 AG006781/AG/NIA NIH HHS/United States

LinkOut - more resources

Full Text Sources
Medical
- MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Development of a machine learning model to predict mild cognitive impairment using natural language processing in the absence of screening

Affiliations

Development of a machine learning model to predict mild cognitive impairment using natural language processing in the absence of screening

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Medical