Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Apr 19:14:04088.
doi: 10.7189/jogh.14.04088.

Data-driven clustering approach to identify novel clusters of high cognitive impairment risk among Chinese community-dwelling elderly people with normal cognition: A national cohort study

Affiliations

Data-driven clustering approach to identify novel clusters of high cognitive impairment risk among Chinese community-dwelling elderly people with normal cognition: A national cohort study

Wang Ran et al. J Glob Health. .

Abstract

Background: Cognitive impairment is a highly heterogeneous disorder that necessitates further investigation into the distinct characteristics of populations at varying risk levels of cognitive impairment. Using a large-scale registry cohort of elderly individuals, we applied a data-driven approach to identify novel clusters based on diverse sociodemographic features.

Methods: A prospective cohort of 6398 elderly people from the Chinese Longitudinal Healthy Longevity Survey, followed between 2008-14, was used to develop and validate the model. Participants were aged ≥60 years, community-dwelling, and the Chinese version of the Mini-Mental State Examination (MMSE) score ≥18 were included. Sixty-nine sociodemographic features were included in the analysis. The total population was divided into two-thirds for the derivation cohort (n = 4265) and one-third for the validation cohort (n = 2133). In the derivation cohort, an unsupervised Gaussian mixture model was applied to categorise participants into distinct clusters. A classifier was developed based on the most important 10 factors and was applied to categorise participants into their corresponding clusters in a validation cohort. The difference in the three-year risk of cognitive impairment was compared across the clusters.

Results: We identified four clusters with distinct features in the derivation cohort. Cluster 1 was associated with the worst life independence, longest sleep duration, and the oldest age. Cluster 2 demonstrated the highest loneliness, characterised by non-marital status and living alone. Cluster 3 was characterised by the lowest sense of loneliness and the highest proportions in marital status and family co-residence. Cluster 4 demonstrated heightened engagement in exercise and leisure activity, along with independent decision-making, hygiene, and a diverse diet. In comparison to Cluster 4, Cluster 1 exhibited the highest three-year cognitive impairment risk (adjusted odds ratio (aOR) = 3.31; 95% confidence interval (CI) = 1.81-6.05), followed by Cluster 2 and Cluster 3 after adjustment for baseline MMSE, residence, sex, age, years of education, drinking, smoking, hypertension, diabetes, heart disease and stroke or cardiovascular diseases.

Conclusions: A data-driven approach can be instrumental in identifying individuals at high risk of cognitive impairment among cognitively normal elderly populations. Based on various sociodemographic features, these clusters can suggest individualised intervention plans.

PubMed Disclaimer

Conflict of interest statement

Disclosure of interest: The authors completed the ICMJE Disclosure of Interest Form (available upon request from the corresponding author) and disclose no relevant interests.

Figures

Figure 1
Figure 1
Study flowchart. Panel A. Patient selection. Two-thirds of the 6398 participants included in the analysis (n = 4265) were randomly assigned to the derivation cohort, while one-third (n = 2133) were randomly assigned to the validation cohort. Panel B. Study design. We included 69 features and identified the top 20 most important using LightGBM. Utilising these 20 features, the GMM was conducted in the derivation cohort to categorise participants into four clusters. The 10 most important features from the LightGBM model were selected to build another prediction model in the derivation cohort, which was applied to classify participants in the validation cohort. GMM – Gaussian Mixture Model, LightGBM – light gradient boosted machine
Figure 2
Figure 2
Importance ranking of features. Panel A. Importance ranking of features according to light gradient boosted machine model. The 25 most important features are depicted. Panel B. SHAP values for 20 features. ADL – activities of daily living, BMI – body mass index, IADL – instrumental activities of daily living, SHAP – SHapley Additive exPlanations
Figure 3
Figure 3
Dendrogram and heat map for unsupervised hierarchical clustering. Dendrogram and heat map for unsupervised hierarchical clustering in four clusters based on all the features. Panel A. Derivation cohort. Panel B. Validation cohort.
Figure 4
Figure 4
The predictive model accurately classifies the participants into their inherent clusters. Panel A. The receiver operating characteristic curves of prediction models. The performance of prediction models in assigning every participant to one of the four clusters. Panels B–C. The radar plots represent profiles of the four clusters identified in the derivation cohort (B) and validation cohort (C) based on 10 key features. Ten axes represented z-values for 10 key features. Panels D–E. The bar chart depicts the proportion of cognitive impairment at the three-year follow-up for each cluster in the derivation cohort (D) and validation cohort (E).

Similar articles

Cited by

References

    1. Jia L, Du Y, Chu L, Zhang Z, Li F, Lyu D, et al. Prevalence, risk factors, and management of dementia and mild cognitive impairment in adults aged 60 years or older in China: a cross-sectional study. Lancet Public Health. 2020;5:e661–71. 10.1016/S2468-2667(20)30185-7 - DOI - PubMed
    1. Chan KY, Wang W, Wu JJ, Liu L, Theodoratou E, Car J, et al. Epidemiology of Alzheimer’s disease and other forms of dementia in China, 1990-2010: a systematic review and analysis. Lancet. 2013;381:2016–23. 10.1016/S0140-6736(13)60221-4 - DOI - PubMed
    1. Plassman BL, Langa KM, Fisher GG, Heeringa SG, Weir DR, Ofstedal MB, et al. Prevalence of Dementia in the United States: The Aging, Demographics, and Memory Study. Neuroepidemiology. 2007;29:125–32. 10.1159/000109998 - DOI - PMC - PubMed
    1. National Bureau of Statistics of the People’s Republic of China. Interpretation of the seventh national census. 2021. Available: http://www.stats.gov.cn/zt_18555/zdtjgz/zgrkpc/dqcrkpc/. Accessed: 26 November 2023.
    1. Nie H, Xu Y, Liu B, Zhang Y, Lei T, Hui X, et al. The prevalence of mild cognitive impairment about elderly population in China: a meta-analysis. Int J Geriatr Psychiatry. 2011;26:558–63. 10.1002/gps.2579 - DOI - PubMed