Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Feb;75(2):210-219.
doi: 10.1002/acr.24973. Epub 2022 Sep 15.

Using Unsupervised Machine Learning Methods to Cluster Comorbidities in a Population-Based Cohort of Patients With Rheumatoid Arthritis

Affiliations

Using Unsupervised Machine Learning Methods to Cluster Comorbidities in a Population-Based Cohort of Patients With Rheumatoid Arthritis

Cynthia S Crowson et al. Arthritis Care Res (Hoboken). 2023 Feb.

Abstract

Objective: To identify clusters of comorbidities in patients with rheumatoid arthritis (RA) using 4 methods and to compare to patients without RA.

Methods: In this retrospective, population-based study, residents of 8 Minnesota counties with prevalent RA as of January 1, 2015 were identified. Age-, sex-, and county-matched non-RA comparators were selected from the same underlying population. Diagnostic codes were retrieved for 5 years before January 1, 2015. Using 2 codes ≥30 days apart, 44 previously defined morbidities and 11 nonoverlapping chronic disease categories based on Clinical Classifications Software were defined. Unsupervised machine learning methods of interest included hierarchical clustering, factor analysis, K-means clustering, and network analysis.

Results: Two groups of 1,643 patients with and without RA (72% female; mean age 63.1 years in both groups) were studied. Clustering of comorbidities revealed strong associations among mental/behavioral comorbidities and among cardiovascular risk factors and diseases. The clusters were associated with age and sex. Differences between the 4 clustering methods were driven by comorbidities that are rare and those that were weakly associated with other comorbidities. Common comorbidities tended to group together consistently across approaches. The instability of clusters when using different random seeds or bootstrap sampling impugns the usefulness and reliability of these methods. Clusters of common comorbidities between RA and non-RA cohorts were similar.

Conclusion: Despite the higher comorbidity burden in patients with RA compared to the general population, clustering comorbidities did not identify substantial differences in comorbidity patterns between the RA and non-RA cohorts. The instability of clustering methods suggests caution when interpreting clustering using 1 method.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1.
Figure 1.
Dendrogram based on hierarchical clustering of variables applied to patients with rheumatoid arthritis divided into eight clusters. Abbreviations: BPH=benign prostatic hyperplasia, circ=circulatory, COPD=chronic obstructive pulmonary disease, GERD=gastroesophageal reflux disease, PTSD=post-traumatic stress disorder.
Figure 2.
Figure 2.
Comorbidities according to the mean age and percent female for those who have that comorbidity within the rheumatoid arthritis cohort. The names of the comorbidities are color coded based on the hierarchical clustering. Abbreviations: BPH=benign prostatic hyperplasia, circ=circulatory, COPD=chronic obstructive pulmonary disease, GERD=gastroesophageal reflux disease, PTSD=post-traumatic stress disorder.
Figure 3.
Figure 3.
Loadings for three factors identified using factor analysis for the rheumatoid arthritis cohort. The colors of the bars correspond to the hierarchical clustering analysis to facilitate comparisons between methods. Loadings less than an absolute value of 0.3 were not displayed. Note some comorbidities are included in multiple factors and others are not included in any factor. Abbreviations: BPH=benign prostatic hyperplasia, circ=circulatory, COPD=chronic obstructive pulmonary disease, GERD=gastroesophageal reflux disease, PTSD=post-traumatic stress disorder.
Figure 4.
Figure 4.
Mapping between the k-means clusters and the prior hierarchical clusters identified using the rheumatoid arthritis cohort. For both approaches, eight clusters were chosen. Abbreviations: BPH=benign prostatic hyperplasia, circ=circulatory, COPD=chronic obstructive pulmonary disease, GERD=gastroesophageal reflux disease, PTSD=post-traumatic stress disorder.
Figure 5.
Figure 5.
Clusters identified using network analysis among subjects with rheumatoid arthritis. Panel A includes comorbidities that have pairwise Cramer’s V > .22 while panel B only includes Cramer’s V > .32. Thicker lines correspond to stronger relationships. Cramer’s V is a measure of the relative strength of an association between two variables and ranges from 0 to 1. Abbreviations: alcohol=Alcohol abuse, aller_rhin=Allergic rhinitis, arrhythm=Cardiac arrhythmia, back=Chronic back pain, bipolar=Bipolar disorder, cad=Coronary artery disease, copd=Chronic obstructive pulmonary disease, dm=Diabetes mellitus, drug=Drug abuse, hemat_ca=Hematologic cancer, hf=Heart failure, htn=Hypertension, oa=Osteoarthritis, ptsd=Post-traumatic stress disorder, pvd=Peripheral vascular disease, rls=Restless leg syndrome, sinusitis=Chronic sinusitis, skin_ulcer= Chronic skin ulcers, sleep=Sleep disorders, solid_ca=Solid cancer, stroke=Cerebrovascular disease, valvular=Valvular heart disease.
Figure 6.
Figure 6.
Heat maps from 100 bootstrap samples for hierarchical clusters of comorbidities with at least 5% prevalence from the cohorts with rheumatoid arthritis (RA; left panel) and without RA (non-RA; right panel). Abbreviations: BPH=benign prostatic hyperplasia, circ=circulatory, COPD=chronic obstructive pulmonary disease, GERD=gastroesophageal reflux disease, PTSD=post-traumatic stress disorder.

Comment in

References

    1. Gunderson TM, Myasoedova E, Davis JM 3rd, Crowson CS. Multimorbidity Burden in Rheumatoid Arthritis: A Population-based Cohort Study. J Rheumatol. 2021;48(11):1648–54. - PMC - PubMed
    1. Canning J, Siebert S, Jani BD, Harding-Edgar L, Kempe I, Mair FS, et al. Examining the relationship between rheumatoid arthritis, multimorbidity and adverse health-related outcomes: a systematic review. Arthritis Care Res (Hoboken). 2021. - PMC - PubMed
    1. U.S. Department of Health and Human Services. Multiple Chronic Conditions - A Strategic Framework: Optimum Health and Quality of Life for Individuals with Multiple Chronic Conditions Washington, D.C.2010. Available from: https://www.hhs.gov/sites/default/files/ash/initiatives/mcc/mcc_framewor....
    1. Whitty CJM, Watt FM. Map clusters of diseases to tackle multimorbidity. Nature. 2020;579(7800):494–6. - PubMed
    1. Ni Mhuircheartaigh O, Crowson CS, Gabriel SE, Roger VL, Melton LJ 3rd, Amin S. Fragility Fractures Are Associated with an Increased Risk for Cardiovascular Events in Women and Men with Rheumatoid Arthritis: A Population-based Study. J Rheumatol. 2017;44(5):558–64. - PMC - PubMed

Publication types