. 2023 Dec;248(24):2547-2559.

doi: 10.1177/15353702231214253. Epub 2023 Dec 15.

Explainable hierarchical clustering for patient subtyping and risk prediction

Enrico Werner¹, Jeffrey N Clark¹, Alexander Hepburn¹, Ranjeet S Bhamber¹, Michael Ambler², Christopher P Bourdeaux³, Christopher J McWilliams⁴, Raul Santos-Rodriguez⁵

Affiliations

¹ University of Bristol, Bristol BS1 5DD, UK.
² University of Bristol, Bristol BS8 1TD, UK.
³ University Hospitals Bristol NHS Foundation Trust, Bristol BS2 8HW, UK.
⁴ University of Bristol, Bristol BS8 1TW, UK.
⁵ University of Bristol, Bristol BS8 1QU, UK.

PMID: 38102763
PMCID: PMC10854470
DOI: 10.1177/15353702231214253

Explainable hierarchical clustering for patient subtyping and risk prediction

Enrico Werner et al. Exp Biol Med (Maywood). 2023 Dec.

. 2023 Dec;248(24):2547-2559.

doi: 10.1177/15353702231214253. Epub 2023 Dec 15.

Authors

Enrico Werner¹, Jeffrey N Clark¹, Alexander Hepburn¹, Ranjeet S Bhamber¹, Michael Ambler², Christopher P Bourdeaux³, Christopher J McWilliams⁴, Raul Santos-Rodriguez⁵

Affiliations

¹ University of Bristol, Bristol BS1 5DD, UK.
² University of Bristol, Bristol BS8 1TD, UK.
³ University Hospitals Bristol NHS Foundation Trust, Bristol BS2 8HW, UK.
⁴ University of Bristol, Bristol BS8 1TW, UK.
⁵ University of Bristol, Bristol BS8 1QU, UK.

PMID: 38102763
PMCID: PMC10854470
DOI: 10.1177/15353702231214253

Abstract

We present a pipeline in which machine learning techniques are used to automatically identify and evaluate subtypes of hospital patients admitted between 2017 and 2021 in a large UK teaching hospital. Patient clusters are determined using routinely collected hospital data, such as those used in the UK's National Early Warning Score 2 (NEWS2). An iterative, hierarchical clustering process was used to identify the minimum set of relevant features for cluster separation. With the use of state-of-the-art explainability techniques, the identified subtypes are interpreted and assigned clinical meaning, illustrating their robustness. In parallel, clinicians assessed intracluster similarities and intercluster differences of the identified patient subtypes within the context of their clinical knowledge. For each cluster, outcome prediction models were trained and their forecasting ability was illustrated against the NEWS2 of the unclustered patient cohort. These preliminary results suggest that subtype models can outperform the established NEWS2 method, providing improved prediction of patient deterioration. By considering both the computational outputs and clinician-based explanations in patient subtyping, we aim to highlight the mutual benefit of combining machine learning techniques with clinical expertise.

Keywords: Hierarchical clustering; clinical evaluation; early warning score; explainability; mortality prediction; patient subtypes.

PubMed Disclaimer

Conflict of interest statement

Declaration of conflicting interestsThe author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Figures

**Figure 1.**
Pipeline overview, from data set import to generation of explainable clusters and clinical outcome predictions. The blue box denotes the iterative clustering process.

**Figure 2.**
Patients mapped onto the two-dimensional embedding space after dimensionality reduction and clustering. Clusters inside black boxes depict the subclustering results. Subclustering was not performed for clusters 4 and 6 as both contained less than 1000 patients. Unclusterable patients are shown in dark blue, often at the edges of clusters.

**Figure 3.**
NEWS, vitals, age, gender, ICD-10 code count, and length of stay for individual clusters from clustering the entire population. The mean value of each cluster is compared to the mean or median value depending on the feature (black line) of the whole population.

**Figure 4.**
NEWS, vitals, age, gender, ICD-10 code count, and length of stay show the subclustering results for cluster 0. The mean value of each cluster is compared to the mean/median value (black line) of the parent cluster.

**Figure 5.**
NEWS, vitals, age, gender, ICD-10 code count, and length of stay show the subclustering results for cluster 1. The mean value of each cluster is compared to the mean/median value (black line) of the parent cluster.

**Figure 6.**
Heatmap of primary ICD-10 codes of full-population clustering and collated by top-level grouping. For display purposes, only ICD-10 codes with ⩾ 2% incidence for at least one cluster are displayed. Since only a subset of ICD-10 codes are visualized, each row does not add up to 100. MSK: musculoskeletal.

**Figure 7.**
Heatmap of primary ICD-10 codes of the subclusters of cluster 0 as recorded by clinicians at the time of patient admission and collated by top-level grouping. For display purposes, only ICD-10 codes with ⩾ 2% incidence for at least one cluster are displayed. Since only a subset of ICD-10 codes are visualized, each row does not add up to 100. MSK: musculoskeletal.

**Figure 8.**
Heatmap of primary ICD-10 codes of different subclusters as recorded by clinicians at the time of patient admission and collated by top-level grouping. For display purposes, only ICD-10 codes with ⩾ 2% incidence for at least one cluster are displayed. Since only a subset of ICD-10 codes are visualized, each row does not add up to 100. MSK: musculoskeletal.

**Figure 9.**
Surrogate explanations for the contribution of each vital in determining the assignment of patients into each cluster from clustering of the entire population. SATS: hemoglobin saturation with oxygen.

**Figure 10.**
Surrogate explanations for the contribution of each vital in determining the assignment of patients into each subcluster of cluster 0. SATS: hemoglobin saturation with oxygen.

**Figure 11.**
Surrogate explanations for the contribution of each vital in determining the assignment of patients into each subcluster of cluster 1. SATS: hemoglobin saturation with oxygen.

**Figure 12.**
Predictive performance for classification models compared against the existing NEWS2 risk scoring system for the two predicted outcomes: in-hospital mortality and admission to higher care (general ICU, cardiac ICU, and critical care unit). “All” refers to the entire unclustered patient cohort. Mortality was not predicted for cluster 0 since only one positive case occurred: (a) mortality ROC, (b) mortality PRC, (c) admission to higher care ROC, and (d) admission to higher care PRC. Figures in brackets are the area under the curve. ROC: receiver operating characteristic curves; PRC: precision recall curve.

See this image and copyright information in PMC

Cited by

A new distal radius fracture classification depending on the specific fragments through machine learning clustering method.
Gao Y, Zhao Y, Liu Y, Lei S, Wang H, Lizhu Y, Lu T, Cheng Z, Wang D, Zhao B, Li Z, Zhou J. Gao Y, et al. BMC Musculoskelet Disord. 2024 Dec 30;25(1):1085. doi: 10.1186/s12891-024-08215-1. BMC Musculoskelet Disord. 2024. PMID: 39736589 Free PMC article.
Improving explainability of post-separation suicide attempt prediction models for transitioning service members: insights from the Army Study to Assess Risk and Resilience in Servicemembers - Longitudinal Study.
Edwards ER, Geraci JC, Gildea SM, Houtsma C, Holdcraft JA, Kennedy CJ, King AJ, Luedtke A, Marx BP, Naifeh JA, Sampson NA, Stein MB, Ursano RJ, Kessler RC. Edwards ER, et al. Transl Psychiatry. 2025 Jan 30;15(1):37. doi: 10.1038/s41398-025-03248-z. Transl Psychiatry. 2025. PMID: 39885116 Free PMC article.
Unravelling lumbar disc herniation severity beyond MRI : integrated transcriptomic and metabolomic analyses highlight glycerophospholipid metabolism and inform a machine-learning diagnostic model: a pilot study.
Deng Q, Ren S, Zhang N, Li G, Yu Z, Li X, Cui H, Zhang Y, Zhang Y, Chen J. Deng Q, et al. Bone Joint Res. 2025 May 12;14(5):434-447. doi: 10.1302/2046-3758.145.BJR-2024-0071.R1. Bone Joint Res. 2025. PMID: 40350161 Free PMC article.
Creative and generative artificial intelligence for personalized medicine and healthcare: Hype, reality, or hyperreality?
Shaban-Nejad A, Michalowski M, Bianco S. Shaban-Nejad A, et al. Exp Biol Med (Maywood). 2023 Dec;248(24):2497-2499. doi: 10.1177/15353702241226801. Epub 2024 Feb 4. Exp Biol Med (Maywood). 2023. PMID: 38311873 Free PMC article. No abstract available.

References

1. Castela Forte J, Yeshmagambetova G, van der Grinten ML, Hiemstra B, Kaufmann T, Eck RJ, Keus F, Epema AH, Wiering MA, van der Horst ICC. Identifying and characterizing high-risk clusters in a heterogeneous ICU population with deep embedded clustering. Sci Rep 2021;11:12109. - PMC - PubMed
1. Baytas IM, Xiao C, Zhang X, Wang F, Jain AK, Zhou J. Patient subtyping via time-aware LSTM networks. In: Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining, Halifax, NS, Canada, 13–17 August 2017, pp.65–74. New York: ACM
1. Vranas KC, Jopling JK, Sweeney TE, Ramsey MC, Milstein AS, Slatore CG, Escobar GJ, Liu VX. Identifying distinct subgroups of intensive care unit patients: a machine learning approach. Crit Care Med 2017;45:1607–15 - PMC - PubMed
1. Anand RS, Stey P, Jain S, Biron DR, Bhatt H, Monteiro K, Feller E, Ranney ML, Sarkar IN, Chen ES. Predicting mortality in diabetic ICU patients using machine learning and severity indices. AMIA Jt Summits Transl Sci Proc 2018;2017:310–9 - PMC - PubMed
1. McWilliams CJ, Lawson DJ, Santos-Rodriguez R, Gilchrist ID, Champneys A, Gould TH, Thomas MJC, Bourdeaux CP. Towards a decision support tool for intensive care discharge: machine learning algorithm development using electronic healthcare data from MIMIC-III and Bristol, UK. BMJ Open 2019;9:e025925 - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Explainable hierarchical clustering for patient subtyping and risk prediction

Affiliations

Explainable hierarchical clustering for patient subtyping and risk prediction

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

MeSH terms

LinkOut - more resources

Full Text Sources

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

MeSH terms

Related information

LinkOut - more resources

Full Text Sources