Explainable hierarchical clustering for patient subtyping and risk prediction
- PMID: 38102763
- PMCID: PMC10854470
- DOI: 10.1177/15353702231214253
Explainable hierarchical clustering for patient subtyping and risk prediction
Abstract
We present a pipeline in which machine learning techniques are used to automatically identify and evaluate subtypes of hospital patients admitted between 2017 and 2021 in a large UK teaching hospital. Patient clusters are determined using routinely collected hospital data, such as those used in the UK's National Early Warning Score 2 (NEWS2). An iterative, hierarchical clustering process was used to identify the minimum set of relevant features for cluster separation. With the use of state-of-the-art explainability techniques, the identified subtypes are interpreted and assigned clinical meaning, illustrating their robustness. In parallel, clinicians assessed intracluster similarities and intercluster differences of the identified patient subtypes within the context of their clinical knowledge. For each cluster, outcome prediction models were trained and their forecasting ability was illustrated against the NEWS2 of the unclustered patient cohort. These preliminary results suggest that subtype models can outperform the established NEWS2 method, providing improved prediction of patient deterioration. By considering both the computational outputs and clinician-based explanations in patient subtyping, we aim to highlight the mutual benefit of combining machine learning techniques with clinical expertise.
Keywords: Hierarchical clustering; clinical evaluation; early warning score; explainability; mortality prediction; patient subtypes.
© 2023 by the Society for Experimental Biology and Medicine
Conflict of interest statement
Declaration of conflicting interestsThe author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Figures
References
-
- Baytas IM, Xiao C, Zhang X, Wang F, Jain AK, Zhou J. Patient subtyping via time-aware LSTM networks. In: Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining, Halifax, NS, Canada, 13–17 August 2017, pp.65–74. New York: ACM
-
- McWilliams CJ, Lawson DJ, Santos-Rodriguez R, Gilchrist ID, Champneys A, Gould TH, Thomas MJC, Bourdeaux CP. Towards a decision support tool for intensive care discharge: machine learning algorithm development using electronic healthcare data from MIMIC-III and Bristol, UK. BMJ Open 2019;9:e025925 - PMC - PubMed
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
