Automated identification of patient subgroups: A case-study on mortality of COVID-19 patients admitted to the ICU
- PMID: 37356293
- PMCID: PMC10266884
- DOI: 10.1016/j.compbiomed.2023.107146
Automated identification of patient subgroups: A case-study on mortality of COVID-19 patients admitted to the ICU
Abstract
Background: - Subgroup discovery (SGD) is the automated splitting of the data into complex subgroups. Various SGD methods have been applied to the medical domain, but none have been extensively evaluated. We assess the numerical and clinical quality of SGD methods.
Method: - We applied the improved Subgroup Set Discovery (SSD++), Patient Rule Induction Method (PRIM) and APRIORI - Subgroup Discovery (APRIORI-SD) algorithms to obtain patient subgroups on observational data of 14,548 COVID-19 patients admitted to 73 Dutch intensive care units. Hospital mortality was the clinical outcome. Numerical significance of the subgroups was assessed with information-theoretic measures. Clinical significance of the subgroups was assessed by comparing variable importance on population and subgroup levels and by expert evaluation.
Results: - The tested algorithms varied widely in the total number of discovered subgroups (5-62), the number of selected variables, and the predictive value of the subgroups. Qualitative assessment showed that the found subgroups make clinical sense. SSD++ found most subgroups (n = 62), which added predictive value and generally showed high potential for clinical use. APRIORI-SD and PRIM found fewer subgroups (n = 5 and 6), which did not add predictive value and were clinically less relevant.
Conclusion: - Automated SGD methods find clinical subgroups that are relevant when assessed quantitatively (yield added predictive value) and qualitatively (intensivists consider the subgroups significant). Different methods yield different subgroups with varying degrees of predictive performance and clinical quality. External validation is needed to generalize the results to other populations and future research should explore which algorithm performs best in other settings.
Keywords: COVID-19; Data registry; In-hospital mortality; Intensive care; Machine learning; Subgroup discovery.
Copyright © 2023 The Author(s). Published by Elsevier Ltd.. All rights reserved.
Conflict of interest statement
Declaration of competing interest The authors declare that they have no conflict of interests.
Figures
References
-
- Higgins J.P.T., Thomas J., Chandler J., Cumpston M., Li T., Page M.J., Welch V.A., editors. Cochrane Handbook for Systematic Reviews of Interventions Version 6.3 (Updated February 2022). Cochrane. 2022. www.training.cochrane.org/handbook Available from. :
-
- Ventura S., Luna J.M. Supervised Descriptive Pattern Mining. Springer; Cham: 2018. Subgroup discovery. - DOI
-
- Helal S. Subgroup discovery algorithms: a survey and empirical evaluation. J. Comput. Sci. Technol. 2016;31:561–576. doi: 10.1007/s11390-016-1647-1. - DOI
-
- Proença H.M., Grünwald P., Bäck T., Leeuwen M v. In: Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2020. Hutter F., Kersting K., Lijffijt J., Valera I., editors. Vol. 12457. Springer; Cham: 2021. Discovering outstanding subgroup lists for numeric targets using MDL. (Lecture Notes in Computer Science). - DOI
-
- Esnault C., Gadonna M.-L., Queyrel M., Templier A., Zucker J.-D. Q-finder: an algorithm for credible subgroup discovery in clinical data analysis — an application to the international diabetes management practice study. Front. Artif. Intell. 2020;3 doi: 10.3389/frai.2020.559927. - DOI - PMC - PubMed
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Medical