Application of convex hull analysis for the evaluation of data heterogeneity between patient populations of different origin and implications of hospital bias in downstream machine-learning-based data processing: A comparison of 4 critical-care patient datasets

Konstantin Sharafutdinov^{1

2

3}, Jayesh S Bhat^{1

2}, Sebastian Johannes Fritsch^{3

4

5}, Kateryna Nikulina^{1

2

3}, Moein E Samadi^{1

2}, Richard Polzin^{1

2

3}, Hannah Mayer^{3

6}, Gernot Marx^{3

4}, Johannes Bickenbach^{3

4}, Andreas Schuppert^{1

2

3}

Affiliations

¹ Institute for Computational Biomedicine, RWTH Aachen University, Aachen, Germany.
² Joint Research Center for Computational Biomedicine, RWTH Aachen University, Aachen, Germany.
³ SMITH Consortium of the German Medical Informatics Initiative, Leipzig, Germany.
⁴ Department of Intensive Care Medicine, University Hospital RWTH Aachen, Aachen, Germany.
⁵ Juelich Supercomputing Centre, Forschungszentrum Juelich, Juelich, Germany.
⁶ Systems Pharmacology and Medicine, Bayer AG, Leverkusen, Germany.

PMID: 36387013
PMCID: PMC9659720
DOI: 10.3389/fdata.2022.603429

Application of convex hull analysis for the evaluation of data heterogeneity between patient populations of different origin and implications of hospital bias in downstream machine-learning-based data processing: A comparison of 4 critical-care patient datasets

Konstantin Sharafutdinov et al. Front Big Data. 2022.

. 2022 Oct 31:5:603429.

doi: 10.3389/fdata.2022.603429. eCollection 2022.

Authors

Affiliations

¹ Institute for Computational Biomedicine, RWTH Aachen University, Aachen, Germany.
² Joint Research Center for Computational Biomedicine, RWTH Aachen University, Aachen, Germany.
³ SMITH Consortium of the German Medical Informatics Initiative, Leipzig, Germany.
⁴ Department of Intensive Care Medicine, University Hospital RWTH Aachen, Aachen, Germany.
⁵ Juelich Supercomputing Centre, Forschungszentrum Juelich, Juelich, Germany.
⁶ Systems Pharmacology and Medicine, Bayer AG, Leverkusen, Germany.

PMID: 36387013
PMCID: PMC9659720
DOI: 10.3389/fdata.2022.603429

Abstract

Machine learning (ML) models are developed on a learning dataset covering only a small part of the data of interest. If model predictions are accurate for the learning dataset but fail for unseen data then generalization error is considered high. This problem manifests itself within all major sub-fields of ML but is especially relevant in medical applications. Clinical data structures, patient cohorts, and clinical protocols may be highly biased among hospitals such that sampling of representative learning datasets to learn ML models remains a challenge. As ML models exhibit poor predictive performance over data ranges sparsely or not covered by the learning dataset, in this study, we propose a novel method to assess their generalization capability among different hospitals based on the convex hull (CH) overlap between multivariate datasets. To reduce dimensionality effects, we used a two-step approach. First, CH analysis was applied to find mean CH coverage between each of the two datasets, resulting in an upper bound of the prediction range. Second, 4 types of ML models were trained to classify the origin of a dataset (i.e., from which hospital) and to estimate differences in datasets with respect to underlying distributions. To demonstrate the applicability of our method, we used 4 critical-care patient datasets from different hospitals in Germany and USA. We estimated the similarity of these populations and investigated whether ML models developed on one dataset can be reliably applied to another one. We show that the strongest drop in performance was associated with the poor intersection of convex hulls in the corresponding hospitals' datasets and with a high performance of ML methods for dataset discrimination. Hence, we suggest the application of our pipeline as a first tool to assess the transferability of trained models. We emphasize that datasets from different hospitals represent heterogeneous data sources, and the transfer from one database to another should be performed with utmost care to avoid implications during real-world applications of the developed models. Further research is needed to develop methods for the adaptation of ML models to new hospitals. In addition, more work should be aimed at the creation of gold-standard datasets that are large and diverse with data from varied application sites.

Keywords: ARDS; convex hull (CH); data pooling; dataset-bias; generalization error.

PubMed Disclaimer

Conflict of interest statement

HM is an employee of Bayer AG, Germany. HM has stock ownership with Bayer AG, Germany. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

**Figure 1**
Example of CH intersection for the pair of hospitals (Hosp A, Hosp B) and the pair of features: SaO₂ and bicarbonate. Some data points are filtered out by the DBSCAN method prior to the construction of the CH.

**Figure 2**
CH analysis results for data from four hospitals. Mean CH coverage over all features is shown. Rows-initial population, columns-population, whose CH is covered by the CH of the initial population.

**Figure 3**
Random forest classifier classification results (cross-prediction matrix) for ARDS on the first day in ICU. RF trained in each of the four hospitals (row name) and applied in each of the four hospitals (column name). Diagonal cells represent the performance of specialized models which were trained and tested in the same hospital. Non-diagonal cells represent the performance of such models once they are applied in other hospitals and reflect ability of a model to generalize to the unseen population of another hospital. Twenty-one features common for all four hospitals were used to build corresponding RF models. Performance is depicted in terms of ROC AUC.

See this image and copyright information in PMC

Cited by

Computational Simulation of Virtual Patients Reduces Dataset Bias and Improves Machine Learning-Based Detection of ARDS from Noisy Heterogeneous ICU Datasets.
Sharafutdinov K, Fritsch SJ, Iravani M, Ghalati PF, Saffaran S, Bates DG, Hardman JG, Polzin R, Mayer H, Marx G, Bickenbach J, Schuppert A. Sharafutdinov K, et al. IEEE Open J Eng Med Biol. 2023 Feb 8;5:611-620. doi: 10.1109/OJEMB.2023.3243190. eCollection 2024. IEEE Open J Eng Med Biol. 2023. PMID: 39184970 Free PMC article.
A hybrid modeling framework for generalizable and interpretable predictions of ICU mortality across multiple hospitals.
Samadi ME, Guzman-Maldonado J, Nikulina K, Mirzaieazar H, Sharafutdinov K, Fritsch SJ, Schuppert A. Samadi ME, et al. Sci Rep. 2024 Mar 8;14(1):5725. doi: 10.1038/s41598-024-55577-6. Sci Rep. 2024. PMID: 38459085 Free PMC article.
Developing an Artificial Intelligence-Based Representation of a Virtual Patient Model for Real-Time Diagnosis of Acute Respiratory Distress Syndrome.
Barakat CS, Sharafutdinov K, Busch J, Saffaran S, Bates DG, Hardman JG, Schuppert A, Brynjólfsson S, Fritsch S, Riedel M. Barakat CS, et al. Diagnostics (Basel). 2023 Jun 17;13(12):2098. doi: 10.3390/diagnostics13122098. Diagnostics (Basel). 2023. PMID: 37370993 Free PMC article.
Analysis of Chest X-ray for COVID-19 Diagnosis as a Use Case for an HPC-Enabled Data Analysis and Machine Learning Platform for Medical Diagnosis Support.
Barakat C, Aach M, Schuppert A, Brynjólfsson S, Fritsch S, Riedel M. Barakat C, et al. Diagnostics (Basel). 2023 Jan 20;13(3):391. doi: 10.3390/diagnostics13030391. Diagnostics (Basel). 2023. PMID: 36766496 Free PMC article.

References

1. AlBadawy E. A., Saha A., Mazurowski M. A. (2018). Deep learning for segmentation of brain tumors: Impact of cross-institutional training and testing. Med. Phys. 45, 1150–1158. 10.1002/mp.12752 - DOI - PubMed
1. Arcadu F., Benmansour F., Maunz A., Willis J., Haskova Z., Prunotto M. (2019). Deep learning algorithm predicts diabetic retinopathy progression in individual patients. NPJ Digit. Med. 2, 92. 10.1038/s41746-019-0172-3 - DOI - PMC - PubMed
1. ARDS Definition Task Force. Ranieri V. M., Rubenfeld G. D., Thompson B. T., Ferguson N. D., Caldwell E., et al. . (2012). Acute respiratory distress syndrome: the Berlin Definition. JAMA 307, 2526–2533. 10.1001/jama.2012.5669 - DOI - PubMed
1. Balestriero R., Pesenti J., LeCun Y. (2021). Learning in high dimension always amounts to extrapolation. arXiv preprint arXiv:2110.09485. 10.48550/arXiv.2110.09485 - DOI
1. Barish M., Bolourani S., Lau L. F., Shah S., Zanos T. P. (2021). External validation demonstrates limited clinical utility of the interpretable mortality prediction model for patients with COVID-19. Nat. Mach. Intell. 3, 25–27. 10.1038/s42256-020-00254-2 - DOI

LinkOut - more resources

Full Text Sources
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Application of convex hull analysis for the evaluation of data heterogeneity between patient populations of different origin and implications of hospital bias in downstream machine-learning-based data processing: A comparison of 4 critical-care patient datasets

Affiliations

Application of convex hull analysis for the evaluation of data heterogeneity between patient populations of different origin and implications of hospital bias in downstream machine-learning-based data processing: A comparison of 4 critical-care patient datasets

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

LinkOut - more resources

Full Text Sources

Research Materials