A Unified Framework on Generalizability of Clinical Prediction Models

Bohua Wan¹, Brian Caffo^{2

3}, S Swaroop Vedula³

Affiliations

¹ Department of Computer Science, Whiting School of Engineering, Johns Hopkins University, Baltimore, MD, United States.
² Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD, United States.
³ Malone Center for Engineering in Healthcare, Whiting School of Engineering, Baltimore, MD, United States.

PMID: 35573904
PMCID: PMC9100692
DOI: 10.3389/frai.2022.872720

Review

A Unified Framework on Generalizability of Clinical Prediction Models

Bohua Wan et al. Front Artif Intell. 2022.

. 2022 Apr 29:5:872720.

doi: 10.3389/frai.2022.872720. eCollection 2022.

Authors

Bohua Wan¹, Brian Caffo^{2

3}, S Swaroop Vedula³

Affiliations

¹ Department of Computer Science, Whiting School of Engineering, Johns Hopkins University, Baltimore, MD, United States.
² Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD, United States.
³ Malone Center for Engineering in Healthcare, Whiting School of Engineering, Baltimore, MD, United States.

PMID: 35573904
PMCID: PMC9100692
DOI: 10.3389/frai.2022.872720

Abstract

To be useful, clinical prediction models (CPMs) must be generalizable to patients in new settings. Evaluating generalizability of CPMs helps identify spurious relationships in data, provides insights on when they fail, and thus, improves the explainability of the CPMs. There are discontinuities in concepts related to generalizability of CPMs in the clinical research and machine learning domains. Specifically, conventional statistical reasons to explain poor generalizability such as inadequate model development for the purposes of generalizability, differences in coding of predictors and outcome between development and external datasets, measurement error, inability to measure some predictors, and missing data, all have differing and often complementary treatments, in the two domains. Much of the current machine learning literature on generalizability of CPMs is in terms of dataset shift of which several types have been described. However, little research exists to synthesize concepts in the two domains. Bridging this conceptual discontinuity in the context of CPMs can facilitate systematic development of CPMs and evaluation of their sensitivity to factors that affect generalizability. We survey generalizability and dataset shift in CPMs from both the clinical research and machine learning perspectives, and describe a unifying framework to analyze generalizability of CPMs and to explain their sensitivity to factors affecting it. Our framework leads to a set of signaling statements that can be used to characterize differences between datasets in terms of factors that affect generalizability of the CPMs.

Keywords: clinical prediction models; dataset shift; diagnosis; explainability; external validity; generalizability; prognosis.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

**Figure 1**
Selection diagrams for dataset shifts. The solid circles denote observable variables. The hollow circles represent unobservable variables. The rectangles denote the selection variables.

**Figure 2**
A framework to unify concepts related to generalizability of clinical prediction models. *This criterion is satisfied when there are no missing data in the development and external datasets. Furthermore, when there are missing data, there is no difference in assumptions about the missingness between the datasets (e.g., missing completely at random in both datasets), or there is no difference between the process that introduced missingness in each dataset.

**Figure 3**
Simulation to illustrate model performance in external datasets with no dataset shifts. 1. The expectation of the estimate of algorithm performance in a test dataset is the mean of a distribution of estimates obtained by evaluating the algorithm on multiple test datasets. In other words, a difference in the magnitude of the error in the test and development datasets does not necessarily indicate poor or better algorithm performance. 95% confidence intervals of the estimate in a test dataset, which indicate the width of true distribution of estimates, are necessary. 2. Test datasets of sufficient sample size are necessary to minimize bias in the estimate of algorithm performance, depending on model complexity.

See this image and copyright information in PMC

References

1. Adebayo J., Gilmer J., Muelly M., Goodfellow I., Hardt M., Kim B. (2018). “Sanity checks for saliency maps,” in Proceedings of the 32nd International Conference on Neural Information Processing Systems, NIPS'18 (Red Hook, NY: Curran Associates Inc.), 9525–9536.
1. Altman D. G., Bland J. M. (1998). Generalisation and extrapolation. BMJ 317, 409–410. 10.1136/bmj.317.7155.409 - DOI - PMC - PubMed
1. Altman D. G., Vergouwe Y., Royston P., Moons K. G. M. (2009). Prognosis and prognostic research: validating a prognostic model. BMJ 338. 10.1136/bmj.b605 - DOI - PubMed
1. Caffo B., Diener-West M., Punjabi N. M., Samet J. (2010). A novel approach to prediction of mild obstructive sleep disordered breathing in a population-based sample: the sleep heart health study. Sleep 33, 1641–1648. 10.1093/sleep/33.12.1641 - DOI - PMC - PubMed
1. Copas J. B. (1983). Plotting p against x. J. R. Stat. Soc. C 32, 25–31. 10.2307/2348040 - DOI

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

A Unified Framework on Generalizability of Clinical Prediction Models

Affiliations

A Unified Framework on Generalizability of Clinical Prediction Models

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

Grants and funding

LinkOut - more resources

Full Text Sources