Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Jan 12;2(1):e0000179.
doi: 10.1371/journal.pdig.0000179. eCollection 2023 Jan.

External validity of machine learning-based prognostic scores for cystic fibrosis: A retrospective study using the UK and Canadian registries

Affiliations

External validity of machine learning-based prognostic scores for cystic fibrosis: A retrospective study using the UK and Canadian registries

Yuchao Qin et al. PLOS Digit Health. .

Abstract

Precise and timely referral for lung transplantation is critical for the survival of cystic fibrosis patients with terminal illness. While machine learning (ML) models have been shown to achieve significant improvement in prognostic accuracy over current referral guidelines, the external validity of these models and their resulting referral policies has not been fully investigated. Here, we studied the external validity of machine learning-based prognostic models using annual follow-up data from the UK and Canadian Cystic Fibrosis Registries. Using a state-of-the-art automated ML framework, we derived a model for predicting poor clinical outcomes in patients enrolled in the UK registry, and conducted external validation of the derived model using the Canadian Cystic Fibrosis Registry. In particular, we studied the effect of (1) natural variations in patient characteristics across populations and (2) differences in clinical practice on the external validity of ML-based prognostic scores. Overall, decrease in prognostic accuracy on the external validation set (AUCROC: 0.88, 95% CI 0.88-0.88) was observed compared to the internal validation accuracy (AUCROC: 0.91, 95% CI 0.90-0.92). Based on our ML model, analysis on feature contributions and risk strata revealed that, while external validation of ML models exhibited high precision on average, both factors (1) and (2) can undermine the external validity of ML models in patient subgroups with moderate risk for poor outcomes. A significant boost in prognostic power (F1 score) from 0.33 (95% CI 0.31-0.35) to 0.45 (95% CI 0.45-0.45) was observed in external validation when variations in these subgroups were accounted in our model. Our study highlighted the significance of external validation of ML models for cystic fibrosis prognostication. The uncovered insights on key risk factors and patient subgroups can be used to guide the cross-population adaptation of ML-based models and inspire new research on applying transfer learning methods for fine-tuning ML models to cope with regional variations in clinical care.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. An overview of the AutoPrognosis framework.
AutoPrognosis is a highly extensible AutoML framework built upon a plugin system. Based on the configured plugins for data imputation, preprocessing and classification, AutoPrognosis constructs an ML pipeline ensemble from the most performant pipelines developed with base classification plugins. (a) An example ensemble composed of three ML pipelines was illustrated to demonstrate the AutoML workflow of AutoPrognosis. All pipelines include four major procedures: imputation, preprocessing, classification, and calibration. In pipeline 1, the multivariate imputation by chained equations (MICE) plugin is applied for missing data imputation. The imputed data are then passed to fast ICA to create a compact, low-dimension data representation. The random forest classifier is used for the prediction task and its outputs are calibrated with a sigmoid function. Pipeline 2 and 3 are constructed in the same manner for the end-to-end prediction. AutoPrognosis first searches for the most performant ML pipelines among all possible combination of configured plugins. The selected pipelines are then combined as an ensemble model to achieve the best prediction performance. Two types of ensemble structure, i.e., stacked and weighted ensembles, are considered in AutoPrognosis, and Bayesian optimization is used to tune ensemble parameters for each structure. The optimal ensemble is selected based on the configured performance metric. Various explainer plugins of AutoPrognosis can be enabled for the ensemble to provide explanations along with the classification outputs. Detailed description of the algorithm can be found in [–12]. (b) In this study, the UK CF dataset was provided as input to AutoPrognosis to search the optimal ML model for the composite endpoint prognostic task. The constructed ML model was a weighted ensemble of three ML pipelines. As illustrated in the calibration curves, the random forest pipelines tended to underestimate (above the dashed line) the risk level of high-risk patients. While the logistic regression pipeline was able to identify high-risk CF patients, its prognostic output was significantly higher than observed risks and would lead to many false alarms. AutoPrognosis was able to take advantage of all ML pipelines and create an optimal ensemble with the best prognostic accuracy.
Fig 2
Fig 2. Diagnostic accuracy of individual variables in the UK and Canadian CF cohorts.
The ML model constructed by AutoPrognosis was trained with one single feature variable as input iteratively on the UK cohort. The AUCROC score was used as the proxy of diagnostic accuracy and was measured with ten-fold cross-validation. The Canadian CF cohort was used as the external validation set in each fold. Feature variables were colored based on their category, and their locations were determined by the average AUCROC score achieved by their associated models on the two populations. Feature variables with AUCROC score above 0.6 [25] were considered to be predictive of high-risk patients and were annotated with their variable names.
Fig 3
Fig 3. Mismatches in risk stratification between the UK and Canadian CF cohorts.
Two prognostic models were constructed by AutoPrognosis separately on the UK and Canadian CF populations. Canadian CF patients with future outcomes of survival, death and LTx were annotated with circles, crosses, and triangles, respectively. Their locations were determined by the output of the two AutoPrognosis models. Very high-risk patients with FEV1 below 30% predicted and supplement oxygenation were highlighted with cyan circles. The two AutoPrognosis models were in agreement of risk stratum for most of these very high-risk patients. Mismatches happened in two subgroups of patients with underestimated risk levels. The first subgroup consisted of moderate-risk patients that were identified as low-risk by the AutoPrognosis model developed on the UK population and was referred to as mismatch 1. The latter subgroup consisted of high-risk patients that fell into the moderate-risk stratum in the UK population. We referred to the corresponding area as mismatch 2.

References

    1. Kapnadak SG, Dimango E, Hadjiliadis D, Hempstead SE, Tallarico E, Pilewski JM, et al.. Cystic Fibrosis Foundation consensus guidelines for the care of individuals with advanced cystic fibrosis lung disease. Journal of Cystic Fibrosis. 2020;19(3):344–354. doi: 10.1016/j.jcf.2020.02.015 - DOI - PubMed
    1. Saunders T, Burgner D, Ranganathan S. Identifying and preventing cardiovascular disease in patients with cystic fibrosis. Nature Cardiovascular Research. 2022;1(3):187–188. doi: 10.1038/s44161-022-00030-y - DOI - PubMed
    1. Yeung JC, Machuca TN, Chaparro C, Cypel M, Stephenson AL, Solomon M, et al.. Lung transplantation for cystic fibrosis. The Journal of Heart and Lung Transplantation. 2020;39(6):553–560. doi: 10.1016/j.healun.2020.02.010 - DOI - PubMed
    1. Kerem E, Reisman J, Corey M, Canny GJ, Levison H. Prediction of mortality in patients with cystic fibrosis. New England Journal of Medicine. 1992;326(18):1187–1191. doi: 10.1056/NEJM199204303261804 - DOI - PubMed
    1. Yoon J, Davtyan C, van der Schaar M. Discovery and clinical decision support for personalized healthcare. IEEE Journal of Biomedical and Health Informatics. 2016;21(4):1133–1145. doi: 10.1109/JBHI.2016.2574857 - DOI - PubMed

LinkOut - more resources