Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Apr 19;31(5):1051-1061.
doi: 10.1093/jamia/ocae028.

Towards global model generalizability: independent cross-site feature evaluation for patient-level risk prediction models using the OHDSI network

Affiliations

Towards global model generalizability: independent cross-site feature evaluation for patient-level risk prediction models using the OHDSI network

Behzad Naderalvojoud et al. J Am Med Inform Assoc. .

Abstract

Background: Predictive models show promise in healthcare, but their successful deployment is challenging due to limited generalizability. Current external validation often focuses on model performance with restricted feature use from the original training data, lacking insights into their suitability at external sites. Our study introduces an innovative methodology for evaluating features during both the development phase and the validation, focusing on creating and validating predictive models for post-surgery patient outcomes with improved generalizability.

Methods: Electronic health records (EHRs) from 4 countries (United States, United Kingdom, Finland, and Korea) were mapped to the OMOP Common Data Model (CDM), 2008-2019. Machine learning (ML) models were developed to predict post-surgery prolonged opioid use (POU) risks using data collected 6 months before surgery. Both local and cross-site feature selection methods were applied in the development and external validation datasets. Models were developed using Observational Health Data Sciences and Informatics (OHDSI) tools and validated on separate patient cohorts.

Results: Model development included 41 929 patients, 14.6% with POU. The external validation included 31 932 (UK), 23 100 (US), 7295 (Korea), and 3934 (Finland) patients with POU of 44.2%, 22.0%, 15.8%, and 21.8%, respectively. The top-performing model, Lasso logistic regression, achieved an area under the receiver operating characteristic curve (AUROC) of 0.75 during local validation and 0.69 (SD = 0.02) (averaged) in external validation. Models trained with cross-site feature selection significantly outperformed those using only features from the development site through external validation (P < .05).

Conclusions: Using EHRs across four countries mapped to the OMOP CDM, we developed generalizable predictive models for POU. Our approach demonstrates the significant impact of cross-site feature selection in improving model performance, underscoring the importance of incorporating diverse feature sets from various clinical settings to enhance the generalizability and utility of predictive healthcare models.

Keywords: feature selection; machine learning; model generalizability; opioid risk; prolonged opioid use; surgery.

PubMed Disclaimer

Conflict of interest statement

The authors have no competing interests to declare.

Figures

Figure 1.
Figure 1.
Cross-site feature selection results achieved from four CDM databases: (A) site-specific and cross-site features selected by the chi-square metric; (B) site-specific and cross-site features selected by the PNF metric; (C) cross-site features selected by the combination of chi-square and PNF metrics by excluding site-specific features.
Figure 2.
Figure 2.
Top 20 features associated with higher POU risk achieved from Lasso logistic regression trained with superset-union features.
Figure 3.
Figure 3.
Calibration plots and Brier scores of the 3 models trained with superset-union features on the 4 external validation databases over the target cohort and 3 risk subgroups.

References

    1. Eche T, Schwartz LH, Mokrane FZ, Dercle L.. Toward generalizability in the deployment of artificial intelligence in radiology: role of computation stress testing to overcome underspecification. Radiol Artif Intell. 2021;3(6):e210097. - PMC - PubMed
    1. Leming MJ, Bron EE, Bruffaerts R, et al.Challenges of implementing computer-aided diagnostic models for neuroimages in a clinical setting. NPJ Digit Med. 2023;6(1):129. - PMC - PubMed
    1. Pumplun L, Fecho M, Wahl N, Peters F, Buxmann P.. Adoption of machine learning systems for medical diagnostics in clinics: qualitative interview study. J Med Internet Res. 2021;23(10):e29301. - PMC - PubMed
    1. Wong A, Otles E, Donnelly JP, et al.External validation of a widely implemented proprietary sepsis prediction model in hospitalized patients. JAMA Intern Med. 2021;181(8):1065-1070. - PMC - PubMed
    1. Noseworthy PA, Attia ZI, Brewer LC, et al.Assessing and mitigating bias in medical artificial intelligence: the effects of race and ethnicity on a deep learning model for ECG analysis. Circ Arrhythm Electrophysiol. 2020;13(3):e007988. - PMC - PubMed

Publication types

LinkOut - more resources