Leveraging clinical data across healthcare institutions for continual learning of predictive risk models

doi:10.1038/s41598-022-12497-7

. 2022 May 19;12(1):8380.

doi: 10.1038/s41598-022-12497-7.

Leveraging clinical data across healthcare institutions for continual learning of predictive risk models

Fatemeh Amrollahi¹, Supreeth P Shashikumar¹, Andre L Holder², Shamim Nemati³

Affiliations

¹ Division of Biomedical Informatics, University of California San Diego, San Diego, USA.
² Division of Pulmonary, Critical Care, Allergy and Sleep Medicine, Emory University School of Medicine, Atlanta, USA.
³ Division of Biomedical Informatics, University of California San Diego, San Diego, USA. snemati@health.ucsd.edu.

PMID: 35590018
PMCID: PMC9117839
DOI: 10.1038/s41598-022-12497-7

Leveraging clinical data across healthcare institutions for continual learning of predictive risk models

Fatemeh Amrollahi et al. Sci Rep. 2022.

. 2022 May 19;12(1):8380.

doi: 10.1038/s41598-022-12497-7.

Authors

Fatemeh Amrollahi¹, Supreeth P Shashikumar¹, Andre L Holder², Shamim Nemati³

Affiliations

¹ Division of Biomedical Informatics, University of California San Diego, San Diego, USA.
² Division of Pulmonary, Critical Care, Allergy and Sleep Medicine, Emory University School of Medicine, Atlanta, USA.
³ Division of Biomedical Informatics, University of California San Diego, San Diego, USA. snemati@health.ucsd.edu.

PMID: 35590018
PMCID: PMC9117839
DOI: 10.1038/s41598-022-12497-7

Abstract

The inherent flexibility of machine learning-based clinical predictive models to learn from episodes of patient care at a new institution (site-specific training) comes at the cost of performance degradation when applied to external patient cohorts. To exploit the full potential of cross-institutional clinical big data, machine learning systems must gain the ability to transfer their knowledge across institutional boundaries and learn from new episodes of patient care without forgetting previously learned patterns. In this work, we developed a privacy-preserving learning algorithm named WUPERR (Weight Uncertainty Propagation and Episodic Representation Replay) and validated the algorithm in the context of early prediction of sepsis using data from over 104,000 patients across four distinct healthcare systems. We tested the hypothesis, that the proposed continual learning algorithm can maintain higher predictive performance than competing methods on previous cohorts once it has been trained on a new patient cohort. In the sepsis prediction task, after incremental training of a deep learning model across four hospital systems (namely hospitals H-A, H-B, H-C, and H-D), WUPERR maintained the highest positive predictive value across the first three hospitals compared to a baseline transfer learning approach (H-A: 39.27% vs. 31.27%, H-B: 25.34% vs. 22.34%, H-C: 30.33% vs. 28.33%). The proposed approach has the potential to construct more generalizable models that can learn from cross-institutional clinical big data in a privacy-preserving manner.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

**Figure 1**
Schematic diagram of the WUPERR algorithm. The training starts with a randomly initialized set of weights, which are trained on the first task (e.g., prediction on Hospital-A data). In all subsequent learning tasks the input layer weights ( $W_{1}^{A}$ ) are kept frozen. The optimal network parameters, the parameter uncertainties under task-A, and the set of representations from training cohort of Hospital-A ( ${h_{1}^{A}}$ ) are then transferred to Hospital-B. The deeper layers of the model are fine-tuned to perform the second task (e.g., prediction on Hospital-B data) through replaying the representation of Hospital-A and Hospital-B data. Similarly, the optimal parameters and their uncertainty levels along with the Hospital-A and Hospital-B representations are transferred to Hospital-C to fine-tune the model on performing the third task. Note, at no time protected health information (PHI+) leaves the institutional boundaries of a given hospital. Finally, at the time of evaluation (on testing data) at a given task, the model is evaluated on all the hospital cohorts.

**Figure 2**
Evaluation of continual learning models for early predicting of onset of Sepsis, measured using Area Under the Curve (AUC) metric. (a) Illustrates AUC of a model (median[IQR]) trained using transfer learning. The model performance is reported (using different markers; see legend) across all the cohorts after sequential training on data from a given hospital on the x-axis. (b) shows the AUC of the proposed WUPERR model, under the same experimental set-up as (a). At the time of evaluation (on testing data) at a given site, the model is evaluated on all the hospital cohorts. The solid line-style indicates that at the time of model evaluation (on testing data) at a given site, the model had already seen the training data from that site. For instance, since the model is first trained on Hospital-A data, the performance of the model on this dataset after continual learning on all subsequent hospitals is shown in solid line-style to signify that the model had already seen this patient cohort in the past. (c) summarizes the model performance (median[IQR]) on Hospitals A–C after continual learning on all four hospitals with Transfer learning (red) and WUPERR (blue).

**Figure 3**
Evaluation of continual learning models for early predicting of onset of Sepsis, measured using positive predictive value (PPV) and sensitivity. (Atlanta) Illustrates the PPV of a model (median[IQR]) trained using transfer learning (measured at fixed threshold of 0.41 corresponding to 80% sensitivity at Hospital-A after Task 1, for all folds and across all tasks). The model performance is reported (using different markers; see legend) across all the cohorts after sequential training on data from a given hospital on the x-axis. (Atlanta) shows the PPV of the proposed WUPERR model, under the same experimental set-up as (Atlanta). (Atlanta) summarizes the model performance (median[IQR]) on Hospitals A-C after continual learning on all four hospitals with Transfer learning (red) and WUPERR (blue). (d–f) summarize the model sensitivity results under the same experimental protocol.

See this image and copyright information in PMC

Cited by

Predicting Hospital Readmission among Patients with Sepsis Using Clinical and Wearable Data.
Amrollahi F, Shashikumar SP, Boussina A, Yhdego H, Nayebnazar A, Yung N, Wardi G, Nemati S. Amrollahi F, et al. Annu Int Conf IEEE Eng Med Biol Soc. 2023 Jul;2023:1-4. doi: 10.1109/EMBC40787.2023.10341165. Annu Int Conf IEEE Eng Med Biol Soc. 2023. PMID: 38083775 Free PMC article.
Learning across diverse biomedical data modalities and cohorts: Challenges and opportunities for innovation.
Rajendran S, Pan W, Sabuncu MR, Chen Y, Zhou J, Wang F. Rajendran S, et al. Patterns (N Y). 2024 Jan 17;5(2):100913. doi: 10.1016/j.patter.2023.100913. eCollection 2024 Feb 9. Patterns (N Y). 2024. PMID: 38370129 Free PMC article. Review.
Can we predict the future of respiratory failure prediction?
Pearce AK, Nemati S, Goligher EC, Hough CL, Holder AL, Wardi G, Yang P, Boussina A, Lyons PG, Sahetya S, Malhotra A, Rogers A. Pearce AK, et al. Crit Care. 2025 Jun 19;29(1):253. doi: 10.1186/s13054-025-05484-7. Crit Care. 2025. PMID: 40537867 Free PMC article. Review.
Continually-Adaptive Representation Learning Framework for Time-Sensitive Healthcare Applications.
Choudhuri A, Jang H, Segre AM, Polgreen PM, Jha K, Adhikari B. Choudhuri A, et al. Proc ACM Int Conf Inf Knowl Manag. 2023;2023:4538-4544. doi: 10.1145/3583780.3615464. Epub 2023 Oct 21. Proc ACM Int Conf Inf Knowl Manag. 2023. PMID: 40110564 Free PMC article.
Continual learning across population cohorts with distribution shift: insights from multi-cohort metabolic syndrome identification.
Liu C, Liu Z, Liu J, Cai C, Clifton DA, Wang H, Yang Y. Liu C, et al. J Am Med Inform Assoc. 2025 Aug 1;32(8):1310-1319. doi: 10.1093/jamia/ocaf070. J Am Med Inform Assoc. 2025. PMID: 40498469 Free PMC article.

See all "Cited by" articles

References

1. Yu K-H, Beam AL, Kohane IS. Artificial intelligence in healthcare. Nat. Biomed. Eng. 2018;2:719–731. doi: 10.1038/s41551-018-0305-z. - DOI - PubMed
1. Topol EJ. High-performance medicine: the convergence of human and artificial intelligence. Nat. Med. 2019;25:44–56. doi: 10.1038/s41591-018-0300-7. - DOI - PubMed
1. Lee CS, Lee AY. Clinical applications of continual learning machine learning. Lancet Digit. Health. 2020;2:e279–e281. doi: 10.1016/S2589-7500(20)30102-3. - DOI - PMC - PubMed
1. Tyler NS, et al. An artificial intelligence decision support system for the management of type 1 diabetes. Nat. Metab. 2020;2:612–619. doi: 10.1038/s42255-020-0212-y. - DOI - PMC - PubMed
1. Zhou Y, Wang F, Tang J, Nussinov R, Cheng F. Artificial intelligence in COVID-19 drug repurposing. Lancet Digit. Health. 2020 doi: 10.1016/S2589-7500(20)30192-8. - DOI - PMC - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Medical
- MedlinePlus Health Information
Research Materials
- NCI CPTC Antibody Characterization Program

[1] Yu K-H, Beam AL, Kohane IS. Artificial intelligence in healthcare. Nat. Biomed. Eng. 2018;2:719–731. doi: 10.1038/s41551-018-0305-z. - DOI - PubMed

[2] Yu K-H, Beam AL, Kohane IS. Artificial intelligence in healthcare. Nat. Biomed. Eng. 2018;2:719–731. doi: 10.1038/s41551-018-0305-z. - DOI - PubMed

[3] Topol EJ. High-performance medicine: the convergence of human and artificial intelligence. Nat. Med. 2019;25:44–56. doi: 10.1038/s41591-018-0300-7. - DOI - PubMed

[4] Topol EJ. High-performance medicine: the convergence of human and artificial intelligence. Nat. Med. 2019;25:44–56. doi: 10.1038/s41591-018-0300-7. - DOI - PubMed

[5] Lee CS, Lee AY. Clinical applications of continual learning machine learning. Lancet Digit. Health. 2020;2:e279–e281. doi: 10.1016/S2589-7500(20)30102-3. - DOI - PMC - PubMed

[6] Lee CS, Lee AY. Clinical applications of continual learning machine learning. Lancet Digit. Health. 2020;2:e279–e281. doi: 10.1016/S2589-7500(20)30102-3. - DOI - PMC - PubMed

[7] Tyler NS, et al. An artificial intelligence decision support system for the management of type 1 diabetes. Nat. Metab. 2020;2:612–619. doi: 10.1038/s42255-020-0212-y. - DOI - PMC - PubMed

[8] Tyler NS, et al. An artificial intelligence decision support system for the management of type 1 diabetes. Nat. Metab. 2020;2:612–619. doi: 10.1038/s42255-020-0212-y. - DOI - PMC - PubMed

[9] Zhou Y, Wang F, Tang J, Nussinov R, Cheng F. Artificial intelligence in COVID-19 drug repurposing. Lancet Digit. Health. 2020 doi: 10.1016/S2589-7500(20)30192-8. - DOI - PMC - PubMed

[10] Zhou Y, Wang F, Tang J, Nussinov R, Cheng F. Artificial intelligence in COVID-19 drug repurposing. Lancet Digit. Health. 2020 doi: 10.1016/S2589-7500(20)30192-8. - DOI - PMC - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Leveraging clinical data across healthcare institutions for continual learning of predictive risk models

Affiliations

Leveraging clinical data across healthcare institutions for continual learning of predictive risk models

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Medical

Research Materials