A machine learning-based phenotype for long COVID in children: An EHR-based study from the RECOVER program
- PMID: 37561683
- PMCID: PMC10414557
- DOI: 10.1371/journal.pone.0289774
A machine learning-based phenotype for long COVID in children: An EHR-based study from the RECOVER program
Abstract
As clinical understanding of pediatric Post-Acute Sequelae of SARS CoV-2 (PASC) develops, and hence the clinical definition evolves, it is desirable to have a method to reliably identify patients who are likely to have post-acute sequelae of SARS CoV-2 (PASC) in health systems data. In this study, we developed and validated a machine learning algorithm to classify which patients have PASC (distinguishing between Multisystem Inflammatory Syndrome in Children (MIS-C) and non-MIS-C variants) from a cohort of patients with positive SARS- CoV-2 test results in pediatric health systems within the PEDSnet EHR network. Patient features included in the model were selected from conditions, procedures, performance of diagnostic testing, and medications using a tree-based scan statistic approach. We used an XGboost model, with hyperparameters selected through cross-validated grid search, and model performance was assessed using 5-fold cross-validation. Model predictions and feature importance were evaluated using Shapley Additive exPlanation (SHAP) values. The model provides a tool for identifying patients with PASC and an approach to characterizing PASC using diagnosis, medication, laboratory, and procedure features in health systems data. Using appropriate threshold settings, the model can be used to identify PASC patients in health systems data at higher precision for inclusion in studies or at higher recall in screening for clinical trials, especially in settings where PASC diagnosis codes are used less frequently or less reliably. Analysis of how specific features contribute to the classification process may assist in gaining a better understanding of features that are associated with PASC diagnoses.
Copyright: © 2023 Lorman et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Conflict of interest statement
“"Dr. Rao reports prior grant support from GSK and Biofire and is a consultant for Sequiris. Dr. Jhaveri is a consultant for AstraZeneca, Seqirus, Dynavax, receives an editorial stipend from Elsevier and Pediatric Infectious Diseases Society and royalties from Up To Date/Wolters Kluwer. Dr. Lee serves on the PASC Advisory Board for United Health Group. Dr Bailey has received grants from Patient-Centered Outcomes Research Institute All other authors have nothing to disclose. This does not alter our adherence to PLOS ONE policies on sharing data and materials.”
Figures



Update of
-
A machine learning-based phenotype for long COVID in children: an EHR-based study from the RECOVER program.medRxiv [Preprint]. 2022 Dec 26:2022.12.22.22283791. doi: 10.1101/2022.12.22.22283791. medRxiv. 2022. Update in: PLoS One. 2023 Aug 10;18(8):e0289774. doi: 10.1371/journal.pone.0289774. PMID: 36597534 Free PMC article. Updated. Preprint.
References
-
- Rao S, Lee GM, Razzaghi H, Lorman V, Mejias A, Pajor NM, et al.. Clinical features and burden of post-acute sequelae of SARS-CoV-2 infection in children and adolescents: an exploratory EHR-based cohort study from the RECOVER program. MedRxiv Prepr Serv Health Sci 2022:2022.05.24.22275544. doi: 10.1101/2022.05.24.22275544 - DOI - PMC - PubMed
Publication types
MeSH terms
Supplementary concepts
Grants and funding
LinkOut - more resources
Full Text Sources
Medical
Research Materials
Miscellaneous