This is a preprint.

It has not yet been peer reviewed by a journal.

The National Library of Medicine is running a pilot to include preprints that result from research funded by NIH in PMC and PubMed.

[Preprint]. 2022 Dec 26:2022.12.22.22283791.

doi: 10.1101/2022.12.22.22283791.

A machine learning-based phenotype for long COVID in children: an EHR-based study from the RECOVER program

Vitaly Lorman¹, Hanieh Razzaghi¹, Xing Song², Keith Morse³, Levon Utidjian¹, Andrea J Allen¹, Suchitra Rao⁴, Colin Rogerson⁵, Tellen D Bennett⁶, Hiroki Morizono⁷, Daniel Eckrich⁸, Ravi Jhaveri⁹, Yungui Huang¹⁰, Daksha Ranade¹¹, Nathan Pajor¹², Grace M Lee¹³, Christopher B Forrest¹, L Charles Bailey¹

Affiliations

¹ Applied Clinical Research Center, Children's Hospital of Philadelphia, Philadelphia, PA, United States.
² Department of Health Management and Informatics, University of Missouri School of Medicine, Columbia, MO, United States.
³ Division of Pediatric Hospital Medicine, Department of Pediatrics, Stanford University School of Medicine, Stanford, California, United States.
⁴ Department of Pediatrics, University of Colorado School of Medicine and Children's Hospital of Colorado, Aurora, CO, United States.
⁵ Division of Critical Care, Department of Pediatrics, Indiana University School of Medicine, Indianapolis, IN, United States.
⁶ Departments of Biomedical Informatics and Pediatrics, University of Colorado School of Medicine and Children's Hospital Colorado, Aurora, CO, United States.
⁷ Center for Genetic Medicine Research, Children's National Hospital, Washington DC, United States.
⁸ Biomedical Research Informatics Center, Nemours Children's Health, Wilmington, DE, United States.
⁹ Division of Infectious Diseases, Ann & Robert H. Lurie Children's Hospital of Chicago, Chicago, IL, United States.
¹⁰ IT Research and Innovation, The Research Institute at Nationwide Children's Hospital, Columbus, OH, United States.
¹¹ Research Informatics Department, Seattle Children's Hospital, Seattle, WA, United States.
¹² Division of Pulmonary Medicine, Cincinnati Children's Hospital Medical Center and University of Cincinnati College of Medicine, Cincinnati, OH, United States.
¹³ Division of Infectious Diseases, Department of Pediatrics, Stanford University School of Medicine, Stanford, CA, United States.

PMID: 36597534
PMCID: PMC9810222
DOI: 10.1101/2022.12.22.22283791

A machine learning-based phenotype for long COVID in children: an EHR-based study from the RECOVER program

Vitaly Lorman et al. medRxiv. 2022.

[Preprint]. 2022 Dec 26:2022.12.22.22283791.

doi: 10.1101/2022.12.22.22283791.

Authors

Affiliations

¹ Applied Clinical Research Center, Children's Hospital of Philadelphia, Philadelphia, PA, United States.
² Department of Health Management and Informatics, University of Missouri School of Medicine, Columbia, MO, United States.
³ Division of Pediatric Hospital Medicine, Department of Pediatrics, Stanford University School of Medicine, Stanford, California, United States.
⁴ Department of Pediatrics, University of Colorado School of Medicine and Children's Hospital of Colorado, Aurora, CO, United States.
⁵ Division of Critical Care, Department of Pediatrics, Indiana University School of Medicine, Indianapolis, IN, United States.
⁶ Departments of Biomedical Informatics and Pediatrics, University of Colorado School of Medicine and Children's Hospital Colorado, Aurora, CO, United States.
⁷ Center for Genetic Medicine Research, Children's National Hospital, Washington DC, United States.
⁸ Biomedical Research Informatics Center, Nemours Children's Health, Wilmington, DE, United States.
⁹ Division of Infectious Diseases, Ann & Robert H. Lurie Children's Hospital of Chicago, Chicago, IL, United States.
¹⁰ IT Research and Innovation, The Research Institute at Nationwide Children's Hospital, Columbus, OH, United States.
¹¹ Research Informatics Department, Seattle Children's Hospital, Seattle, WA, United States.
¹² Division of Pulmonary Medicine, Cincinnati Children's Hospital Medical Center and University of Cincinnati College of Medicine, Cincinnati, OH, United States.
¹³ Division of Infectious Diseases, Department of Pediatrics, Stanford University School of Medicine, Stanford, CA, United States.

PMID: 36597534
PMCID: PMC9810222
DOI: 10.1101/2022.12.22.22283791

Update in

A machine learning-based phenotype for long COVID in children: An EHR-based study from the RECOVER program.
Lorman V, Razzaghi H, Song X, Morse K, Utidjian L, Allen AJ, Rao S, Rogerson C, Bennett TD, Morizono H, Eckrich D, Jhaveri R, Huang Y, Ranade D, Pajor N, Lee GM, Forrest CB, Bailey LC. Lorman V, et al. PLoS One. 2023 Aug 10;18(8):e0289774. doi: 10.1371/journal.pone.0289774. eCollection 2023. PLoS One. 2023. PMID: 37561683 Free PMC article.

Abstract

Background: As clinical understanding of pediatric Post-Acute Sequelae of SARS CoV-2 (PASC) develops, and hence the clinical definition evolves, it is desirable to have a method to reliably identify patients who are likely to have post-acute sequelae of SARS CoV-2 (PASC) in health systems data.

Methods and findings: In this study, we developed and validated a machine learning algorithm to classify which patients have PASC (distinguishing between Multisystem Inflammatory Syndrome in Children (MIS-C) and non-MIS-C variants) from a cohort of patients with positive SARS-CoV-2 test results in pediatric health systems within the PEDSnet EHR network. Patient features included in the model were selected from conditions, procedures, performance of diagnostic testing, and medications using a tree-based scan statistic approach. We used an XGboost model, with hyperparameters selected through cross-validated grid search, and model performance was assessed using 5-fold cross-validation. Model predictions and feature importance were evaluated using Shapley Additive exPlanation (SHAP) values.

Conclusions: The model provides a tool for identifying patients with PASC and an approach to characterizing PASC using diagnosis, medication, laboratory, and procedure features in health systems data. Using appropriate threshold settings, the model can be used to identify PASC patients in health systems data at higher precision for inclusion in studies or at higher recall in screening for clinical trials, especially in settings where PASC diagnosis codes are used less frequently or less reliably. Analysis of how specific features contribute to the classification process may assist in gaining a better understanding of features that are associated with PASC diagnoses.

Funding source: This research was funded by the National Institutes of Health (NIH) Agreement OT2HL161847-01 as part of the Researching COVID to Enhance Recovery (RECOVER) program of research.

Disclaimer: The content is solely the responsibility of the authors and does not necessarily represent the official views of the RECOVER Program, the NIH or other funders.

PubMed Disclaimer

Figures

**Figure 1:. Receiver Operating Characteristic (ROC) and Precision-Recall (PR) curves describing model performance in identifying PASC with (MIS-C or non-MIS-C variants)**
For each of the three outcomes (PASC (any), non MIS-C PASC, and MIS-C) the Receiver Operating Characteristic (ROC) curves and Precision-Recall (PR) curves are estimated and plotted 5 times, once for each cross-validation fold.

**Figure 2. SHapley Additive exPlanation (SHAP) values for top administrative and clinical model features by class in predicting non-MIS-C PASC**
The plots show the most significant features as determined by the sum of SHAP value magnitudes over all samples. For each feature, SHAP values for each patient are plotted, with color representing the feature value (e.g. red if feature was present and blue if absent in case of a binary variable). For the SHAP values pictured, the x axis is interpreted as change in log odds (in particular, SHAP values are not confined to be between −1 and 1).

See this image and copyright information in PMC

References

1. Fainardi V, Meoli A, Chiopris G, et al. Long COVID in Children and Adolescents. Life Basel Switz. 2022;12(2):285. doi: 10.3390/life12020285 - DOI - PMC - PubMed
1. Thallapureddy K, Thallapureddy K, Zerda E, et al. Long-Term Complications of COVID-19 Infection in Adolescents and Children. Curr Pediatr Rep. 2022;10(1):11–17. doi: 10.1007/s40124-021-00260-x - DOI - PMC - PubMed
1. Rao S, Lee GM, Razzaghi H, et al. Clinical features and burden of post-acute sequelae of SARS-CoV-2 infection in children and adolescents: an exploratory EHR-based cohort study from the RECOVER program. MedRxiv Prepr Serv Health Sci. Published online May 25, 2022:2022.05.24.22275544. doi: 10.1101/2022.05.24.22275544 - DOI
1. Reese J, Blau H, Bergquist T, et al. Generalizable Long COVID Subtypes: Findings from the NIH N3C and RECOVER Programs. MedRxiv Prepr Serv Health Sci. Published online May 25, 2022:2022.05.24.22275398. doi: 10.1101/2022.05.24.22275398 - DOI
1. Pfaff ER, Girvin AT, Bennett TD, et al. Identifying who has long COVID in the USA: a machine learning approach using N3C data. Lancet Digit Health. 2022;4(7):e532–e541. doi: 10.1016/S2589-7500(22)00048-6 - DOI - PMC - PubMed

Publication types

Actions

Grants and funding

OT2 HL161847/HL/NHLBI NIH HHS/United States

LinkOut - more resources

Full Text Sources
Research Materials
- NCI CPTC Antibody Characterization Program
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

This is a preprint.

A machine learning-based phenotype for long COVID in children: an EHR-based study from the RECOVER program

Affiliations

A machine learning-based phenotype for long COVID in children: an EHR-based study from the RECOVER program

Authors

Affiliations

Update in

Abstract

Figures

References

Publication types

Grants and funding

LinkOut - more resources

Full Text Sources

Research Materials

Miscellaneous