. 2023 Aug 10;18(8):e0289774.

doi: 10.1371/journal.pone.0289774. eCollection 2023.

A machine learning-based phenotype for long COVID in children: An EHR-based study from the RECOVER program

Vitaly Lorman¹, Hanieh Razzaghi¹, Xing Song², Keith Morse³, Levon Utidjian¹, Andrea J Allen¹, Suchitra Rao⁴, Colin Rogerson⁵, Tellen D Bennett⁶, Hiroki Morizono⁷, Daniel Eckrich⁸, Ravi Jhaveri⁹, Yungui Huang¹⁰, Daksha Ranade¹¹, Nathan Pajor¹², Grace M Lee¹³, Christopher B Forrest¹, L Charles Bailey¹

Affiliations

¹ Applied Clinical Research Center, Children's Hospital of Philadelphia, Philadelphia, Pennsylvania, United States of America.
² Department of Health Management and Informatics, University of Missouri School of Medicine, Columbia, Missouri, United States of America.
³ Division of Pediatric Hospital Medicine, Department of Pediatrics, Stanford University School of Medicine, Stanford, California, United States of America.
⁴ Department of Pediatrics, University of Colorado School of Medicine and Children's Hospital of Colorado, Aurora, Colorado, United States of America.
⁵ Division of Critical Care, Department of Pediatrics, Indiana University School of Medicine, Indianapolis, Indiana, United States of America.
⁶ Departments of Biomedical Informatics and Pediatrics, University of Colorado School of Medicine and Children's Hospital Colorado, Aurora, Colorado, United States of America.
⁷ Center for Genetic Medicine Research, Children's National Hospital, Washington, DC, United States of America.
⁸ Biomedical Research Informatics Center, Nemours Children's Health, Wilmington, Delaware, United States of America.
⁹ Division of Infectious Diseases, Ann & Robert H. Lurie Children's Hospital of Chicago, Chicago, Illinois, United States of America.
¹⁰ IT Research and Innovation, The Research Institute at Nationwide Children's Hospital, Columbus, Ohio, United States of America.
¹¹ Research Informatics Department, Seattle Children's Hospital, Seattle, Washington, United States of America.
¹² Division of Pulmonary Medicine, Cincinnati Children's Hospital Medical Center and University of Cincinnati College of Medicine, Cincinnati, Ohio, United States of America.
¹³ Division of Infectious Diseases, Department of Pediatrics, Stanford University School of Medicine, Stanford, California, United States of America.

PMID: 37561683
PMCID: PMC10414557
DOI: 10.1371/journal.pone.0289774

A machine learning-based phenotype for long COVID in children: An EHR-based study from the RECOVER program

Vitaly Lorman et al. PLoS One. 2023.

. 2023 Aug 10;18(8):e0289774.

doi: 10.1371/journal.pone.0289774. eCollection 2023.

Authors

Affiliations

¹ Applied Clinical Research Center, Children's Hospital of Philadelphia, Philadelphia, Pennsylvania, United States of America.
² Department of Health Management and Informatics, University of Missouri School of Medicine, Columbia, Missouri, United States of America.
³ Division of Pediatric Hospital Medicine, Department of Pediatrics, Stanford University School of Medicine, Stanford, California, United States of America.
⁴ Department of Pediatrics, University of Colorado School of Medicine and Children's Hospital of Colorado, Aurora, Colorado, United States of America.
⁵ Division of Critical Care, Department of Pediatrics, Indiana University School of Medicine, Indianapolis, Indiana, United States of America.
⁶ Departments of Biomedical Informatics and Pediatrics, University of Colorado School of Medicine and Children's Hospital Colorado, Aurora, Colorado, United States of America.
⁷ Center for Genetic Medicine Research, Children's National Hospital, Washington, DC, United States of America.
⁸ Biomedical Research Informatics Center, Nemours Children's Health, Wilmington, Delaware, United States of America.
⁹ Division of Infectious Diseases, Ann & Robert H. Lurie Children's Hospital of Chicago, Chicago, Illinois, United States of America.
¹⁰ IT Research and Innovation, The Research Institute at Nationwide Children's Hospital, Columbus, Ohio, United States of America.
¹¹ Research Informatics Department, Seattle Children's Hospital, Seattle, Washington, United States of America.
¹² Division of Pulmonary Medicine, Cincinnati Children's Hospital Medical Center and University of Cincinnati College of Medicine, Cincinnati, Ohio, United States of America.
¹³ Division of Infectious Diseases, Department of Pediatrics, Stanford University School of Medicine, Stanford, California, United States of America.

PMID: 37561683
PMCID: PMC10414557
DOI: 10.1371/journal.pone.0289774

Abstract

As clinical understanding of pediatric Post-Acute Sequelae of SARS CoV-2 (PASC) develops, and hence the clinical definition evolves, it is desirable to have a method to reliably identify patients who are likely to have post-acute sequelae of SARS CoV-2 (PASC) in health systems data. In this study, we developed and validated a machine learning algorithm to classify which patients have PASC (distinguishing between Multisystem Inflammatory Syndrome in Children (MIS-C) and non-MIS-C variants) from a cohort of patients with positive SARS- CoV-2 test results in pediatric health systems within the PEDSnet EHR network. Patient features included in the model were selected from conditions, procedures, performance of diagnostic testing, and medications using a tree-based scan statistic approach. We used an XGboost model, with hyperparameters selected through cross-validated grid search, and model performance was assessed using 5-fold cross-validation. Model predictions and feature importance were evaluated using Shapley Additive exPlanation (SHAP) values. The model provides a tool for identifying patients with PASC and an approach to characterizing PASC using diagnosis, medication, laboratory, and procedure features in health systems data. Using appropriate threshold settings, the model can be used to identify PASC patients in health systems data at higher precision for inclusion in studies or at higher recall in screening for clinical trials, especially in settings where PASC diagnosis codes are used less frequently or less reliably. Analysis of how specific features contribute to the classification process may assist in gaining a better understanding of features that are associated with PASC diagnoses.

Copyright: © 2023 Lorman et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

PubMed Disclaimer

Conflict of interest statement

“"Dr. Rao reports prior grant support from GSK and Biofire and is a consultant for Sequiris. Dr. Jhaveri is a consultant for AstraZeneca, Seqirus, Dynavax, receives an editorial stipend from Elsevier and Pediatric Infectious Diseases Society and royalties from Up To Date/Wolters Kluwer. Dr. Lee serves on the PASC Advisory Board for United Health Group. Dr Bailey has received grants from Patient-Centered Outcomes Research Institute All other authors have nothing to disclose. This does not alter our adherence to PLOS ONE policies on sharing data and materials.”

Figures

**Fig 2. Receiver Operating Characteristic (ROC) and Precision-Recall (PR) curves describing model performance in identifying PASC with (MIS-C or non-MIS-C variants).**
For each of the three outcomes (PASC (any), non MIS-C PASC, and MIS-C) the Receiver Operating Characteristic (ROC) curves and Precision-Recall (PR) curves are estimated and plotted 5 times, once for each cross-validation fold.

**Fig 3. SHapley Additive exPlanation (SHAP) values for top administrative and clinical model features by class in predicting non-MIS-C PASC.**
The plots show the most significant features as determined by the sum of SHAP value magnitudes over all samples. For each feature, SHAP values for each patient are plotted, with color representing the feature value (e.g. red if feature was present and blue if absent in case of a binary variable). For the SHAP values pictured, the x axis is interpreted as change in log odds (in particular, SHAP values are not confined to be between –1 and 1).

See this image and copyright information in PMC

Update of

A machine learning-based phenotype for long COVID in children: an EHR-based study from the RECOVER program.
Lorman V, Razzaghi H, Song X, Morse K, Utidjian L, Allen AJ, Rao S, Rogerson C, Bennett TD, Morizono H, Eckrich D, Jhaveri R, Huang Y, Ranade D, Pajor N, Lee GM, Forrest CB, Bailey LC. Lorman V, et al. medRxiv [Preprint]. 2022 Dec 26:2022.12.22.22283791. doi: 10.1101/2022.12.22.22283791. medRxiv. 2022. Update in: PLoS One. 2023 Aug 10;18(8):e0289774. doi: 10.1371/journal.pone.0289774. PMID: 36597534 Free PMC article. Updated. Preprint.

References

1. Fainardi V, Meoli A, Chiopris G, Motta M, Skenderaj K, Grandinetti R, et al. Long COVID in Children and Adolescents. Life Basel Switz 2022;12:285. doi: 10.3390/life12020285 - DOI - PMC - PubMed
1. Thallapureddy K, Thallapureddy K, Zerda E, Suresh N, Kamat D, Rajasekaran K, et al. Long-Term Complications of COVID-19 Infection in Adolescents and Children. Curr Pediatr Rep 2022;10:11–7. doi: 10.1007/s40124-021-00260-x - DOI - PMC - PubMed
1. Rao S, Lee GM, Razzaghi H, Lorman V, Mejias A, Pajor NM, et al. Clinical features and burden of post-acute sequelae of SARS-CoV-2 infection in children and adolescents: an exploratory EHR-based cohort study from the RECOVER program. MedRxiv Prepr Serv Health Sci 2022:2022.05.24.22275544. doi: 10.1101/2022.05.24.22275544 - DOI - PMC - PubMed
1. Reese J, Blau H, Bergquist T, Loomba JJ, Callahan T, Laraway B, et al. Generalizable Long COVID Subtypes: Findings from the NIH N3C and RECOVER Programs. MedRxiv Prepr Serv Health Sci 2022:2022.05.24.22275398. doi: 10.1101/2022.05.24.22275398 - DOI - PMC - PubMed
1. Pfaff ER, Girvin AT, Bennett TD, Bhatia A, Brooks IM, Deer RR, et al. Identifying who has long COVID in the USA: a machine learning approach using N3C data. Lancet Digit Health 2022;4:e532–41. doi: 10.1016/S2589-7500(22)00048-6 - DOI - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Supplementary concepts

Actions

Grants and funding

OT2 HL161847/HL/NHLBI NIH HHS/United States

LinkOut - more resources

Full Text Sources
Medical
- MedlinePlus Health Information
Research Materials
- NCI CPTC Antibody Characterization Program
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

A machine learning-based phenotype for long COVID in children: An EHR-based study from the RECOVER program

Affiliations

A machine learning-based phenotype for long COVID in children: An EHR-based study from the RECOVER program

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Update of

References

Publication types

MeSH terms

Supplementary concepts

Grants and funding

LinkOut - more resources

Full Text Sources

Medical

Research Materials

Miscellaneous