Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Aug 10;18(8):e0289774.
doi: 10.1371/journal.pone.0289774. eCollection 2023.

A machine learning-based phenotype for long COVID in children: An EHR-based study from the RECOVER program

Affiliations

A machine learning-based phenotype for long COVID in children: An EHR-based study from the RECOVER program

Vitaly Lorman et al. PLoS One. .

Abstract

As clinical understanding of pediatric Post-Acute Sequelae of SARS CoV-2 (PASC) develops, and hence the clinical definition evolves, it is desirable to have a method to reliably identify patients who are likely to have post-acute sequelae of SARS CoV-2 (PASC) in health systems data. In this study, we developed and validated a machine learning algorithm to classify which patients have PASC (distinguishing between Multisystem Inflammatory Syndrome in Children (MIS-C) and non-MIS-C variants) from a cohort of patients with positive SARS- CoV-2 test results in pediatric health systems within the PEDSnet EHR network. Patient features included in the model were selected from conditions, procedures, performance of diagnostic testing, and medications using a tree-based scan statistic approach. We used an XGboost model, with hyperparameters selected through cross-validated grid search, and model performance was assessed using 5-fold cross-validation. Model predictions and feature importance were evaluated using Shapley Additive exPlanation (SHAP) values. The model provides a tool for identifying patients with PASC and an approach to characterizing PASC using diagnosis, medication, laboratory, and procedure features in health systems data. Using appropriate threshold settings, the model can be used to identify PASC patients in health systems data at higher precision for inclusion in studies or at higher recall in screening for clinical trials, especially in settings where PASC diagnosis codes are used less frequently or less reliably. Analysis of how specific features contribute to the classification process may assist in gaining a better understanding of features that are associated with PASC diagnoses.

PubMed Disclaimer

Conflict of interest statement

“"Dr. Rao reports prior grant support from GSK and Biofire and is a consultant for Sequiris. Dr. Jhaveri is a consultant for AstraZeneca, Seqirus, Dynavax, receives an editorial stipend from Elsevier and Pediatric Infectious Diseases Society and royalties from Up To Date/Wolters Kluwer. Dr. Lee serves on the PASC Advisory Board for United Health Group. Dr Bailey has received grants from Patient-Centered Outcomes Research Institute All other authors have nothing to disclose. This does not alter our adherence to PLOS ONE policies on sharing data and materials.”

Figures

Fig 1
Fig 1. Cohort attrition diagram.
Fig 2
Fig 2. Receiver Operating Characteristic (ROC) and Precision-Recall (PR) curves describing model performance in identifying PASC with (MIS-C or non-MIS-C variants).
For each of the three outcomes (PASC (any), non MIS-C PASC, and MIS-C) the Receiver Operating Characteristic (ROC) curves and Precision-Recall (PR) curves are estimated and plotted 5 times, once for each cross-validation fold.
Fig 3
Fig 3. SHapley Additive exPlanation (SHAP) values for top administrative and clinical model features by class in predicting non-MIS-C PASC.
The plots show the most significant features as determined by the sum of SHAP value magnitudes over all samples. For each feature, SHAP values for each patient are plotted, with color representing the feature value (e.g. red if feature was present and blue if absent in case of a binary variable). For the SHAP values pictured, the x axis is interpreted as change in log odds (in particular, SHAP values are not confined to be between –1 and 1).

Update of

References

    1. Fainardi V, Meoli A, Chiopris G, Motta M, Skenderaj K, Grandinetti R, et al.. Long COVID in Children and Adolescents. Life Basel Switz 2022;12:285. doi: 10.3390/life12020285 - DOI - PMC - PubMed
    1. Thallapureddy K, Thallapureddy K, Zerda E, Suresh N, Kamat D, Rajasekaran K, et al.. Long-Term Complications of COVID-19 Infection in Adolescents and Children. Curr Pediatr Rep 2022;10:11–7. doi: 10.1007/s40124-021-00260-x - DOI - PMC - PubMed
    1. Rao S, Lee GM, Razzaghi H, Lorman V, Mejias A, Pajor NM, et al.. Clinical features and burden of post-acute sequelae of SARS-CoV-2 infection in children and adolescents: an exploratory EHR-based cohort study from the RECOVER program. MedRxiv Prepr Serv Health Sci 2022:2022.05.24.22275544. doi: 10.1101/2022.05.24.22275544 - DOI - PMC - PubMed
    1. Reese J, Blau H, Bergquist T, Loomba JJ, Callahan T, Laraway B, et al.. Generalizable Long COVID Subtypes: Findings from the NIH N3C and RECOVER Programs. MedRxiv Prepr Serv Health Sci 2022:2022.05.24.22275398. doi: 10.1101/2022.05.24.22275398 - DOI - PMC - PubMed
    1. Pfaff ER, Girvin AT, Bennett TD, Bhatia A, Brooks IM, Deer RR, et al.. Identifying who has long COVID in the USA: a machine learning approach using N3C data. Lancet Digit Health 2022;4:e532–41. doi: 10.1016/S2589-7500(22)00048-6 - DOI - PMC - PubMed

Publication types

Supplementary concepts