Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2022 Dec 26:2022.12.22.22283791.
doi: 10.1101/2022.12.22.22283791.

A machine learning-based phenotype for long COVID in children: an EHR-based study from the RECOVER program

Affiliations

A machine learning-based phenotype for long COVID in children: an EHR-based study from the RECOVER program

Vitaly Lorman et al. medRxiv. .

Update in

Abstract

Background: As clinical understanding of pediatric Post-Acute Sequelae of SARS CoV-2 (PASC) develops, and hence the clinical definition evolves, it is desirable to have a method to reliably identify patients who are likely to have post-acute sequelae of SARS CoV-2 (PASC) in health systems data.

Methods and findings: In this study, we developed and validated a machine learning algorithm to classify which patients have PASC (distinguishing between Multisystem Inflammatory Syndrome in Children (MIS-C) and non-MIS-C variants) from a cohort of patients with positive SARS-CoV-2 test results in pediatric health systems within the PEDSnet EHR network. Patient features included in the model were selected from conditions, procedures, performance of diagnostic testing, and medications using a tree-based scan statistic approach. We used an XGboost model, with hyperparameters selected through cross-validated grid search, and model performance was assessed using 5-fold cross-validation. Model predictions and feature importance were evaluated using Shapley Additive exPlanation (SHAP) values.

Conclusions: The model provides a tool for identifying patients with PASC and an approach to characterizing PASC using diagnosis, medication, laboratory, and procedure features in health systems data. Using appropriate threshold settings, the model can be used to identify PASC patients in health systems data at higher precision for inclusion in studies or at higher recall in screening for clinical trials, especially in settings where PASC diagnosis codes are used less frequently or less reliably. Analysis of how specific features contribute to the classification process may assist in gaining a better understanding of features that are associated with PASC diagnoses.

Funding source: This research was funded by the National Institutes of Health (NIH) Agreement OT2HL161847-01 as part of the Researching COVID to Enhance Recovery (RECOVER) program of research.

Disclaimer: The content is solely the responsibility of the authors and does not necessarily represent the official views of the RECOVER Program, the NIH or other funders.

PubMed Disclaimer

Figures

Figure 1:
Figure 1:. Receiver Operating Characteristic (ROC) and Precision-Recall (PR) curves describing model performance in identifying PASC with (MIS-C or non-MIS-C variants)
For each of the three outcomes (PASC (any), non MIS-C PASC, and MIS-C) the Receiver Operating Characteristic (ROC) curves and Precision-Recall (PR) curves are estimated and plotted 5 times, once for each cross-validation fold.
Figure 2
Figure 2. SHapley Additive exPlanation (SHAP) values for top administrative and clinical model features by class in predicting non-MIS-C PASC
The plots show the most significant features as determined by the sum of SHAP value magnitudes over all samples. For each feature, SHAP values for each patient are plotted, with color representing the feature value (e.g. red if feature was present and blue if absent in case of a binary variable). For the SHAP values pictured, the x axis is interpreted as change in log odds (in particular, SHAP values are not confined to be between −1 and 1).

References

    1. Fainardi V, Meoli A, Chiopris G, et al. Long COVID in Children and Adolescents. Life Basel Switz. 2022;12(2):285. doi: 10.3390/life12020285 - DOI - PMC - PubMed
    1. Thallapureddy K, Thallapureddy K, Zerda E, et al. Long-Term Complications of COVID-19 Infection in Adolescents and Children. Curr Pediatr Rep. 2022;10(1):11–17. doi: 10.1007/s40124-021-00260-x - DOI - PMC - PubMed
    1. Rao S, Lee GM, Razzaghi H, et al. Clinical features and burden of post-acute sequelae of SARS-CoV-2 infection in children and adolescents: an exploratory EHR-based cohort study from the RECOVER program. MedRxiv Prepr Serv Health Sci. Published online May 25, 2022:2022.05.24.22275544. doi: 10.1101/2022.05.24.22275544 - DOI
    1. Reese J, Blau H, Bergquist T, et al. Generalizable Long COVID Subtypes: Findings from the NIH N3C and RECOVER Programs. MedRxiv Prepr Serv Health Sci. Published online May 25, 2022:2022.05.24.22275398. doi: 10.1101/2022.05.24.22275398 - DOI
    1. Pfaff ER, Girvin AT, Bennett TD, et al. Identifying who has long COVID in the USA: a machine learning approach using N3C data. Lancet Digit Health. 2022;4(7):e532–e541. doi: 10.1016/S2589-7500(22)00048-6 - DOI - PMC - PubMed

Publication types