Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Clinical Trial
. 2025 Sep 9;135(21):e193698.
doi: 10.1172/JCI193698. eCollection 2025 Nov 3.

A multiomics recovery factor predicts long COVID in the IMPACC study

Affiliations
Clinical Trial

A multiomics recovery factor predicts long COVID in the IMPACC study

Gisela Gabernet et al. J Clin Invest. .

Abstract

BACKGROUNDFollowing SARS-CoV-2 infection, approximately 10%-35% of patients with COVID-19 experience long COVID (LC), in which debilitating symptoms persist for at least 3 months. Elucidating the biologic underpinnings of LC could identify therapeutic opportunities.METHODSWe utilized machine learning methods on biologic analytes provided over 12 months after hospital discharge from more than 500 patients with COVID-19 in the IMPACC cohort to identify a multiomics "recovery factor," trained on patient-reported physical function survey scores. Immune profiling data included PBMC transcriptomics, serum O-link and plasma proteomics, plasma metabolomics, and blood mass cytometry by time of flight (CyTOF) protein levels. Recovery factor scores were tested for association with LC, disease severity, clinical parameters, and immune subset frequencies. Enrichment analyses identified biologic pathways associated with recovery factor scores.RESULTSParticipants with LC had lower recovery factor scores compared with recovered participants. Recovery factor scores predicted LC as early as hospital admission, irrespective of acute COVID-19 severity. Biologic characterization revealed increased inflammatory mediators, elevated signatures of heme metabolism, and decreased androgenic steroids as predictive and ongoing biomarkers of LC. Lower recovery factor scores were associated with reduced lymphocyte and increased myeloid cell frequencies. The observed signatures are consistent with persistent inflammation driving anemia and stress erythropoiesis as major biologic underpinnings of LC.CONCLUSIONThe multiomics recovery factor identifies patients at risk of LC early after SARS-CoV-2 infection and reveals LC biomarkers and potential treatment targets.TRIAL REGISTRATIONClinicalTrials.gov NCT04378777.FUNDINGNational Institute of Allergy and Infectious Diseases (NIAID), NIH (3U01AI167892-03S2, 3U01AI167892-01S2, 5R01AI135803-03, 5U19AI118608-04, 5U19AI128910-04, 4U19AI090023-11, 4U19AI118610-06, R01AI145835-01A1S1, 5U19AI062629-17, 5U19AI057229-17, 5U19AI057229-18, 5U19AI125357-05, 5U19AI128913-03, 3U19AI077439-13, 5U54AI142766-03, 5R01AI104870-07S1, 3U19AI089992-09, 3U19AI128913-03, and 5T32DA018926-1, 3U19AI1289130, U19AI128913-04S1, R01AI122220); NIH (UM1TR004528); and National Science Foundation (NSF) (DMS2310836).

Keywords: Biomarkers; COVID-19; Immunology; Infectious disease; Machine learning.

PubMed Disclaimer

Figures

Figure 1
Figure 1. Multiomics data overview and generation of a predictive LC factor.
(A) Number of samples used in the multiomics data integration strategy by assay (rows) and scheduled time of collection (columns). Shading indicates the frequency of samples with data availability at the indicated visit. (B) Patient classification in PRO clusters according to the PRO survey scores (18). (C) Individual assay data were preprocessed and split into train and test cohorts by participant in an 80/20 split, maintaining the proportion of PRO cluster participants in each partition. (D) Preprocessed assay data and LC response outcomes for the train cohort were used to identify multiomics predictive factors with SPEAR. Factor scores were then calculated for the test cohort. (E) The performance of the multiomics predictive factors to classify patients according to the presence or absence of LC was assessed on the train cohort via cross-validation and then validated on the test cohort. The predictive factor scores were confirmed to be associated with LC after correcting for possible confounding variables. In-depth analysis of enriched biological pathways and significant analytes relevant for the prediction was performed. Factor scores were computed for the acute infection immune profiles, and association analysis with LC at these early time points was performed. See also Supplemental Figures 1 and 2.
Figure 2
Figure 2. Identification of a convalescent multiomics recovery factor that discriminates LC.
(A) Predictive performance of a lasso model trained on the MOFA and SPEAR factors to discriminate LC versus MIN at the event level. The mean AUROC of a 10-fold cross-validation on the train cohort for 100 bootstrapped model training repetitions is shown. Significance was calculated by standard normal approximation of bootstrapped differences between models (t test, adj. ****P ≤ 0.0001). CV, cross validation. (B) Predictive performance of the SPEAR Physical model to discriminate LC versus MIN on the test cohort. The ROC curve of model (solid line), random classifier (dashed line), and AUROC value are shown. TPR, true-positive rate; FPR, false-positive rate. (C) Recovery factor scores for the test cohort of the MIN and LC groups at 3 months (visit 7), 6 months (visit 8), 9 months (visit 9), and 12 months (visit 10) after hospital discharge. (D) Recovery factor scores of the individual PRO clusters by visit for the test cohort. P values in C and D show the significance of the recovery factor score association with MIN versus LC and pairwise PRO cluster combinations using a goodness-of-fit χ2 test. See also Supplemental Figure 3–5.
Figure 3
Figure 3. Heme metabolism and androgenic steroid pathways, inflammation-associated serum factors, and altered immune cell composition are associated with the recovery factor during convalescence.
(A) GSEA identified heme metabolism and androgenic steroid pathways as being significantly associated with the recovery factor, with significance shown per assay, as well as across assays (joint adj. P < 0.05). (B) Twenty-six significant analytes (SPEAR Bayesian posterior selection probability ≥0.95) in the recovery factor across different assays (left) were identified using SPEAR factor loadings (middle; coefficient in the factor), and each was tested for association in the test cohort with MIN versus LC groups (right; adj. intercept P value). (C) Geometric means of analytes from the significantly enriched gene and metabolite sets and/or significant SPEAR analytes are shown per sample at each convalescent visit in the test cohort. The combined geometric mean score includes leading edge analytes from the hallmark heme metabolism and androgenic steroids pathways, and the significant SPEAR analytes. The P values indicate significance of the association with MIN versus LC. (D) Association in the full cohort of whole blood cell counts determined by CyTOF with the recovery factor for parent and child immune cell types. Mono, monocytes; B, B lymphocytes; CD4, CD4+ T lymphocytes; CD8, CD8+ T lymphocytes; CD27+ non-sM, CD27+ nonswitched memory; CD27+ sM, CD27+ switched memory. For a full list of the child populations, see Supplemental Table 3. (adj. *P < 0.05, adj. **P < 0.01, adj. ***P < 0.001). See also Supplemental Figure 6.
Figure 4
Figure 4. Associations of clinical measurements with recovery factor scores.
(A) Association of recovery factor scores with clinical features (demographics, comorbidities, complications, and baseline laboratory measurements). Dot plot shows the signed adjusted P values indicating the clinical feature term significance from a linear mixed-effects model, with enrollment site and participant as random effects to explain the convalescent phase recovery factor scores. Sex and discretized age were further adjusted as fixed effects for clinical features other than sex and age. Only significant associations (adj. P < 0.05) are shown. (B) Associations of recovery factor scores with individual PRO survey scores (PROMIS scale scores, EQ-5D-5L and health score) in the test cohort. Raw and adjusted P values indicate the PRO score term significance in linear mixed-effects models. (C) Associations of recovery factor scores with each indicated symptom group in the test cohort: neurological, cardiopulmonary, upper respiratory, systemic, gastrointestinal. Numbers are the uncorrected significance (P values) of the symptom group term in linear mixed-effects models. (D) Recovery factor scores per participant in the test cohort, separated into MIN and LC groups by acute phase trajectory groups, stratified by visit. P values for BD show the endpoint term of a linear mixed-effects model with sex, discretized admission age, and trajectory group as fixed effects and enrollment site as a random effect. No individual MIN versus LC comparisons were significant after P value correction.
Figure 5
Figure 5. Recovery factor scores in acute phase data associate with eventual LC status.
(A) Recovery factor scores during the acute disease phase for participants in the LC and MIN groups within 72 hours of hospital admission (visit 1) and at day 4 (visit 2), day 7 (visit 3), day 14 (visit 4), day 21 (visit 5), and day 28 (visit 6) after admission. (B) Recovery factor scores during the acute disease phase for participants in individual PRO clusters. (C) Geometric mean scores of analytes in enriched gene and metabolic sets and/or significant SPEAR analytes during the acute phase. No individual per-visit comparisons were significant after P value correction. P values in the top-right box in AC show the significance of the recovery factor score or the geometric mean signature association with MIN versus LC or pairwise PRO cluster combinations. Bars above the box plots show the pairwise significance across groups in a per-visit comparison (*P < 0.05 and **P < 0.01). (D) Recovery factor score association with whole blood CyTOF immune cell populations during the acute phase (adj. *P < 0.05, adj. **P < 0.01, adj. ***P < 0.001). See also Supplemental Figure 9.
Figure 6
Figure 6. Recovery factor scores associate with acute disease phase trajectory groups, but identify LC irrespective of acute severity.
(A) Longitudinal analysis of acute recovery factor scores for the full IMPACC cohort stratified by trajectory group (n = 1,148 participants). The P value shows the significance of the trajectory group term in a longitudinal model, correcting for age and sex as fixed effects and enrollment site and participant ID as random effects. (B) Recovery factor scores in the acute phase by convalescent MIN/LC label, stratified by acute trajectory group and visit number. P values show significance in distinguishing MIN versus LC labels in linear mixed models with sex, discretized admission age, and trajectory group as fixed effects and enrollment site and participant ID as random effects, performed separately for each acute visit and corrected across all visits.

Update of

  • Identification of a multi-omics factor predictive of long COVID in the IMPACC study.
    Gabernet G, Maciuch J, Gygi JP, Moore JF, Hoch A, Syphurs C, Chu T, Jayavelu ND, Corry DB, Kheradmand F, Baden LR, Sekaly RP, McComsey GA, Haddad EK, Cairns CB, Rouphael N, Fernandez-Sesma A, Simon V, Metcalf JP, Agudelo Higuita NI, Hough CL, Messer WB, Davis MM, Nadeau KC, Pulendran B, Kraft M, Bime C, Reed EF, Schaenman J, Erle DJ, Calfee CS, Atkinson MA, Brackenridge SC, Melamed E, Shaw AC, Hafler DA, Ozonoff A, Bosinger SE, Eckalbar W, Maecker HT, Kim-Schulze S, Steen H, Krammer F, Westendorf K; IMPACC Network; Peters B, Fourati S, Altman MC, Levy O, Smolen KK, Montgomery RR, Diray-Arce J, Kleinstein SH, Guan L, Ehrlich LIR. Gabernet G, et al. bioRxiv [Preprint]. 2025 Feb 14:2025.02.12.637926. doi: 10.1101/2025.02.12.637926. bioRxiv. 2025. Update in: J Clin Invest. 2025 Sep 9;135(21):e193698. doi: 10.1172/JCI193698. PMID: 39990442 Free PMC article. Updated. Preprint.

References

    1. Ely EW, et al. Long Covid defined. N Engl J Med. 2024;391(18):1746–1753. doi: 10.1056/NEJMsb2408466. - DOI - PMC - PubMed
    1. National Academies of Sciences E and Medicine. A Long COVID Definition: A Chronic, Systemic Disease State with Profound Consequences. The National Academies Press; 2024. - PubMed
    1. Groff D, et al. Short-term and long-term rates of postacute sequelae of SARS-CoV-2 infection: a systematic review. JAMA Netw Open. 2021;4(10):e2128568. doi: 10.1001/jamanetworkopen.2021.28568. - DOI - PMC - PubMed
    1. Davis HE, et al. Characterizing long COVID in an international cohort: 7 months of symptoms and their impact. eClinicalMedicine. 2021;38:101019. doi: 10.1016/j.eclinm.2021.101019. - DOI - PMC - PubMed
    1. Melamed E, et al. Multidisciplinary collaborative consensus guidance statement on the assessment and treatment of neurologic sequelae in patients with post-acute sequelae of SARS-CoV-2 infection (PASC) PM R. 2023;15(5):640–662. doi: 10.1002/pmrj.12976. - DOI - PubMed

Publication types

Associated data