Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2025 Feb 14:2025.02.12.637926.
doi: 10.1101/2025.02.12.637926.

Identification of a multi-omics factor predictive of long COVID in the IMPACC study

Affiliations

Identification of a multi-omics factor predictive of long COVID in the IMPACC study

Gisela Gabernet et al. bioRxiv. .

Update in

  • A multiomics recovery factor predicts long COVID in the IMPACC study.
    Gabernet G, Maciuch J, Gygi JP, Moore JF, Hoch A, Syphurs C, Chu T, Doni Jayavelu N, Corry DB, Kheradmand F, Baden LR, Sekaly RP, McComsey GA, Haddad EK, Cairns CB, Rouphael N, Fernandez-Sesma A, Simon V, Metcalf JP, Agudelo Higuita NI, Hough CL, Messer WB, Davis MM, Nadeau KC, Pulendran B, Kraft M, Bime C, Reed EF, Schaenman J, Erle DJ, Calfee CS, Atkinson MA, Brakenridge SC, Melamed E, Shaw AC, Hafler DA, Augustine AD, Becker PM, Ozonoff A, Bosinger SE, Eckalbar W, Maecker HT, Kim-Schulze S, Steen H, Krammer F, Westendorf K; IMPACC Network; Peters B, Fourati S, Altman MC, Levy O, Smolen KK, Montgomery RR, Diray-Arce J, Kleinstein SH, Guan L, Ehrlich LI. Gabernet G, et al. J Clin Invest. 2025 Sep 9;135(21):e193698. doi: 10.1172/JCI193698. eCollection 2025 Nov 3. J Clin Invest. 2025. PMID: 40924481 Free PMC article. Clinical Trial.

Abstract

Following SARS-CoV-2 infection, ~10-35% of COVID-19 patients experience long COVID (LC), in which often debilitating symptoms persist for at least three months. Elucidating the biologic underpinnings of LC could identify therapeutic opportunities. We utilized machine learning methods on biologic analytes and patient reported outcome surveys provided over 12 months after hospital discharge from >500 hospitalized COVID-19 patients in the IMPACC cohort to identify a multi-omics "recovery factor". IMPACC participants who experienced LC had lower recovery factor scores compared to participants without LC. Biologic characterization revealed increased levels of plasma proteins associated with inflammation, elevated transcriptional signatures of heme metabolism, and decreased androgenic steroids in LC patients. The recovery factor was also associated with altered circulating immune cell frequencies. Notably, recovery factor scores were predictive of LC occurrence in patients as early as hospital admission, irrespective of acute disease severity. Thus, the recovery factor identifies patients at risk of LC early after SARS-CoV-2 infection and reveals LC biomarkers and potential treatment targets.

Keywords: COVID-19; Heme metabolism; Long COVID; Machine Learning; PASC; SARS-CoV-2; androgenic steroids; inflammation; multi-omics; patient reported outcomes.

PubMed Disclaimer

Conflict of interest statement

The Icahn School of Medicine at Mount Sinai has filed patent applications relating to SARS-CoV-2 serological assays, NDV-based SARS-CoV-2 vaccines influenza virus vaccines and influenza virus therapeutics which lists Florian Krammer as co-inventor. Mount Sinai has spun out a company, Kantaro, to market serological tests for SARS-CoV-2 and another company, Castlevax, to develop SARS-CoV-2 vaccines. Florian Krammer is co-founder and scientific advisory board member of Castlevax. Florian Krammer has consulted for Merck, Curevac, Seqirus, GSK and Pfizer and is currently consulting for 3rd Rock Ventures, Sanofi, Gritstone and Avimex. The Krammer laboratory is also collaborating with Dynavax on influenza vaccine development and with VIR on influenza virus therapeutics development. Viviana Simon is a co-inventor on a patent filed relating to SARS-CoV-2 serological assays (the “Serology Assays”). Ofer Levy is a named inventor on patents held by Boston Children’s Hospital relating to vaccine adjuvants and human in vitro platforms that model vaccine action. His laboratory has received research support from GlaxoSmithKline (GSK) and Pfizer, and he is a co-founder of and advisor to Ovax, Inc that develops opioid vaccines. Charles Cairns serves as a consultant to bioMerieux and is funded by a grant from Bill & Melinda Gates Foundation. James A Overton is a consultant at Knocean Inc. Jessica Lasky-Su serves as a scientific advisor of Precion Inc. Scott R. Hutton, Greg Michelloti and Kari Wong are employees of Metabolon Inc. Vicki Seyfer- Margolis is a current employee of MyOwnMed. Nadine Rouphael reports grants or contracts with Merck, Sanofi, Pfizer, Vaccine Company and Immorna, and has participated on data safety monitoring boards for Moderna, Sanofi, Seqirus, Pfizer, EMMES, ICON, BARDA, and CyanVan, Imunon Micron. N.R. has also received support for meetings/travel from Sanofi and Moderna and honoraria from Virology Education and Krog consulting. Chris Cotsapas is a current employee of Vesalius Therapeutics. Adeeb Rahman is a current employee of Immunai Inc. Steven Kleinstein is a consultant related to ImmPort data repository for Peraton. Nathan Grabaugh is a consultant for Tempus Labs and the National Basketball Association. Akiko Iwasaki is a consultant for 4BIO, Blue Willow Biologics, Revelar Biotherapeutics, RIGImmune, Xanadu Bio, Paratus Sciences. Monika Kraft receives research funds paid to her institution from NIH, ALA; Sanofi, Astra-Zeneca for work in asthma, serves as a consultant for Astra-Zeneca, Sanofi, Chiesi, GSK for severe asthma; is a co-founder and CMO for RaeSedo, Inc, a company created to develop peptidomimetics for treatment of inflammatory lung disease. Esther Melamed received research funding from Babson Diagnostics and honorarium from Multiple Sclerosis Association of America and has served on the advisory boards of Genentech, Horizon, Teva, and Viela Bio. Carolyn Calfee receives research funding from NIH, FDA, DOD, Roche-Genentech and Quantum Leap Healthcare Collaborative as well as consulting services for Janssen, Vasomune, Gen1e Life Sciences, NGMBio, and Cellenkos. Wade Schulz was an investigator for a research agreement, through Yale University, from the Shenzhen Center for Health Information for work to advance intelligent disease prevention and health promotion; collaborates with the National Center for Cardiovascular Diseases in Beijing; is a technical consultant to Hugo Health, a personal health information platform; cofounder of Refactor Health, an AI-augmented data management platform for health care; and has received grants from Merck and Regeneron Pharmaceutical for research related to COVID-19. Grace A McComsey received research grants from Rehdhill, Cognivue, Pfizer, and Genentech, and served as a research consultant for Gilead, Merck, Viiv/GSK, and Janssen. Linda N. Geng received research funding paid to her institution from Pfizer, Inc. Catherine Hough receives support through her institution from the NIH and CDC. David Hafler has received research funding from BristolMyers Squibb, Novartis, Sanofi, and Genentech. He has been a consultant for Bayer Pharmaceuticals, Repertoire Inc, Bristol Myers Squibb, Compass Therapeutics, EMD Serono, Genentech, Novartis Pharmaceuticals, and Sanofi Genzyme.

Figures

Figure 1.
Figure 1.. Multi-omics data overview and generation of a predictive LC factor.
(A) Number of samples used in the multi-omics data integration strategy by assay (rows) and scheduled time of collection (columns). Shading indicates the frequency of samples with data availability at the indicated visit. (B) Patient classification in Patient Reported Outcome (PRO) clusters according to the PRO measure survey scores. (C) Individual assay data were preprocessed and split into Train and Test cohorts by participant in an 80/20 split, maintaining the proportion of PRO cluster participants in each partition. (D) Preprocessed assay data and LC response outcomes for the Train cohort were used to identify multi-omics predictive factors with SPEAR. Factor scores were then calculated for the Test cohort. (E) The performance of the multi-omics predictive factors to classify patients into presence and absence of LC was assessed on the Train cohort via cross-validation and then validated on the Test cohort. The predictive factor scores were confirmed to be associated with LC after correcting for possible confounding variables. In depth analysis of enriched biological pathways and significant analytes relevant for the prediction was performed. Factor scores were computed for the acute infection immune profiles, and association analysis with LC at these early time points was performed. See also Figure S1.
Figure 2.
Figure 2.. Identification of a convalescent multi-omics recovery factor that discriminates long COVID.
(A) Predictive performance of a lasso model trained on the MOFA and SPEAR factors to discriminate LC vs MIN at the event level. The mean AUROC of a 10-fold cross-validation on the Train Cohort, for 100 bootstrapped model training repetitions are shown. Significance was calculated by standard normal approximation of bootstrapped differences between models (t-test, ****adj. p-value ≤ 0.0001) (B) Predictive performance of the SPEAR Physical model to discriminate LC vs MIN on the Test cohort. ROC curve of model (solid line), random classifier (dashed line), and AUROC value are shown. TPR: true positive rate, FPR: false positive rate. (C) Recovery factor scores for the Test cohort of the MIN and LC groups at 3 months (Visit 7), 6 months (Visit 8), 9 months (Visit 9) and 12 months (Visit 10) after hospital discharge. (D) Recovery factor scores of the individual PRO clusters by visit for the Test cohort. P-values in C and D show the significance of the recovery factor score association with MIN vs LC and pairwise PRO cluster combinations, respectively (see methods for association details). See also Figure S2, S3, and S4.
Figure 3.
Figure 3.. Heme metabolism and androgenic steroid pathways, inflammation-associated serum factors, and altered immune cell composition are associated with the recovery factor during convalescence.
(A) GSEA identifies heme metabolism and androgenic steroid pathways as significantly associated with the recovery factor, with significance shown per assay, as well as across assays (joint adj. p < 0.05). (B) 26 significant analytes (SPEAR Bayesian posterior selection probability ≥ 0.95) in the recovery factor across different assays (left) were identified using SPEAR factor loadings (middle; coefficient in the factor), and each was tested for association in the test cohort with MIN vs LC groups (right; adj. intercept p-value). (C) Geometric means of analytes from the significantly enriched gene and metabolite sets and/or significant SPEAR analytes are shown per sample at each convalescent visit in the test cohort. The p-values indicate significance of the association with MIN vs LC. (D) Association in the full cohort of whole blood cell counts determined by CyTOF with the recovery factor for parent and child immune cell types. NK: Natural Killer cells, Mono: Monocytes, PMN: polymorphonuclear neutrophils, B: B lymphocytes, CD4: CD4+ T lymphocytes, CD8: CD8+ T lymphocytes. For a full list of the child populations see Table S3. (* adj. p-value < 0.05, ** adj. p-value < 0.01, *** adj. p-value < 0.001). See also Figure S5.
Figure 4.
Figure 4.. Associations of clinical measurements with recovery factor scores.
(A) Association of recovery factor sores with clinical features (demographics, comorbidities, complications and baseline lab measurements). Dot plot shows the signed adjusted p-values indicating the clinical feature term significance from a linear mixed-effect model with enrollment site and participant as random effects to explain the convalescent phase recovery factor scores. Sex and discretized age were further adjusted as fixed effects for clinical features other than sex and age. Only significant associations (adj. p-value <0.05) are shown. (B) Associations of recovery factor scores with individual PRO survey scores (PROMIS scale scores, EQ-5D-5L and health score) in the test cohort. Raw and adjusted p-values indicated the PRO score term significance in linear mixed effect models. (C) Associations of recovery factor scores with each indicated symptom group in the test cohort. Numbers are the uncorrected significance (p-values) of the symptom group term in linear mixed effect models. (D) Recovery factor scores per participant in the test cohort, separated into MIN and LC groups by acute phase trajectory groups, stratified by visit. P-values for panels B-D show the endpoint term of a linear mixed effect model with sex, discretized admit age, and trajectory group as fixed effects and enrollment site as random effect. No individual MIN vs. LC comparisons were significant after p-value correction. (* adj. p-value < 0.05, ** adj. adj. p-value < 0.01, *** adj. p-value < 0.001)
Figure 5.
Figure 5.. Recovery factor scores in acute phase data associate with eventual LC status.
(A) Recovery factor scores during the acute disease phase for participants in the LC and MIN groups within 72h of hospital admission (Visit 1) and at day 4 (Visit 2), day 7 (Visit 3), day 14 (Visit 4), day 21 (Visit 5), and day 28 (Visit 6) after admission. (B) Recovery factor scores during the acute disease phase for participants in individual PRO clusters. (C) Geometric mean of analytes in enriched gene and metabolic sets and/or significant SPEAR analytes during the acute phase. No individual per-visit comparisons were significant after p-value correction. P-values in top-right box in A-C show the significance of the recovery factor score or geometric mean signature association with MIN vs LC or pairwise PRO cluster combinations. Bars above the boxplots show the pairwise significance across groups in a per-visit comparison (** p<0.01, *p<0.05). (D) Recovery factor scores association with whole blood CyTOF immune cell populations during the acute phase (* adj. p-value < 0.05, ** adj. p-value < 0.01, *** adj. p-value < 0.001). See also Figure S8.
Figure 6.
Figure 6.. Recovery factor scores associate with acute disease phase trajectory groups, but identify LC irrespective of acute severity.
(A) Longitudinal analysis of acute recovery factor scores for the full IMPACC cohort stratified by trajectory group (N=1,148 participants). P-value shows the significance of the trajectory group term in a longitudinal model correcting for age and sex as fixed effects and enrollment site and participant ID as random effects. (B) Recovery factor scores in the acute phase by convalescent MIN/LC label, stratified by acute trajectory group and visit number. P-values show significance in distinguishing MIN vs. LC labels in linear mixed models with sex, discretized admit age, and trajectory group as fixed effects and enrollment site and participant ID as random effects, performed separately for each acute visit and corrected across all visits.

References

    1. Ely E.W., Brown L.M., and Fineberg H.V. (2024). Long Covid Defined. New England Journal of Medicine 391, 1746–1753. 10.1056/NEJMsb2408466. - DOI - PMC - PubMed
    1. National Academies of Sciences, E., and Medicine (2024). A Long COVID Definition: A Chronic, Systemic Disease State with Profound Consequences (The National Academies Press) 10.17226/27768. - DOI - PubMed
    1. Groff D., Sun A., Ssentongo A.E., Ba D.M., Parsons N., Poudel G.R., Lekoubou A., Oh J.S., Ericson J.E., Ssentongo P., et al. (2021). Short-term and Long-term Rates of Postacute Sequelae of SARS-CoV-2 Infection: A Systematic Review. JAMA Netw Open 4, e2128568. 10.1001/jamanetworkopen.2021.28568. - DOI - PMC - PubMed
    1. Davis H.E., Assaf G.S., McCorkell L., Wei H., Low R.J., Re’em Y., Redfield S., Austin J.P., and Akrami A. (2021). Characterizing long COVID in an international cohort: 7 months of symptoms and their impact. eClinicalMedicine 38. 10.1016/j.eclinm.2021.101019. - DOI - PMC - PubMed
    1. Melamed E., Rydberg L., Ambrose A.F., Bhavaraju-Sanka R., Fine J.S., Fleming T.K., Herman E., Phipps Johnson J.L., Kucera J.R., Longo M., et al. (2023). Multidisciplinary collaborative consensus guidance statement on the assessment and treatment of neurologic sequelae in patients with post-acute sequelae of SARS-CoV-2 infection (PASC). PM R 15, 640–662. 10.1002/pmrj.12976. - DOI - PubMed

Publication types

LinkOut - more resources