Machine learning models predict long COVID outcomes based on baseline clinical and immunologic factors

Naresh Doni Jayavelu^#¹, Hady Samaha^#², Sonia Tandon Wimalasena², Annmarie Hoch³, Jeremy P Gygi⁴, Gisela Gabernet⁴, Al Ozonoff³, Shanshan Liu³, Carly E Milliren³, Ofer Levy⁵, Lindsey R Baden⁶, Esther Melamed⁷, Lauren I R Ehrlich⁷, Grace A McComsey⁸, Rafick P Sekaly⁸, Charles B Cairns⁹, Elias K Haddad⁹, Joanna Schaenman¹⁰, Albert C Shaw⁴, David A Hafler⁴, Ruth R Montgomery⁴, David B Corry¹¹, Farrah Kheradmand¹¹, Mark A Atkinson¹², Scott C Brakenridge¹², Nelson I Agudelo Higuit¹³, Jordan P Metcalf¹³, Catherine L Hough¹⁴, William B Messer¹⁴, Bali Pulendran¹⁵, Kari C Nadeau¹⁵, Mark M Davis¹⁵, Linda N Geng¹⁵, Ana Fernandez Sesma¹⁶, Viviana Simon¹⁶, Florian Krammer¹⁶, Monica Kraft¹⁷, Chris Bime¹⁷, Carolyn S Calfee¹⁸, David J Erle¹⁸, Charles R Langelier¹⁸; IMPACC Network; Leying Guan⁴, Holden T Maecker¹⁵, Bjoern Peters¹⁹, Steven H Kleinstein⁴, Elaine F Reed¹⁰, Alison D Augustine²⁰, Joann Diray-Arce³, Patrice M Becker²⁰, Nadine Rouphael², Matthew C Altman²¹

Collaborators, Affiliations

PMID: 41484172
PMCID: PMC12764860
DOI: 10.1038/s43856-025-01230-w

Machine learning models predict long COVID outcomes based on baseline clinical and immunologic factors

Naresh Doni Jayavelu et al. Commun Med (Lond). 2026.

. 2026 Jan 3;6(1):1.

doi: 10.1038/s43856-025-01230-w.

PMID: 41484172
PMCID: PMC12764860
DOI: 10.1038/s43856-025-01230-w

Erratum in

Author Correction: Machine learning models predict long COVID outcomes based on baseline clinical and immunologic factors.
Doni Jayavelu N, Samaha H, Wimalasena ST, Hoch A, Gygi JP, Gabernet G, Ozonoff A, Liu S, Milliren CE, Levy O, Baden LR, Melamed E, Ehrlich LIR, McComsey GA, Sekaly RP, Cairns CB, Haddad EK, Schaenman J, Shaw AC, Hafler DA, Montgomery RR, Corry DB, Kheradmand F, Atkinson MA, Brakenridge SC, Agudelo Higuit NI, Metcalf JP, Hough CL, Messer WB, Pulendran B, Nadeau KC, Davis MM, Geng LN, Fernandez Sesma A, Simon V, Krammer F, Kraft M, Bime C, Calfee CS, Erle DJ, Langelier CR; IMPACC Network; Guan L, Maecker HT, Peters B, Kleinstein SH, Reed EF, Augustine AD, Diray-Arce J, Becker PM, Rouphael N, Altman MC. Doni Jayavelu N, et al. Commun Med (Lond). 2026 Feb 23;6(1):125. doi: 10.1038/s43856-026-01425-9. Commun Med (Lond). 2026. PMID: 41730996 Free PMC article. No abstract available.

Abstract

Background: The post-acute sequelae of SARS-CoV-2 (PASC), also known as long COVID, remain a significant health issue that is incompletely understood. Predicting which acutely infected individuals will develop long COVID is challenging due to the absence of established biomarkers, clear disease mechanisms, or well-defined sub-phenotypes. Machine learning (ML) models may address this gap by leveraging clinical data to enhance diagnostic precision.

Methods: Clinical data, including antibody titers and viral load measurements collected at the time of hospital admission, are used to predict the likelihood of acute COVID-19 progressing to long COVID. Machine learning models are trained and evaluated for predictive performance. Feature importance analysis is performed to identify the most influential predictors.

Results: The machine learning models achieve median AUROC values ranging from 0.64 to 0.66 and AUPRC values between 0.51 and 0.54, demonstrating predictive capabilities. Low antibody titers and high viral loads at hospital admission emerge as the strongest predictors of long COVID outcomes. Comorbidities-such as chronic respiratory, cardiac, and neurologic diseases-and female sex are also identified as significant risk factors.

Conclusions: Machine learning models identify patients at risk for developing long COVID based on baseline clinical characteristics. These models guide early interventions, improve patient outcomes, and mitigate the long-term public health impacts of SARS-CoV-2.

Plain language summary

Long COVID, or post-acute sequelae of SARS-CoV-2, is a prolonged health condition that can occur after acute COVID-19 infection. However, the ability to predict who will develop long COVID remains limited due to the absence of clear tests or biomarkers. We looked at patients’ medical information, including the amount of virus in their body at hospital admission, and how strong their immune response was. Using computer programs that can find hidden patterns in large sets of data, we discovered that people with a weaker immune response, higher amounts of virus, certain long term health problems and women are more likely to develop long COVID. This study highlights that computer-based tools could help doctors identify high-risk patients early and provide care that may prevent long-term complications.

PubMed Disclaimer

Conflict of interest statement

Competing interests: The authors declare the following competing interests: The Icahn School of Medicine at Mount Sinai has filed patent applications relating to SARS-CoV-2 serological assays and NDV-based SARS-CoV-2 vaccines which list F.K. as co-inventor. Mount Sinai has spun out a company, Kantaro, to market serological tests for SARS-CoV-2. F.K. has consulted for Merck and Pfizer (before 2020), and is currently consulting for Pfizer, Seqirus, 3rd Rock Ventures, Merck and Avimex. The Krammer laboratory is also collaborating with Pfizer on animal models of SARS-CoV-2. V.S. is a co-inventor on a patent filed relating to SARS-CoV-2 serological assays (the “Serology Assays”). O.L. is a named inventor on patents held by Boston Children’s Hospital relating to vaccine adjuvants and human in vitro platforms that model vaccine action. His laboratory has received research support from GlaxoSmithKline (GSK). C.B.C. serves as a consultant to bioMerieux and is funded for a grant from Bill & Melinda Gates Foundation. J.A.O. is a consultant at Knocean Inc. Jessica Lasky-Su serves as a scientific advisor of Precion Inc. S.R.H., G.M. and K.W. are employees of Metabolon Inc. V.S.M. is a current employee of MyOwnMed. N.R. reports contracts with Lilly, Immorna, Vaccine Company and Sanofi for COVID-19 clinical trials and serves as a consultant for ICON, EMMES, Imunon, CyanVac for consulting on safety for COVID19 clinical trials. A.R. is a current employee of Immunai Inc. Steven Kleinstein is a consultant related to ImmPort data repository for Peraton. Nathan Grabaugh is a consultant for Tempus Labs and the National Basketball Association. Akiko Iwasaki is a consultant for 4BIO, Blue Willow Biologics, Revelar Biotherapeutics, RIGImmune, Xanadu Bio, Paratus Sciences. M.K. receives research funds paid to her institution from NIH, ALA; Sanofi, Astra-Zeneca for work in asthma, serves as a consultant for Astra-Zeneca, Sanofi, Chiesi, GSK for severe asthma; is a co-founder and CMO for RaeSedo, Inc, a company created to develop peptidomimetics for treatment of inflammatory lung disease. E.M. received research funding from Babson Diagnostics, honorarium from Multiple Sclerosis Association of America and has served on advisory boards of Genentech, Horizon, Teva and Viela Bio. C.C. receives research funding from NIH, FDA, DOD, Roche-Genentech and Quantum Leap Healthcare Collaborative as well as consulting services for Janssen, Vasomune, Gen1e Life Sciences, NGMBio, and Cellenkos. Wade Schulz was an investigator for a research agreement, through Yale University, from the Shenzhen Center for Health Information for work to advance intelligent disease prevention and health promotion; collaborates with the National Center for Cardiovascular Diseases in Beijing; is a technical consultant to Hugo Health, a personal health information platform; cofounder of Refactor Health, an AI-augmented data management platform for health care; and has received grants from Merck and Regeneron Pharmaceutical for research related to COVID-19. G.A.M. received research grants from Rehdhill, Cognivue, Pfizer, and Genentech, and served as a research consultant for Gilead, Merck, Viiv/GSK, and Jenssen. L.N.G. received research funding paid to her institution from Pfizer, Inc. E.M. is an Editorial Board Member for Communications Medicine and Guest Editor for the Post COVID-19 condition/Long COVID Collection, but was not involved in the editorial review or peer review, nor in the decision to publish this article. L.N.G. is a Guest Editor for the Post COVID-19 condition/Long COVID Collection, but was not involved in the editorial review or peer review, nor the decision to publish this article.

Figures

**Fig. 1**
Schematic of the machine learning models development for predicting long COVID.

**Fig. 2. Evaluation of machine learning model’s predictive performance on independent test data for identifying patients at risk of developing a long COVID phenotype.**
Distribution of (a) AUROC and (b) AUPRC values for all the models.

**Fig. 3. Relative importance of features included in the machine learning models predicting a long COVID phenotype.**
a Dot plot showing scaled importance of all features included in the models predictive of long COVID. The size of the circle shows the relative importance of features. b Forest plot showing the univariate model odds ratios for the same features. c Boxplots showing SARS-CoV-2 viral levels (N1 CT) and antibody titers (Spike IgG) measured at the hospital admission by long COVID outcomes, Minimal vs Deficit clusters. Shown are median values (horizontal lines), interquartile ranges (boxes), and 1.5 IQR (whiskers), as well as individual points. The lower CT values indicate higher viral loads, the y-axis reversed.

See this image and copyright information in PMC

Update of

Machine learning models predict long COVID outcomes based on baseline clinical and immunologic factors.
Jayavelu ND, Samaha H, Wimalasena ST, Hoch A, Gygi JP, Gabernet G, Ozonoff A, Liu S, Milliren CE, Levy O, Baden LR, Melamed E, Ehrlich LIR, McComsey GA, Sekaly RP, Cairns CB, Haddad EK, Schaenman J, Shaw AC, Hafler DA, Montgomery RR, Corry DB, Kheradmand F, Atkinson MA, Brakenridge SC, Higuita NIA, Metcalf JP, Hough CL, Messer WB, Pulendran B, Nadeau KC, Davis MM, Geng LN, Sesma AF, Simon V, Krammer F, Kraft M, Bime C, Calfee CS, Erle DJ, Langelier CR; IMPACC Network; Guan L, Maecker HT, Peters B, Kleinstein SH, Reed EF, Diray-Arce J, Rouphael N, Altman MC. Jayavelu ND, et al. medRxiv [Preprint]. 2025 Feb 13:2025.02.12.25322164. doi: 10.1101/2025.02.12.25322164. medRxiv. 2025. Update in: Commun Med (Lond). 2026 Jan 3;6(1):1. doi: 10.1038/s43856-025-01230-w. PMID: 39990570 Free PMC article. Updated. Preprint.

References

1. CDC. Long COVID Basics. https://www.cdc.gov/coronavirus/2019-ncov/long-term-effects/index.html (2025).
1. National Center for Health Statistics. U.S. Census Bureau HPS, 2022–. Long COVID. https://www.cdc.gov/nchs/covid19/pulse/long-covid.htm (2025).
1. Office for National Statistics. Prevalence of Ongoing Symptoms Following Coronavirus (COVID-19) Infection in the UK. https://www.ons.gov.uk (2025).
1. Sudre, C. H. et al. Attributes and predictors of long COVID. Nat. Med.27, 626–631 (2021). - DOI - PMC - PubMed
1. Huyut, M. T., Velichko, A. & Belyaev, M. Detection of risk predictors of COVID-19 mortality with classifier machine learning models operated with routine laboratory biomarkers. Appl. Sci.12, 12180 (2022). - DOI

Grants and funding

LinkOut - more resources

Full Text Sources
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Machine learning models predict long COVID outcomes based on baseline clinical and immunologic factors

Machine learning models predict long COVID outcomes based on baseline clinical and immunologic factors

Erratum in

Abstract

Plain language summary

Conflict of interest statement

Figures

Update of

References

Grants and funding

LinkOut - more resources

Full Text Sources

Miscellaneous