Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2025 Feb 13:2025.02.12.25322164.
doi: 10.1101/2025.02.12.25322164.

Machine learning models predict long COVID outcomes based on baseline clinical and immunologic factors

Affiliations

Machine learning models predict long COVID outcomes based on baseline clinical and immunologic factors

Naresh Doni Jayavelu et al. medRxiv. .

Abstract

The post-acute sequelae of SARS-CoV-2 (PASC), also known as long COVID, remain a significant health issue that is incompletely understood. Predicting which acutely infected individuals will go on to develop long COVID is challenging due to the lack of established biomarkers, clear disease mechanisms, or well-defined sub-phenotypes. Machine learning (ML) models offer the potential to address this by leveraging clinical data to enhance diagnostic precision. We utilized clinical data, including antibody titers and viral load measurements collected at the time of hospital admission, to predict the likelihood of acute COVID-19 progressing to long COVID. Our machine learning models achieved median AUROC values ranging from 0.64 to 0.66 and AUPRC values between 0.51 and 0.54, demonstrating their predictive capabilities. Feature importance analysis revealed that low antibody titers and high viral loads at hospital admission were the strongest predictors of long COVID outcomes. Comorbidities, including chronic respiratory, cardiac, and neurologic diseases, as well as female sex, were also identified as significant risk factors for long COVID. Our findings suggest that ML models have the potential to identify patients at risk for developing long COVID based on baseline clinical characteristics. These models can help guide early interventions, improving patient outcomes and mitigating the long-term public health impacts of SARS-CoV-2.

Keywords: COVID-19; Long COVID; PASC; Patient Reported Outcomes; SARS-CoV-2; modeling.

PubMed Disclaimer

Conflict of interest statement

IMPACC Network Competing Interests The Icahn School of Medicine at Mount Sinai has filed patent applications relating to SARS-CoV-2 serological assays and NDV-based SARS-CoV-2 vaccines which list Florian Krammer as co-inventor. Mount Sinai has spun out a company, Kantaro, to market serological tests for SARS-CoV-2. Florian Krammer has consulted for Merck and Pfizer (before 2020), and is currently consulting for Pfizer, Seqirus, 3rd Rock Ventures, Merck and Avimex. The Krammer laboratory is also collaborating with Pfizer on animal models of SARS-CoV-2. Viviana Simon is a co-inventor on a patent filed relating to SARS-CoV-2 serological assays (the “Serology Assays”). Ofer Levy is a named inventor on patents held by Boston Children’s Hospital relating to vaccine adjuvants and human in vitro platforms that model vaccine action. His laboratory has received research support from GlaxoSmithKline (GSK). Charles Cairns serves as a consultant to bioMerieux and is funded for a grant from Bill & Melinda Gates Foundation. James A Overton is a consultant at Knocean Inc. Jessica Lasky-Su serves as a scientific advisor of Precion Inc. Scott R. Hutton, Greg Michelloti and Kari Wong are employees of Metabolon Inc. Vicki Seyfer-Margolis is a current employee of MyOwnMed. Nadine Rouphael reports contracts with Lilly,Immorna, Vaccine Company and Sanofi for COVID-19 clinical trials and serves as a consultant for ICON, EMMES, Imunon, CyanVac for consulting on safety for COVID19 clinical trials. Adeeb Rahman is a current employee of Immunai Inc. Steven Kleinstein is a consultant related to ImmPort data repository for Peraton. Nathan Grabaugh is a consultant for Tempus Labs and the National Basketball Association. Akiko Iwasaki is a consultant for 4BIO, Blue Willow Biologics, Revelar Biotherapeutics, RIGImmune, Xanadu Bio, Paratus Sciences. Monika Kraft receives research funds paid to her institution from NIH, ALA; Sanofi, Astra-Zeneca for work in asthma, serves as a consultant for Astra-Zeneca, Sanofi, Chiesi, GSK for severe asthma; is a co-founder and CMO for RaeSedo, Inc, a company created to develop peptidomimetics for treatment of inflammatory lung disease. Esther Melamed received research funding from Babson Diagnostics, honorarium from Multiple Sclerosis Association of America and has served on advisory boards of Genentech, Horizon, Teva and Viela Bio. Carolyn Calfee receives research funding from NIH, FDA, DOD, Roche-Genentech and Quantum Leap Healthcare Collaborative as well as consulting services for Janssen, Vasomune, Gen1e Life Sciences, NGMBio, and Cellenkos. Wade Schulz was an investigator for a research agreement, through Yale University, from the Shenzhen Center for Health Information for work to advance intelligent disease prevention and health promotion; collaborates with the National Center for Cardiovascular Diseases in Beijing; is a technical consultant to Hugo Health, a personal health information platform; cofounder of Refactor Health, an AI-augmented data management platform for health care; and has received grants from Merck and Regeneron Pharmaceutical for research related to COVID-19. Grace A McComsey received research grants from Rehdhill, Cognivue, Pfizer, and Genentech, and served as a research consultant for Gilead, Merck, Viiv/GSK, and Jenssen. Linda N. Geng received research funding paid to her institution from Pfizer, Inc.

Figures

Figure 1:
Figure 1:
Schematic of the machine learning models development for predicting long COVID.
Figure 2:
Figure 2:
Evaluation of machine learning model’s predictive performance on independent test data for identifying patients at risk of developing a long COVID phenotype. Distribution of (A) AUROC and (B) AUPRC values for all the models.
Figure 3:
Figure 3:
Relative importance of features included in the machine learning models predicting a long COVID phenotype. (A) Dot plot showing scaled importance of all features included in the models predictive of long COVID. The size of the circle shows the relative importance of features. (B) Forest plot showing the univariate model odds ratios for the same features.

References

    1. Website. n.d. https://www.cdc.gov/coronavirus/2019-ncov/long-term-effects/index.html.
    1. National Center for Health Statistics. U.S. Census Bureau HPS, 2022–. Long COVID; n.d.
    1. Office for National Statistics. Prevalence of ongoing symptoms following coronavirus (COVID-19) infection in the UK; n.d.
    1. Sudre CH, Murray B, Varsavsky T, Graham MS, Penfold RS, Bowyer RC, et al. Attributes and predictors of long COVID. Nat Med 2021;27:626–31. - PMC - PubMed
    1. Antony B, Blau H, Casiraghi E, Loomba JJ, Callahan TJ, Laraway BJ, et al. Predictive models of long COVID. EBioMedicine 2023;96:104777. - PMC - PubMed

Publication types

LinkOut - more resources