Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Oct;14(10):e70042.
doi: 10.1002/ctm2.70042.

Prediction of COVID-19 severity using machine learning

Affiliations

Prediction of COVID-19 severity using machine learning

Kanita Karaduzovic-Hadziabdic et al. Clin Transl Med. 2024 Oct.
No abstract available

PubMed Disclaimer

Conflict of interest statement

YD holds patents and licensing agreements related to the use of RNAs for diagnostic and therapeutic purposes (WO2018229046, licensed to Firalis SA, protecting the use of lncRNAs in the FIMICS panel used for RNAseq in the present paper; other patents and licenses are not related to the present work). YD is Scientific Advisory Board member of Firalis SA.

PF is the founder and CEO of Pharmahungary Group, a group of R&D companies.

LB declares to have acted as a SAB member of Sanofi, Ionnis, MSD and NovoNordisk; to have received speaker fees from Sanofi, Bayer and AB‐Biotics SA and to have founded the spin‐off Ivastatin Therapeutics S.L. (all unrelated to this work).

TP declares to have received speaker fees from AB‐Biotics SA and to be a co‐founder of the Spin‐off Ivastatin Therapeutics SL (all unrelated to this work).

MS received funding from Pfizer Inc. and from Owkin for projects not related to this research.

HF is the founder and owner of Firalis SA, a company commercialising the FIMICS panel. He holds patents and licenses for the use of RNAs as biomarkers and therapeutic targets.

All other authors declare no competing interests.

Figures

FIGURE 1
FIGURE 1
Study workflow and data available for the analysis (A) Study workflow. Blood samples stored at −80°C in a central NF S96‐900 certified Biobank at Firalis SA were collected from 564 patients with COVID‐19. Following this, RNA extraction, quality check, library preparation, and analysis by targeted sequencing using the FIMICS panel were performed. RNA seq data was then merged with patients’ clinical data and stored in a central database. Data was curated and made available for analysis using ML. (B) Baseline datasets available for analysis from four European cohorts: PrediCOVID from Luxembourg (n = 162), MiRCOVID from Germany (n = 69), COVID19_OMICS‐COVIRNA from Italy (n = 100), and TOCOVID from Spain (n = 233). Patient numbers indicated for each cohort after data curation and preprocessing: PrediCOVID from Luxembourg (n = 133), MiRCOVID from Germany (n = 65), COVID19_OMICS‐COVIRNA from Italy (n = 75), and TOCOVID from Spain (n = 195). A total of 463 datasets were available for the analysis.
FIGURE 2
FIGURE 2
Machine learning workflow. Machine learning workflow using (A) balanced dataset and (B) imbalanced dataset.
FIGURE 3
FIGURE 3
Feature selection. (A) Six features were selected as best predictors of COVID‐19 severity in more than 90 out of 100 iterations: age, SEQ0548 (LINC01088‐201), SEQ0817 (FGD5‐AS1), SEQ1056 (LINC01088‐209), SEQ3051 (lncCOVIRNA1), and SEQ1321 (AKAP13‐SI). The line plot shows the top 10 selected features. X‐axis: feature names: SEQXXXX is the code of the probe of the FIMICS panel. SEQ0548 and SEQ1056 probes recognise two different isoforms of the same gene LINC01088 (the former LINC01088‐201, and latter LINC01088‐209), SEQ0817 recognises FGD5‐AS1, SEQ3051 recognises an unannotated lncRNA (i.e. lncCOVIRNA1), and SEQ1321 recognises AKAP13‐SI. Y‐axis: the number of times a feature appeared in the 100 iterations of the feature selection process. (B) GLMNet and SS methods used to cross‐validate the selected features. The probability of selection of predictors plotted against the values of the regression coefficients (ß) for the leave‐one‐out cross‐validated GLMNet model. Each point represents a unique predictor. In the plot, the X‐axis represents the values of the regression coefficients of the predictors, where nonzero values indicate selection by the GLMNet model. The Y‐axis represents the frequentist probability of predictor selection when running a SS model. The probabilities of the features selected by the Boruta method are as follows: age (.95), LINC01088‐201 (.93), lncCOVIRNA1 (.71), LINC01088‐209 (.47), AKAP13‐SI (.29) and FGD5‐AS1 (.01).
FIGURE 4
FIGURE 4
Comparison of selected features between stable and critical patients. Box/violin plots for (A) age, and expression of: (B) LINC01088‐201, (C) FGD5‐AS1, (D) LINC01088‐209, (E) lncCOVIRNA1, and (F) AKAP13‐SI showing regulations in the critical group of the merged cohort (n = 101) as compared to the group of stable patients (n = 362). p Value is from Student's t test. Boxes are drawn from Q1 (25th percentile) to Q3 (75th percentile) with a horizontal line inside it to denote the median. The length of the whiskers indicates 1.5 times of IQR (interquartile range Q3–Q1).

References

    1. World Health Organization . WHO COVID‐19 Dashboard. data.who.int. http://data.who.int/dashboards/covid19/cases
    1. Thaweethai T, Jolley SE, Karlson EW, et al. Development of a definition of postacute sequelae of SARS‐CoV‐2 infection. JAMA. 2023;329:1934‐1946. - PMC - PubMed
    1. Caporali A, Anwar M, Devaux Y, et al. Non‐coding RNAs as therapeutic targets and biomarkers in ischaemic heart disease. Nat Rev Cardiol 2024:1‐18, doi:10.1038/s41569-024-01001-5 - DOI - PubMed
    1. Badimon L, Devaux Y. Transcriptomics research to improve cardiovascular healthcare. Eur Heart J. 2020;41:3296‐3298. - PubMed
    1. Gomes CPC, Ágg B, Andova A, et al. Catalyzing transcriptomics research in cardiovascular disease: the CardioRNA COST Action CA17129. Noncoding RNA. 2019;5:31. - PMC - PubMed

LinkOut - more resources