Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Nov 18;12(22):7164.
doi: 10.3390/jcm12227164.

Multivariable Risk Modelling and Survival Analysis with Machine Learning in SARS-CoV-2 Infection

Affiliations

Multivariable Risk Modelling and Survival Analysis with Machine Learning in SARS-CoV-2 Infection

Andrea Ciarmiello et al. J Clin Med. .

Abstract

Aim: To evaluate the performance of a machine learning model based on demographic variables, blood tests, pre-existing comorbidities, and computed tomography(CT)-based radiomic features to predict critical outcome in patients with acute respiratory syndrome coronavirus 2 (SARS-CoV-2).

Methods: We retrospectively enrolled 694 SARS-CoV-2-positive patients. Clinical and demographic data were extracted from clinical records. Radiomic data were extracted from CT. Patients were randomized to the training (80%, n = 556) or test (20%, n = 138) dataset. The training set was used to define the association between severity of disease and comorbidities, laboratory tests, demographic, and CT-based radiomic variables, and to implement a risk-prediction model. The model was evaluated using the C statistic and Brier scores. The test set was used to assess model prediction performance.

Results: Patients who died (n = 157) were predominantly male (66%) over the age of 50 with median (range) C-reactive protein (CRP) = 5 [1, 37] mg/dL, lactate dehydrogenase (LDH) = 494 [141, 3631] U/I, and D-dimer = 6.006 [168, 152.015] ng/mL. Surviving patients (n = 537) had median (range) CRP = 3 [0, 27] mg/dL, LDH = 484 [78, 3.745] U/I, and D-dimer = 1.133 [96, 55.660] ng/mL. The strongest risk factors were D-dimer, age, and cardiovascular disease. The model implemented using the variables identified using the LASSO Cox regression analysis classified 90% of non-survivors as high-risk individuals in the testing dataset. In this sample, the estimated median survival in the high-risk group was 9 days (95% CI; 9-37), while the low-risk group did not reach the median survival of 50% (p < 0.001).

Conclusions: A machine learning model based on combined data available on the first days of hospitalization (demographics, CT-radiomics, comorbidities, and blood biomarkers), can identify SARS-CoV-2 patients at risk of serious illness and death.

Keywords: CT; SARS-CoV-2; machine learning; radiomics; survival.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
Flowchart for machine learning model development.
Figure 2
Figure 2
Predictors of outcome. (A) Coefficient profile plotted versus the log (λ). Each colored line represents the coefficient of each feature. (B) The C-index was plotted versus log (λ). The green circle and line locate the Lambda with minimum cross-validation error. The blue circle and line locate the point with minimum cross-validation error plus one standard deviation. (C) Variables that survived the LASSO regression, including age, D-dimer, LDH, three comorbidities, and four radiomic variables.
Figure 3
Figure 3
Survival curves. Training data set (A): the survival time of SARS-CoV-2 patients in the high-risk group differed significantly from that of the low-risk subjects, with a median of 12 days (95% CI; 10–14). The low-risk group did not achieve the 50% survival rate. Test dataset (B): the median survival duration of the high-risk group was 9 days (95% CI; 6–37) and low-risk patients did not reach the median survival of 50%.

Similar articles

Cited by

References

    1. Wu Z., McGoogan J.M. Characteristics of and important lessons from the coronavirus disease 2019 (COVID-19) outbreak in China: Summary of a report of 72,314 cases from the Chinese center for disease control and prevention. JAMA. 2020;323:1239–1242. doi: 10.1001/jama.2020.2648. - DOI - PubMed
    1. Lambrou A.S., Shirk P., Steele M.K., Paul P., Paden C.R., Cadwell B., Reese H.E., Aoki Y., Hassell N., Zheng X.Y., et al. Genomic surveillance for SARS-CoV-2 variants: Predominance of the delta (b.1.617.2) and omicron (b.1.1.529) variants—United states, June 2021–January 2022. Morb. Mortal. Wkly. Rep. 2022;71:206–211. doi: 10.15585/mmwr.mm7106a4. - DOI - PMC - PubMed
    1. Colson P., Delerce J., Burel E., Dahan J., Jouffret A., Fenollar F., Yahi N., Fantini J., La Scola B., Raoult D. Emergence in southern france of a new SARS-CoV-2 variant harbouring both n501y and e484k substitutions in the spike protein. Arch. Virol. 2022;167:1185–1190. doi: 10.1007/s00705-022-05385-y. - DOI - PMC - PubMed
    1. Vadiati M., Beynaghi A., Bhattacharya P., Bandala E.R., Mozafari M. Indirect effects of covid-19 on the environment: How deep and how long? Sci. Total Environ. 2022;810:152255. doi: 10.1016/j.scitotenv.2021.152255. - DOI - PMC - PubMed
    1. Richardson S., Hirsch J.S., Narasimhan M., Crawford J.M., McGinn T., Davidson K.W., Northwell COVID-19 Research Consortium Presenting Characteristics, Comorbidities, and Outcomes Among 5700 Patients Hospitalized With COVID-19 in the New York City Area. JAMA. 2020;323:2052–2059. doi: 10.1001/jama.2020.6775. - DOI - PMC - PubMed