A machine learning model to identify early stage symptoms of SARS-Cov-2 infected patients
- PMID: 32834556
- PMCID: PMC7305929
- DOI: 10.1016/j.eswa.2020.113661
A machine learning model to identify early stage symptoms of SARS-Cov-2 infected patients
Abstract
The recent outbreak of the respiratory ailment COVID-19 caused by novel coronavirus SARS-Cov2 is a severe and urgent global concern. In the absence of effective treatments, the main containment strategy is to reduce the contagion by the isolation of infected individuals; however, isolation of unaffected individuals is highly undesirable. To help make rapid decisions on treatment and isolation needs, it would be useful to determine which features presented by suspected infection cases are the best predictors of a positive diagnosis. This can be done by analyzing patient characteristics, case trajectory, comorbidities, symptoms, diagnosis, and outcomes. We developed a model that employed supervised machine learning algorithms to identify the presentation features predicting COVID-19 disease diagnoses with high accuracy. Features examined included details of the individuals concerned, e.g., age, gender, observation of fever, history of travel, and clinical details such as the severity of cough and incidence of lung infection. We implemented and applied several machine learning algorithms to our collected data and found that the XGBoost algorithm performed with the highest accuracy (>85%) to predict and select features that correctly indicate COVID-19 status for all age groups. Statistical analyses revealed that the most frequent and significant predictive symptoms are fever (41.1%), cough (30.3%), lung infection (13.1%) and runny nose (8.43%). While 54.4% of people examined did not develop any symptoms that could be used for diagnosis, our work indicates that for the remainder, our predictive model could significantly improve the prediction of COVID-19 status, including at early stages of infection.
Keywords: COVID-19; Coronavirus; Early stage symptom; Machine learning; SARS-Cov-2.
© 2020 Elsevier Ltd. All rights reserved.
Figures
References
-
- Agarwal, R. (2019). The 5 Classification Evaluation metrics every Data Scientist must know.https://towardsdatascience.com/the-5-classification-evaluation-metrics-y... Accessed 18 April 2020.https://www.aljazeera.com/news/2020/01/countries-confirmed-cases-coronav... Accessed 18 April 2020.
-
- BDBC-KG-NLP/COVID-19-tracker. GitHub. (2020).https://github.com/BDBC-KG-NLP/COVID-19-tracker Accessed 20 February 2020.
-
- Biau G., Cadre B., Rouviére L. Accelerated gradient boosting. Machine Learning. 2019;108:971–992. doi: 10.1007/s10994-019-05787-1. - DOI
-
- Chen T., Guestrin C. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining – KDD. 2016. XGBoost: A scalable tree boosting system; p. 16. - DOI
LinkOut - more resources
Full Text Sources
Other Literature Sources
Miscellaneous