Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Dec 1:160:113661.
doi: 10.1016/j.eswa.2020.113661. Epub 2020 Jun 20.

A machine learning model to identify early stage symptoms of SARS-Cov-2 infected patients

Affiliations

A machine learning model to identify early stage symptoms of SARS-Cov-2 infected patients

Md Martuza Ahamad et al. Expert Syst Appl. .

Abstract

The recent outbreak of the respiratory ailment COVID-19 caused by novel coronavirus SARS-Cov2 is a severe and urgent global concern. In the absence of effective treatments, the main containment strategy is to reduce the contagion by the isolation of infected individuals; however, isolation of unaffected individuals is highly undesirable. To help make rapid decisions on treatment and isolation needs, it would be useful to determine which features presented by suspected infection cases are the best predictors of a positive diagnosis. This can be done by analyzing patient characteristics, case trajectory, comorbidities, symptoms, diagnosis, and outcomes. We developed a model that employed supervised machine learning algorithms to identify the presentation features predicting COVID-19 disease diagnoses with high accuracy. Features examined included details of the individuals concerned, e.g., age, gender, observation of fever, history of travel, and clinical details such as the severity of cough and incidence of lung infection. We implemented and applied several machine learning algorithms to our collected data and found that the XGBoost algorithm performed with the highest accuracy (>85%) to predict and select features that correctly indicate COVID-19 status for all age groups. Statistical analyses revealed that the most frequent and significant predictive symptoms are fever (41.1%), cough (30.3%), lung infection (13.1%) and runny nose (8.43%). While 54.4% of people examined did not develop any symptoms that could be used for diagnosis, our work indicates that for the remainder, our predictive model could significantly improve the prediction of COVID-19 status, including at early stages of infection.

Keywords: COVID-19; Coronavirus; Early stage symptom; Machine learning; SARS-Cov-2.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Proposed methodology.
Fig. 2
Fig. 2
Impact of age for COVID-19 outbreak.
Fig. 3
Fig. 3
Illustration of symptoms frequency.
Fig. 4
Fig. 4
Feature importance for COVID-19 patients.

References

    1. Agarwal, R. (2019). The 5 Classification Evaluation metrics every Data Scientist must know.https://towardsdatascience.com/the-5-classification-evaluation-metrics-y... Accessed 18 April 2020.https://www.aljazeera.com/news/2020/01/countries-confirmed-cases-coronav... Accessed 18 April 2020.
    1. BDBC-KG-NLP/COVID-19-tracker. GitHub. (2020).https://github.com/BDBC-KG-NLP/COVID-19-tracker Accessed 20 February 2020.
    1. Biau G., Cadre B., Rouviére L. Accelerated gradient boosting. Machine Learning. 2019;108:971–992. doi: 10.1007/s10994-019-05787-1. - DOI
    1. Chan J.F.-W., Yuan S., Kok K.-H., To K.K.-W., Chu H., Yang J. A familial cluster of pneumonia associated with the 2019 novel coronavirus indicating person-to-person transmission: a study of a family cluster. The Lancet. 2020;395:514–523. doi: 10.1016/s0140-6736(20)30154-9. - DOI - PMC - PubMed
    1. Chen T., Guestrin C. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining – KDD. 2016. XGBoost: A scalable tree boosting system; p. 16. - DOI

LinkOut - more resources