Classification Models for COVID-19 Test Prioritization in Brazil: Machine Learning Approach

doi:10.2196/27293

. 2021 Apr 8;23(4):e27293.

doi: 10.2196/27293.

Classification Models for COVID-19 Test Prioritization in Brazil: Machine Learning Approach

Íris Viana Dos Santos Santana¹, Andressa Cm da Silveira², Álvaro Sobrinho^{1

3}, Lenardo Chaves E Silva⁴, Leandro Dias da Silva³, Danilo F S Santos², Edmar C Gurjão², Angelo Perkusich²

Affiliations

¹ Federal University of the Agreste of Pernambuco, Garanhuns, Brazil.
² Federal University of Campina Grande, Campina Grande, Brazil.
³ Federal University of Alagoas, Maceió, Brazil.
⁴ Federal Rural University of the Semi-Arid, Pau dos Ferros, Brazil.

PMID: 33750734
PMCID: PMC8034680
DOI: 10.2196/27293

Classification Models for COVID-19 Test Prioritization in Brazil: Machine Learning Approach

Íris Viana Dos Santos Santana et al. J Med Internet Res. 2021.

. 2021 Apr 8;23(4):e27293.

doi: 10.2196/27293.

Authors

Íris Viana Dos Santos Santana¹, Andressa Cm da Silveira², Álvaro Sobrinho^{1

3}, Lenardo Chaves E Silva⁴, Leandro Dias da Silva³, Danilo F S Santos², Edmar C Gurjão², Angelo Perkusich²

Affiliations

¹ Federal University of the Agreste of Pernambuco, Garanhuns, Brazil.
² Federal University of Campina Grande, Campina Grande, Brazil.
³ Federal University of Alagoas, Maceió, Brazil.
⁴ Federal Rural University of the Semi-Arid, Pau dos Ferros, Brazil.

PMID: 33750734
PMCID: PMC8034680
DOI: 10.2196/27293

Abstract

Background: Controlling the COVID-19 outbreak in Brazil is a challenge due to the population's size and urban density, inefficient maintenance of social distancing and testing strategies, and limited availability of testing resources.

Objective: The purpose of this study is to effectively prioritize patients who are symptomatic for testing to assist early COVID-19 detection in Brazil, addressing problems related to inefficient testing and control strategies.

Methods: Raw data from 55,676 Brazilians were preprocessed, and the chi-square test was used to confirm the relevance of the following features: gender, health professional, fever, sore throat, dyspnea, olfactory disorders, cough, coryza, taste disorders, and headache. Classification models were implemented relying on preprocessed data sets; supervised learning; and the algorithms multilayer perceptron (MLP), gradient boosting machine (GBM), decision tree (DT), random forest (RF), extreme gradient boosting (XGBoost), k-nearest neighbors (KNN), support vector machine (SVM), and logistic regression (LR). The models' performances were analyzed using 10-fold cross-validation, classification metrics, and the Friedman and Nemenyi statistical tests. The permutation feature importance method was applied for ranking the features used by the classification models with the highest performances.

Results: Gender, fever, and dyspnea were among the highest-ranked features used by the classification models. The comparative analysis presents MLP, GBM, DT, RF, XGBoost, and SVM as the highest performance models with similar results. KNN and LR were outperformed by the other algorithms. Applying the easy interpretability as an additional comparison criterion, the DT was considered the most suitable model.

Conclusions: The DT classification model can effectively (with a mean accuracy≥89.12%) assist COVID-19 test prioritization in Brazil. The model can be applied to recommend the prioritizing of a patient who is symptomatic for COVID-19 testing.

Keywords: COVID-19; classification models; medical diagnosis; test prioritization.

©Íris Viana dos Santos Santana, Andressa CM da Silveira, Álvaro Sobrinho, Lenardo Chaves e Silva, Leandro Dias da Silva, Danilo F S Santos, Edmar C Gurjão, Angelo Perkusich. Originally published in the Journal of Medical Internet Research (http://www.jmir.org), 08.04.2021.

PubMed Disclaimer

Conflict of interest statement

Conflicts of Interest: None declared.

Figures

**Figure 1**
Overview of the research methodology applied for the study. The methodological steps consist of data preprocessing, the definition of new data sets, English translation, feature selection, 10-fold cross-validation, statistical comparisons, and feature ranking. AUPR: area under the precision-recall curve; AUROC: area under the receiver operating characteristic curve; DT: decision tree; GBM: gradient boosting machine; KNN: k-nearest neighbors; LR: logistic regression (weak regularization); LRR: logistic regression (strong regularization); MLP: multilayer perceptron; RF: random forest; RT-PCR: reverse transcription polymerase chain reaction; SVM: support vector machine; XGBoost: extreme gradient boosting.

**Figure 2**
Correlation matrix for (A) RT-PCR unbalanced data set, (B) RT-PCR balanced data set, (C) rapid unbalanced data set, (D) rapid balanced data set, (E) both unbalanced data set, and (F) both balanced data set. RT-PCR: reverse transcription polymerase chain reaction.

**Figure 3**
(A) The frequency of symptoms for the 20,021 patients who were symptomatic of the both unbalanced data set and the number of CCs. Top values are frequencies; numbers on the geometric forms are the CC for frequency. (B) The frequency of symptoms for the 3128 patients who were symptomatic of the both balanced data set and the number of CCs.

**Figure 4**
The models' ROC curves with (A) RT-PCR unbalanced, (B) RT-PCR balanced, (C) rapid unbalanced, (D) rapid balanced, (E) both unbalanced, and (F) both balanced. AUC: area under the receiver operating characteristic curve; GBM: gradient boosting machine; KNN: k-nearest neighbors; LR: logistic regression (weak regularization); LRR: logistic regression (strong regularization); Mlp: multilayer perceptron; ROC: receiver operating characteristic; RT-PCR: reverse transcription polymerase chain reaction; SVM: support vector machine; XGBoost: extreme gradient boosting.

**Figure 5**
Models' precision-recall curve with (A) RT-PCR unbalanced data set, (B) rapid unbalanced data set, and (C) both unbalanced data set. AP: average precision; GBM: gradient boosting machine; KNN: k-nearest neighbors; LR: logistic regression (weak regularization); LRR: logistic regression (strong regularization); Mlp: multilayer perceptron; PR: precision-recall; RT-PCR: reverse transcription polymerase chain reaction; SVM: support vector machine; XGBoost: extreme gradient boosting.

**Figure 6**
(A) The mean recall for the MLP, GBM, RF, DT, XGBoost, KNN, SVM, LRR, and LR classification models using the unbalanced data sets for RT-PCR, rapid, and both types. (B) The mean recall for the MLP, GBM, RF, DT, XGBoost, KNN, SVM, LRR, and LR classification models using the balanced data sets for RT-PCR, rapid, and both types. DT: decision tree; GBM: gradient boosting machine; KNN: k-nearest neighbors; LR: logistic regression (weak regularization); LRR: logistic regression (strong regularization); MLP: multilayer perceptron; RF: random forest; RT-PCR: reverse transcription polymerase chain reaction; SVM: support vector machine; XGBoost: extreme gradient boosting.

**Figure 7**
An application scenario to connect the decision tree classification model with a clinical workflow. The model guides the test prioritization of patients who were symptomatic suspected of COVID-19. RT-PCR: reverse transcription polymerase chain reaction.

See this image and copyright information in PMC

Cited by

Web-Based Skin Cancer Assessment and Classification Using Machine Learning and Mobile Computerized Adaptive Testing in a Rasch Model: Development Study.
Yang TY, Chien TW, Lai FJ. Yang TY, et al. JMIR Med Inform. 2022 Mar 9;10(3):e33006. doi: 10.2196/33006. JMIR Med Inform. 2022. PMID: 35262505 Free PMC article.
COVID-19 Diagnosis from Chest X-ray Images Using a Robust Multi-Resolution Analysis Siamese Neural Network with Super-Resolution Convolutional Neural Network.
Monday HN, Li J, Nneji GU, Nahar S, Hossin MA, Jackson J, Ejiyi CJ. Monday HN, et al. Diagnostics (Basel). 2022 Mar 18;12(3):741. doi: 10.3390/diagnostics12030741. Diagnostics (Basel). 2022. PMID: 35328294 Free PMC article.
An overview of deep learning techniques for COVID-19 detection: methods, challenges, and future works.
Gürsoy E, Kaya Y. Gürsoy E, et al. Multimed Syst. 2023;29(3):1603-1627. doi: 10.1007/s00530-023-01083-0. Epub 2023 Mar 25. Multimed Syst. 2023. PMID: 37261262 Free PMC article.
Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.
Struyf T, Deeks JJ, Dinnes J, Takwoingi Y, Davenport C, Leeflang MM, Spijker R, Hooft L, Emperador D, Domen J, Tans A, Janssens S, Wickramasinghe D, Lannoy V, Horn SRA, Van den Bruel A; Cochrane COVID-19 Diagnostic Test Accuracy Group. Struyf T, et al. Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3. Cochrane Database Syst Rev. 2022. PMID: 35593186 Free PMC article.
Radiomics models to predict bone marrow metastasis of neuroblastoma using CT.
Chen X, Chen Q, Liu Y, Qiu Y, Lv L, Zhang Z, Yin X, Shu F. Chen X, et al. Cancer Innov. 2024 Jun 28;3(5):e135. doi: 10.1002/cai2.135. eCollection 2024 Oct. Cancer Innov. 2024. PMID: 38948899 Free PMC article.

See all "Cited by" articles

References

1. Belard A, Buchman T, Forsberg J, Potter BK, Dente CJ, Kirk A, Elster E. Precision diagnosis: a view of the clinical decision support systems (CDSS) landscape through the lens of critical care. J Clin Monit Comput. 2017 Apr;31(2):261–271. doi: 10.1007/s10877-016-9849-1. - DOI - PubMed
1. Elhoseny M, Abdelaziz A, Salama AS, Riad A, Muhammad K, Sangaiah AK. A hybrid model of Internet of Things and cloud computing to manage big data in health services applications. Future Generation Computer Syst. 2018 Sep;86:1383–1394. doi: 10.1016/j.future.2018.03.005. - DOI
1. Chatterjee A, Gerdes MW, Martinez S. eHealth initiatives for the promotion of healthy lifestyle and allied implementation difficulties. International Conference on Wireless and Mobile Computing, Networking and Communications; October 21-23, 2019; Barcelona, Spain. 2019. pp. 1–8. - DOI
1. Zoabi Y, Deri-Rozov S, Shomron N. Machine learning-based prediction of COVID-19 diagnosis based on symptoms. NPJ Digit Med. 2021 Jan 04;4(1):3. doi: 10.1038/s41746-020-00372-6. - DOI - PMC - PubMed
1. Guimarães VHA, de Oliveira-Leandro M, Cassiano C, Marques ALP, Motta C, Freitas-Silva AL, de Sousa MAD, Silveira LAM, Pardi TC, Gazotto FC, Silva MV, Rodrigues V, Rodrigues WF, Oliveira CJF. Knowledge about COVID-19 in Brazil: cross-sectional web-based study. JMIR Public Health Surveill. 2021 Jan 21;7(1):e24756. doi: 10.2196/24756. https://publichealth.jmir.org/2021/1/e24756/ - DOI - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations
Medical
- MedlinePlus Consumer Health Information
- MedlinePlus Health Information

[1] Belard A, Buchman T, Forsberg J, Potter BK, Dente CJ, Kirk A, Elster E. Precision diagnosis: a view of the clinical decision support systems (CDSS) landscape through the lens of critical care. J Clin Monit Comput. 2017 Apr;31(2):261–271. doi: 10.1007/s10877-016-9849-1. - DOI - PubMed

[2] Belard A, Buchman T, Forsberg J, Potter BK, Dente CJ, Kirk A, Elster E. Precision diagnosis: a view of the clinical decision support systems (CDSS) landscape through the lens of critical care. J Clin Monit Comput. 2017 Apr;31(2):261–271. doi: 10.1007/s10877-016-9849-1. - DOI - PubMed

[3] Elhoseny M, Abdelaziz A, Salama AS, Riad A, Muhammad K, Sangaiah AK. A hybrid model of Internet of Things and cloud computing to manage big data in health services applications. Future Generation Computer Syst. 2018 Sep;86:1383–1394. doi: 10.1016/j.future.2018.03.005. - DOI

[4] Elhoseny M, Abdelaziz A, Salama AS, Riad A, Muhammad K, Sangaiah AK. A hybrid model of Internet of Things and cloud computing to manage big data in health services applications. Future Generation Computer Syst. 2018 Sep;86:1383–1394. doi: 10.1016/j.future.2018.03.005. - DOI

[5] Chatterjee A, Gerdes MW, Martinez S. eHealth initiatives for the promotion of healthy lifestyle and allied implementation difficulties. International Conference on Wireless and Mobile Computing, Networking and Communications; October 21-23, 2019; Barcelona, Spain. 2019. pp. 1–8. - DOI

[6] Chatterjee A, Gerdes MW, Martinez S. eHealth initiatives for the promotion of healthy lifestyle and allied implementation difficulties. International Conference on Wireless and Mobile Computing, Networking and Communications; October 21-23, 2019; Barcelona, Spain. 2019. pp. 1–8. - DOI

[7] Zoabi Y, Deri-Rozov S, Shomron N. Machine learning-based prediction of COVID-19 diagnosis based on symptoms. NPJ Digit Med. 2021 Jan 04;4(1):3. doi: 10.1038/s41746-020-00372-6. - DOI - PMC - PubMed

[8] Zoabi Y, Deri-Rozov S, Shomron N. Machine learning-based prediction of COVID-19 diagnosis based on symptoms. NPJ Digit Med. 2021 Jan 04;4(1):3. doi: 10.1038/s41746-020-00372-6. - DOI - PMC - PubMed

[9] Guimarães VHA, de Oliveira-Leandro M, Cassiano C, Marques ALP, Motta C, Freitas-Silva AL, de Sousa MAD, Silveira LAM, Pardi TC, Gazotto FC, Silva MV, Rodrigues V, Rodrigues WF, Oliveira CJF. Knowledge about COVID-19 in Brazil: cross-sectional web-based study. JMIR Public Health Surveill. 2021 Jan 21;7(1):e24756. doi: 10.2196/24756. https://publichealth.jmir.org/2021/1/e24756/ - DOI - PMC - PubMed

[10] Guimarães VHA, de Oliveira-Leandro M, Cassiano C, Marques ALP, Motta C, Freitas-Silva AL, de Sousa MAD, Silveira LAM, Pardi TC, Gazotto FC, Silva MV, Rodrigues V, Rodrigues WF, Oliveira CJF. Knowledge about COVID-19 in Brazil: cross-sectional web-based study. JMIR Public Health Surveill. 2021 Jan 21;7(1):e24756. doi: 10.2196/24756. https://publichealth.jmir.org/2021/1/e24756/ - DOI - PMC - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Classification Models for COVID-19 Test Prioritization in Brazil: Machine Learning Approach

Affiliations

Classification Models for COVID-19 Test Prioritization in Brazil: Machine Learning Approach

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

MeSH terms

LinkOut - more resources

Full Text Sources

Other Literature Sources

Medical