A machine learning study of COVID-19 serology and molecular tests and predictions

doi:10.1016/j.smhl.2022.100331

. 2022 Dec:26:100331.

doi: 10.1016/j.smhl.2022.100331. Epub 2022 Oct 20.

A machine learning study of COVID-19 serology and molecular tests and predictions

Magdalyn E Elkin¹, Xingquan Zhu¹

Affiliations

PMID: 36281350
PMCID: PMC9583626
DOI: 10.1016/j.smhl.2022.100331

A machine learning study of COVID-19 serology and molecular tests and predictions

Magdalyn E Elkin et al. Smart Health (Amst). 2022 Dec.

. 2022 Dec:26:100331.

doi: 10.1016/j.smhl.2022.100331. Epub 2022 Oct 20.

Authors

Magdalyn E Elkin¹, Xingquan Zhu¹

Affiliation

¹ Department of Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, FL 33431, USA.

PMID: 36281350
PMCID: PMC9583626
DOI: 10.1016/j.smhl.2022.100331

Abstract

Serology and molecular tests are the two most commonly used methods for rapid COVID-19 infection testing. The two types of tests have different mechanisms to detect infection, by measuring the presence of viral SARS-CoV-2 RNA (molecular test) or detecting the presence of antibodies triggered by the SARS-CoV-2 virus (serology test). A handful of studies have shown that symptoms, combined with demographic and/or diagnosis features, can be helpful for the prediction of COVID-19 test outcomes. However, due to nature of the test, serology and molecular tests vary significantly. There is no existing study on the correlation between serology and molecular tests, and what type of symptoms are the key factors indicating the COVID-19 positive tests. In this study, we propose a machine learning based approach to study serology and molecular tests, and use features to predict test outcomes. A total of 2,467 donors, each tested using one or multiple types of COVID-19 tests, are collected as our testbed. By cross checking test types and results, we study correlation between serology and molecular tests. For test outcome prediction, we label 2,467 donors as positive or negative, by using their serology or molecular test results, and create symptom features to represent each donor for learning. Because COVID-19 produces a wide range of symptoms and the data collection process is essentially error prone, we group similar symptoms into bins. This decreases the feature space and sparsity. Using binned symptoms, combined with demographic features, we train five classification algorithms to predict COVID-19 test results. Experiments show that XGBoost achieves the best performance with 76.85% accuracy and 81.4% AUC scores, demonstrating that symptoms are indeed helpful for predicting COVID-19 test outcomes. Our study investigates the relationship between serology and molecular tests, identifies meaningful symptom features associated with COVID-19 infection, and also provides a way for rapid screening and cost effective detection of COVID-19 infection.

Keywords: 68T05; 68T50; 92C50; 92C55; 92C60; COVID-19; Classification; Machine Learning; Molecular test; Serology test; Symptoms.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Figures

**Fig. 1**
A conceptual view of random forest and its prediction mechanism. The forest contains 200 trees, each tree is created using a subset of randomly selected features (each node color/intensity denotes a unique feature). For each test input, the predictions from all trees are combined to generate final prediction.

**Fig. 2**
Venn diagram to demonstrate (a) Samples that received 1 or more of IgG, IgM, IgA or molecular testing; (b) samples that were tested positive for IgG, IgM, IgA or molecular testing. Venn diagrams are constructed using Venny (Oliveros, 2021).

**Fig. 3**
Kernel Density Estimation plot of days PSO with respect to (a) Samples with molecular testing; (b) Samples with serology testing.

**Fig. 4**
Statistical tests of different test mechanisms. (a) Chi-Squared test to show statistical significance between pairs of COVID-19 diagnostic tests. The symbol ** indicates the result was statistically significant with $p < 0.001$ . No sample in the dataset has both IgA and molecular test results, thus the corresponding cell is empty. (b) Pearson correlation matrix of optical density values from IgG, IgM and IgA tests.

**Fig. 5**
Pairwise Kernel Density Estimation of optical density values from (a) samples tested with EuroImmun COVID-19 IgG vs EDI COVID-19 IgM; and (b) samples tested with EuroImmun COVID-19 IgG vs EuroImmun COVID-19 IgA. The three tests are all ELISA tests. The samples are color coded to indicate their result interpretation. Darker colored densities indicate more samples in the area. G+ indicates IgG positive, G- indicates IgG negative; M+ indicates IgM positive, M- indicates IgM negative; A+ indicates IgA positive, A- indicates IgA negative.

**Fig. 6**
Boxplot distribution of days PSO with respect to samples tested on IgA, IgM and IgG.

**Fig. 7**
Top 15 most informative features from Random Forest model.

**Fig. 8**
Kernel Density Estimation plot of fever temperature with respect to (a) all samples; (b) samples with fever only.

**Fig. 9**
Receiver Operating Characteristic (ROC) curves for the five classification models.

**Fig. 10**
A COVID-19 prediction decision tree learned from Random Forest.

See this image and copyright information in PMC

References

1. Ahamad M.M., Aktar S., Rashed-Al-Mahfuz M., Uddin S., Liò P., Xu H., et al. A machine learning model to identify early stage symptoms of SARS-Cov-2 infected patients. Expert Systems with Applications. 2020;160 doi: 10.1016/j.eswa.2020.113661. - DOI - PMC - PubMed
1. Alimohamadi Y., Sepandi M., Taghdir M., Hosamirudsari H. Determine the most common clinical symptoms in COVID-19 patients: a systematic review and meta-analysis. Journal of Preventive Medicine and Hygiene. 2020;61(3):E304–E312. doi: 10.15167/2421-4248/jpmh2020.61.3.1530. - DOI - PMC - PubMed
1. Bishop C.M. 781058134; Springer: 2009. Pattern recognition and machine learning. OCLC.
1. Böger B., et al. Systematic review with meta-analysis of the accuracy of diagnostic tests for COVID-19. American Journal of Infection Control. 2021;49(1):21–29. doi: 10.1016/j.ajic.2020.07.011. - DOI - PMC - PubMed
1. Chen, T., & Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System. In Proc. of the 22nd ACM SIGKDD Conf. (pp. 785–794). New York, NY, USA: ISBN: 9781450342322, 10.1145/2939672.2939785. - DOI

LinkOut - more resources

Full Text Sources
Miscellaneous
- NCI CPTAC Assay Portal

[1] Ahamad M.M., Aktar S., Rashed-Al-Mahfuz M., Uddin S., Liò P., Xu H., et al. A machine learning model to identify early stage symptoms of SARS-Cov-2 infected patients. Expert Systems with Applications. 2020;160 doi: 10.1016/j.eswa.2020.113661. - DOI - PMC - PubMed

[2] Ahamad M.M., Aktar S., Rashed-Al-Mahfuz M., Uddin S., Liò P., Xu H., et al. A machine learning model to identify early stage symptoms of SARS-Cov-2 infected patients. Expert Systems with Applications. 2020;160 doi: 10.1016/j.eswa.2020.113661. - DOI - PMC - PubMed

[3] Alimohamadi Y., Sepandi M., Taghdir M., Hosamirudsari H. Determine the most common clinical symptoms in COVID-19 patients: a systematic review and meta-analysis. Journal of Preventive Medicine and Hygiene. 2020;61(3):E304–E312. doi: 10.15167/2421-4248/jpmh2020.61.3.1530. - DOI - PMC - PubMed

[4] Alimohamadi Y., Sepandi M., Taghdir M., Hosamirudsari H. Determine the most common clinical symptoms in COVID-19 patients: a systematic review and meta-analysis. Journal of Preventive Medicine and Hygiene. 2020;61(3):E304–E312. doi: 10.15167/2421-4248/jpmh2020.61.3.1530. - DOI - PMC - PubMed

[5] Bishop C.M. 781058134; Springer: 2009. Pattern recognition and machine learning. OCLC.

[6] Bishop C.M. 781058134; Springer: 2009. Pattern recognition and machine learning. OCLC.

[7] Böger B., et al. Systematic review with meta-analysis of the accuracy of diagnostic tests for COVID-19. American Journal of Infection Control. 2021;49(1):21–29. doi: 10.1016/j.ajic.2020.07.011. - DOI - PMC - PubMed

[8] Böger B., et al. Systematic review with meta-analysis of the accuracy of diagnostic tests for COVID-19. American Journal of Infection Control. 2021;49(1):21–29. doi: 10.1016/j.ajic.2020.07.011. - DOI - PMC - PubMed

[9] Chen, T., & Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System. In Proc. of the 22nd ACM SIGKDD Conf. (pp. 785–794). New York, NY, USA: ISBN: 9781450342322, 10.1145/2939672.2939785. - DOI

[10] Chen, T., & Guestrin, C. (2016). XGBoost: A Scalable Tree Boosting System. In Proc. of the 22nd ACM SIGKDD Conf. (pp. 785–794). New York, NY, USA: ISBN: 9781450342322, 10.1145/2939672.2939785. - DOI

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

A machine learning study of COVID-19 serology and molecular tests and predictions

Affiliation

A machine learning study of COVID-19 serology and molecular tests and predictions

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

Similar articles

References

LinkOut - more resources

Full Text Sources

Miscellaneous