. 2021 Feb 12;2(2):196-208.e4.

doi: 10.1016/j.medj.2020.10.002. Epub 2020 Oct 10.

A Prediction Model to Prioritize Individuals for a SARS-CoV-2 Test Built from National Symptom Surveys

Saar Shoer^{1

2}, Tal Karady^{1

2}, Ayya Keshet^{1

2}, Smadar Shilo^{1

2

3}, Hagai Rossman^{1

2}, Amir Gavrieli^{1

2}, Tomer Meir^{1

2}, Amit Lavon^{1

2}, Dmitry Kolobkov^{1

2}, Iris Kalka^{1

2}, Anastasia Godneva^{1

2}, Ori Cohen^{1

2}, Adam Kariv⁴, Ori Hoch⁴, Mushon Zer-Aviv⁴, Noam Castel⁴, Carole Sudre⁵, Anat Ekka Zohar⁶, Angela Irony⁶, Tim Spector⁵, Benjamin Geiger², Dorit Hizi⁴, Varda Shalev^{6

7}, Ran Balicer⁸, Eran Segal^{1

2}

Affiliations

¹ Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot, Israel.
² Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot, Israel.
³ Pediatric Diabetes Unit, Ruth Rappaport Children's Hospital, Rambam Healthcare Campus, Haifa, Israel.
⁴ The public knowledge workshop, Tel Aviv, Israel.
⁵ Department of Twin Research, King's College London, London, UK.
⁶ Epidemiology and Database Research Unit, Maccabi Healthcare Services, Tel Aviv, Israel.
⁷ School of Public Health, Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel.
⁸ Clalit Research Institute, Clalit Health Services, Tel Aviv, Israel.

PMID: 33073258
PMCID: PMC7547576
DOI: 10.1016/j.medj.2020.10.002

A Prediction Model to Prioritize Individuals for a SARS-CoV-2 Test Built from National Symptom Surveys

Saar Shoer et al. Med. 2021.

. 2021 Feb 12;2(2):196-208.e4.

doi: 10.1016/j.medj.2020.10.002. Epub 2020 Oct 10.

Authors

Affiliations

¹ Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot, Israel.
² Department of Molecular Cell Biology, Weizmann Institute of Science, Rehovot, Israel.
³ Pediatric Diabetes Unit, Ruth Rappaport Children's Hospital, Rambam Healthcare Campus, Haifa, Israel.
⁴ The public knowledge workshop, Tel Aviv, Israel.
⁵ Department of Twin Research, King's College London, London, UK.
⁶ Epidemiology and Database Research Unit, Maccabi Healthcare Services, Tel Aviv, Israel.
⁷ School of Public Health, Sackler Faculty of Medicine, Tel Aviv University, Tel Aviv, Israel.
⁸ Clalit Research Institute, Clalit Health Services, Tel Aviv, Israel.

PMID: 33073258
PMCID: PMC7547576
DOI: 10.1016/j.medj.2020.10.002

Abstract

Background: The gold standard for COVID-19 diagnosis is detection of viral RNA through PCR. Due to global limitations in testing capacity, effective prioritization of individuals for testing is essential.

Methods: We devised a model estimating the probability of an individual to test positive for COVID-19 based on answers to 9 simple questions that have been associated with SARS-CoV-2 infection. Our model was devised from a subsample of a national symptom survey that was answered over 2 million times in Israel in its first 2 months and a targeted survey distributed to all residents of several cities in Israel. Overall, 43,752 adults were included, from which 498 self-reported as being COVID-19 positive.

Findings: Our model was validated on a held-out set of individuals from Israel where it achieved an auROC of 0.737 (CI: 0.712-0.759) and auPR of 0.144 (CI: 0.119-0.177) and demonstrated its applicability outside of Israel in an independently collected symptom survey dataset from the US, UK, and Sweden. Our analyses revealed interactions between several symptoms and age, suggesting variation in the clinical manifestation of the disease in different age groups.

Conclusions: Our tool can be used online and without exposure to suspected patients, thus suggesting worldwide utility in combating COVID-19 by better directing the limited testing resources through prioritization of individuals for testing, thereby increasing the rate at which positive individuals can be identified. Moreover, individuals at high risk for a positive test result can be isolated prior to testing.

Funding: E.S. is supported by the Crown Human Genome Center, Larson Charitable Foundation New Scientist Fund, Else Kroener Fresenius Foundation, White Rose International Foundation, Ben B. and Joyce E. Eisenberg Foundation, Nissenbaum Family, Marcos Pinheiro de Andrade and Vanessa Buchheim, Lady Michelle Michels, and Aliza Moussaieff and grants funded by the Minerva foundation with funding from the Federal German Ministry for Education and Research and by the European Research Council and the Israel Science Foundation. H.R. is supported by the Israeli Council for Higher Education (CHE) via the Weizmann Data Science Research Center and by a research grant from Madame Olga Klein - Astrachan.

Keywords: Artificial Intelligence; COVID-19; Diagnosis; Health Policies; Machine Learning; SARS-CoV-2.

PubMed Disclaimer

Conflict of interest statement

The study protocol was approved by the Weizmann Institute of Science review board (IRB). Informed consent was waived by the IRB, as all identifying details of the participants were removed before the computational analysis. Participants were made fully aware of the way in which the data will be stored, handled, and shared, which was provided to them and is in accordance with the privacy and data-protection policy of the Weizmann Institute of Science (https://www.weizmann.ac.il/pages/privacy-policy).

Figures

**Figure 1**
Study Population Flow Chart Numbers represent recorded responses. Blue colored boxes show responses that were used in extended features model (top) and primary model (bottom) constructions.

**Figure 2**
Primary Model Performance (A–C) Logistic Regression. (D–F) Gradient Boosting Decision Trees. auROC/auPR, area under the ROC/PR curve; ROC, receiver operator characteristic; PR, precision recall. Confidence intervals are in parenthesis. (A and D) ROC curve of our model consisting of 9 simple questions. (B and E) Precision-recall curve of our model. (C and F) Calibration curve. Top: blue dots represent deciles of predicted probabilities. The dotted diagonal line represents an ideal calibration. Bottom: log-scaled histogram of predicted probabilities of COVID-19 undiagnosed (green) and diagnosed (red). See also Figure S1 and Tables S3–S5.

**Figure 3**
Comparison of Primary Model Predictions to New COVID-19 Cases in Israel over Time (A) Primary model predictions, averaged across all individuals on a 3-day running average (solid blue) and shifted 4 days forward (dotted blue), compared to the number of newly confirmed COVID-19 cases in Israel by the ministry of health, based on a 3-day running average. (B) Number of survey responses per day.

**Figure 4**
Primary Model Performance on an Independently Collected Dataset from the US, UK, and Sweden (A) Area under the receiver operator characteristic curve (auROC) (purple). (B) Area under the precision-recall curve (auPR) (orange). (C) Number of survey responses per day. (D) Receiver operator characteristic curve of our model consisting of 9 simple questions. (E) Precision-recall curve of our model. (F) Calibration curve. Top: blue dots represent deciles of predicted probabilities. Dotted diagonal line represents an ideal calibration. Bottom: log-scaled histogram of predicted probabilities of COVID-19 undiagnosed (green) and diagnosed (red). Error bars represent CI. See also Table S4.

**Figure 5**
Feature Contribution Analysis Mean absolute Shapley value (in units of log-odds) of (A) the primary model, including all features used in the model, and (B) the extended features model, for the 13 highest contributing features. See also Figure S2 and Table S6.

**Figure 6**
Feature Interpretation Analysis (A) SHAP values (in units of log-odds) for positive report of a feature colored in red, negative report of a feature colored in blue, and missing answers in gray. (B) SHAP values for age with number of responses as a histogram at the bottom. (C–F) SHAP dependence plot of age versus its SHAP value in the model, stratified by positive (red) and negative (blue) responses of loss of taste or smell (C), cough (D), shortness of breath (E), and sore throat (F). (G–J) SHAP interaction values of age with positive (red) and negative (blue) responses of loss of taste or smell (G), cough (H), shortness of breath (I), and sore throat (J). Error bars represent SD.

See this image and copyright information in PMC

References

1. Xie J., Tong Z., Guan X., Du B., Qiu H., Slutsky A.S. Critical care crisis and some recommendations during the COVID-19 epidemic in China. Intensive Care Med. 2020;46:837–840. - PMC - PubMed
1. Grasselli G., Pesenti A., Cecconi M. Critical Care Utilization for the COVID-19 Outbreak in Lombardy, Italy: Early Experience and Forecast During an Emergency Response. JAMA. 2020;323:1545–1546. - PubMed
1. Kucirka L.M., Lauer S.A., Laeyendecker O., Boon D., Lessler J. Variation in False-Negative Rate of Reverse Transcriptase Polymerase Chain Reaction-Based SARS-CoV-2 Tests by Time Since Exposure. Ann. Intern. Med. 2020;173:262–267. - PMC - PubMed
1. Coronavirus Testing Basics https://www.fda.gov/consumers/consumer-updates/coronavirus-testing-basics.
1. Sheridan C. COVID-19 spurs wave of innovative diagnostics. Nat. Biotechnol. 2020;38:769–772. - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations
Medical
- MedlinePlus Health Information
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

A Prediction Model to Prioritize Individuals for a SARS-CoV-2 Test Built from National Symptom Surveys

Affiliations

A Prediction Model to Prioritize Individuals for a SARS-CoV-2 Test Built from National Symptom Surveys

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Other Literature Sources

Medical

Miscellaneous