Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Feb 12;2(2):196-208.e4.
doi: 10.1016/j.medj.2020.10.002. Epub 2020 Oct 10.

A Prediction Model to Prioritize Individuals for a SARS-CoV-2 Test Built from National Symptom Surveys

Affiliations

A Prediction Model to Prioritize Individuals for a SARS-CoV-2 Test Built from National Symptom Surveys

Saar Shoer et al. Med. .

Abstract

Background: The gold standard for COVID-19 diagnosis is detection of viral RNA through PCR. Due to global limitations in testing capacity, effective prioritization of individuals for testing is essential.

Methods: We devised a model estimating the probability of an individual to test positive for COVID-19 based on answers to 9 simple questions that have been associated with SARS-CoV-2 infection. Our model was devised from a subsample of a national symptom survey that was answered over 2 million times in Israel in its first 2 months and a targeted survey distributed to all residents of several cities in Israel. Overall, 43,752 adults were included, from which 498 self-reported as being COVID-19 positive.

Findings: Our model was validated on a held-out set of individuals from Israel where it achieved an auROC of 0.737 (CI: 0.712-0.759) and auPR of 0.144 (CI: 0.119-0.177) and demonstrated its applicability outside of Israel in an independently collected symptom survey dataset from the US, UK, and Sweden. Our analyses revealed interactions between several symptoms and age, suggesting variation in the clinical manifestation of the disease in different age groups.

Conclusions: Our tool can be used online and without exposure to suspected patients, thus suggesting worldwide utility in combating COVID-19 by better directing the limited testing resources through prioritization of individuals for testing, thereby increasing the rate at which positive individuals can be identified. Moreover, individuals at high risk for a positive test result can be isolated prior to testing.

Funding: E.S. is supported by the Crown Human Genome Center, Larson Charitable Foundation New Scientist Fund, Else Kroener Fresenius Foundation, White Rose International Foundation, Ben B. and Joyce E. Eisenberg Foundation, Nissenbaum Family, Marcos Pinheiro de Andrade and Vanessa Buchheim, Lady Michelle Michels, and Aliza Moussaieff and grants funded by the Minerva foundation with funding from the Federal German Ministry for Education and Research and by the European Research Council and the Israel Science Foundation. H.R. is supported by the Israeli Council for Higher Education (CHE) via the Weizmann Data Science Research Center and by a research grant from Madame Olga Klein - Astrachan.

Keywords: Artificial Intelligence; COVID-19; Diagnosis; Health Policies; Machine Learning; SARS-CoV-2.

PubMed Disclaimer

Conflict of interest statement

The study protocol was approved by the Weizmann Institute of Science review board (IRB). Informed consent was waived by the IRB, as all identifying details of the participants were removed before the computational analysis. Participants were made fully aware of the way in which the data will be stored, handled, and shared, which was provided to them and is in accordance with the privacy and data-protection policy of the Weizmann Institute of Science (https://www.weizmann.ac.il/pages/privacy-policy).

Figures

None
Graphical abstract
Figure 1
Figure 1
Study Population Flow Chart Numbers represent recorded responses. Blue colored boxes show responses that were used in extended features model (top) and primary model (bottom) constructions.
Figure 2
Figure 2
Primary Model Performance (A–C) Logistic Regression. (D–F) Gradient Boosting Decision Trees. auROC/auPR, area under the ROC/PR curve; ROC, receiver operator characteristic; PR, precision recall. Confidence intervals are in parenthesis. (A and D) ROC curve of our model consisting of 9 simple questions. (B and E) Precision-recall curve of our model. (C and F) Calibration curve. Top: blue dots represent deciles of predicted probabilities. The dotted diagonal line represents an ideal calibration. Bottom: log-scaled histogram of predicted probabilities of COVID-19 undiagnosed (green) and diagnosed (red). See also Figure S1 and Tables S3–S5.
Figure 3
Figure 3
Comparison of Primary Model Predictions to New COVID-19 Cases in Israel over Time (A) Primary model predictions, averaged across all individuals on a 3-day running average (solid blue) and shifted 4 days forward (dotted blue), compared to the number of newly confirmed COVID-19 cases in Israel by the ministry of health, based on a 3-day running average. (B) Number of survey responses per day.
Figure 4
Figure 4
Primary Model Performance on an Independently Collected Dataset from the US, UK, and Sweden (A) Area under the receiver operator characteristic curve (auROC) (purple). (B) Area under the precision-recall curve (auPR) (orange). (C) Number of survey responses per day. (D) Receiver operator characteristic curve of our model consisting of 9 simple questions. (E) Precision-recall curve of our model. (F) Calibration curve. Top: blue dots represent deciles of predicted probabilities. Dotted diagonal line represents an ideal calibration. Bottom: log-scaled histogram of predicted probabilities of COVID-19 undiagnosed (green) and diagnosed (red). Error bars represent CI. See also Table S4.
Figure 5
Figure 5
Feature Contribution Analysis Mean absolute Shapley value (in units of log-odds) of (A) the primary model, including all features used in the model, and (B) the extended features model, for the 13 highest contributing features. See also Figure S2 and Table S6.
Figure 6
Figure 6
Feature Interpretation Analysis (A) SHAP values (in units of log-odds) for positive report of a feature colored in red, negative report of a feature colored in blue, and missing answers in gray. (B) SHAP values for age with number of responses as a histogram at the bottom. (C–F) SHAP dependence plot of age versus its SHAP value in the model, stratified by positive (red) and negative (blue) responses of loss of taste or smell (C), cough (D), shortness of breath (E), and sore throat (F). (G–J) SHAP interaction values of age with positive (red) and negative (blue) responses of loss of taste or smell (G), cough (H), shortness of breath (I), and sore throat (J). Error bars represent SD.

References

    1. Xie J., Tong Z., Guan X., Du B., Qiu H., Slutsky A.S. Critical care crisis and some recommendations during the COVID-19 epidemic in China. Intensive Care Med. 2020;46:837–840. - PMC - PubMed
    1. Grasselli G., Pesenti A., Cecconi M. Critical Care Utilization for the COVID-19 Outbreak in Lombardy, Italy: Early Experience and Forecast During an Emergency Response. JAMA. 2020;323:1545–1546. - PubMed
    1. Kucirka L.M., Lauer S.A., Laeyendecker O., Boon D., Lessler J. Variation in False-Negative Rate of Reverse Transcriptase Polymerase Chain Reaction-Based SARS-CoV-2 Tests by Time Since Exposure. Ann. Intern. Med. 2020;173:262–267. - PMC - PubMed
    1. Coronavirus Testing Basics https://www.fda.gov/consumers/consumer-updates/coronavirus-testing-basics.
    1. Sheridan C. COVID-19 spurs wave of innovative diagnostics. Nat. Biotechnol. 2020;38:769–772. - PubMed

Publication types