Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Jul 1;44(8):135.
doi: 10.1007/s10916-020-01597-4.

Detection of COVID-19 Infection from Routine Blood Exams with Machine Learning: A Feasibility Study

Affiliations

Detection of COVID-19 Infection from Routine Blood Exams with Machine Learning: A Feasibility Study

Davide Brinati et al. J Med Syst. .

Abstract

The COVID-19 pandemia due to the SARS-CoV-2 coronavirus, in its first 4 months since its outbreak, has to date reached more than 200 countries worldwide with more than 2 million confirmed cases (probably a much higher number of infected), and almost 200,000 deaths. Amplification of viral RNA by (real time) reverse transcription polymerase chain reaction (rRT-PCR) is the current gold standard test for confirmation of infection, although it presents known shortcomings: long turnaround times (3-4 hours to generate results), potential shortage of reagents, false-negative rates as large as 15-20%, the need for certified laboratories, expensive equipment and trained personnel. Thus there is a need for alternative, faster, less expensive and more accessible tests. We developed two machine learning classification models using hematochemical values from routine blood exams (namely: white blood cells counts, and the platelets, CRP, AST, ALT, GGT, ALP, LDH plasma levels) drawn from 279 patients who, after being admitted to the San Raffaele Hospital (Milan, Italy) emergency-room with COVID-19 symptoms, were screened with the rRT-PCR test performed on respiratory tract specimens. Of these patients, 177 resulted positive, whereas 102 received a negative response. We have developed two machine learning models, to discriminate between patients who are either positive or negative to the SARS-CoV-2: their accuracy ranges between 82% and 86%, and sensitivity between 92% e 95%, so comparably well with respect to the gold standard. We also developed an interpretable Decision Tree model as a simple decision aid for clinician interpreting blood tests (even off-line) for COVID-19 suspect cases. This study demonstrated the feasibility and clinical soundness of using blood tests analysis and machine learning as an alternative to rRT-PCR for identifying COVID-19 positive patients. This is especially useful in those countries, like developing ones, suffering from shortages of rRT-PCR reagents and specialized laboratories. We made available a Web-based tool for clinical reference and evaluation (This tool is available at https://covid19-blood-ml.herokuapp.com/ ).

Keywords: Blood tests; COVID-19; Machine learning; RT-PCR test; Random forest; Three-way.

PubMed Disclaimer

Conflict of interest statement

The authors have declared no conflict of interest.

Figures

Fig. 1
Fig. 1
Violin plots for selected features in the training dataset (chosen for their predictive importance)
Fig. 2
Fig. 2
Pairwise Pearson correlation of the features taken into account for this case study
Fig. 3
Fig. 3
Distribution plots and pairwise scatter plots of selected features. Red points and red distributions represent positive patients to Covid19, while blue points represent negative patients
Fig. 4
Fig. 4
Violin plots of the accuracy distributions reached by each models on five folds (on dataset B)
Fig. 5
Fig. 5
The sensitivity and specificity curve (i.e., sensitivity /positive predictive value curve or, equivalently true positive rate / false positive rate as depicted in the Figure) of the evaluated models. The best performing algorithm, Random Forest, is highlighted
Fig. 6
Fig. 6
The precision/recall (i.e., positive predictive value / sensitivity) curve, and the area under this curve
Fig. 7
Fig. 7
Feature importance scores for the best performing model
Fig. 8
Fig. 8
An interpretable Decision Tree, developed in order to support the interpretation of the predictions from the other models. Color gradients denote predictivity for either classes (shades of blue correspond to COVID-19 negativity, shades of orange to positivity)

Similar articles

Cited by

References

    1. Ai T., Yang Z., Hou H., Zhan C., Chen C., Lv W., Tao Q., Sun Z., Xia L. (2020) Correlation of chest ct and rt-pcr testing in coronavirus disease 2019 (covid-19) in china: a report of 1014 cases. Radiology p 200642 - PMC - PubMed
    1. Altman NS. An introduction to kernel and nearest-neighbor nonparametric regression. The American Statistician. 1992;46(3):175–185.
    1. Anguita D., Ghio A., Greco N., et al.: Model selection for support vector machines: Advantages and disadvantages of the machine learning theory.. In: IJCNN-2010, 2010, pp 1–8, 10.1109/IJCNN.2010.5596450
    1. Apostolopoulos I.D., Mpesiana T.A. (2020) Covid-19: automatic detection from x-ray images utilizing transfer learning with convolutional neural networks. Physical and Engineering Sciences in Medicine 1 - PMC - PubMed
    1. van Buuren S, Groothuis-Oudshoorn K. mice: Multivariate imputation by chained equations in r. Journal of Statistical Software, Articles. 2011;45(3):1–67.