Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2015 Dec:58:60-69.
doi: 10.1016/j.jbi.2015.08.019. Epub 2015 Sep 16.

Comparison of machine learning classifiers for influenza detection from emergency department free-text reports

Affiliations
Comparative Study

Comparison of machine learning classifiers for influenza detection from emergency department free-text reports

Arturo López Pineda et al. J Biomed Inform. 2015 Dec.

Abstract

Influenza is a yearly recurrent disease that has the potential to become a pandemic. An effective biosurveillance system is required for early detection of the disease. In our previous studies, we have shown that electronic Emergency Department (ED) free-text reports can be of value to improve influenza detection in real time. This paper studies seven machine learning (ML) classifiers for influenza detection, compares their diagnostic capabilities against an expert-built influenza Bayesian classifier, and evaluates different ways of handling missing clinical information from the free-text reports. We identified 31,268 ED reports from 4 hospitals between 2008 and 2011 to form two different datasets: training (468 cases, 29,004 controls), and test (176 cases and 1620 controls). We employed Topaz, a natural language processing (NLP) tool, to extract influenza-related findings and to encode them into one of three values: Acute, Non-acute, and Missing. Results show that all ML classifiers had areas under ROCs (AUC) ranging from 0.88 to 0.93, and performed significantly better than the expert-built Bayesian model. Missing clinical information marked as a value of missing (not missing at random) had a consistently improved performance among 3 (out of 4) ML classifiers when it was compared with the configuration of not assigning a value of missing (missing completely at random). The case/control ratios did not affect the classification performance given the large number of training cases. Our study demonstrates ED reports in conjunction with the use of ML and NLP with the handling of missing value information have a great potential for the detection of infectious diseases.

Keywords: Bayesian; Case detection; Emergency department reports; Influenza; Machine learning.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Study process flow.
Figure 2
Figure 2
Expert-constructed Bayesian model.
Figure 3
Figure 3
Experiments tree.

References

    1. Bradley CA, Rolka H, Walker D, Loonsk J. BioSense: Implementation of a National Early Event Detection and Situational Awareness System. MMWR Morb Mortal Wkly Rep. 2005 - PubMed
    1. Wagner MM, Moore AW, Aryel RM. Handbook of Biosurveillance. Academic Press; 2011.
    1. Reina J, Plasencia V, Leyes M, Nicolau A, Galmés A, Arbona G. Comparison study of a real-time reverse transcription polymerase chain reaction assay with an enzyme immunoassay and shell vial culture for influenza A and B virus detection in adult patients. Enferm Infecc Microbiol Clin. 2010;28:95–8. doi: 10.1016/j.eimc.2008.11.021. - DOI - PubMed
    1. Tsui F-C, Espino JU, Sriburadej T, Su H, Dowling JN. Building an automated Bayesian case detection system. Emerging Health Threats Journal. 2011:68–9. doi: 10.3134/ehtj.10.101. - DOI
    1. Shu B, Wu K-H, Emery S, Villanueva J, Johnson R, Guthrie E, et al. Design and Performance of the CDC Real-Time Reverse Transcriptase PCR Swine Flu Panel for Detection of 2009 A (H1N1) Pandemic Influenza Virus. J Clin Microbiol. 2011;49:2614–9. doi: 10.1128/JCM.02636-10. - DOI - PMC - PubMed

Publication types