Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Oct 30:3:e55059.
doi: 10.2196/55059.

Targeting COVID-19 and Human Resources for Health News Information Extraction: Algorithm Development and Validation

Affiliations

Targeting COVID-19 and Human Resources for Health News Information Extraction: Algorithm Development and Validation

Mathieu Ravaut et al. JMIR AI. .

Abstract

Background: Global pandemics like COVID-19 put a high amount of strain on health care systems and health workers worldwide. These crises generate a vast amount of news information published online across the globe. This extensive corpus of articles has the potential to provide valuable insights into the nature of ongoing events and guide interventions and policies. However, the sheer volume of information is beyond the capacity of human experts to process and analyze effectively.

Objective: The aim of this study was to explore how natural language processing (NLP) can be leveraged to build a system that allows for quick analysis of a high volume of news articles. Along with this, the objective was to create a workflow comprising human-computer symbiosis to derive valuable insights to support health workforce strategic policy dialogue, advocacy, and decision-making.

Methods: We conducted a review of open-source news coverage from January 2020 to June 2022 on COVID-19 and its impacts on the health workforce from the World Health Organization (WHO) Epidemic Intelligence from Open Sources (EIOS) by synergizing NLP models, including classification and extractive summarization, and human-generated analyses. Our DeepCovid system was trained on 2.8 million news articles in English from more than 3000 internet sources across hundreds of jurisdictions.

Results: Rules-based classification with hand-designed rules narrowed the data set to 8508 articles with high relevancy confirmed in the human-led evaluation. DeepCovid's automated information targeting component reached a very strong binary classification performance of 98.98 for the area under the receiver operating characteristic curve (ROC-AUC) and 47.21 for the area under the precision recall curve (PR-AUC). Its information extraction component attained good performance in automatic extractive summarization with a mean Recall-Oriented Understudy for Gisting Evaluation (ROUGE) score of 47.76. DeepCovid's final summaries were used by human experts to write reports on the COVID-19 pandemic.

Conclusions: It is feasible to synergize high-performing NLP models and human-generated analyses to benefit open-source health workforce intelligence. The DeepCovid approach can contribute to an agile and timely global view, providing complementary information to scientific literature.

Keywords: COVID-19; NLP; SARS-CoV-2; classification; deep learning; extract; extraction; machine learning; media; natural language processing; news; news articles; summarization; summarize; summary.

PubMed Disclaimer

Conflict of interest statement

Conflicts of Interest: MR and RZ are PhD candidates at Nanyang Technological University (NTU). IB, DP, VMQ, and AP are full-time employees of NTU. DM was a full-time employee at the World Health Organization (WHO) Health Emergency Intelligence and Surveillance Systems (WSE) division, in the Intelligence and Surveillance Systems (ISY) department, in the Intelligence Innovation and Integration (III) unit during the time of the project. JC was a full-time employee at NTU during the time of the project. SJ was a full-time employee at NTU and part-time employee at Salesforce during the time of the project.

Figures

Figure 1
Figure 1
DeepCovid model architecture overview (read from bottom to top), with colored blocks corresponding to machine learning models and gold arrows indicating actions necessitated from human experts. BERT: Bidirectional Encoder Representations from Transformers; EIOS: Epidemic Intelligence from Open Sources; RoBERTa: Robustly Optimized BERT Pretraining Approach.
Figure 2
Figure 2
Deep learning classifier architecture. ReLU: rectified linear unit; RoBERTa: Robustly Optimized BERT Pretraining Approach.
Figure 3
Figure 3
Flowchart for DeepCovid showing the step-by-step process transforming a raw data set of 2.8 million news articles (top left) to high-level reports (bottom-right). Boxes with an orange top-right ring indicate the need for human annotation, while boxes with a blue ring correspond to training a deep learning model. EIOS: Epidemic Intelligence from Open Sources; Val: validation; WHO: World Health Organization.

References

    1. Pilipiec P, Samsten I, Bota A. Surveillance of communicable diseases using social media: A systematic review. PLoS One. 2023;18(2):e0282101. doi: 10.1371/journal.pone.0282101. https://dx.plos.org/10.1371/journal.pone.0282101 PONE-D-22-09136 - DOI - PMC - PubMed
    1. Nsubuga P, White ME, Thacker SB, Anderson MA, Blount SB, Broome CV, Chiller TM, Espitia V, Imtiaz R, Sosin D, Stroup DF, Tauxe RV, Vijayaraghavan M, Trostle M. Public Health Surveillance: A Tool for Targeting and Monitoring Interventions. In: Jamison DT, Breman JG, Measham AR, editors. Disease Control Priorities in Developing Countries. New York, NY: Oxford University Press; 2011.
    1. Thacker S B, Berkelman R L. Public health surveillance in the United States. Epidemiol Rev. 1988;10(1):164–90. doi: 10.1093/oxfordjournals.epirev.a036021. - DOI - PubMed
    1. Narasimhan Vasant, Brown Hilary, Pablos-Mendez Ariel, Adams Orvill, Dussault Gilles, Elzinga Gijs, Nordstrom Anders, Habte Demissie, Jacobs Marian, Solimano Giorgio, Sewankambo Nelson, Wibulpolprasert Suwit, Evans Timothy, Chen Lincoln. Responding to the global human resources crisis. Lancet. 2004 May 01;363(9419):1469–72. doi: 10.1016/S0140-6736(04)16108-4.S0140-6736(04)16108-4 - DOI - PubMed
    1. Hope Kirsty, Durrheim David N, d'Espaignet Edouard Tursan, Dalton Craig. Syndromic Surveillance: is it a useful tool for local outbreak detection? J Epidemiol Community Health. 2006 May;60(5):374–5. doi: 10.1136/jech.2005.035337. https://europepmc.org/abstract/MED/16680907 - DOI - PMC - PubMed

LinkOut - more resources