Targeting COVID-19 and Human Resources for Health News Information Extraction: Algorithm Development and Validation

Mathieu Ravaut¹, Ruochen Zhao¹, Duy Phung¹, Vicky Mengqi Qin¹, Dusan Milovanovic², Anita Pienkowska¹, Iva Bojic¹, Josip Car³, Shafiq Joty^{1

4}

Affiliations

¹ Nanyang Technological University, Singapore, Singapore.
² Episteme Systems, Geneva, Switzerland.
³ King's College London, London, United Kingdom.
⁴ Salesforce Research, San Francisco, CA, United States.

PMID: 39475833
PMCID: PMC11561429
DOI: 10.2196/55059

Targeting COVID-19 and Human Resources for Health News Information Extraction: Algorithm Development and Validation

Mathieu Ravaut et al. JMIR AI. 2024.

. 2024 Oct 30:3:e55059.

doi: 10.2196/55059.

Authors

Mathieu Ravaut¹, Ruochen Zhao¹, Duy Phung¹, Vicky Mengqi Qin¹, Dusan Milovanovic², Anita Pienkowska¹, Iva Bojic¹, Josip Car³, Shafiq Joty^{1

4}

Affiliations

¹ Nanyang Technological University, Singapore, Singapore.
² Episteme Systems, Geneva, Switzerland.
³ King's College London, London, United Kingdom.
⁴ Salesforce Research, San Francisco, CA, United States.

PMID: 39475833
PMCID: PMC11561429
DOI: 10.2196/55059

Abstract

Background: Global pandemics like COVID-19 put a high amount of strain on health care systems and health workers worldwide. These crises generate a vast amount of news information published online across the globe. This extensive corpus of articles has the potential to provide valuable insights into the nature of ongoing events and guide interventions and policies. However, the sheer volume of information is beyond the capacity of human experts to process and analyze effectively.

Objective: The aim of this study was to explore how natural language processing (NLP) can be leveraged to build a system that allows for quick analysis of a high volume of news articles. Along with this, the objective was to create a workflow comprising human-computer symbiosis to derive valuable insights to support health workforce strategic policy dialogue, advocacy, and decision-making.

Methods: We conducted a review of open-source news coverage from January 2020 to June 2022 on COVID-19 and its impacts on the health workforce from the World Health Organization (WHO) Epidemic Intelligence from Open Sources (EIOS) by synergizing NLP models, including classification and extractive summarization, and human-generated analyses. Our DeepCovid system was trained on 2.8 million news articles in English from more than 3000 internet sources across hundreds of jurisdictions.

Results: Rules-based classification with hand-designed rules narrowed the data set to 8508 articles with high relevancy confirmed in the human-led evaluation. DeepCovid's automated information targeting component reached a very strong binary classification performance of 98.98 for the area under the receiver operating characteristic curve (ROC-AUC) and 47.21 for the area under the precision recall curve (PR-AUC). Its information extraction component attained good performance in automatic extractive summarization with a mean Recall-Oriented Understudy for Gisting Evaluation (ROUGE) score of 47.76. DeepCovid's final summaries were used by human experts to write reports on the COVID-19 pandemic.

Conclusions: It is feasible to synergize high-performing NLP models and human-generated analyses to benefit open-source health workforce intelligence. The DeepCovid approach can contribute to an agile and timely global view, providing complementary information to scientific literature.

Keywords: COVID-19; NLP; SARS-CoV-2; classification; deep learning; extract; extraction; machine learning; media; natural language processing; news; news articles; summarization; summarize; summary.

©Mathieu Ravaut, Ruochen Zhao, Duy Phung, Vicky Mengqi Qin, Dusan Milovanovic, Anita Pienkowska, Iva Bojic, Josip Car, Shafiq Joty. Originally published in JMIR AI (https://ai.jmir.org), 30.10.2024.

PubMed Disclaimer

Conflict of interest statement

Conflicts of Interest: MR and RZ are PhD candidates at Nanyang Technological University (NTU). IB, DP, VMQ, and AP are full-time employees of NTU. DM was a full-time employee at the World Health Organization (WHO) Health Emergency Intelligence and Surveillance Systems (WSE) division, in the Intelligence and Surveillance Systems (ISY) department, in the Intelligence Innovation and Integration (III) unit during the time of the project. JC was a full-time employee at NTU during the time of the project. SJ was a full-time employee at NTU and part-time employee at Salesforce during the time of the project.

Figures

**Figure 1**
DeepCovid model architecture overview (read from bottom to top), with colored blocks corresponding to machine learning models and gold arrows indicating actions necessitated from human experts. BERT: Bidirectional Encoder Representations from Transformers; EIOS: Epidemic Intelligence from Open Sources; RoBERTa: Robustly Optimized BERT Pretraining Approach.

**Figure 2**
Deep learning classifier architecture. ReLU: rectified linear unit; RoBERTa: Robustly Optimized BERT Pretraining Approach.

**Figure 3**
Flowchart for DeepCovid showing the step-by-step process transforming a raw data set of 2.8 million news articles (top left) to high-level reports (bottom-right). Boxes with an orange top-right ring indicate the need for human annotation, while boxes with a blue ring correspond to training a deep learning model. EIOS: Epidemic Intelligence from Open Sources; Val: validation; WHO: World Health Organization.

See this image and copyright information in PMC

References

1. Pilipiec P, Samsten I, Bota A. Surveillance of communicable diseases using social media: A systematic review. PLoS One. 2023;18(2):e0282101. doi: 10.1371/journal.pone.0282101. https://dx.plos.org/10.1371/journal.pone.0282101 PONE-D-22-09136 - DOI - PMC - PubMed
1. Nsubuga P, White ME, Thacker SB, Anderson MA, Blount SB, Broome CV, Chiller TM, Espitia V, Imtiaz R, Sosin D, Stroup DF, Tauxe RV, Vijayaraghavan M, Trostle M. Public Health Surveillance: A Tool for Targeting and Monitoring Interventions. In: Jamison DT, Breman JG, Measham AR, editors. Disease Control Priorities in Developing Countries. New York, NY: Oxford University Press; 2011.
1. Thacker S B, Berkelman R L. Public health surveillance in the United States. Epidemiol Rev. 1988;10(1):164–90. doi: 10.1093/oxfordjournals.epirev.a036021. - DOI - PubMed
1. Narasimhan Vasant, Brown Hilary, Pablos-Mendez Ariel, Adams Orvill, Dussault Gilles, Elzinga Gijs, Nordstrom Anders, Habte Demissie, Jacobs Marian, Solimano Giorgio, Sewankambo Nelson, Wibulpolprasert Suwit, Evans Timothy, Chen Lincoln. Responding to the global human resources crisis. Lancet. 2004 May 01;363(9419):1469–72. doi: 10.1016/S0140-6736(04)16108-4.S0140-6736(04)16108-4 - DOI - PubMed
1. Hope Kirsty, Durrheim David N, d'Espaignet Edouard Tursan, Dalton Craig. Syndromic Surveillance: is it a useful tool for local outbreak detection? J Epidemiol Community Health. 2006 May;60(5):374–5. doi: 10.1136/jech.2005.035337. https://europepmc.org/abstract/MED/16680907 - DOI - PMC - PubMed

LinkOut - more resources

Full Text Sources
- JMIR Publications
- PubMed Central
Research Materials
- NCI CPTC Antibody Characterization Program
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Targeting COVID-19 and Human Resources for Health News Information Extraction: Algorithm Development and Validation

Affiliations

Targeting COVID-19 and Human Resources for Health News Information Extraction: Algorithm Development and Validation

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

LinkOut - more resources

Full Text Sources

Research Materials

Miscellaneous