Targeting COVID-19 and Human Resources for Health News Information Extraction: Algorithm Development and Validation
- PMID: 39475833
- PMCID: PMC11561429
- DOI: 10.2196/55059
Targeting COVID-19 and Human Resources for Health News Information Extraction: Algorithm Development and Validation
Abstract
Background: Global pandemics like COVID-19 put a high amount of strain on health care systems and health workers worldwide. These crises generate a vast amount of news information published online across the globe. This extensive corpus of articles has the potential to provide valuable insights into the nature of ongoing events and guide interventions and policies. However, the sheer volume of information is beyond the capacity of human experts to process and analyze effectively.
Objective: The aim of this study was to explore how natural language processing (NLP) can be leveraged to build a system that allows for quick analysis of a high volume of news articles. Along with this, the objective was to create a workflow comprising human-computer symbiosis to derive valuable insights to support health workforce strategic policy dialogue, advocacy, and decision-making.
Methods: We conducted a review of open-source news coverage from January 2020 to June 2022 on COVID-19 and its impacts on the health workforce from the World Health Organization (WHO) Epidemic Intelligence from Open Sources (EIOS) by synergizing NLP models, including classification and extractive summarization, and human-generated analyses. Our DeepCovid system was trained on 2.8 million news articles in English from more than 3000 internet sources across hundreds of jurisdictions.
Results: Rules-based classification with hand-designed rules narrowed the data set to 8508 articles with high relevancy confirmed in the human-led evaluation. DeepCovid's automated information targeting component reached a very strong binary classification performance of 98.98 for the area under the receiver operating characteristic curve (ROC-AUC) and 47.21 for the area under the precision recall curve (PR-AUC). Its information extraction component attained good performance in automatic extractive summarization with a mean Recall-Oriented Understudy for Gisting Evaluation (ROUGE) score of 47.76. DeepCovid's final summaries were used by human experts to write reports on the COVID-19 pandemic.
Conclusions: It is feasible to synergize high-performing NLP models and human-generated analyses to benefit open-source health workforce intelligence. The DeepCovid approach can contribute to an agile and timely global view, providing complementary information to scientific literature.
Keywords: COVID-19; NLP; SARS-CoV-2; classification; deep learning; extract; extraction; machine learning; media; natural language processing; news; news articles; summarization; summarize; summary.
©Mathieu Ravaut, Ruochen Zhao, Duy Phung, Vicky Mengqi Qin, Dusan Milovanovic, Anita Pienkowska, Iva Bojic, Josip Car, Shafiq Joty. Originally published in JMIR AI (https://ai.jmir.org), 30.10.2024.
Conflict of interest statement
Conflicts of Interest: MR and RZ are PhD candidates at Nanyang Technological University (NTU). IB, DP, VMQ, and AP are full-time employees of NTU. DM was a full-time employee at the World Health Organization (WHO) Health Emergency Intelligence and Surveillance Systems (WSE) division, in the Intelligence and Surveillance Systems (ISY) department, in the Intelligence Innovation and Integration (III) unit during the time of the project. JC was a full-time employee at NTU during the time of the project. SJ was a full-time employee at NTU and part-time employee at Salesforce during the time of the project.
Figures
References
-
- Pilipiec P, Samsten I, Bota A. Surveillance of communicable diseases using social media: A systematic review. PLoS One. 2023;18(2):e0282101. doi: 10.1371/journal.pone.0282101. https://dx.plos.org/10.1371/journal.pone.0282101 PONE-D-22-09136 - DOI - PMC - PubMed
-
- Nsubuga P, White ME, Thacker SB, Anderson MA, Blount SB, Broome CV, Chiller TM, Espitia V, Imtiaz R, Sosin D, Stroup DF, Tauxe RV, Vijayaraghavan M, Trostle M. Public Health Surveillance: A Tool for Targeting and Monitoring Interventions. In: Jamison DT, Breman JG, Measham AR, editors. Disease Control Priorities in Developing Countries. New York, NY: Oxford University Press; 2011.
-
- Narasimhan Vasant, Brown Hilary, Pablos-Mendez Ariel, Adams Orvill, Dussault Gilles, Elzinga Gijs, Nordstrom Anders, Habte Demissie, Jacobs Marian, Solimano Giorgio, Sewankambo Nelson, Wibulpolprasert Suwit, Evans Timothy, Chen Lincoln. Responding to the global human resources crisis. Lancet. 2004 May 01;363(9419):1469–72. doi: 10.1016/S0140-6736(04)16108-4.S0140-6736(04)16108-4 - DOI - PubMed
-
- Hope Kirsty, Durrheim David N, d'Espaignet Edouard Tursan, Dalton Craig. Syndromic Surveillance: is it a useful tool for local outbreak detection? J Epidemiol Community Health. 2006 May;60(5):374–5. doi: 10.1136/jech.2005.035337. https://europepmc.org/abstract/MED/16680907 - DOI - PMC - PubMed
LinkOut - more resources
Full Text Sources
Research Materials
Miscellaneous
