CoQUAD: a COVID-19 question answering dataset system, facilitating research, benchmarking, and practice
- PMID: 35655148
- PMCID: PMC9160513
- DOI: 10.1186/s12859-022-04751-6
CoQUAD: a COVID-19 question answering dataset system, facilitating research, benchmarking, and practice
Abstract
Background: Due to the growing amount of COVID-19 research literature, medical experts, clinical scientists, and researchers frequently struggle to stay up to date on the most recent findings. There is a pressing need to assist researchers and practitioners in mining and responding to COVID-19-related questions on time.
Methods: This paper introduces CoQUAD, a question-answering system that can extract answers related to COVID-19 questions in an efficient manner. There are two datasets provided in this work: a reference-standard dataset built using the CORD-19 and LitCOVID initiatives, and a gold-standard dataset prepared by the experts from a public health domain. The CoQUAD has a Retriever component trained on the BM25 algorithm that searches the reference-standard dataset for relevant documents based on a question related to COVID-19. CoQUAD also has a Reader component that consists of a Transformer-based model, namely MPNet, which is used to read the paragraphs and find the answers related to a question from the retrieved documents. In comparison to previous works, the proposed CoQUAD system can answer questions related to early, mid, and post-COVID-19 topics.
Results: Extensive experiments on CoQUAD Retriever and Reader modules show that CoQUAD can provide effective and relevant answers to any COVID-19-related questions posed in natural language, with a higher level of accuracy. When compared to state-of-the-art baselines, CoQUAD outperforms the previous models, achieving an exact match ratio score of 77.50% and an F1 score of 77.10%.
Conclusion: CoQUAD is a question-answering system that mines COVID-19 literature using natural language processing techniques to help the research community find the most recent findings and answer any related questions.
Keywords: CORD-19; COVID-19; LitCOVID; Long-COVID; Pipeline; Post-COVID-19; Question answering system; Transformer model.
© 2022. The Author(s).
Conflict of interest statement
The authors declare that they have no competing interests.
Figures







Similar articles
-
Revealing Opinions for COVID-19 Questions Using a Context Retriever, Opinion Aggregator, and Question-Answering Model: Model Development Study.J Med Internet Res. 2021 Mar 19;23(3):e22860. doi: 10.2196/22860. J Med Internet Res. 2021. PMID: 33739287 Free PMC article.
-
SemBioNLQA: A semantic biomedical question answering system for retrieving exact and ideal answers to natural language questions.Artif Intell Med. 2020 Jan;102:101767. doi: 10.1016/j.artmed.2019.101767. Epub 2019 Nov 28. Artif Intell Med. 2020. PMID: 31980104
-
A COVID-19 Search Engine (CO-SE) with Transformer-based architecture.Healthc Anal (N Y). 2022 Nov;2:100068. doi: 10.1016/j.health.2022.100068. Epub 2022 Jun 6. Healthc Anal (N Y). 2022. PMID: 37520616 Free PMC article.
-
Question answering for biology.Methods. 2015 Mar;74:36-46. doi: 10.1016/j.ymeth.2014.10.023. Epub 2014 Oct 28. Methods. 2015. PMID: 25448292 Review.
-
Biomedical Research: Formulating a Well-Built and Worth-Answering Research Question.Addict Health. 2025 Jan;17:1564. doi: 10.34172/ahj.1564. Epub 2025 Feb 22. Addict Health. 2025. PMID: 40458274 Free PMC article. Review.
Cited by
-
ChatGPT and the rise of large language models: the new AI-driven infodemic threat in public health.Front Public Health. 2023 Apr 25;11:1166120. doi: 10.3389/fpubh.2023.1166120. eCollection 2023. Front Public Health. 2023. PMID: 37181697 Free PMC article.
-
Clinical Application of Detecting COVID-19 Risks: A Natural Language Processing Approach.Viruses. 2022 Dec 11;14(12):2761. doi: 10.3390/v14122761. Viruses. 2022. PMID: 36560764 Free PMC article.
-
Question answering systems for health professionals at the point of care-a systematic review.J Am Med Inform Assoc. 2024 Apr 3;31(4):1009-1024. doi: 10.1093/jamia/ocae015. J Am Med Inform Assoc. 2024. PMID: 38366879 Free PMC article.
-
Constructing a disease database and using natural language processing to capture and standardize free text clinical information.Sci Rep. 2023 May 26;13(1):8591. doi: 10.1038/s41598-023-35482-0. Sci Rep. 2023. PMID: 37237101 Free PMC article.
-
A review of the explainability and safety of conversational agents for mental health to identify avenues for improvement.Front Artif Intell. 2023 Oct 12;6:1229805. doi: 10.3389/frai.2023.1229805. eCollection 2023. Front Artif Intell. 2023. PMID: 37899961 Free PMC article. Review.
References
-
- World Health Organization. Archived: WHO Timeline—COVID-19 [Internet]. Wold Health Organization. 2020 [cited 2021 Oct 7]. p. 2020. Available from: https://www.who.int/news/item/27-04-2020-who-timeline---covid-19
MeSH terms
LinkOut - more resources
Full Text Sources
Medical