Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2021 Mar 22;22(2):781-799.
doi: 10.1093/bib/bbaa296.

Text mining approaches for dealing with the rapidly expanding literature on COVID-19

Affiliations
Review

Text mining approaches for dealing with the rapidly expanding literature on COVID-19

Lucy Lu Wang et al. Brief Bioinform. .

Abstract

More than 50 000 papers have been published about COVID-19 since the beginning of 2020 and several hundred new papers continue to be published every day. This incredible rate of scientific productivity leads to information overload, making it difficult for researchers, clinicians and public health officials to keep up with the latest findings. Automated text mining techniques for searching, reading and summarizing papers are helpful for addressing information overload. In this review, we describe the many resources that have been introduced to support text mining applications over the COVID-19 literature; specifically, we discuss the corpora, modeling resources, systems and shared tasks that have been introduced for COVID-19. We compile a list of 39 systems that provide functionality such as search, discovery, visualization and summarization over the COVID-19 literature. For each system, we provide a qualitative description and assessment of the system's performance, unique data or user interface features and modeling decisions. Many systems focus on search and discovery, though several systems provide novel features, such as the ability to summarize findings over multiple documents or linking between scientific articles and clinical trials. We also describe the public corpora, models and shared tasks that have been introduced to help reduce repeated effort among community members; some of these resources (especially shared tasks) can provide a basis for comparing the performance of different systems. Finally, we summarize promising results and open challenges for text mining the COVID-19 literature.

Keywords: CORD-19; COVID-19; information extraction; information retrieval; natural language processing; question answering; shared tasks; summarization; text mining.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
A typical workflow for creating a literature text mining system may consist of corpus construction, data enrichment, model development and evaluation. A text mining practitioner (e.g. engineer, researcher, enthusiast, etc.) may be responsible for each of these steps in the gray box, whether by identifying and adapting existing datasets and models or by creating their own. For COVID-19, centralization of parts of this workflow have helped to reduce the burden around some of these steps.
Fig. 2.
Fig. 2.
The process of systematic review construction (left) and example systems that assist with several steps (right).

References

    1. Almeida T, Matos S. Calling attention to passages for biomedical question answering. In: Proceedings of the 2020 European Conference on Information Retrieval: Advances in Information Retrieval, Online. 2020, 69–77.
    1. Alsentzer E, Murphy J, Boag W, et al. Publicly available clinical BERT embeddings. In: Proceedings of the 2nd Clinical Natural Language Processing Workshop. Minneapolis, MN, USA: Association for Computational Linguistics, 2019, 72–8.
    1. Ananiadou S, Kell D, Tsujii J. Text mining and its potential applications in systems biology. Trends Biotechnol 2006;24:571–9. - PubMed
    1. Andersen K, Rambaut A, Lipkin WI, et al. The proximal origin of SARS-CoV-2. Nat Med 2020;1–3. - PMC - PubMed
    1. ASReview Core Development Team . ASReview: Active Learning for Systematic Reviews. Utrecht, The Netherlands: Utrecht University, 2019.

Publication types