Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Dec 11;14(12):2761.
doi: 10.3390/v14122761.

Clinical Application of Detecting COVID-19 Risks: A Natural Language Processing Approach

Affiliations

Clinical Application of Detecting COVID-19 Risks: A Natural Language Processing Approach

Syed Raza Bashir et al. Viruses. .

Abstract

The clinical application of detecting COVID-19 factors is a challenging task. The existing named entity recognition models are usually trained on a limited set of named entities. Besides clinical, the non-clinical factors, such as social determinant of health (SDoH), are also important to study the infectious disease. In this paper, we propose a generalizable machine learning approach that improves on previous efforts by recognizing a large number of clinical risk factors and SDoH. The novelty of the proposed method lies in the subtle combination of a number of deep neural networks, including the BiLSTM-CNN-CRF method and a transformer-based embedding layer. Experimental results on a cohort of COVID-19 data prepared from PubMed articles show the superiority of the proposed approach. When compared to other methods, the proposed approach achieves a performance gain of about 1-5% in terms of macro- and micro-average F1 scores. Clinical practitioners and researchers can use this approach to obtain accurate information regarding clinical risks and SDoH factors, and use this pipeline as a tool to end the pandemic or to prepare for future pandemics.

Keywords: COVID-19; clinical; de-identification; named entities; non-clinical; pipeline; social determinants of health.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
Biomedical Pipeline.
Figure 2
Figure 2
Named Entity Recognition algorithm.
Figure 3
Figure 3
Most common symptoms of COVID-19 patients. Number at the top of each bar represents the number of times the symptoms were mentioned in test set.
Figure 4
Figure 4
COVID-19 hospitalization by race and ethnicity.
Figure 5
Figure 5
Biomedical entities recognized by proposed pipeline.

References

    1. Chen Q., Leaman R., Allot A., Luo L., Wei C.-H., Yan S., Lu Z. Artificial Intelligence in Action: Addressing the COVID-19 Pandemic with Natural Language Processing. Annu. Rev. Biomed. Data Sci. 2021;4:313–339. doi: 10.1146/annurev-biodatasci-021821-061045. - DOI - PubMed
    1. Raza S., Schwartz B., Rosella L.C. CoQUAD: A COVID-19 Question Answering Dataset System, Facilitating Research, Benchmarking, and Practice. BMC Bioinform. 2022;23:210. doi: 10.1186/s12859-022-04751-6. - DOI - PMC - PubMed
    1. Allen Institute COVID-19 Open Research Dataset Challenge (CORD-19) 2020. [(accessed on 27 November 2022)]. Available online: https://www.kaggle.com/datasets/allen-institute-for-ai/CORD-19-research-....
    1. Chen Q., Allot A., Lu Z. LitCovid: An Open Database of COVID-19 Literature. Nucleic Acids Res. 2021;49:D1534–D1540. doi: 10.1093/nar/gkaa952. - DOI - PMC - PubMed
    1. Wang L.L., Lo K. Text Mining Approaches for Dealing with the Rapidly Expanding Literature on COVID-19. Brief. Bioinform. 2021;22:781–799. doi: 10.1093/bib/bbaa296. - DOI - PMC - PubMed

LinkOut - more resources