Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Feb:102:107808.
doi: 10.1016/j.compbiolchem.2022.107808. Epub 2023 Jan 2.

Extraction of knowledge graph of Covid-19 through mining of unstructured biomedical corpora

Affiliations

Extraction of knowledge graph of Covid-19 through mining of unstructured biomedical corpora

Sudhakaran Gajendran et al. Comput Biol Chem. 2023 Feb.

Abstract

The number of biomedical articles published is increasing rapidly over the years. Currently there are about 30 million articles in PubMed and over 25 million mentions in Medline. Among these fundamentals, Biomedical Named Entity Recognition (BioNER) and Biomedical Relation Extraction (BioRE) are the most essential in analysing the literature. In the biomedical domain, Knowledge Graph is used to visualize the relationships between various entities such as proteins, chemicals and diseases. Scientific publications have increased dramatically as a result of the search for treatments and potential cures for the new Coronavirus, but efficiently analysing, integrating, and utilising related sources of information remains a difficulty. In order to effectively combat the disease during pandemics like COVID-19, literature must be used quickly and effectively. In this paper, we introduced a fully automated framework consists of BERT-BiLSTM, Knowledge graph, and Representation Learning model to extract the top diseases, chemicals, and proteins related to COVID-19 from the literature. The proposed framework uses Named Entity Recognition models for disease recognition, chemical recognition, and protein recognition. Then the system uses the Chemical - Disease Relation Extraction and Chemical - Protein Relation Extraction models. And the system extracts the entities and relations from the CORD-19 dataset using the models. The system then creates a Knowledge Graph for the extracted relations and entities. The system performs Representation Learning on this KG to get the embeddings of all entities and get the top related diseases, chemicals, and proteins with respect to COVID-19.

Keywords: BERT; BiLSTM; Biomedical Named Entity Recognition (BioNER); Knowledge graph; Relation Extraction (RE); Representation learning.

PubMed Disclaimer

Conflict of interest statement

Declaration of Competing Interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Figures

ga1
Graphical abstract
Fig. 1
Fig. 1
Overall architecture diagram.
Fig. 2
Fig. 2
Relation extraction module design diagram.
Fig. 3
Fig. 3
BERT layer output.
Fig. 4
Fig. 4
BiLSTM layer output.
Fig. 5
Fig. 5
Chemicals and Diseases from sample BC5CDR row.
Fig. 6
Fig. 6
Input ids of sample BC5CDR row.
Fig. 7
Fig. 7
Fine tuning SciBERT for BC5CDR.
Fig. 8
Fig. 8
Portion of knowledge graph focused on COVID-19.
Fig. 9
Fig. 9
Top Diseases related to COVID-19.
Fig. 10
Fig. 10
Top chemicals related to COVID-19.
Fig. 11
Fig. 11
Top proteins related to COVID-19.
fx1
fx2

References

    1. Beltagy, Iz, Lo, Kyle, Cohan, Arman, 2019. Scibert: A pretrained language model for scientific text. In EMNLP/IJCNLP.
    1. Chai Z., Jin H., Shi S., et al. Hierarchical shared transfer learning for biomedical named entity recognition. BMC Bioinforma. 2022;23:8. doi: 10.1186/s12859-021-04551-4. https://doi.org/10.1186/s12859-021-04551-4. - DOI - PMC - PubMed
    1. Chen C., Akef Ebeid I., Bu Y., Ding Y. Coronavirus knowledge graph: a case study. arXiv e-prints. 2020
    1. Cheng D., Knox C., Young N., Stothard P., Damaraju S., Wishart D.S. PolySearch: a web-based text mining system for extracting relationships between human diseases, genes, mutations, drugs and metabolites. Nucleic Acids Res. 2008 Jul 1;36(Web Server issue):W399-W405. doi: 10.1093/nar/gkn296. Epub 2008 May 16. PMID: 18487273; PMCID: PMC2447794. - PMC - PubMed
    1. Devlin J., Chang M., Lee K., Toutanova K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. ArXiv, abs/1810. 2019:04805.