Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Mar 31;18(1):198.
doi: 10.1186/s12859-017-1609-9.

A neural joint model for entity and relation extraction from biomedical text

Affiliations

A neural joint model for entity and relation extraction from biomedical text

Fei Li et al. BMC Bioinformatics. .

Abstract

Background: Extracting biomedical entities and their relations from text has important applications on biomedical research. Previous work primarily utilized feature-based pipeline models to process this task. Many efforts need to be made on feature engineering when feature-based models are employed. Moreover, pipeline models may suffer error propagation and are not able to utilize the interactions between subtasks. Therefore, we propose a neural joint model to extract biomedical entities as well as their relations simultaneously, and it can alleviate the problems above.

Results: Our model was evaluated on two tasks, i.e., the task of extracting adverse drug events between drug and disease entities, and the task of extracting resident relations between bacteria and location entities. Compared with the state-of-the-art systems in these tasks, our model improved the F1 scores of the first task by 5.1% in entity recognition and 8.0% in relation extraction, and that of the second task by 9.2% in relation extraction.

Conclusions: The proposed model achieves competitive performances with less work on feature engineering. We demonstrate that the model based on neural networks is effective for biomedical entity and relation extraction. In addition, parameter sharing is an alternative method for neural models to jointly process this task. Our work can facilitate the research on biomedical text mining.

Keywords: Biomedical text; Entity recognition; Joint model; Neural network; Relation extraction.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
The CNN for extracting character-level representations. A rectangular grid indicates a vector and a square indicates one dimension of this vector, so character embeddings or representations can be denoted as n-dimensional vectors. Shading rectangular grids indicate special padding vectors
Fig. 2
Fig. 2
The Bi-LSTM-RNN for biomedical entity recognition. Rectangular grids indicate vectors of feature embeddings or representations. At the bottom, three kinds of vectors are concatenated and fed into LSTMs. Dashed arrow lines denote bottom-up computations along the network framework and solid arrow lines denote left-to-right computations along the sentence
Fig. 3
Fig. 3
The Bi-LSTM-RNN for relation classification. The input sentence is tokenized before it is analyzed by a dependency parser. Tokens are indexed by Arabic numerals. Basic (a.k.a, projective) dependency style is utilized to build a tree. The bold lines in the tree denote the shortest dependency path (SDP) between “gliclazide” and “hepatitis” with their lowest common ancestor “induced”. x i indicates the input vector of a LSTM unit as shown in Eq. 6 and i corresponds to the index of a token. In the Bi-LSTM-RNN layer, solid arrow lines denote bottom-up and top-down computations along the SDP in the dependency tree. h a, h b, h a, h b are listed in Eq. 8

Similar articles

Cited by

References

    1. Wei C, Peng Y, Leaman R, Davis AP, Mattingly CJ, Li J, Wiegers TC, Lu Z. Assessing the state of the art in biomedical relation extraction: overview of the BioCreative V chemical-disease relation (CDR) task. Database. 2016;2016:1–8. doi: 10.1093/database/baw140. - DOI - PMC - PubMed
    1. Pyysalo S, Ginter F, Heimonen J, Björne J, Boberg J, Järvinen J, Salakoski T. Bioinfer: a corpus for information extraction in the biomedical domain. BMC Bioinforma. 2007;8:266–7. doi: 10.1186/1471-2105-8-50. - DOI - PMC - PubMed
    1. Segura-Bedmar I, Martínez P, Herrero-Zazo M. Proceedings of the 7th International Workshop on Semantic Evaluation. Atlanta: Association for Computational Linguistics; 2013. Semeval-2013 task 9 : Extraction of drug-drug interactions from biomedical texts (ddiextraction 2013)
    1. Gurulingappa H, Mateen-Rajput A, Roberts A, Fluck J, Hofmann-Apitius M, Toldo L. Development of a benchmark corpus to support the automatic extraction of drug-related adverse effects frommedical case reports. J Biomed Inform. 2012;45:885–92. doi: 10.1016/j.jbi.2012.04.008. - DOI - PubMed
    1. Deléger L, Bossy R, Chaix E, Ba M, Ferré A, Bessières P, Nédellec C. Proceedings of the 4th BioNLP Shared Task Workshop. Berlin: Association for Computational Linguistics; 2016. Overview of the bacteria biotope task at bionlp shared task 2016.