Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Mar:103:103381.
doi: 10.1016/j.jbi.2020.103381. Epub 2020 Jan 28.

Combinatorial feature embedding based on CNN and LSTM for biomedical named entity recognition

Affiliations
Free article

Combinatorial feature embedding based on CNN and LSTM for biomedical named entity recognition

Minsoo Cho et al. J Biomed Inform. 2020 Mar.
Free article

Abstract

With the rapid advancement of technology and the necessity of processing large amounts of data, biomedical Named Entity Recognition (NER) has become an essential technique for information extraction in the biomedical field. NER, which is a sequence-labeling task, has been performed using various traditional techniques including dictionary-, rule-, machine learning-, and deep learning-based methods. However, as existing biomedical NER models are insufficient to handle new and unseen entity types from the growing biomedical data, the development of more effective and accurate biomedical NER models is being widely researched. Among biomedical NER models utilizing deep learning approaches, there have been only a few studies involving the design of high-level features in the embedding layer. In this regard, herein, we propose a deep learning NER model that effectively represents biomedical word tokens through the design of a combinatorial feature embedding. The proposed model is based on Bidirectional Long Short-Term Memory (bi-LSTM) with Conditional Random Field (CRF) and enhanced by integrating two different character-level representations extracted from a Convolutional Neural Network (CNN) and bi-LSTM. Additionally, an attention mechanism is applied to the model to focus on the relevant tokens in the sentence, which alleviates the long-term dependency problem of the LSTM model and allows effective recognition of entities. The proposed model was evaluated on two benchmark datasets, the JNLPBA and NCBI-Disease, and a comparative analysis with the existing models is performed. The proposed model achieved a relatively higher performance with an F1-score of 86.93% in case of NCBI-Disease, and a competitive performance for the JNLPBA with an F1-score of 75.31%.

Keywords: Attention mechanism; Biomedical named entity recognition; Feature embedding; Information retrieval.

PubMed Disclaimer

Conflict of interest statement

Declaration of Competing Interest The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Publication types

LinkOut - more resources