Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 May 17;12(1):10.
doi: 10.1186/s13326-021-00238-0.

Learning adaptive representations for entity recognition in the biomedical domain

Affiliations

Learning adaptive representations for entity recognition in the biomedical domain

Ivano Lauriola et al. J Biomed Semantics. .

Abstract

Background: Named Entity Recognition is a common task in Natural Language Processing applications, whose purpose is to recognize named entities in textual documents. Several systems exist to solve this task in the biomedical domain, based on Natural Language Processing techniques and Machine Learning algorithms. A crucial step of these applications is the choice of the representation which describes data. Several representations have been proposed in the literature, some of which are based on a strong knowledge of the domain, and they consist of features manually defined by domain experts. Usually, these representations describe the problem well, but they require a lot of human effort and annotated data. On the other hand, general-purpose representations like word-embeddings do not require human domain knowledge, but they could be too general for a specific task.

Results: This paper investigates methods to learn the best representation from data directly, by combining several knowledge-based representations and word embeddings. Two mechanisms have been considered to perform the combination, which are neural networks and Multiple Kernel Learning. To this end, we use a hybrid architecture for biomedical entity recognition which integrates dictionary look-up (also known as gazetteers) with machine learning techniques. Results on the CRAFT corpus clearly show the benefits of the proposed algorithm in terms of F1 score.

Conclusions: Our experiments show that the principled combination of general, domain specific, word-, and character-level representations improves the performance of entity recognition. We also discussed the contribution of each representation in the final solution.

Keywords: Ensemble; Kernel methods; Named entity recognition; Neural networks.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
A depiction of a simple neural network. Arrows define the non-linear transformations from the input (green) to the output layer (red). Hidden layers are described in blue. Circles denote atomic features (or neurons)
Fig. 2
Fig. 2
A Neural Network architecture to combine and integrate different sources and representations. Each base network is trained and validated on a single base representation. At the top of the network, a shared layer combines the output of the base networks. Green circles denote the input features. In the example, the network combines the output from P different neural networks, i.e. P different representations
Fig. 3
Fig. 3
A depiction of the hybrid BNER system. A term is retrieved if it is selected by the dictionary look-up and it is accepted by the ML algorithm
Fig. 4
Fig. 4
Depiction of the proposed system and other features aggregation schemes. The OGER annotator retrieves candidate entities from input texts (Deoxyribonucleic acid in the example). Then, different set of features associate to the candidate entity are computed (e.g. affixes and spectrum) or extracted (word2vec), producing multiple feature vectors. Consequently, the features aggregation schema defines the final representation as (i) a single base representation, (ii) the concatenation of base feature vectors, and (iii) the principled combination obtained through a MKL algorithm or a NN (shown in Fig. 2). The resulting representation is used with a classifier to select the final class (entity or not)
Fig. 5
Fig. 5
Contribution of base representations in the BioNER

References

    1. Nadeau D, Sekine S. A survey of named entity recognition and classification. Lingvisticae Investigationes. 2007;30(1):3–26. doi: 10.1075/li.30.1.03nad. - DOI
    1. Campos D, Matos S, Oliveira JL. Biomedical named entity recognition: a survey of machine-learning tools. In: Theory and Applications for Advanced Text Mining. InTech: 2012. 10.5772/51066.
    1. B a s a l d e ll a M, F u r r e r L, T a s s o C, R i n a l d i F. E n tity recognition in the biomedical domain using a hybrid approach. J Biomed Semant. 2017;8(1):51. doi: 10.1186/s13326-017-0157-6. - DOI - PMC - PubMed
    1. Crichton G, Pyysalo S, Chiu B, Korhonen A. A neural network multi-task learning approach to biomedical named entity recognition. BMC Bioinformatics. 2017;18(1):368. doi: 10.1186/s12859-017-1776-8. - DOI - PMC - PubMed
    1. Lample G, Ballesteros M, Subramanian S, Kawakami K, Dyer C. Neural architectures for named entity recognition. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. San Diego: Association for Computational Linguistics: 2016. p. 260–70. 10.18653/v1/N16-1030, https://www.aclweb.org/anthology/N16-1030.

Publication types

LinkOut - more resources