Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Aug 18;12(1):16.
doi: 10.1186/s13326-021-00248-y.

Syntax-based transfer learning for the task of biomedical relation extraction

Affiliations

Syntax-based transfer learning for the task of biomedical relation extraction

Joël Legrand et al. J Biomed Semantics. .

Abstract

Background: Transfer learning aims at enhancing machine learning performance on a problem by reusing labeled data originally designed for a related, but distinct problem. In particular, domain adaptation consists for a specific task, in reusing training data developedfor the same task but a distinct domain. This is particularly relevant to the applications of deep learning in Natural Language Processing, because they usually require large annotated corpora that may not exist for the targeted domain, but exist for side domains.

Results: In this paper, we experiment with transfer learning for the task of relation extraction from biomedical texts, using the TreeLSTM model. We empirically show the impact of TreeLSTM alone and with domain adaptation by obtaining better performances than the state of the art on two biomedical relation extraction tasks and equal performances for two others, for which little annotated data are available. Furthermore, we propose an analysis of the role that syntactic features may play in transfer learning for relation extraction.

Conclusion: Given the difficulty to manually annotate corpora in the biomedical domain, the proposed transfer learning method offers a promising alternative to achieve good relation extraction performances for domains associated with scarce resources. Also, our analysis illustrates the importance that syntax plays in transfer learning, underlying the importance in this domain to privilege approaches that embed syntactic features.

Keywords: Biomedical relation extraction; Deep learning; Transfer learning.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
Example of relationship typed as Weak Confidence Association between two named entities: a SNP (single nucleotide polymorphism) and a Phenotype, from the SNPPhenA corpus
Fig. 2
Fig. 2
The MCCNN model with three channels, two CNN kernels of size 2 (CNN2) and 3 (CNN3). Red words correspond to the entities
Fig. 3
Fig. 3
The TreeLSTM model. Each node takes as input the representation of its children. Red words correspond to the entities
Fig. 4
Fig. 4
Dependency parse tree of a sentence from SNPPhena expressing a relation between the entities rs429358 and dementia. The shortest dependency path between the two entities is shown in bold
Fig. 5
Fig. 5
Examples of patterns and of their instantiation in corpora. Red words correspond to entities

References

    1. Weiss KR, Khoshgoftaar TM, Wang D. A survey of transfer learning. J Big Data. 2016;3:9.
    1. Zeng D, Liu K, Lai S, Zhou G, Zhao J. Proceedings of COLING 2014, the 25th International Conference on Computational Linguistics: Technical Papers. Dublin: Dublin City University and Association for Computational Linguistics; 2014. Relation classification via convolutional deep neural network.
    1. Mintz M, Bills S, Snow R, Jurafsky D. Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP. Suntec: Association for Computational Linguistics; 2009. Distant supervision for relation extraction without labeled data.
    1. Bokharaeian B, Esteban AD, Taghizadeh N, Chitsaz H, Chavoshinejad R. SNPPhenA: a corpus for extracting ranked associations of single-nucleotide polymorphisms and phenotypes from literature. J Biomed Semant. 2017;8(1):14–11413. - PMC - PubMed
    1. van Mulligen EM, Fourrier-Réglat A, Gurwitz D, Molokhia M, Nieto A, Trifirò G, Kors JA, Furlong LI. The EU-ADR corpus: Annotated drugs, diseases, targets, and their relationships. J Biomed Inform. 2012;45(5):879–84. - PubMed

Publication types

MeSH terms

LinkOut - more resources