. 2015;16 Suppl 10(Suppl 10):S6.

doi: 10.1186/1471-2105-16-S10-S6. Epub 2015 Jul 13.

The contribution of co-reference resolution to supervised relation detection between bacteria and biotopes entities

Thomas Lavergne, Cyril Grouin, Pierre Zweigenbaum

PMID: 26201352
PMCID: PMC4511182
DOI: 10.1186/1471-2105-16-S10-S6

The contribution of co-reference resolution to supervised relation detection between bacteria and biotopes entities

Thomas Lavergne et al. BMC Bioinformatics. 2015.

. 2015;16 Suppl 10(Suppl 10):S6.

doi: 10.1186/1471-2105-16-S10-S6. Epub 2015 Jul 13.

Authors

Thomas Lavergne, Cyril Grouin, Pierre Zweigenbaum

PMID: 26201352
PMCID: PMC4511182
DOI: 10.1186/1471-2105-16-S10-S6

Abstract

Background: The acquisition of knowledge about relations between bacteria and their locations (habitats and geographical locations) in short texts about bacteria, as defined in the BioNLP-ST 2013 Bacteria Biotope task, depends on the detection of co-reference links between mentions of entities of each of these three types. To our knowledge, no participant in this task has investigated this aspect of the situation. The present work specifically addresses issues raised by this situation: (i) how to detect these co-reference links and associated co-reference chains; (ii) how to use them to prepare positive and negative examples to train a supervised system for the detection of relations between entity mentions; (iii) what context around which entity mentions contributes to relation detection when co-reference chains are provided.

Results: We present experiments and results obtained both with gold entity mentions (task 2 of BioNLP-ST 2013) and with automatically detected entity mentions (end-to-end system, in task 3 of BioNLP-ST 2013). Our supervised mention detection system uses a linear chain Conditional Random Fields classifier, and our relation detection system relies on a Logistic Regression (aka Maximum Entropy) classifier. They use a set of morphological, morphosyntactic and semantic features. To minimize false inferences, co-reference resolution applies a set of heuristic rules designed to optimize precision. They take into account the types of the detected entity mentions, and take advantage of the didactic nature of the texts of the corpus, where a large proportion of bacteria naming is fairly explicit (although natural referring expressions such as "the bacteria" are common). The resulting system achieved a 0.495 F-measure on the official test set when taking as input the gold entity mentions, and a 0.351 F-measure when taking as input entity mentions predicted by our CRF system, both of which are above the best BioNLP-ST 2013 participant system.

Conclusions: We show that co-reference resolution substantially improves over a baseline system which does not use co-reference information: about 3.5 F-measure points on the test corpus for the end-to-end system (5.5 points on the development corpus) and 7 F-measure points on both development and test corpora when gold mentions are used. While this outperforms the best published system on the BioNLP-ST 2013 Bacteria Biotope dataset, we consider that it provides mostly a stronger baseline from which more work can be started. We also emphasize the importance and difficulty of designing a comprehensive gold standard co-reference annotation, which we explain is a key point to further progress on the task.

PubMed Disclaimer

Figures

**Figure 1**
**Global view of the process**.

**Figure 2**
**Annotated corpus excerpt**. Excerpt from the annotated corpus (BTID-10087 file, training corpus) using the BRAT Rapid Annotation Tool.

**Figure 3**
**Co-reference between similar Bacteria mentions**. Co-reference relation between graphically similar Bacteria entity mentions ***Campylobacter coli*** and ***C. coli***: these mentions and this instance of co-reference relation were given in the gold standard annotations provided with the training corpus of Task 2. This co-reference relation is instrumental in the detection of the Localization relations between ***Campylobacter coli*** and its habitats ***pigs***, ***birds***, and ***surface water***. Ellipsis ***(...)*** shows skipped material.

**Figure 4**
**Co-reference through definite noun phrase anaphora to a Bacteria-type mention**. Co-reference relation through definite noun phrase anaphora to a Bacteria-type mention: ***The organism*** refers to Bacteria-type mention ***Yersinia pestis***. Since ***The organism*** is not the name of a bacterium, it was not annotated in the gold standard annotations provided with the training corpus of Task 2, nor was the corresponding co-reference relation. Since sentence ***The organism ... infection*** asserts the locations where ***The organism*** can be found, it is important to capture the co-reference between ***This organism*** and ***Yersinia pestis*** to detect the Localization relations between this bacterium and its habitats. Ellipsis ***(...)*** shows skipped material.

**Figure 5**
**Impact on relation detection of the maximum distance (in sentences) between entity mentions when training (training corpus)**. Variation of performance with the maximum distance s between entity mentions at training time.

**Figure 6**
**Impact on relation detection of the maximum distance (in sentences) between entity mentions when decoding (development corpus)**. Variation of performance with the maximum distance s between entity mentions at decoding time.

**Figure 7**
**Impact on relation detection of the size of the context surrounding each mention**. Variation of performance with the size n of the left and right contexts used to collect features for each entity mention on the development corpus.

See this image and copyright information in PMC

Cited by

Extracting medications and associated adverse drug events using a natural language processing system combining knowledge base and deep learning.
Chen L, Gu Y, Ji X, Sun Z, Li H, Gao Y, Huang Y. Chen L, et al. J Am Med Inform Assoc. 2020 Jan 1;27(1):56-64. doi: 10.1093/jamia/ocz141. J Am Med Inform Assoc. 2020. PMID: 31591641 Free PMC article.
Extraction of Information Related to Adverse Drug Events from Electronic Health Record Notes: Design of an End-to-End Model Based on Deep Learning.
Li F, Liu W, Yu H. Li F, et al. JMIR Med Inform. 2018 Nov 26;6(4):e12159. doi: 10.2196/12159. JMIR Med Inform. 2018. PMID: 30478023 Free PMC article.
Relation Extraction from Clinical Narratives Using Pre-trained Language Models.
Wei Q, Ji Z, Si Y, Du J, Wang J, Tiryaki F, Wu S, Tao C, Roberts K, Xu H. Wei Q, et al. AMIA Annu Symp Proc. 2020 Mar 4;2019:1236-1245. eCollection 2019. AMIA Annu Symp Proc. 2020. PMID: 32308921 Free PMC article.
Bridging semantics and syntax with graph algorithms-state-of-the-art of extracting biomedical relations.
Luo Y, Uzuner Ö, Szolovits P. Luo Y, et al. Brief Bioinform. 2017 Jan;18(1):160-178. doi: 10.1093/bib/bbw001. Epub 2016 Feb 5. Brief Bioinform. 2017. PMID: 26851224 Free PMC article.
A neural joint model for entity and relation extraction from biomedical text.
Li F, Zhang M, Fu G, Ji D. Li F, et al. BMC Bioinformatics. 2017 Mar 31;18(1):198. doi: 10.1186/s12859-017-1609-9. BMC Bioinformatics. 2017. PMID: 28359255 Free PMC article.

See all "Cited by" articles

References

1. Kim JD, Pyysalo S, Ohta T, Bossy R, Nguyen N, Tsujii J. BioNLP Shared Task 2011 Workshop Proc. Portland, OR: ACL; 2011. Overview of BioNLP Shared Task 2011; pp. 1–6.
1. Bossy R, Jourde J, Manine AP, Veber P, Alphonse E, van de Guchte M, Bessières P, Nédellec C. BioNLP Shared Task - The Bacteria Track. BMC Bioinformatics. 2012;13(Suppl 11):S3. doi: 10.1186/1471-2105-13-S11-S3. - DOI - PMC - PubMed
1. Nédellec C, Bossy R, Kim JD, Kim JJ, Ohta T, Pyysalo S, Zweigenbaum P. Proceedings of the BioNLP Shared Task 2013 Workshop. Sofia, Bulgaria: Association for Computational Linguistics; 2013. Overview of BioNLP Shared Task 2013; pp. 1–7.
1. Bossy R, Golik W, Ratkovic Z, Valsamou D, Bessières P, Nédellec C. Overview of the Gene Regulation Network and the Bacteria Biotope Tasks in BioNLP'13 Shared Task. BMC Bioinformatics. 2014. - PMC - PubMed
1. Coreference task definition (v2.3) In: Proceedings of the Sixth Message Understanding Conference (MUC-6) Columbia, MD . 1995. pp. 335–344.

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

The contribution of co-reference resolution to supervised relation detection between bacteria and biotopes entities

The contribution of co-reference resolution to supervised relation detection between bacteria and biotopes entities

Authors

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Related information

LinkOut - more resources

Full Text Sources