Automatic discourse connective detection in biomedical text
- PMID: 22744958
- PMCID: PMC3422833
- DOI: 10.1136/amiajnl-2011-000775
Automatic discourse connective detection in biomedical text
Abstract
Objective: Relation extraction in biomedical text mining systems has largely focused on identifying clause-level relations, but increasing sophistication demands the recognition of relations at discourse level. A first step in identifying discourse relations involves the detection of discourse connectives: words or phrases used in text to express discourse relations. In this study supervised machine-learning approaches were developed and evaluated for automatically identifying discourse connectives in biomedical text.
Materials and methods: Two supervised machine-learning models (support vector machines and conditional random fields) were explored for identifying discourse connectives in biomedical literature. In-domain supervised machine-learning classifiers were trained on the Biomedical Discourse Relation Bank, an annotated corpus of discourse relations over 24 full-text biomedical articles (~112,000 word tokens), a subset of the GENIA corpus. Novel domain adaptation techniques were also explored to leverage the larger open-domain Penn Discourse Treebank (~1 million word tokens). The models were evaluated using the standard evaluation metrics of precision, recall and F1 scores.
Results and conclusion: Supervised machine-learning approaches can automatically identify discourse connectives in biomedical text, and the novel domain adaptation techniques yielded the best performance: 0.761 F1 score. A demonstration version of the fully implemented classifier BioConn is available at: http://bioconn.askhermes.org.
Conflict of interest statement
Figures



References
-
- Prasad R, Dinesh N, Lee A, et al. The penn discourse treebank 2.0. Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC 2008). Marrakech, Morocco, 2008:2961–8 Organised by ELRA, the European Language Resources Association, with the collaboration of major international institutions and organisations.
-
- Marcu D. Improving summarization through rhetorical parsing tuning. In: The 6th Workshop on Very Large Corpora; Montreal, Canada. New Brunswick, NY: The Association for Computational Linguistics SIGDAT, 1998:206–15
-
- Hovy EH. Automated discourse generation using discourse structure relations. In: Artificial Intelligence, Volume 63. 1993:341–85
-
- Hernault H, Piwek P, Prendinger H, et al. Generating dialogues for virtual agents using nested textual coherence relations. Proceedings of the 8th International Conference on Intelligent Virtual Agents; Tokyo, Japan. Tokyo, Japan: International Information Science Foundation, 2008:139–45