Automatic discourse connective detection in biomedical text

Balaji Polepalli Ramesh¹, Rashmi Prasad, Tim Miller, Brian Harrington, Hong Yu

Affiliations

PMID: 22744958
PMCID: PMC3422833
DOI: 10.1136/amiajnl-2011-000775

Automatic discourse connective detection in biomedical text

Balaji Polepalli Ramesh et al. J Am Med Inform Assoc. 2012 Sep-Oct.

. 2012 Sep-Oct;19(5):800-8.

doi: 10.1136/amiajnl-2011-000775. Epub 2012 Jun 28.

Authors

Balaji Polepalli Ramesh¹, Rashmi Prasad, Tim Miller, Brian Harrington, Hong Yu

Affiliation

¹ Department of Electrical Engineering and Computer Science, University of Wisconsin-Milwaukee, Milwaukee, Wisconsin 53211, USA.

PMID: 22744958
PMCID: PMC3422833
DOI: 10.1136/amiajnl-2011-000775

Abstract

Objective: Relation extraction in biomedical text mining systems has largely focused on identifying clause-level relations, but increasing sophistication demands the recognition of relations at discourse level. A first step in identifying discourse relations involves the detection of discourse connectives: words or phrases used in text to express discourse relations. In this study supervised machine-learning approaches were developed and evaluated for automatically identifying discourse connectives in biomedical text.

Materials and methods: Two supervised machine-learning models (support vector machines and conditional random fields) were explored for identifying discourse connectives in biomedical literature. In-domain supervised machine-learning classifiers were trained on the Biomedical Discourse Relation Bank, an annotated corpus of discourse relations over 24 full-text biomedical articles (~112,000 word tokens), a subset of the GENIA corpus. Novel domain adaptation techniques were also explored to leverage the larger open-domain Penn Discourse Treebank (~1 million word tokens). The models were evaluated using the standard evaluation metrics of precision, recall and F1 scores.

Results and conclusion: Supervised machine-learning approaches can automatically identify discourse connectives in biomedical text, and the novel domain adaptation techniques yielded the best performance: 0.761 F1 score. A demonstration version of the fully implemented classifier BioConn is available at: http://bioconn.askhermes.org.

PubMed Disclaimer

Conflict of interest statement

Competing interests: None.

Figures

**Figure 1**
Frequency of the tokens in the Biomedical Discourse Relation Bank (BioDRB) corpus and their frequency of as connectives.

**Figure 3**
The graph of performance of *Hybrid* classifier over different distributions of the connectives. BioDRB, Biomedical Discourse Relation Bank; PDTB, Penn Discourse Treebank.

See this image and copyright information in PMC

References

1. Zheng J, Chapman WW, Crowley RS, et al. Coreference resolution: a review of general methodologies and applications in the clinical domain. J Biomed Inform 2011;44:1113–22 - PMC - PubMed
1. Prasad R, Dinesh N, Lee A, et al. The penn discourse treebank 2.0. Proceedings of the 6th International Conference on Language Resources and Evaluation (LREC 2008). Marrakech, Morocco, 2008:2961–8 Organised by ELRA, the European Language Resources Association, with the collaboration of major international institutions and organisations.
1. Marcu D. Improving summarization through rhetorical parsing tuning. In: The 6th Workshop on Very Large Corpora; Montreal, Canada. New Brunswick, NY: The Association for Computational Linguistics SIGDAT, 1998:206–15
1. Hovy EH. Automated discourse generation using discourse structure relations. In: Artificial Intelligence, Volume 63. 1993:341–85
1. Hernault H, Piwek P, Prendinger H, et al. Generating dialogues for virtual agents using nested textual coherence relations. Proceedings of the 8th International Conference on Intelligent Virtual Agents; Tokyo, Japan. Tokyo, Japan: International Information Science Foundation, 2008:139–45

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Automatic discourse connective detection in biomedical text

Affiliation

Automatic discourse connective detection in biomedical text

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources