A controlled greedy supervised approach for co-reference resolution on clinical text
- PMID: 23562650
- DOI: 10.1016/j.jbi.2013.03.007
A controlled greedy supervised approach for co-reference resolution on clinical text
Abstract
Identification of co-referent entity mentions inside text has significant importance for other natural language processing (NLP) tasks (e.g. event linking). However, this task, known as co-reference resolution, remains a complex problem, partly because of the confusion over different evaluation metrics and partly because the well-researched existing methodologies do not perform well on new domains such as clinical records. This paper presents a variant of the influential mention-pair model for co-reference resolution. Using a series of linguistically and semantically motivated constraints, the proposed approach controls generation of less-informative/sub-optimal training and test instances. Additionally, the approach also introduces some aggressive greedy strategies in chain clustering. The proposed approach has been tested on the official test corpus of the recently held i2b2/VA 2011 challenge. It achieves an unweighted average F1 score of 0.895, calculated from multiple evaluation metrics (MUC, B(3) and CEAF scores). These results are comparable to the best systems of the challenge. What makes our proposed system distinct is that it also achieves high average F1 scores for each individual chain type (Test: 0.897, Person: 0.852, PROBLEM: 0.855, TREATMENT: 0.884). Unlike other works, it obtains good scores for each of the individual metrics rather than being biased towards a particular metric.
Copyright © 2013 Elsevier Inc. All rights reserved.
Publication types
MeSH terms
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources