Evaluation of two dependency parsers on biomedical corpus targeted at protein-protein interactions
- PMID: 16099201
- DOI: 10.1016/j.ijmedinf.2005.06.009
Evaluation of two dependency parsers on biomedical corpus targeted at protein-protein interactions
Abstract
We present an evaluation of Link Grammar and Connexor Machinese Syntax, two major broad-coverage dependency parsers, on a custom hand-annotated corpus consisting of sentences regarding protein-protein interactions. In the evaluation, we apply the notion of an interaction subgraph, which is the subgraph of a dependency graph expressing a protein-protein interaction. We measure the performance of the parsers for recovery of individual dependencies, fully correct parses, and interaction subgraphs. For Link Grammar, an open system that can be inspected in detail, we further perform a comprehensive failure analysis, report specific causes of error, and suggest potential modifications to the grammar. We find that both parsers perform worse on biomedical English than previously reported on general English. While Connexor Machinese Syntax significantly outperforms Link Grammar, the failure analysis suggests specific ways in which the latter could be modified for better performance in the domain.
Similar articles
-
Recognizing names in biomedical texts: a machine learning approach.Bioinformatics. 2004 May 1;20(7):1178-90. doi: 10.1093/bioinformatics/bth060. Epub 2004 Feb 10. Bioinformatics. 2004. PMID: 14871877
-
Recognizing names in biomedical texts using mutual information independence model and SVM plus sigmoid.Int J Med Inform. 2006 Jun;75(6):456-67. doi: 10.1016/j.ijmedinf.2005.06.012. Epub 2005 Aug 19. Int J Med Inform. 2006. PMID: 16112894
-
Distributed modules for text annotation and IE applied to the biomedical domain.Int J Med Inform. 2006 Jun;75(6):496-500. doi: 10.1016/j.ijmedinf.2005.06.011. Epub 2005 Aug 8. Int J Med Inform. 2006. PMID: 16085453
-
Status of text-mining techniques applied to biomedical text.Drug Discov Today. 2006 Apr;11(7-8):315-25. doi: 10.1016/j.drudis.2006.02.011. Drug Discov Today. 2006. PMID: 16580973 Review.
-
Hairpins in bookstacks: information retrieval from biomedical text.Brief Bioinform. 2005 Sep;6(3):222-38. doi: 10.1093/bib/6.3.222. Brief Bioinform. 2005. PMID: 16212771 Review.
Cited by
-
Benchmarking natural-language parsers for biological applications using dependency graphs.BMC Bioinformatics. 2007 Jan 25;8:24. doi: 10.1186/1471-2105-8-24. BMC Bioinformatics. 2007. PMID: 17254351 Free PMC article.
-
A de-identifier for medical discharge summaries.Artif Intell Med. 2008 Jan;42(1):13-35. doi: 10.1016/j.artmed.2007.10.001. Epub 2007 Nov 28. Artif Intell Med. 2008. PMID: 18053696 Free PMC article.
-
A corpus of full-text journal articles is a robust evaluation tool for revealing differences in performance of biomedical natural language processing tools.BMC Bioinformatics. 2012 Aug 17;13:207. doi: 10.1186/1471-2105-13-207. BMC Bioinformatics. 2012. PMID: 22901054 Free PMC article.
-
BioInfer: a corpus for information extraction in the biomedical domain.BMC Bioinformatics. 2007 Feb 9;8:50. doi: 10.1186/1471-2105-8-50. BMC Bioinformatics. 2007. PMID: 17291334 Free PMC article.
-
Lexical adaptation of link grammar to the biomedical sublanguage: a comparative evaluation of three approaches.BMC Bioinformatics. 2006 Nov 24;7 Suppl 3(Suppl 3):S2. doi: 10.1186/1471-2105-7-S3-S2. BMC Bioinformatics. 2006. PMID: 17134475 Free PMC article.
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources