Lexical adaptation of link grammar to the biomedical sublanguage: a comparative evaluation of three approaches
- PMID: 17134475
- PMCID: PMC1764446
- DOI: 10.1186/1471-2105-7-S3-S2
Lexical adaptation of link grammar to the biomedical sublanguage: a comparative evaluation of three approaches
Abstract
Background: We study the adaptation of Link Grammar Parser to the biomedical sublanguage with a focus on domain terms not found in a general parser lexicon. Using two biomedical corpora, we implement and evaluate three approaches to addressing unknown words: automatic lexicon expansion, the use of morphological clues, and disambiguation using a part-of-speech tagger. We evaluate each approach separately for its effect on parsing performance and consider combinations of these approaches.
Results: In addition to a 45% increase in parsing efficiency, we find that the best approach, incorporating information from a domain part-of-speech tagger, offers a statistically significant 10% relative decrease in error.
Conclusion: When available, a high-quality domain part-of-speech tagger is the best solution to unknown word issues in the domain adaptation of a general parser. In the absence of such a resource, surface clues can provide remarkably good coverage and performance when tuned to the domain. The adapted parser is available under an open-source license.
Figures
Similar articles
-
Mapping data elements to terminological resources for integrating biomedical data sources.BMC Bioinformatics. 2006 Nov 24;7 Suppl 3(Suppl 3):S6. doi: 10.1186/1471-2105-7-S3-S6. BMC Bioinformatics. 2006. PMID: 17134479 Free PMC article.
-
An environment for relation mining over richly annotated corpora: the case of GENIA.BMC Bioinformatics. 2006 Nov 24;7 Suppl 3(Suppl 3):S3. doi: 10.1186/1471-2105-7-S3-S3. BMC Bioinformatics. 2006. PMID: 17134476 Free PMC article.
-
Two biomedical sublanguages: a description based on the theories of Zellig Harris.J Biomed Inform. 2002 Aug;35(4):222-35. doi: 10.1016/s1532-0464(03)00012-1. J Biomed Inform. 2002. PMID: 12755517 Review.
-
Porting a lexicalized-grammar parser to the biomedical domain.J Biomed Inform. 2009 Oct;42(5):852-65. doi: 10.1016/j.jbi.2008.12.004. Epub 2008 Dec 25. J Biomed Inform. 2009. PMID: 19141332
-
A critical review of PASBio's argument structures for biomedical verbs.BMC Bioinformatics. 2006 Nov 24;7 Suppl 3(Suppl 3):S5. doi: 10.1186/1471-2105-7-S3-S5. BMC Bioinformatics. 2006. PMID: 17134478 Free PMC article. Review.
Cited by
-
Corpus annotation for mining biomedical events from literature.BMC Bioinformatics. 2008 Jan 8;9:10. doi: 10.1186/1471-2105-9-10. BMC Bioinformatics. 2008. PMID: 18182099 Free PMC article.
-
The BioLexicon: a large-scale terminological resource for biomedical text mining.BMC Bioinformatics. 2011 Oct 12;12:397. doi: 10.1186/1471-2105-12-397. BMC Bioinformatics. 2011. PMID: 21992002 Free PMC article.
-
Automated extraction of precise protein expression patterns in lymphoma by text mining abstracts of immunohistochemical studies.J Pathol Inform. 2013 Jul 31;4:20. doi: 10.4103/2153-3539.115880. eCollection 2013. J Pathol Inform. 2013. PMID: 23967385 Free PMC article.
-
Bridging semantics and syntax with graph algorithms-state-of-the-art of extracting biomedical relations.Brief Bioinform. 2017 Jan;18(1):160-178. doi: 10.1093/bib/bbw001. Epub 2016 Feb 5. Brief Bioinform. 2017. PMID: 26851224 Free PMC article.
-
Domain adaption of parsing for operative notes.J Biomed Inform. 2015 Apr;54:1-9. doi: 10.1016/j.jbi.2015.01.016. Epub 2015 Feb 7. J Biomed Inform. 2015. PMID: 25661593 Free PMC article.
References
-
- Sekine S. Proceedings of the 5th ACL Conference on Applied Natural Language Processing (ANLP'97) Washington D.C., USA; 1997. The Domain Dependence of Parsing; pp. 96–102.
-
- Grishman R. Adaptive Information Extraction and Sublanguage Analysis. In: Nebel B, editor. Proceedings of the Workshop on Adaptive Text Extraction and Mining at the 17th International Joint Conference on Artificial Intelligence (IJCAI'01) Seattle, USA; 2001.
-
- Lease M, Charniak E. Parsing Biomedical Literature. In: Dale R, Wong KF, Su J, Kwong OY, editor. Proceedings of the 2nd International Joint Conference on Natural Language Processing (IJCNLP'05) Korea: Springer; 2005. pp. 58–69.
-
- Pyysalo S, Ginter F, Pahikkala T, Boberg J, Järvinen J, Salakoski T. Evaluation of Two Dependency Parsers on Biomedical Corpus Targeted at Protein-Protein Interactions. Int J Med Inform. 2006;75:430–442. - PubMed
-
- Blaschke C, Andrade MA, Ouzounis CA, Valencia A. Automatic Extraction of Biological Information from Scientific Text: Protein-Protein Interactions. In: Lengauer T, Schneider R, Bork P, Brutlag DL, Glasgow JI, Mewes HW, Zimmer R, editor. Proceedings of the 7th International Conference on Intelligent Systems for Molecular Biology (ISMB'99) 1999. pp. 60–67. - PubMed
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources