Using UMLS Concept Unique Identifiers (CUIs) for word sense disambiguation in the biomedical domain
- PMID: 18693893
- PMCID: PMC2655788
Using UMLS Concept Unique Identifiers (CUIs) for word sense disambiguation in the biomedical domain
Abstract
This paper explores the use of Concept Unique Identifiers (CUIs) as assigned by MetaMap as features for a supervised learning approach to word sense disambiguation of biomedical text. We compare the use of CUIs that occur in abstracts containing an instance of the target word with using the CUIs that occur in sentences containing an instance of the target word. We also experiment with frequency cutoffs for determining which CUIs should be included as features. We find that a Naive Bayesian classifier where the features represent CUIs that occur two or more times in abstracts containing the target word attains accuracy 9% greater than Leroy and Rindflesch's approach, which includes features based on semantic types assigned by MetaMap. Our results are comparable to those of Joshi, et. al. and Liu, et. al., who use feature sets that do not contain biomedical information.
References
-
- Joshi M, Pedersen T, Maclin R. A Comparative Study of Support Vectors Machines Applied to the Supervised Word Sense Disambiguation Problem in the Medical Domain. Proceedings of Second Indian International Conference on Artificial Intelligence; 2005. pp. 3449–3468.
-
- Leroy G, Rindflesch TC. Effects of information and machine learning algorithms on word sense disambiguation with small datasets. International Journal of Medical Informatics. 2005;74(7–8):573–585. - PubMed
-
- Witten IH, Frank E. Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann; 1999.