Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Oct:2017:163-170.
doi: 10.1109/BIBE.2017.00-61. Epub 2018 Jan 11.

Knowledge-Based Biomedical Word Sense Disambiguation with Neural Concept Embeddings

Affiliations

Knowledge-Based Biomedical Word Sense Disambiguation with Neural Concept Embeddings

Akm Sabbir et al. Proc IEEE Int Symp Bioinformatics Bioeng. 2017 Oct.

Abstract

Biomedical word sense disambiguation (WSD) is an important intermediate task in many natural language processing applications such as named entity recognition, syntactic parsing, and relation extraction. In this paper, we employ knowledge-based approaches that also exploit recent advances in neural word/concept embeddings to improve over the state-of-the-art in biomedical WSD using the public MSH WSD dataset [1] as the test set. Our methods involve weak supervision - we do not use any hand-labeled examples for WSD to build our prediction models; however, we employ an existing concept mapping program, MetaMap, to obtain our concept vectors. Over the MSH WSD dataset, our linear time (in terms of numbers of senses and words in the test instance) method achieves an accuracy of 92.24% which is a 3% improvement over the best known results [2] obtained via unsupervised means. A more expensive approach that we developed relies on a nearest neighbor framework and achieves accuracy of 94.34%, essentially cutting the error rate in half. Employing dense vector representations learned from unlabeled free text has been shown to benefit many language processing tasks recently and our efforts show that biomedical WSD is no exception to this trend. For a complex and rapidly evolving domain such as biomedicine, building labeled datasets for larger sets of ambiguous terms may be impractical. Here, we show that weak supervision that leverages recent advances in representation learning can rival supervised approaches in biomedical WSD. However, external knowledge bases (here sense inventories) play a key role in the improvements achieved.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Architecture for WSD approaches from Sections III-A and III-B
Fig. 2
Fig. 2
Accuracy of the k-NN approach with varying k

References

    1. Jimeno-Yepes A, McInnes BT, Aronson AR. Exploiting MeSH indexing in MEDLINE to generate a data set for word sense disambiguation. BMC bioinformatics. 2011;12(223) - PMC - PubMed
    1. Jimeno-Yepes A, Berlanga R. Knowledge based word-concept model estimation and refinement for biomedical text mining. Journal of biomedical informatics. 2015;53:300–307. - PubMed
    1. Luo Y, Uzuner Ö, Szolovits P. Bridging semantics and syntax with graph algorithm – state-of-the-art of extracting biomedical relations. Briefings in bioinformatics. 2016 bbw001. - PMC - PubMed
    1. Cameron D, Kavuluru R, Rindflesch TC, Sheth AP, Thirunarayan K, Bodenreider O. Context-driven automatic subgraph creation for literature-based discovery. Journal of biomedical informatics. 2015;54:141–157. - PMC - PubMed
    1. Kavuluru R, Thomas C, Sheth AP, Chan V, Wang W, Smith A, Soto A, Walters A. An up-to-date knowledge-based literature search and exploration framework for focused bioscience domains. Proc of the 2nd ACM SIGHIT Health Informatics Symposium ACM. 2012:275–284.

LinkOut - more resources