A knowledge-driven approach to biomedical document conceptualization
- PMID: 20371168
- DOI: 10.1016/j.artmed.2010.02.005
A knowledge-driven approach to biomedical document conceptualization
Abstract
Objective: Biomedical document conceptualization is the process of clustering biomedical documents based on ontology-represented domain knowledge. The result of this process is the representation of the biomedical documents by a set of key concepts and their relationships. Most of clustering methods cluster documents based on invariant domain knowledge. The objective of this work is to develop an effective method to cluster biomedical documents based on various user-specified ontologies, so that users can exploit the concept structures of documents more effectively.
Methods: We develop a flexible framework to allow users to specify the knowledge bases, in the form of ontologies. Based on the user-specified ontologies, we develop a key concept induction algorithm, which uses latent semantic analysis to identify key concepts and cluster documents. A corpus-related ontology generation algorithm is developed to generate the concept structures of documents.
Results: Based on two biomedical datasets, we evaluate the proposed method and five other clustering algorithms. The clustering results of the proposed method outperform the five other algorithms, in terms of key concept identification. With respect to the first biomedical dataset, our method has the F-measure values 0.7294 and 0.5294 based on the MeSH ontology and gene ontology (GO), respectively. With respect to the second biomedical dataset, our method has the F-measure values 0.6751 and 0.6746 based on the MeSH ontology and GO, respectively. Both results outperforms the five other algorithms in terms of F-measure. Based on the MeSH ontology and GO, the generated corpus-related ontologies show informative conceptual structures.
Conclusions: The proposed method enables users to specify the domain knowledge to exploit the conceptual structures of biomedical document collections. In addition, the proposed method is able to extract the key concepts and cluster the documents with a relatively high precision.
Copyright 2010 Elsevier B.V. All rights reserved.
Similar articles
-
GOClonto: an ontological clustering approach for conceptualizing PubMed abstracts.J Biomed Inform. 2010 Feb;43(1):31-40. doi: 10.1016/j.jbi.2009.07.006. Epub 2009 Jul 25. J Biomed Inform. 2010. PMID: 19635585
-
A concept-driven biomedical knowledge extraction and visualization framework for conceptualization of text corpora.J Biomed Inform. 2010 Dec;43(6):1020-35. doi: 10.1016/j.jbi.2010.09.008. Epub 2010 Sep 24. J Biomed Inform. 2010. PMID: 20870033
-
Enhancing MEDLINE document clustering by incorporating MeSH semantic similarity.Bioinformatics. 2009 Aug 1;25(15):1944-51. doi: 10.1093/bioinformatics/btp338. Epub 2009 Jun 3. Bioinformatics. 2009. PMID: 19497938
-
Interpreting microarray results with gene ontology and MeSH.Methods Mol Biol. 2007;377:223-42. doi: 10.1007/978-1-59745-390-5_14. Methods Mol Biol. 2007. PMID: 17634620 Review.
-
Natural Language Processing methods and systems for biomedical ontology learning.J Biomed Inform. 2011 Feb;44(1):163-79. doi: 10.1016/j.jbi.2010.07.006. Epub 2010 Jul 18. J Biomed Inform. 2011. PMID: 20647054 Free PMC article. Review.
Cited by
-
The role of a multicentre data repository in ocular inflammation: The Ocular Autoimmune Systemic Inflammatory Infectious Study (OASIS).Eye (Lond). 2023 Oct;37(15):3084-3096. doi: 10.1038/s41433-023-02472-5. Epub 2023 Mar 14. Eye (Lond). 2023. PMID: 36918629 Free PMC article. Review.
-
Mapping biological entities using the longest approximately common prefix method.BMC Bioinformatics. 2014 Jun 14;15:187. doi: 10.1186/1471-2105-15-187. BMC Bioinformatics. 2014. PMID: 24928653 Free PMC article.
-
Towards semantic search and inference in electronic medical records: An approach using concept--based information retrieval.Australas Med J. 2012;5(9):482-8. doi: 10.4066/AMJ.2012.1362. Epub 2012 Sep 30. Australas Med J. 2012. PMID: 23115582 Free PMC article.
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources