GOClonto: an ontological clustering approach for conceptualizing PubMed abstracts
- PMID: 19635585
- DOI: 10.1016/j.jbi.2009.07.006
GOClonto: an ontological clustering approach for conceptualizing PubMed abstracts
Abstract
Concurrent with progress in biomedical sciences, an overwhelming of textual knowledge is accumulating in the biomedical literature. PubMed is the most comprehensive database collecting and managing biomedical literature. To help researchers easily understand collections of PubMed abstracts, numerous clustering methods have been proposed to group similar abstracts based on their shared features. However, most of these methods do not explore the semantic relationships among groupings of documents, which could help better illuminate the groupings of PubMed abstracts. To address this issue, we proposed an ontological clustering method called GOClonto for conceptualizing PubMed abstracts. GOClonto uses latent semantic analysis (LSA) and gene ontology (GO) to identify key gene-related concepts and their relationships as well as allocate PubMed abstracts based on these key gene-related concepts. Based on two PubMed abstract collections, the experimental results show that GOClonto is able to identify key gene-related concepts and outperforms the STC (suffix tree clustering) algorithm, the Lingo algorithm, the Fuzzy Ants algorithm, and the clustering based TRS (tolerance rough set) algorithm. Moreover, the two ontologies generated by GOClonto show significant informative conceptual structures.
Similar articles
-
A knowledge-driven approach to biomedical document conceptualization.Artif Intell Med. 2010 Jun;49(2):67-78. doi: 10.1016/j.artmed.2010.02.005. Epub 2010 Apr 3. Artif Intell Med. 2010. PMID: 20371168
-
PuReD-MCL: a graph-based PubMed document clustering methodology.Bioinformatics. 2008 Sep 1;24(17):1935-41. doi: 10.1093/bioinformatics/btn318. Epub 2008 Jul 1. Bioinformatics. 2008. PMID: 18593717
-
Biomedical knowledge navigation by literature clustering.J Biomed Inform. 2007 Apr;40(2):114-30. doi: 10.1016/j.jbi.2006.07.004. Epub 2006 Aug 5. J Biomed Inform. 2007. PMID: 16996316
-
Recognizing names in biomedical texts: a machine learning approach.Bioinformatics. 2004 May 1;20(7):1178-90. doi: 10.1093/bioinformatics/bth060. Epub 2004 Feb 10. Bioinformatics. 2004. PMID: 14871877
-
Concept-based annotation of enzyme classes.Bioinformatics. 2005 May 1;21(9):2059-66. doi: 10.1093/bioinformatics/bti284. Epub 2005 Jan 20. Bioinformatics. 2005. PMID: 15661799
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources