Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2009 Jul 15;25(14):1789-95.
doi: 10.1093/bioinformatics/btp327. Epub 2009 Jun 3.

Seeing the forest for the trees: using the Gene Ontology to restructure hierarchical clustering

Affiliations

Seeing the forest for the trees: using the Gene Ontology to restructure hierarchical clustering

Dikla Dotan-Cohen et al. Bioinformatics. .

Abstract

Motivation: There is a growing interest in improving the cluster analysis of expression data by incorporating into it prior knowledge, such as the Gene Ontology (GO) annotations of genes, in order to improve the biological relevance of the clusters that are subjected to subsequent scrutiny. The structure of the GO is another source of background knowledge that can be exploited through the use of semantic similarity.

Results: We propose here a novel algorithm that integrates semantic similarities (derived from the ontology structure) into the procedure of deriving clusters from the dendrogram constructed during expression-based hierarchical clustering. Our approach can handle the multiple annotations, from different levels of the GO hierarchy, which most genes have. Moreover, it treats annotated and unannotated genes in a uniform manner. Consequently, the clusters obtained by our algorithm are characterized by significantly enriched annotations. In both cross-validation tests and when using an external index such as protein-protein interactions, our algorithm performs better than previous approaches. When applied to human cancer expression data, our algorithm identifies, among others, clusters of genes related to immune response and glucose metabolism. These clusters are also supported by protein-protein interaction data.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Clustering performances as a function of the number of clusters. (A). Overall PPI percentage in the obtained clusters. (B) The 5-fold cross-validation tests, strict accuracy. (C) The 5-fold cross-validation tests, sw-accuracy.
Fig. 2.
Fig. 2.
Clustering performances as a function of the number of clusters: average accuracy over 10 experiments. (A) Overall PPI percentage in the obtained clusters. (B) The 5-fold cross-validation tests, sw-accuracy.

Similar articles

Cited by

References

    1. Bellazzi R, Zupan B. Towards knowledge-based gene expression data mining. J. Biomed. Inform. 2007;6:787–802. - PubMed
    1. Buehler EC, et al. The CRASSS plug-in for integrating annotation data with hierarchical clustering results. Bioinformatics. 2004;20:3266–3269. - PubMed
    1. Cheng J, et al. A knowledge-based clustering algorithm driven by Gene Ontology. J. Biopharm. Stat. 2004;14:687–700. - PubMed
    1. Crocker PR, et al. Siglecs and their roles in the immune system. Nat. Rev. Immunol. 2007;7:255–266. - PubMed
    1. Curtis RK, et al. Pathways to the analysis of microarray data. Trends Biotechnol. 2005;23:429–435. - PubMed

Publication types