Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018:4:e155.
doi: 10.7717/peerj-cs.155. Epub 2018 May 21.

ClusterEnG: an interactive educational web resource for clustering and visualizing high-dimensional data

Affiliations

ClusterEnG: an interactive educational web resource for clustering and visualizing high-dimensional data

Mohith Manjunath et al. PeerJ Comput Sci. 2018.

Abstract

Summary: Clustering is one of the most common techniques used in data analysis to discover hidden structures by grouping together data points that are similar in some measure into clusters. Although there are many programs available for performing clustering, a single web resource that provides both state-of-the-art clustering methods and interactive visualizations is lacking. ClusterEnG (acronym for Clustering Engine for Genomics) provides an interface for clustering big data and interactive visualizations including 3D views, cluster selection and zoom features. ClusterEnG also aims at educating the user about the similarities and differences between various clustering algorithms and provides clustering tutorials that demonstrate potential pitfalls of each algorithm. The web resource will be particularly useful to scientists who are not conversant with computing but want to understand the structure of their data in an intuitive manner.

Availability: ClusterEnG is part of a bigger project called KnowEnG (Knowledge Engine for Genomics) and is available at http://education.knoweng.org/clustereng.

Contact: songi@illinois.edu.

PubMed Disclaimer

Conflict of interest statement

Conflict of Interest: none declared.

Figures

Figure 1
Figure 1. Typical workflow of ClusterEnG encompassing educational and visualization components.
Figure 2
Figure 2. A partial snapshot of ClusterEnG user interface showing a choice of clustering algorithms and related options.
Figure 3
Figure 3. NCI60 gene expression sample data clustering of samples using k-medoids algorithm.
The snapshots show visualizations of first three principal components and vectors from PCA and t-SNE, respectively, in (A) 2D and (B) 3D with perspective and orthogonal projection of principal components.
Figure 4
Figure 4. Dynamic clustering application in affinity propagation using R Shiny server displaying heatmap of similarity matrix of selected data points.
Figure 5
Figure 5. Benchmarking results illustrating algorithm run time for the clustering algorithms in ClusterEnG.
“PCA time” data indicates the time taken to compute principal components, a step common to all the algorithms for visualization.

References

    1. Alizadeh AA, Eisen MB, Davis RE, Ma C, Lossos IS, Rosenwald A, Boldrick JC, Sabet H, Tran T, Yu X, Powell JI, Yang L, Marti GE, Moore T, Hudson Jr J, Lu L, Lewis DB, Tibshirani R, Sherlock G, Chan WC, Greiner TC, Weisenburger DD Armitage, JO, Warnke R, Levy R, Wilson W, Grever MR, Byrd JC, Botstein D, Brown PO, Staudt LM. Distinct types of diffuse large B-cell lymphoma identified by gene expression profiling. Nature. 2000;403:503–511. doi: 10.1038/35000501. - DOI - PubMed
    1. Bostock M, Ogievetsky V, Heer J. D-3: data-Driven Documents. IEEE Transactions on Visualization and Computer Graphics. 2011;17:2301–2309. doi: 10.1109/TVCG.2011.185. - DOI - PubMed
    1. Cabello R. Three.js. https://github.com/mrdoob/three.js. [15 March 2017];2010
    1. Chang W, Cheng J, Allaire JJ, Xie Y, McPherson J. Shiny: web application framework for R. R package version 0.122https://cran.r-project.org/web/packages/shiny/index.html. [3 March 2017];2015
    1. Chen WY, Song YQ, Bai HJ, Lin CJ, Chang EY. Parallel spectral clustering in distributed systems. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2011;33:568–586. doi: 10.1109/Tpami.2010.88. - DOI - PubMed

LinkOut - more resources