Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 Dec;39(12):3046-3061.
doi: 10.1016/j.cor.2012.03.008.

Clustering of High Throughput Gene Expression Data

Affiliations

Clustering of High Throughput Gene Expression Data

Harun Pirim et al. Comput Oper Res. 2012 Dec.

Abstract

High throughput biological data need to be processed, analyzed, and interpreted to address problems in life sciences. Bioinformatics, computational biology, and systems biology deal with biological problems using computational methods. Clustering is one of the methods used to gain insight into biological processes, particularly at the genomics level. Clearly, clustering can be used in many areas of biological data analysis. However, this paper presents a review of the current clustering algorithms designed especially for analyzing gene expression data. It is also intended to introduce one of the main problems in bioinformatics - clustering gene expression data - to the operations research community.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
A microarray chip produced by Affimetrix courtesy, (source: http://www.affymetrix.com/about_affymetrix/media/image-library.affx)
Fig. 2
Fig. 2
Reverse engineering to infer about the extracted data
Fig. 3
Fig. 3
Biological experiment and validation work flow
Fig. 4
Fig. 4
Image plot of expression values
Fig. 5
Fig. 5
Transitive distance - the distance between genes G1 and G5 is 8 not 9 since the shortest path between genes is considered rather than the pairwise distance.
Fig. 6
Fig. 6
70 data points generated by two different normal distributions. Stars are the cluster centers to be used by the K-means algorithm. Circles represent the 2 clusters found by the K-means algorithm
Fig. 7
Fig. 7
Dendrogram of the simulated data generated for Figure 6
Fig. 8
Fig. 8
Hierarchical and spring embedded layouts for protein-protein and protein-DNA interactions in yeast galactose metabolism
Fig. 9
Fig. 9
Priority queue

References

    1. Adamcsek B, Palla G, Farkas IJ, Derényi I, Vicsek T. Cfinder: locating cliques and overlapping modules in biological networks. Bioinformatics. 2006;22(8):1021–1023. - PubMed
    1. Agarwal G, Kempe D. Modularity-maximizing graph communities via mathematical programming. The European Physical Journal. 2008;B 66:409–418.
    1. Alderson DL. Catching the networkscience bug: insight and opportunity for the operations researcher. Operations Research. 2008;56(5):1047–1065.
    1. Allison DB, Page GP, Beasley TM, Edwards JW. DNA Microarrays and Related Genomics Techniques: Design, Analysis, and Interpretation of Experiments (Biostatistics) Chapman and Hall/CRC. 2005
    1. Alshalalfah M, Alhajj R. Cancer class prediction: two stage clustering approach to identify informative genes. Intelligent Data Analysis. 2009;13(4):671–686.

LinkOut - more resources