Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2009 Apr 29;10 Suppl 4(Suppl 4):S9.
doi: 10.1186/1471-2105-10-S4-S9.

Biclustering of microarray data with MOSPO based on crowding distance

Affiliations

Biclustering of microarray data with MOSPO based on crowding distance

Junwan Liu et al. BMC Bioinformatics. .

Abstract

Background: High-throughput microarray technologies have generated and accumulated massive amounts of gene expression datasets that contain expression levels of thousands of genes under hundreds of different experimental conditions. The microarray datasets are usually presented in 2D matrices, where rows represent genes and columns represent experimental conditions. The analysis of such datasets can discover local structures composed by sets of genes that show coherent expression patterns under subsets of experimental conditions. It leads to the development of sophisticated algorithms capable of extracting novel and useful knowledge from a biomedical point of view. In the medical domain, these patterns are useful for understanding various diseases, and aid in more accurate diagnosis, prognosis, treatment planning, as well as drug discovery.

Results: In this work we present the CMOPSOB (Crowding distance based Multi-objective Particle Swarm Optimization Biclustering), a novel clustering approach for microarray datasets to cluster genes and conditions highly related in sub-portions of the microarray data. The objective of biclustering is to find sub-matrices, i.e. maximal subgroups of genes and subgroups of conditions where the genes exhibit highly correlated activities over a subset of conditions. Since these objectives are mutually conflicting, they become suitable candidates for multi-objective modelling. Our approach CMOPSOB is based on a heuristic search technique, multi-objective particle swarm optimization, which simulates the movements of a flock of birds which aim to find food. In the meantime, the nearest neighbour search strategies based on crowding distance and -dominance can rapidly converge to the Pareto front and guarantee diversity of solutions. We compare the potential of this methodology with other biclustering algorithms by analyzing two common and public datasets of gene expression profiles. In all cases our method can find localized structures related to sets of genes that show consistent expression patterns across subsets of experimental conditions. The mined patterns present a significant biological relevance in terms of related biological processes, components and molecular functions in a species-independent manner.

Conclusion: The proposed CMOPSOB algorithm is successfully applied to biclustering of microarray dataset. It achieves a good diversity in the obtained Pareto front, and rapid convergence. Therefore, it is a useful tool to analyze large microarray datasets.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Small biclusters of size 24 × 15 on the yeast dataset. This figure shows the expression value of 24 genes under 15 conditions from the found biclusters.

References

    1. Su A, Cooke M, Ching K, Hakak Y, Walker J, Wiltshire T, Orth A, Vega R, Sapinoso L, Moqrich A. Large-scale analysis of the human and mouse transcriptomes. Proc Natl Acad Sci U S A. 2002;99:4465–4470. - PMC - PubMed
    1. De Smet F, Mathys J, Marchal K, Thijs G, De Moor B, Moreau Y. Adaptive quality-based clustering of gene expression profiles. Bioinformatics. 2002;18:735–746. - PubMed
    1. Eisen MB, Spellman PT, Brown PO, Botstein D. Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci U S A. 1998;95:14863–14868. - PMC - PubMed
    1. Tamayo P, Slonim D, Mesirov J, Zhu Q, Kitareewan S, Dmitrovsky E, Lander ES, Golub TR. Interpreting patterns of gene expression with self-organizing maps: methods and application to hematopoietic differentiation. Proc Natl Acad Sci USA. 1999;96:2907–2912. - PMC - PubMed
    1. Cheng Y, Church GM. Biclustering of expression data. Proc Int Conf Intell Syst Mol Biol. 2000;8:93–103. - PubMed

Publication types

MeSH terms

LinkOut - more resources