Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2019 Jul 19;20(4):1449-1464.
doi: 10.1093/bib/bby014.

It is time to apply biclustering: a comprehensive review of biclustering applications in biological and biomedical data

Review

It is time to apply biclustering: a comprehensive review of biclustering applications in biological and biomedical data

Juan Xie et al. Brief Bioinform. .

Abstract

Biclustering is a powerful data mining technique that allows clustering of rows and columns, simultaneously, in a matrix-format data set. It was first applied to gene expression data in 2000, aiming to identify co-expressed genes under a subset of all the conditions/samples. During the past 17 years, tens of biclustering algorithms and tools have been developed to enhance the ability to make sense out of large data sets generated in the wake of high-throughput omics technologies. These algorithms and tools have been applied to a wide variety of data types, including but not limited to, genomes, transcriptomes, exomes, epigenomes, phenomes and pharmacogenomes. However, there is still a considerable gap between biclustering methodology development and comprehensive data interpretation, mainly because of the lack of knowledge for the selection of appropriate biclustering tools and further supporting computational techniques in specific studies. Here, we first deliver a brief introduction to the existing biclustering algorithms and tools in public domain, and then systematically summarize the basic applications of biclustering for biological data and more advanced applications of biclustering for biomedical data. This review will assist researchers to effectively analyze their big data and generate valuable biological knowledge and novel insights with higher efficiency.

Keywords: biclustering; biomarker and gene signatures detection; disease subtype identification; functional annotation; gene–drug association; modularity analysis; network elucidation.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Yearly comparison of biclustering algorithm development and algorithm application related studies. The references in 2017 were collected as of 26 March 2017.
Figure 2.
Figure 2.
The overall workflow of biclustering application mechanism (related to upstream and downstream process). Three layers are shown to provide the path from raw data, appropriate analytical methods/tools to various cases of the result. The power of biclustering is illustrated by the ability to generate (from left to right in the figure) co-expressed gene modules, subtype or biomarker, regulatory networks, clinical entities and estimated disease free survival (DFS) distribution.

References

    1. van Dijk EL, Auger H, Jaszczyszyn Y, et al.Ten years of next-generation sequencing technology. Trends Genet 2014;30(9):418–26. - PubMed
    1. Goodwin S, McPherson JD, McCombie WR.. Coming of age: ten years of next-generation sequencing technologies. Nat Rev Genet 2016;17(6):333–51. - PMC - PubMed
    1. Marioni JC, Mason CE, Mane SM, et al.RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. Genome Res 2008;18(9):1509–17. - PMC - PubMed
    1. Miller JA, Menon V, Goldy J, et al.Improving reliability and absolute quantification of human brain microarray data by filtering and scaling probes using RNA-Seq. BMC Genomics 2014;15(1):154. - PMC - PubMed
    1. Nagalakshmi U, Wang Z, Waern K, et al.The transcriptional landscape of the yeast genome defined by RNA sequencing. Science 2008;320(5881):1344–9. - PMC - PubMed

Publication types

MeSH terms