. 2017 Aug 15;33(16):2523-2531.

doi: 10.1093/bioinformatics/btx199.

Gracob: a novel graph-based constant-column biclustering method for mining growth phenotype data

Majed Alzahrani¹, Hiroyuki Kuwahara¹, Wei Wang², Xin Gao¹

Affiliations

¹ King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering (CEMCE) Division, Thuwal, 23955-6900, Saudi Arabia.
² Department of Computer Science, University of California, Los Angeles, CA 90095, USA.

PMID: 28379298
PMCID: PMC5870648
DOI: 10.1093/bioinformatics/btx199

Gracob: a novel graph-based constant-column biclustering method for mining growth phenotype data

Majed Alzahrani et al. Bioinformatics. 2017.

. 2017 Aug 15;33(16):2523-2531.

doi: 10.1093/bioinformatics/btx199.

Authors

Majed Alzahrani¹, Hiroyuki Kuwahara¹, Wei Wang², Xin Gao¹

Affiliations

¹ King Abdullah University of Science and Technology (KAUST), Computational Bioscience Research Center (CBRC), Computer, Electrical and Mathematical Sciences and Engineering (CEMCE) Division, Thuwal, 23955-6900, Saudi Arabia.
² Department of Computer Science, University of California, Los Angeles, CA 90095, USA.

PMID: 28379298
PMCID: PMC5870648
DOI: 10.1093/bioinformatics/btx199

Abstract

Motivation: Growth phenotype profiling of genome-wide gene-deletion strains over stress conditions can offer a clear picture that the essentiality of genes depends on environmental conditions. Systematically identifying groups of genes from such high-throughput data that share similar patterns of conditional essentiality and dispensability under various environmental conditions can elucidate how genetic interactions of the growth phenotype are regulated in response to the environment.

Results: We first demonstrate that detecting such 'co-fit' gene groups can be cast as a less well-studied problem in biclustering, i.e. constant-column biclustering. Despite significant advances in biclustering techniques, very few were designed for mining in growth phenotype data. Here, we propose Gracob, a novel, efficient graph-based method that casts and solves the constant-column biclustering problem as a maximal clique finding problem in a multipartite graph. We compared Gracob with a large collection of widely used biclustering methods that cover different types of algorithms designed to detect different types of biclusters. Gracob showed superior performance on finding co-fit genes over all the existing methods on both a variety of synthetic data sets with a wide range of settings, and three real growth phenotype datasets for E. coli, proteobacteria and yeast.

Availability and implementation: Our program is freely available for download at http://sfb.kaust.edu.sa/Pages/Software.aspx.

Contact: xin.gao@kaust.edu.sa.

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

**Fig. 1**
A minimalist example to illustrate environment-dependent genetic interactions. (a): Conditionally essential and dispensable genes. The circle, triangle and square symbols illustrate environmental inputs to the cell, for example input metabolites and ligands. Red, gray and black arrows denote active paths in wild type, inactive paths and active paths in each condition, respectively. The wild type grows normally under each condition, while the deletion of each gene has different effects on fitness under different conditions. $Δ X$ denotes the strain of deleting gene X ( $X \in {A, B, C}$ ). ‘GR’ and ‘NG’ stand for normal growth and no growth, respectively. (b): The corresponding growth phenotype data. Blue and red denote low and high fitness, respectively. The constant-column bicluster in the green box captures co-fit genes, A and B, which cannot be captured by any other constant biclusters

**Fig. 2**
Workflow of Gracob. (a): The data in each column are transformed using a cumulative distribution function, independently. (b): Data values in each column are sorted independently from other columns while keeping track of the original row indexes. (c): Nodes are created for each consecutive row subset such that the range of their values is at most δ (user defined value for how ‘constant’ each column of desired biclusters should be). A row subset can overlap with other row subsets but cannot be contained by others. (d): An edge is created between any pair of nodes if the nodes are from different columns and share at least r (user defined threshold for the smallest number of strains in desired biclusters) rows (i.e. strains). (e): Nodes with degree less than c (user defined threshold for the smallest number of conditions in desired biclusters) are deleted from the graph. (f): Each node is used to grow a clique with its connected nodes (orange circles) while thresholds, r and c, are repeatedly checked to detect future failures as early as possible. (g): Row and column index information from each clique is used to extract biclusters from the original data matrix

**Fig. 3**
Heatmap visualization of the *E. coli* growth phenotype data and the representative biclusters detected by the 11 methods. (1): The capped data matrix for the *E. coli* growth phenotype dataset with 3979 strains and 324 stress conditions. All the values bigger than 3.0 are capped as 3.0 and all the values smaller than -3.0 are capped as -3.0, for visualization purposes. (2)–(12): The representative biclusters detected by BicPAM, Bimax, CC, CPB, iBBiG, ISA, QUBIC, SAMBA, Spectral, xMOTIFs and Gracob, respectively. For each method, the predicted biclusters that have consistent patterns which appear many times in the results of this method are selected. For visualization purposes, rows and columns of each bicluster are organized by hierarchical clustering (Eisen *et al.*, 1998). That is, genes with similar values are clustered on the Y-axis and conditions with similar values are clustered on the X-axis

**Fig. 4**
Performance comparison of the 11 methods on the *E. coli*, proteobacteria and yeast growth phenotype datasets. (1a)–(3a): The average column-wise standard deviation on the three datasets, respectively. (1b)–(3b): The average size of the returned biclusters on the three datasets, respectively. (1c)–(3c): The KEGG pathway-level precision under five significance levels on the three datasets, respectively. (1d)–(3d): The GO term-level precision under five significance levels on the three datasets, respectively

See this image and copyright information in PMC

References

1. Baba T. et al. (2006) Construction of Escherichia coli K-12 in-frame, single-gene knockout mutants: the Keio collection. Mol. Syst. Biol., 2, 2006.0008. - PMC - PubMed
1. Ben-Dor A. et al. (2003) Discovering local structure in gene expression data: the order-preserving submatrix problem. J. Comput. Biol., 10, 373–384. - PubMed
1. Bergmann S. et al. (2003) Iterative signature algorithm for the analysis of large-scale gene expression data. Phys. Rev. E, 67, 031902. - PubMed
1. Blattner F.R. et al. (1997) The complete genome sequence of Escherichia coli K-12. Science, 277, 1453–1462. - PubMed
1. Bochner B.R. (2009) Global phenotypic characterization of bacteria. FEMS Microbiol. Rev., 33, 191–205. - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

R01 GM115833/GM/NIGMS NIH HHS/United States

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
- scite Smart Citations
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Gracob: a novel graph-based constant-column biclustering method for mining growth phenotype data

Affiliations

Gracob: a novel graph-based constant-column biclustering method for mining growth phenotype data

Authors

Affiliations

Abstract

Figures

References

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Miscellaneous