cblaster: a remote search tool for rapid identification and visualization of homologous gene clusters

Cameron L M Gilchrist¹, Thomas J Booth¹, Bram van Wersch², Liana van Grieken², Marnix H Medema², Yit-Heng Chooi¹

Affiliations

¹ School of Molecular Sciences, The University of Western Australia, Crawley, WA 6009, Australia.
² Bioinformatics Group, Wageningen University, Wageningen 6708PB, The Netherlands.

PMID: 36700093
PMCID: PMC9710679
DOI: 10.1093/bioadv/vbab016

cblaster: a remote search tool for rapid identification and visualization of homologous gene clusters

Cameron L M Gilchrist et al. Bioinform Adv. 2021.

. 2021 Aug 5;1(1):vbab016.

doi: 10.1093/bioadv/vbab016. eCollection 2021.

Authors

Cameron L M Gilchrist¹, Thomas J Booth¹, Bram van Wersch², Liana van Grieken², Marnix H Medema², Yit-Heng Chooi¹

Affiliations

¹ School of Molecular Sciences, The University of Western Australia, Crawley, WA 6009, Australia.
² Bioinformatics Group, Wageningen University, Wageningen 6708PB, The Netherlands.

PMID: 36700093
PMCID: PMC9710679
DOI: 10.1093/bioadv/vbab016

Abstract

Motivation: Genes involved in coordinated biological pathways, including metabolism, drug resistance and virulence, are often collocalized as gene clusters. Identifying homologous gene clusters aids in the study of their function and evolution, however, existing tools are limited to searching local sequence databases. Tools for remotely searching public databases are necessary to keep pace with the rapid growth of online genomic data.

Results: Here, we present cblaster, a Python-based tool to rapidly detect collocated genes in local and remote databases. cblaster is easy to use, offering both a command line and a user-friendly graphical user interface. It generates outputs that enable intuitive visualizations of large datasets and can be readily incorporated into larger bioinformatic pipelines. cblaster is a significant update to the comparative genomics toolbox.

Availability and implementation: cblaster source code and documentation is freely available from GitHub under the MIT license (github.com/gamcil/cblaster).

Supplementary information: Supplementary data are available at Bioinformatics Advances online.

PubMed Disclaimer

Figures

**Fig. 1.**
The cblaster search workflow. Input sequences are given either as a FASTA file or as a text file containing NCBI sequence accessions. They are then searched against the NCBI’s BLAST API or a local DIAMOND database, in remote (blue background) and local (green background) modes, respectively. BLAST hits are filtered according to user-defined quality thresholds. Genomic coordinates for each hit are retrieved from the IPG resource. Hits are grouped by their corresponding organism, scaffold and subjects. Finally, hit clusters are detected in each scaffold and results are summarized in output tables and visualizations

**Fig. 2.**
The cblaster GUI. Each panel represents a single cblaster module: cblaster search (top left) for performing searches against remote and local databases; cblaster gne (top right) for performing genomic neighbourhood estimation; cblaster makedb (bottom left) for building local databases from GenBank files and; cblaster extract (bottom right) for extracting FASTA files of specific groups of homologues

**Fig. 3.**
Interactive visualizations generated by cblaster. (a) Cluster heatmap visualization of cblaster search results: (1) heatmap colour bar indicating 0% (white) to 100% (blue) identity; (2) names of query sequences; (3) names of organism and scaffold locations of hit clusters; (4) dendrogram of hit clusters generated from their identity to query sequences; and (5) cell hover tooltip with detailed hit information including hyperlinks to genomic position on NCBI. (b) Gene neighbourhood estimation (GNE) visualization: (1) Plot of mean and median hit cluster sizes (bp) at different gap sizes; (2) Plot of total clusters at different gap sizes; (3) Hover tooltip showing values of mean and median cluster size (bp) and total clusters at a given gap size. (c) Visualization of gene clusters, inclusive of intermediate genes (grey colour), identified in (a) generated using the clinker tool via the plot_clusters module in cblaster

**Fig. 4.**
Application of cblaster to case studies in bacteria, plants and fungi. (a) A representative subset of the cblaster output in Case Study 1, highlighting the evolutionary relationships between rebeccamycin biosynthetic proteins (Reb) and other natural products. Structural features are highlighted according to conserved biosynthetic proteins (halogenases RebH and RebF in red, proteins involved in chromopyrrolic acid biosynthesis, RebR, RebP, RebC, RebD and RebO in green, glycosyltransferase, RebG, and methyltransferase, RebM, in yellow). The transporters RebT and RebU are shown in grey. (b) Genome neighbourhood estimation of plant triterpene BGCs. ‘Liminal’ region of plot shown in red, ‘stable’ region in green and upper limits in yellow. The accompanying total clusters plot (as in Figure 3b) is omitted from this figure, but follows the same pattern. (c) Using cblaster to piece together the burnettramic acids BGC in a fragmented *A.burnettii* genome

**Fig. 5.**
Time taken (s) to search query clusters from MIBiG containing differing numbers of genes against a database of *Aspergillus* genomes using MultiGeneBlast (blue circles) and cblaster (red triangles)

See this image and copyright information in PMC

Cited by

Marine bacteroidetes use a conserved enzymatic cascade to digest diatom β-mannan.
Beidler I, Robb CS, Vidal-Melgosa S, Zühlke MK, Bartosik D, Solanki V, Markert S, Becher D, Schweder T, Hehemann JH. Beidler I, et al. ISME J. 2023 Feb;17(2):276-285. doi: 10.1038/s41396-022-01342-4. Epub 2022 Nov 21. ISME J. 2023. PMID: 36411326 Free PMC article.
Trichoderma reesei Contains a Biosynthetic Gene Cluster That Encodes the Antifungal Agent Ilicicolin H.
Shenouda ML, Ambilika M, Cox RJ. Shenouda ML, et al. J Fungi (Basel). 2021 Dec 1;7(12):1034. doi: 10.3390/jof7121034. J Fungi (Basel). 2021. PMID: 34947016 Free PMC article.
Relation of pest insect-killing and soilborne pathogen-inhibition abilities to species diversification in environmental Pseudomonas protegens.
Garrido-Sanz D, Vesga P, Heiman CM, Altenried A, Keel C, Vacheron J. Garrido-Sanz D, et al. ISME J. 2023 Sep;17(9):1369-1381. doi: 10.1038/s41396-023-01451-8. Epub 2023 Jun 13. ISME J. 2023. PMID: 37311938 Free PMC article.
Mining for a New Class of Fungal Natural Products: The Evolution, Diversity, and Distribution of Isocyanide Synthase Biosynthetic Gene Clusters.
Nickles GR, Oestereicher B, Keller NP, Drott MT. Nickles GR, et al. bioRxiv [Preprint]. 2023 Apr 18:2023.04.17.537281. doi: 10.1101/2023.04.17.537281. bioRxiv. 2023. Update in: Nucleic Acids Res. 2023 Aug 11;51(14):7220-7235. doi: 10.1093/nar/gkad573. PMID: 37131656 Free PMC article. Updated. Preprint.
CASCADE-Cas3 enables highly efficient genome engineering in Streptomyces species.
Whitford CM, Gockel P, Faurdal D, Gren T, Sigrist R, Weber T. Whitford CM, et al. Nucleic Acids Res. 2025 Mar 20;53(6):gkaf214. doi: 10.1093/nar/gkaf214. Nucleic Acids Res. 2025. PMID: 40138716 Free PMC article.

See all "Cited by" articles

References

1. Blin K. et al. (2019) antiSMASH 5.0: updates to the secondary metabolite genome mining pipeline. Nucleic Acids Res., 47, W81–W87. - PMC - PubMed
1. Bostock M. et al. (2011) D data-driven documents. IEEE Trans. Visual. Comput. Graphics, 17, 2301–2309. - PubMed
1. Bradshaw R.E. et al. (2013) Fragmentation of an aflatoxin-like gene cluster in a forest pathogen. New Phytol., 198, 525–535. - PubMed
1. Buchfink B. et al. (2021) Sensitive protein alignments at tree-of-life scale using DIAMOND. Nat. Methods, 18, 366–368. - PMC - PubMed
1. Cacho R.A. et al. (2015) Next-generation sequencing approach for connecting secondary metabolites to biosynthetic gene clusters in fungi. Front. Microbiol., 5, 774. - PMC - PubMed

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

cblaster: a remote search tool for rapid identification and visualization of homologous gene clusters

Affiliations

cblaster: a remote search tool for rapid identification and visualization of homologous gene clusters

Authors

Affiliations

Abstract

Figures

Similar articles

Cited by

References

LinkOut - more resources

Full Text Sources