Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Aug 5;1(1):vbab016.
doi: 10.1093/bioadv/vbab016. eCollection 2021.

cblaster: a remote search tool for rapid identification and visualization of homologous gene clusters

Affiliations

cblaster: a remote search tool for rapid identification and visualization of homologous gene clusters

Cameron L M Gilchrist et al. Bioinform Adv. .

Abstract

Motivation: Genes involved in coordinated biological pathways, including metabolism, drug resistance and virulence, are often collocalized as gene clusters. Identifying homologous gene clusters aids in the study of their function and evolution, however, existing tools are limited to searching local sequence databases. Tools for remotely searching public databases are necessary to keep pace with the rapid growth of online genomic data.

Results: Here, we present cblaster, a Python-based tool to rapidly detect collocated genes in local and remote databases. cblaster is easy to use, offering both a command line and a user-friendly graphical user interface. It generates outputs that enable intuitive visualizations of large datasets and can be readily incorporated into larger bioinformatic pipelines. cblaster is a significant update to the comparative genomics toolbox.

Availability and implementation: cblaster source code and documentation is freely available from GitHub under the MIT license (github.com/gamcil/cblaster).

Supplementary information: Supplementary data are available at Bioinformatics Advances online.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
The cblaster search workflow. Input sequences are given either as a FASTA file or as a text file containing NCBI sequence accessions. They are then searched against the NCBI’s BLAST API or a local DIAMOND database, in remote (blue background) and local (green background) modes, respectively. BLAST hits are filtered according to user-defined quality thresholds. Genomic coordinates for each hit are retrieved from the IPG resource. Hits are grouped by their corresponding organism, scaffold and subjects. Finally, hit clusters are detected in each scaffold and results are summarized in output tables and visualizations
Fig. 2.
Fig. 2.
The cblaster GUI. Each panel represents a single cblaster module: cblaster search (top left) for performing searches against remote and local databases; cblaster gne (top right) for performing genomic neighbourhood estimation; cblaster makedb (bottom left) for building local databases from GenBank files and; cblaster extract (bottom right) for extracting FASTA files of specific groups of homologues
Fig. 3.
Fig. 3.
Interactive visualizations generated by cblaster. (a) Cluster heatmap visualization of cblaster search results: (1) heatmap colour bar indicating 0% (white) to 100% (blue) identity; (2) names of query sequences; (3) names of organism and scaffold locations of hit clusters; (4) dendrogram of hit clusters generated from their identity to query sequences; and (5) cell hover tooltip with detailed hit information including hyperlinks to genomic position on NCBI. (b) Gene neighbourhood estimation (GNE) visualization: (1) Plot of mean and median hit cluster sizes (bp) at different gap sizes; (2) Plot of total clusters at different gap sizes; (3) Hover tooltip showing values of mean and median cluster size (bp) and total clusters at a given gap size. (c) Visualization of gene clusters, inclusive of intermediate genes (grey colour), identified in (a) generated using the clinker tool via the plot_clusters module in cblaster
Fig. 4.
Fig. 4.
Application of cblaster to case studies in bacteria, plants and fungi. (a) A representative subset of the cblaster output in Case Study 1, highlighting the evolutionary relationships between rebeccamycin biosynthetic proteins (Reb) and other natural products. Structural features are highlighted according to conserved biosynthetic proteins (halogenases RebH and RebF in red, proteins involved in chromopyrrolic acid biosynthesis, RebR, RebP, RebC, RebD and RebO in green, glycosyltransferase, RebG, and methyltransferase, RebM, in yellow). The transporters RebT and RebU are shown in grey. (b) Genome neighbourhood estimation of plant triterpene BGCs. ‘Liminal’ region of plot shown in red, ‘stable’ region in green and upper limits in yellow. The accompanying total clusters plot (as in Figure 3b) is omitted from this figure, but follows the same pattern. (c) Using cblaster to piece together the burnettramic acids BGC in a fragmented A.burnettii genome
Fig. 5.
Fig. 5.
Time taken (s) to search query clusters from MIBiG containing differing numbers of genes against a database of Aspergillus genomes using MultiGeneBlast (blue circles) and cblaster (red triangles)

Similar articles

Cited by

References

    1. Blin K. et al. (2019) antiSMASH 5.0: updates to the secondary metabolite genome mining pipeline. Nucleic Acids Res., 47, W81–W87. - PMC - PubMed
    1. Bostock M. et al. (2011) D data-driven documents. IEEE Trans. Visual. Comput. Graphics, 17, 2301–2309. - PubMed
    1. Bradshaw R.E. et al. (2013) Fragmentation of an aflatoxin-like gene cluster in a forest pathogen. New Phytol., 198, 525–535. - PubMed
    1. Buchfink B. et al. (2021) Sensitive protein alignments at tree-of-life scale using DIAMOND. Nat. Methods, 18, 366–368. - PMC - PubMed
    1. Cacho R.A. et al. (2015) Next-generation sequencing approach for connecting secondary metabolites to biosynthetic gene clusters in fungi. Front. Microbiol., 5, 774. - PMC - PubMed