Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 May 3;24(1):181.
doi: 10.1186/s12859-023-05311-2.

CAGECAT: The CompArative GEne Cluster Analysis Toolbox for rapid search and visualisation of homologous gene clusters

Affiliations

CAGECAT: The CompArative GEne Cluster Analysis Toolbox for rapid search and visualisation of homologous gene clusters

Matthias van den Belt et al. BMC Bioinformatics. .

Abstract

Background: Co-localized sets of genes that encode specialized functions are common across microbial genomes and occur in genomes of larger eukaryotes as well. Important examples include Biosynthetic Gene Clusters (BGCs) that produce specialized metabolites with medicinal, agricultural, and industrial value (e.g. antimicrobials). Comparative analysis of BGCs can aid in the discovery of novel metabolites by highlighting distribution and identifying variants in public genomes. Unfortunately, gene-cluster-level homology detection remains inaccessible, time-consuming and difficult to interpret.

Results: The comparative gene cluster analysis toolbox (CAGECAT) is a rapid and user-friendly platform to mitigate difficulties in comparative analysis of whole gene clusters. The software provides homology searches and downstream analyses without the need for command-line or programming expertise. By leveraging remote BLAST databases, which always provide up-to-date results, CAGECAT can yield relevant matches that aid in the comparison, taxonomic distribution, or evolution of an unknown query. The service is extensible and interoperable and implements the cblaster and clinker pipelines to perform homology search, filtering, gene neighbourhood estimation, and dynamic visualisation of resulting variant BGCs. With the visualisation module, publication-quality figures can be customized directly from a web-browser, which greatly accelerates their interpretation via informative overlays to identify conserved genes in a BGC query.

Conclusion: Overall, CAGECAT is an extensible software that can be interfaced via a standard web-browser for whole region homology searches and comparison on continually updated genomes from NCBI. The public web server and installable docker image are open source and freely available without registration at: https://cagecat.bioinformatics.nl .

Keywords: Biosynthetic; Colocalized; Comparative analysis; Gene cluster; Homology search; Secondary metabolite.

PubMed Disclaimer

Conflict of interest statement

MHM is a co-founder of Design Pharmaceuticals and a member of the scientific advisory board of Hexagon Bio. All other authors have no conflict of interest.

Figures

Fig. 1
Fig. 1
Interoperability scheme of implemented functionality on CAGECAT. Blue outlined rectangles indicate entry points. Arrows indicate available downstream analyses from a module. Currently, a cblaster search/recompute job can be used for every downstream module, excluding a recompute job from being recomputed again. The clinker tool has no downstream analyses. For example, a possible workflow could be: cblaster search to cblaster recompute to cblaster plot clusters to selective clinker visualisation. This allows for fine-grained control of relevant matches for final visualisation and greatly improves user processing time
Fig. 2
Fig. 2
Example output of CAGECAT’s entry point. Both modules create an interactive HTML visualisation which is displayed on each output page. A cblaster search: hit clusters are shown in a dendrogram (based on identity to query sequences). A darker tint of blue resembles a higher percentage identity of the query in the output cluster; B clinker visualisation: genes within a gene cluster are color-coordinated. Similar genes found in multiple clusters have links drawn between and are shaded based on sequence identity
Fig. 3
Fig. 3
Post-job execution screen for selective downstream analysis. 1: buttons to download results and save the current webpage to the browsers bookmark.; 2: available downstream analyses for the current analysis. Selected clusters and/or queries are temporarily saved when navigating to a downstream module; 3: manual selection of clusters for downstream analyses. Clusters/queries can be selected by moving them to the selected field using shown buttons. Available for cblaster search, recompute and plot clusters modules

Similar articles

Cited by

References

    1. Laich F, Fierro F, Cardoza RE, Martin JF. Organization of the gene cluster for biosynthesis of penicillin in Penicillium nalgiovense and antibiotic production in cured dry sausages. Appl Environ Microbiol. 1999;65:1236–1240. doi: 10.1128/AEM.65.3.1236-1240.1999. - DOI - PMC - PubMed
    1. Medema MH, Fischbach MA. Computational approaches to natural product discovery. Nat Chem Biol. 2015;11:639–648. doi: 10.1038/nchembio.1884. - DOI - PMC - PubMed
    1. Crits-Christoph A, Bhattacharya N, Olm MR, Song YS, Banfield JF. Transporter genes in biosynthetic gene clusters predict metabolite characteristics and siderophore activity. Genome Res. 2020 doi: 10.1101/gr.268169.120. - DOI - PMC - PubMed
    1. Cimermancic P, Medema MH, Claesen J, Kurita K, Wieland Brown LC, Mavrommatis K, Pati A, Godfrey PA, Koehrsen M, Clardy J, et al. Insights into secondary metabolism from a global analysis of prokaryotic biosynthetic gene clusters. Cell. 2014;158:412–421. doi: 10.1016/j.cell.2014.06.034. - DOI - PMC - PubMed
    1. Skinnider MA, Merwin NJ, Johnston CW, Magarvey NA. PRISM 3: expanded prediction of natural product chemical structures from microbial genomes. Nucleic Acids Res. 2017;45:W49–W54. doi: 10.1093/nar/gkx320. - DOI - PMC - PubMed