Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2017 Jul 3;45(W1):W55-W63.
doi: 10.1093/nar/gkx305.

plantiSMASH: automated identification, annotation and expression analysis of plant biosynthetic gene clusters

Affiliations

plantiSMASH: automated identification, annotation and expression analysis of plant biosynthetic gene clusters

Satria A Kautsar et al. Nucleic Acids Res. .

Abstract

Plant specialized metabolites are chemically highly diverse, play key roles in host-microbe interactions, have important nutritional value in crops and are frequently applied as medicines. It has recently become clear that plant biosynthetic pathway-encoding genes are sometimes densely clustered in specific genomic loci: biosynthetic gene clusters (BGCs). Here, we introduce plantiSMASH, a versatile online analysis platform that automates the identification of candidate plant BGCs. Moreover, it allows integration of transcriptomic data to prioritize candidate BGCs based on the coexpression patterns of predicted biosynthetic enzyme-coding genes, and facilitates comparative genomic analysis to study the evolutionary conservation of each cluster. Applied on 48 high-quality plant genomes, plantiSMASH identifies a rich diversity of candidate plant BGCs. These results will guide further experimental exploration of the nature and dynamics of gene clustering in plant metabolism. Moreover, spurred by the continuing decrease in costs of plant genome sequencing, they will allow genome mining technologies to be applied to plant natural product discovery. The plantiSMASH web server, precalculated results and source code are freely available from http://plantismash.secondarymetabolites.org.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
General strategy followed by plantiSMASH for the identification of plant BGCs. First, plantiSMASH identifies biosynthetic genes (having a hit on one of the 62 pHMMs) that are located in close proximity to each other. Subsequently, it will look for the co-occurrence of at least three biosynthetic enzyme-coding genes, comprising at least two different enzyme types. (Based on the results of the CD-HIT clustering of encoded protein sequences, closely related duplicate genes will only be counted once). Afterward, identified clusters are extended to incorporate any flanking genes. Finally, each cluster is classified based on the presence of core enzymes (see Supplementary Table S1). In this example, the detected cluster is assigned to the ‘Terpene’ class due to the presence of a terpene synthase-encoding gene.
Figure 2.
Figure 2.
Outputs generated by the plantiSMASH pipeline. The figure illustrates several visualized outputs generated by plantiSMASH, as they appear for various biosynthetic gene clusters of known natural products. (A) Visual overview generated for each gene cluster; in this case, the tirucalladienol cluster from Arabidopsis thaliana (47) is shown. Gene annotations and pHMM hit details appear on mouse click. Also, ClusterBlast output showing alignment of homologous genomic loci across other genomes of related species is provided. (B) Example of a gene expression heat map, showing coexpression among the core genes of the marneral BGC from A. thaliana (48) (and not with the flanking genes). (C) Hive plot on the overview page, which highlights pairs of candidate BGCs which show many coexpression correlations between their genes; in this example view, the coexpression links between the two loci encoding α-tomatine biosynthesis in Solanum lycopersicum (20) are highlighted (clusters 31 and 44). (D) Example ego network that summarizes coexpression correlations between members of the α-tomatine gene (cluster 44), as well as with genes in other gene clusters (including the other α-tomatine biosynthetic locus, cluster 31) and with genes elsewhere on the genome.
Figure 3.
Figure 3.
Numbers of candidate BGCs identified across the Plant Kingdom. (A) PlantiSMASH BGC predictions plotted onto a phylogenetic tree of plant species for which chromosome-level genome assemblies are available. The blue bars indicate the number of candidate BGCs per genome, the red bars indicate the most complex candidate BGC identified in each species (in terms of the number of unique enzymes encoded, as defined by CD-HIT groups). (B) Number of candidate BGCs plotted versus the total number of genes; as expected, more BGCs are found in larger genomes. Outliers represent genomes that have recently undergone whole-genome duplication, and the moss Physcomitrella patens, in the genome of which only a very low number of candidate BGCs is found. (C) Number of candidate BGCs plotted versus the number of genes with pHMM hits to biosynthetic domains. (D) Number of genes with biosynthetic domains plotted against the total number of genes; a linear correspondence is largely observed.
Figure 4.
Figure 4.
Example candidate BGCs identified by plantiSMASH. Five example candidate BGCs are shown, which cover a diverse range of enzymatic classes. Dozens of candidate BGCs of comparable complexity can be found across the precomputed plantiSMASH results that are available online.

References

    1. Jensen P.R. Natural products and the gene cluster revolution. Trends Microbiol. 2016; 24:968–977. - PMC - PubMed
    1. Medema M.H., Fischbach M.A.. Computational approaches to natural product discovery. Nat. Chem. Biol. 2015; 11:639–648. - PMC - PubMed
    1. Rutledge P.J., Challis G.L.. Discovery of microbial natural products by activation of silent biosynthetic gene clusters. Nat. Rev. Microbiol. 2015; 13:509–523. - PubMed
    1. Ziemert N., Alanjary M., Weber T.. The evolution of genome mining in microbes - a review. Nat. Prod. Rep. 2016; 33:988–1005. - PubMed
    1. Medema M.H., Blin K., Cimermancic P., de Jager V., Zakrzewski P., Fischbach M.A., Weber T., Takano E., Breitling R.. antiSMASH: rapid identification, annotation and analysis of secondary metabolite biosynthesis gene clusters in bacterial and fungal genome sequences. Nucleic Acids Res. 2011; 39:W339–W346. - PMC - PubMed

Publication types