Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Dec;49(11-12):681-695.
doi: 10.1007/s10886-023-01452-z. Epub 2023 Oct 2.

A Multi-Label Learning Framework for Predicting Chemical Classes and Biological Activities of Natural Products from Biosynthetic Gene Clusters

Affiliations

A Multi-Label Learning Framework for Predicting Chemical Classes and Biological Activities of Natural Products from Biosynthetic Gene Clusters

Suyu Mei. J Chem Ecol. 2023 Dec.

Abstract

Natural products (NP) or secondary metabolites, as a class of small chemical molecules that are naturally synthesized by chromosomally clustered biosynthesis genes (also called biosynthetic gene clusters, BGCs) encoded enzymes or enzyme complexes, mediates the bioecological interactions between host and microbiota and provides a natural reservoir for screening drug-like therapeutic pharmaceuticals. In this work, we propose a multi-label learning framework to functionally annotate natural products or secondary metabolites solely from their catalytical biosynthetic gene clusters without experimentally conducting NP structural resolutions. All chemical classes and bioactivities constitute the label space, and the sequence domains of biosynthetic gene clusters that catalyse the biosynthesis of natural products constitute the feature space. In this multi-label learning framework, a joint representation of features (BGCs domains) and labels (natural products annotations) is efficiently learnt in an integral and low-dimensional space to accurately define the inter-class boundaries and scale to the learning problem of many imbalanced labels. Computational results on experimental data show that the proposed framework achieves satisfactory multi-label learning performance, and the learnt patterns of BGCs domains are transferrable across bacteria, or even across kingdom, for instance, from bacteria to Arabidopsis thaliana. Lastly, take Arabidopsis thaliana and its rhizosphere microbiome for example, we propose a pipeline combining existing BGCs identification tools and this proposed framework to find and functionally annotate novel natural products for downstream bioecological studies in terms of plant-microbiota-soil interactions and plant environmental adaption.

Keywords: Biosynthetic gene clusters; Machine learning; Multi-label learning; Natural products; Plant and soil microbiome; Synthetic biology; Transfer learning.

PubMed Disclaimer

Similar articles

Cited by

References

    1. Aghdam SA, Brown AMV (2021) Deep learning approaches for natural product discovery from plant endophytic microbiomes. Environ Microbiome 16:6 - PubMed - PMC - DOI
    1. Alam K, Hao J, Zhang Y, Li A (2021) Synthetic biology-inspired strategies and tools for engineering of microbial natural product biosynthetic pathways. Biotechnol Adv 49:07759 - DOI
    1. Atanasov AG, Zotchev SB, Dirsch VM (2021) Natural products in drug discovery: advances and opportunities. Nat Rev Drug Discov 28:1–17
    1. Begani J, Lakhani J, Harwani D (2018) Current strategies to induce secondary metabolites from microbial biosynthetic cryptic gene clusters. Annals Microbiol 68:419–432 - DOI
    1. Blin K, Medema MH, Kazempour D, Fischbach MA, Breitling R, Takano E, Weber T (2013) antiSMASH 2.0–a versatile platform for genome mining of secondary metabolite producers. Nucleic Acids Res 41(Web Server issue):W204-12 - PubMed - PMC - DOI

Substances

LinkOut - more resources