Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Nov 26;84(11):2795-2807.
doi: 10.1021/acs.jnatprod.1c00399. Epub 2021 Oct 18.

NPClassifier: A Deep Neural Network-Based Structural Classification Tool for Natural Products

Affiliations

NPClassifier: A Deep Neural Network-Based Structural Classification Tool for Natural Products

Hyun Woo Kim et al. J Nat Prod. .

Abstract

Computational approaches such as genome and metabolome mining are becoming essential to natural products (NPs) research. Consequently, a need exists for an automated structure-type classification system to handle the massive amounts of data appearing for NP structures. An ideal semantic ontology for the classification of NPs should go beyond the simple presence/absence of chemical substructures, but also include the taxonomy of the producing organism, the nature of the biosynthetic pathway, and/or their biological properties. Thus, a holistic and automatic NP classification framework could have considerable value to comprehensively navigate the relatedness of NPs, and especially so when analyzing large numbers of NPs. Here, we introduce NPClassifier, a deep-learning tool for the automated structural classification of NPs from their counted Morgan fingerprints. NPClassifier is expected to accelerate and enhance NP discovery by linking NP structures to their underlying properties.

PubMed Disclaimer

Conflict of interest statement

The authors declare the following competing financial interest(s): Garrison W. Cottrell, and William H. Gerwick are the cofounders of NMR Finder LLC. Mingxun Wang is the founder of Ometa Laboratories LLC.

Figures

Figure 1
Figure 1
Structures of typical (cedrone) and highly modified (cipadonoid B and quivisianone) limonoids.
Figure 2
Figure 2
Overview of NPClassifier. (A) In the data preparation stage, compound names and their class information were collected from the literature. The compound names were converted to chemical fingerprints, and class information was assigned based on the NPClassifier ontology. During the training phase, molecular fingerprints were input to a deep neural network. Binary cross-entropy loss was calculated by comparison between the prediction result from the sigmoid outputs and the ground truth and back-propagated to adjust the model parameters. In classification, a submitted chemical structure is classified by NPClassifier at three levels, including Pathway, Superclass, and Class. (B) Classification result of a highly modified limonoid, cipadonoid B, by NPClassifier and ClassyFire. NPClassifier returns the classification result with three category levels including Pathway, Superclass, and Class, which are based on the semantic knowledge of natural product research.
Figure 3
Figure 3
Example of the classification ontology of NPClassifier. (A) Amino acids–peptides Pathway and its Superclasses and Classes in the NPClassifier classification system. This Pathway contains 12 Superclasses and 51 Classes. (B) The macrolides Superclass is involved in both polyketides and amino acids–peptides Pathways. (C) The peptide alkaloids Superclass and its Classes belong to both alkaloids and amino acids–peptides Pathways.
Figure 4
Figure 4
Chemical descriptor and the deep learning architecture of NPClassifier. (A) Illustration of the difference between Morgan fingerprints and counted Morgan fingerprints; the latter was used in this application. Morgan fingerprints are generally presented in a binary data format over all radii. Alternatively, the counted Morgan fingerprints have an integer format reflecting the count of atomic substructures. (B) Illustration of the structure of the neural network used for NPClassifier. Three different networks were trained: one for each level of classification in NPClassifier. The same structure was used for all three networks with just the top layers differing as a result of the number of alternatives for each level, as indicated in the legend.
Figure 5
Figure 5
Comparison of the classification results from NPClassifier (blue) and ClassyFire (orange); overlap is shown in brown. Chemical entities (n = 6200, 100 chemical entities for each of 62 classes) were analyzed by NPClassifier and ClassyFire, and the classification accuracy was measured. Classes are numbered around the circumference of the circle, while the ratio of correct predictions to total predictions ranging from 0 to 100 is denoted by the scale across the radius. NPClassifier showed better results for 47 classes and equal or slightly worse results for 15 classes compared with ClassyFire.
Figure 6
Figure 6
Examples of the correlations between structural modifications and classification results. (A) Ester bonds of a cyclic depsipeptide were sequentially replaced with amide bonds, and the classification result changed from cyclic peptide and depsipeptides to cyclic peptides. (B) Correlations between the modification of the C-ring substituents in flavonoids and the resulting classifications.
Figure 7
Figure 7
Incorrectly classified structures and five categories with low F1 scores in the test set.
Figure 8
Figure 8
Application of NPClassifier to natural products research and drug discovery. (A) NPClassifier analysis of the diversity of metabolites and BGCs from bacteria and fungi (see text for more details). (B) Distribution of PKS-derived metabolites from bacteria and fungi. (C) The results of in silico antimalarial screening of NP Atlas using the MAIP tool (upper) and the analysis of these results using NPClassifier (lower). The level of predicted antimalarial activity is colored red for active and blue for inactive. (D) Spirotetronate macrolides with high (decalin containing) and low (non-decalin containing) MAIP scores present in the NP Atlas database.

References

    1. Lachance H.; Wetzel S.; Kumar K.; Waldmann H. J. Med. Chem. 2012, 55, 5989–6001. 10.1021/jm300288g. - DOI - PubMed
    1. Grisoni F.; Merk D.; Consonni V.; Hiss J. A.; Tagliabue S. G.; Todeschini R.; Schneider G. Commun. Chem. 2018, 1, 44.10.1038/s42004-018-0043-x. - DOI
    1. Wu M. C.; Law B.; Wilkinson B.; Micklefield J. Curr. Opin. Biotechnol. 2012, 23, 931–40. 10.1016/j.copbio.2012.03.008. - DOI - PubMed
    1. Reymond J. L.; Awale M. ACS Chem. Neurosci. 2012, 3, 649–657. 10.1021/cn3000422. - DOI - PMC - PubMed
    1. Saldivar-Gonzalez F. I.; Lenci E.; Trabocchi A.; Medina-Franco J. L. RSC Adv. 2019, 9, 27105–27116. 10.1039/C9RA04841C. - DOI - PMC - PubMed

Publication types

MeSH terms

Substances