. 2021 Nov 26;84(11):2795-2807.

doi: 10.1021/acs.jnatprod.1c00399. Epub 2021 Oct 18.

NPClassifier: A Deep Neural Network-Based Structural Classification Tool for Natural Products

Affiliations

¹ Center for Marine Biotechnology and Biomedicine, Scripps Institution of Oceanography, University of California San Diego, La Jolla, California 92093, United States.
² Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, California 92093, United States.
³ Ometa Laboratories LLC, San Diego, California 92121, United States.
⁴ Institute of Pharmacy Martin-Luther-University Halle-Wittenberg, Universitätsplatz 10, 06108 Halle (Saale), Germany.
⁵ Research Institute of Pharmaceutical Sciences, College of Pharmacy, Sookmyung Women's University, Seoul 04310, Korea.
⁶ Bioinformatics Group, Wageningen University, Wageningen 6700, The Netherlands.
⁷ Department of Computer Science and Engineering, University of California, San Diego, La Jolla, California 92093, United States.

PMID: 34662515
PMCID: PMC8631337
DOI: 10.1021/acs.jnatprod.1c00399

NPClassifier: A Deep Neural Network-Based Structural Classification Tool for Natural Products

Hyun Woo Kim et al. J Nat Prod. 2021.

. 2021 Nov 26;84(11):2795-2807.

doi: 10.1021/acs.jnatprod.1c00399. Epub 2021 Oct 18.

Authors

Affiliations

¹ Center for Marine Biotechnology and Biomedicine, Scripps Institution of Oceanography, University of California San Diego, La Jolla, California 92093, United States.
² Skaggs School of Pharmacy and Pharmaceutical Sciences, University of California San Diego, La Jolla, California 92093, United States.
³ Ometa Laboratories LLC, San Diego, California 92121, United States.
⁴ Institute of Pharmacy Martin-Luther-University Halle-Wittenberg, Universitätsplatz 10, 06108 Halle (Saale), Germany.
⁵ Research Institute of Pharmaceutical Sciences, College of Pharmacy, Sookmyung Women's University, Seoul 04310, Korea.
⁶ Bioinformatics Group, Wageningen University, Wageningen 6700, The Netherlands.
⁷ Department of Computer Science and Engineering, University of California, San Diego, La Jolla, California 92093, United States.

PMID: 34662515
PMCID: PMC8631337
DOI: 10.1021/acs.jnatprod.1c00399

Abstract

Computational approaches such as genome and metabolome mining are becoming essential to natural products (NPs) research. Consequently, a need exists for an automated structure-type classification system to handle the massive amounts of data appearing for NP structures. An ideal semantic ontology for the classification of NPs should go beyond the simple presence/absence of chemical substructures, but also include the taxonomy of the producing organism, the nature of the biosynthetic pathway, and/or their biological properties. Thus, a holistic and automatic NP classification framework could have considerable value to comprehensively navigate the relatedness of NPs, and especially so when analyzing large numbers of NPs. Here, we introduce NPClassifier, a deep-learning tool for the automated structural classification of NPs from their counted Morgan fingerprints. NPClassifier is expected to accelerate and enhance NP discovery by linking NP structures to their underlying properties.

PubMed Disclaimer

Conflict of interest statement

The authors declare the following competing financial interest(s): Garrison W. Cottrell, and William H. Gerwick are the cofounders of NMR Finder LLC. Mingxun Wang is the founder of Ometa Laboratories LLC.

Figures

**Figure 1**
Structures of typical (cedrone) and highly modified (cipadonoid B and quivisianone) limonoids.

**Figure 2**
Overview of NPClassifier. (A) In the data preparation stage, compound names and their class information were collected from the literature. The compound names were converted to chemical fingerprints, and class information was assigned based on the NPClassifier ontology. During the training phase, molecular fingerprints were input to a deep neural network. Binary cross-entropy loss was calculated by comparison between the prediction result from the sigmoid outputs and the ground truth and back-propagated to adjust the model parameters. In classification, a submitted chemical structure is classified by NPClassifier at three levels, including Pathway, Superclass, and Class. (B) Classification result of a highly modified limonoid, cipadonoid B, by NPClassifier and ClassyFire. NPClassifier returns the classification result with three category levels including Pathway, Superclass, and Class, which are based on the semantic knowledge of natural product research.

**Figure 3**
Example of the classification ontology of NPClassifier. (A) Amino acids–peptides Pathway and its Superclasses and Classes in the NPClassifier classification system. This Pathway contains 12 Superclasses and 51 Classes. (B) The macrolides Superclass is involved in both polyketides and amino acids–peptides Pathways. (C) The peptide alkaloids Superclass and its Classes belong to both alkaloids and amino acids–peptides Pathways.

**Figure 4**
Chemical descriptor and the deep learning architecture of NPClassifier. (A) Illustration of the difference between Morgan fingerprints and counted Morgan fingerprints; the latter was used in this application. Morgan fingerprints are generally presented in a binary data format over all radii. Alternatively, the counted Morgan fingerprints have an integer format reflecting the count of atomic substructures. (B) Illustration of the structure of the neural network used for NPClassifier. Three different networks were trained: one for each level of classification in NPClassifier. The same structure was used for all three networks with just the top layers differing as a result of the number of alternatives for each level, as indicated in the legend.

**Figure 5**
Comparison of the classification results from NPClassifier (blue) and ClassyFire (orange); overlap is shown in brown. Chemical entities (n = 6200, 100 chemical entities for each of 62 classes) were analyzed by NPClassifier and ClassyFire, and the classification accuracy was measured. Classes are numbered around the circumference of the circle, while the ratio of correct predictions to total predictions ranging from 0 to 100 is denoted by the scale across the radius. NPClassifier showed better results for 47 classes and equal or slightly worse results for 15 classes compared with ClassyFire.

**Figure 6**
Examples of the correlations between structural modifications and classification results. (A) Ester bonds of a cyclic depsipeptide were sequentially replaced with amide bonds, and the classification result changed from cyclic peptide and depsipeptides to cyclic peptides. (B) Correlations between the modification of the C-ring substituents in flavonoids and the resulting classifications.

**Figure 7**
Incorrectly classified structures and five categories with low F1 scores in the test set.

**Figure 8**
Application of NPClassifier to natural products research and drug discovery. (A) NPClassifier analysis of the diversity of metabolites and BGCs from bacteria and fungi (see text for more details). (B) Distribution of PKS-derived metabolites from bacteria and fungi. (C) The results of *in silico* antimalarial screening of NP Atlas using the MAIP tool (upper) and the analysis of these results using NPClassifier (lower). The level of predicted antimalarial activity is colored red for active and blue for inactive. (D) Spirotetronate macrolides with high (decalin containing) and low (non-decalin containing) MAIP scores present in the NP Atlas database.

See this image and copyright information in PMC

References

1. Lachance H.; Wetzel S.; Kumar K.; Waldmann H. J. Med. Chem. 2012, 55, 5989–6001. 10.1021/jm300288g. - DOI - PubMed
1. Grisoni F.; Merk D.; Consonni V.; Hiss J. A.; Tagliabue S. G.; Todeschini R.; Schneider G. Commun. Chem. 2018, 1, 44.10.1038/s42004-018-0043-x. - DOI
1. Wu M. C.; Law B.; Wilkinson B.; Micklefield J. Curr. Opin. Biotechnol. 2012, 23, 931–40. 10.1016/j.copbio.2012.03.008. - DOI - PubMed
1. Reymond J. L.; Awale M. ACS Chem. Neurosci. 2012, 3, 649–657. 10.1021/cn3000422. - DOI - PMC - PubMed
1. Saldivar-Gonzalez F. I.; Lenci E.; Trabocchi A.; Medina-Franco J. L. RSC Adv. 2019, 9, 27105–27116. 10.1039/C9RA04841C. - DOI - PMC - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions

Substances

Actions

Grants and funding

R01 GM107550/GM/NIGMS NIH HHS/United States

LinkOut - more resources

Full Text Sources
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

NPClassifier: A Deep Neural Network-Based Structural Classification Tool for Natural Products

Affiliations

NPClassifier: A Deep Neural Network-Based Structural Classification Tool for Natural Products

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Miscellaneous