Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Mar 4;142(9):4114-4120.
doi: 10.1021/jacs.9b13786. Epub 2020 Feb 21.

A Convolutional Neural Network-Based Approach for the Rapid Annotation of Molecularly Diverse Natural Products

Affiliations

A Convolutional Neural Network-Based Approach for the Rapid Annotation of Molecularly Diverse Natural Products

Raphael Reher et al. J Am Chem Soc. .

Abstract

This report describes the first application of the novel NMR-based machine learning tool "Small Molecule Accurate Recognition Technology" (SMART 2.0) for mixture analysis and subsequent accelerated discovery and characterization of new natural products. The concept was applied to the extract of a filamentous marine cyanobacterium known to be a prolific producer of cytotoxic natural products. This environmental Symploca extract was roughly fractionated, and then prioritized and guided by cancer cell cytotoxicity, NMR-based SMART 2.0, and MS2-based molecular networking. This led to the isolation and rapid identification of a new chimeric swinholide-like macrolide, symplocolide A, as well as the annotation of swinholide A, samholides A-I, and several new derivatives. The planar structure of symplocolide A was confirmed to be a structural hybrid between swinholide A and luminaolide B by 1D/2D NMR and LC-MS2 analysis. A second example applies SMART 2.0 to the characterization of structurally novel cyclic peptides, and compares this approach to the recently appearing "atomic sort" method. This study exemplifies the revolutionary potential of combined traditional and deep learning-assisted analytical approaches to overcome longstanding challenges in natural products drug discovery.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest. Chen Zhang, Garrison W. Cottrell, and William H. Gerwick are the cofounders of NMR Finder LLC. Mingxun Wang is the founder of Ometa Labs LLC.

Figures

Figure 1.
Figure 1.
(a,h) Cytotoxicity assay against H460 lung cancer cells reveals VLC fraction H (a) and subfractions H4–H6 (h), respectively, as the most potent at 1 and 10 μg/mL (see also Figure S3). (b) Digitized HSQC spectrum of most cytotoxic fraction H. (c) t-SNE embedding of the HSQC spectrum into the 180 dimensional cluster space of 53,076 nodes (reduced here to 10,000 for clarity) representing natural products trained on the basis of their 1H-13C-HSQC spectra. Swinholide A and its closest neighbors are highlighted (purple nodes, black rectangle). (d) SMART 2.0 results (top12 structures based on cosine similarity score) of fraction H suggests that it contains macrolides from the swinholide class. (e) Molecular networking analysis of cyanobacterial subfractions H3–H7. Highlighted cluster with putatively new m/z feature 1395.9 (H4, green nodes). (f) MS2 analysis of swinholide A and comparison to (g) the feature detected as new m/z 1395.9 (symplocolide A) show fragmentation patterns that reveal structural insights (same fragment highlighted in green, distinctive fragment highlighted in blue) for both compounds.
Figure 2.
Figure 2.
Comparison of the ‘atomic sort’ method and SMART 2.0 to detect novel/rare structural elements. (a) HSQC correlations suggesting structural novelty by ‘atomic sort’ method (highlighted in blue). (b) Top two results of SMART 2.0 analysis querying cyclomarin A (experimental HSQC data from reference) to detect related compounds that include rare structural moieties (highlighted in red, purple).
Scheme 1.
Scheme 1.
Structures of symplocolide A (1), luminaolide (2), luminaolide B (3), swinholide A (4), and samholides A-I (5–13). All compounds except 2 and 3 were detected in this study.

References

    1. Gerwick WH; Moore BS Lessons from the Past and Charting the Future of Marine Natural Products Drug Discovery and Chemical Biology. Chemistry & Biology 2012, 19 (1), 85–98. 10.1016/j.chembiol.2011.12.014. - DOI - PMC - PubMed
    1. Newman DJ; Cragg GM Natural Products as Sources of New Drugs from 1981 to 2014. J. Nat.l Prod 2016, 79 (3), 629–661. 10.1021/acs.jnatprod.5b01055. - DOI - PubMed
    1. Pereira F Have Marine Natural Product Drug Discovery Efforts Been Productive and How Can We Improve Their Efficiency? Expert Opin. Drug Discov 2019, 14 (8), 717–722. 10.1080/17460441.2019.1604675. - DOI - PubMed
    1. Carroll AR; Copp BR; Davis RA; Keyzers RA; Prinsep MR Marine Natural Products. Nat. Prod. Rep 2019, 36 (1), 122–173. 10.1039/C8NP00092A. - DOI - PubMed
    1. Vijayakumar S; Menakha M Pharmaceutical Applications of Cyanobacteria—A Review. J. Acute Med 2015, 5 (1), 15–23. 10.1016/j.jacme.2015.02.004. - DOI

Publication types

MeSH terms