Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2021 Nov 17;38(11):2066-2082.
doi: 10.1039/d1np00040c.

Advancements in capturing and mining mass spectrometry data are transforming natural products research

Affiliations
Review

Advancements in capturing and mining mass spectrometry data are transforming natural products research

Scott A Jarmusch et al. Nat Prod Rep. .

Abstract

Covering: 2016 up to 2021Mass spectrometry (MS) is an essential technology in natural products research with MS fragmentation (MS/MS) approaches becoming a key tool. Recent advancements in MS yield dense metabolomics datasets which have been, conventionally, used by individual labs for individual projects; however, a shift is brewing. The movement towards open MS data (and other structural characterization data) and accessible data mining tools is emerging in natural products research. Over the past 5 years, this movement has rapidly expanded and evolved with no slowdown in sight; the capabilities of today vastly exceed those of 5 years ago. Herein, we address the analysis of individual datasets, a situation we are calling the '2021 status quo', and the emergent framework to systematically capture sample information (metadata) and perform repository-scale analyses. We evaluate public data deposition, discuss the challenges of working in the repository scale, highlight the challenges of metadata capture and provide illustrative examples of the power of utilizing repository data and the tools that enable it. We conclude that the advancements in MS data collection must be met with advancements in how we utilize data; therefore, we argue that open data and data mining is the next evolution in obtaining the maximum potential in natural products research.

PubMed Disclaimer

Figures

Fig 1.
Fig 1.
GNPS molecular networking connected deoxyphorbol ester derivatives. Bioactivity scores for inhibition of chikungunya viral replication are plotted (larger nodes indicate a greater score). Node coloration indicates relative abundances in different fractions. Additional information is available in the manuscript. Reused with permission (https://pubs.acs.org/doi/10.1021/acs.jnatprod.7b00737); permissions related to the use of this material should be directed to the American Chemical Society.
Fig. 2.
Fig. 2.
(Left) Example knowledge and data repositories which can be searched using text or data as well as being used to annotate chemicals for dereplication. (Right) Illustrative MS data repositories categorized by their primary type of MS data stored. Data and metadata deposition is increasing; however, the reuse of repository data is less common.
Fig. 3.
Fig. 3.
(a) Left: Two-dimensional emperor plots displaying the prinicipal component analysis (based on MS/MS data) of files in ReDU (n = 40,919) colored by SampleType. Middle: Highlighting via filtering of bacterial files in ReDU, n = 2,246 files. Right: Highlighting specific bacterial files based on taxonomy in ReDU with gut-associated bacterial in dark grey and all other bacterial files in light grey. (b) Left: Illustration displaying the gut-associated bacterial genera selected for Group Comparator analysis (created with BioRender.com). Right: Dot plots displaying the Group Comparator results (percentage of files in which an annotated spectrum was observed) of three of the most abundant diketopiperazines: cyclo(Pro-Leu), cyclo(Leu-4-hydroxy-Pro) and cyclo(Phe-Leu).
Fig. 4.
Fig. 4.
(a) Repository-scale molecular networking via ReDU with Euphorbia spp. with highlighted molecular families 1–3 containing milliamines, terracinolides and diterpenes, respecitvely. An illustrative connection between Milliamine M and a putative Milliamine analog differing in mass by ΔCH2 via molecular networking is displayed. (b) Number of nodes in repository-scale molecular network observed as occuring from samples native to the continent.

References

    1. Wilkinson MD, Scientific Data, 2016, 3, 160018.
    1. Grkovic T, Akee RK, Thornburg CC, Trinh SK, Britt JR, Harris MJ, Evans JR, Kang U, Ensel S, Henrich CJ, Gustafson KR, Schneider JP and O’Keefe BR, ACS Chemical Biology, 2020, 15, 1104–1114. - PMC - PubMed
    1. van Santen JA, Kautsar SA, Medema MH and Linington RG, Natural Product Reports, 2020, 38, 264–278. - PMC - PubMed
    1. Demarque DP, Dusi RG, de Sousa FDM, Grossi SM, Silvério MRS, Lopes NP and Espindola LS, Scientific Reports, 2020, 10, 1051-. - PMC - PubMed
    1. Medema MH, Natural Product Reports, 2021, 301–306. - PubMed

Publication types

Substances