Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Dec 28:12:giac124.
doi: 10.1093/gigascience/giac124. Epub 2023 Jan 18.

Open and reusable annotated mass spectrometry dataset of a chemodiverse collection of 1,600 plant extracts

Affiliations

Open and reusable annotated mass spectrometry dataset of a chemodiverse collection of 1,600 plant extracts

Pierre-Marie Allard et al. Gigascience. .

Abstract

As privileged structures, natural products often display potent biological activities. However, the discovery of novel bioactive scaffolds is often hampered by the chemical complexity of the biological matrices they are found in. Large natural extract collections are thus extremely valuable for their chemical novelty potential but also complicated to exploit in the frame of drug-discovery projects. In the end, it is the pure chemical substances that are desired for structural determination purposes and bioactivity evaluation. Researchers interested in the exploration of large and chemodiverse extract collections should thus establish strategies aiming to efficiently tackle such chemical complexity and access these structures. Establishing carefully crafted digital layers documenting the spectral and chemical complexity as well as bioactivity results of natural extracts collections can help prioritize time-consuming but mandatory isolation efforts. In this note, we report the results of our initial exploration of a collection of 1,600 plant extracts in the frame of a drug-discovery effort. After describing the taxonomic coverage of this collection, we present the results of its liquid chromatography high-resolution mass spectrometric profiling and the exploitation of these profiles using computational solutions. The resulting annotated mass spectral dataset and associated chemical and taxonomic metadata are made available to the community, and data reuse cases are proposed. We are currently continuing our exploration of this plant extract collection for drug-discovery purposes (notably looking for novel antitrypanosomatids, anti-infective and prometabolic compounds) and ecometabolomics insights. We believe that such a dataset can be exploited and reused by researchers interested in computational natural products exploration.

Keywords: LC-MS; biodiversity digitization; chemodiversity; drug discovery; mass spectrometry; metabolomics; natural products; open science; plant extracts collection.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Figure 1:
Figure 1:
Taxonomical coverage of the profiled collection (1,600). On the left, the barplot represents the overall coverage at main taxa level and up to the phylum Streptophyta. On the right, the taxonomical coverage is represented using a taxonomical tree of all families within the Streptophyta phylum. The families present within the current collection are highlighted in red. The Open Tree of Life (ott3.3) was used for taxonomy resolving. Download pdf version here.
Figure 2:
Figure 2:
Spectral diversity of the profiled plant collection (1,600 extracts). The TMAP approach is employed to display the >100,000 spectra resulting from the alignment of the 1,600 samples untargeted MS/MS profiles. In this TMAP, each dot represents a feature's spectrum, and they are linked together according to their similarity. In A, blue dots (36% of the total amount of spectra) correspond to annotated spectra while gray dots (64%) correspond to unannotated spectra. In B, dots are colored according to the botanical family of the sample where the highest MS1 peak area for the corresponding feature was recorded. In C and D, dots are colored according to the NPClassifier superclass and class, respectively, of their annotation (for annotated dots). In B, it is possible to spot spectral regions (1, 2, and 3) specific to given botanical families. Region 1 is specific to the Meliaceae family, and these spectra are mainly annotated as limonoid derivatives, region 2 is specific to the Annonaceae family and spectra are mainly annotated as acetogenin derivatives, and region 3 is specific to the Apocynaceae family and spectra are annotated as tryptophan alkaloid derivatives. Note that the structural annotation results are reweighed according to the taxonomical proximity of the biological source of the candidate structure and the biological source of the annotated spectra. A bias favoring taxa-specific structures can thus be observed. This interactive structural TMAP can be browsed online [34].
Figure 3:
Figure 3:
Chemical diversity of the profiled plant collection (1,600 extracts) and coverage against reported natural products. Visualization of reported natural products structures (LOTUS v1 and Dictionary of Natural Products v29.1) as a TMAP with plotting of the producing organism (A), the annotation's status (i.e., whether the 2-dimensional structure was annotated in the dataset) (B), and selected NPClassifier classes (C). In the insert of C, the barplot represents the number of compounds reported for each of the selected chemical classes with the opaque part of the bar representing the annotated compounds. The zoom on the limonoid and oleanane terpenoid clusters of the TMAP allows visualizing a well-covered chemical class such as limonoids and a less-covered one such as oleanane triterpenoids. A specific member of each class (limonin [Q2398745] for the limonoids and β-amyrin [Q27108621] for the oleanane triterpenoids) is represented in their planar structure form for illustration purposes. This interactive structural TMAP can be browsed online [51].

References

    1. Duflos A, Kruczynski A, Barret J-M. Novel aspects of natural and modified Vinca alkaloids. Curr Med Chem Anticancer Agents. 2002;2:55–70. - PubMed
    1. Fiorini-Puybaret C, Joulia P. Dye composition comprising a combination of two plant extracts of Lawsonia inermis. World Patent WO2020249748A1.
    1. Nguyen T, Cousy A, Steward N. Method for producing celastrol and pentacyclic triterpene derivatives. World Patent WO2017194757A1.
    1. Vandenberghe I, Créancier L, Vispé S, et al. Physalin B, a novel inhibitor of the ubiquitin-proteasome pathway, triggers NOXA-associated apoptosis. Biochem Pharmacol. 2008;76:453–62. - PubMed
    1. Pouny I, Long C, Batut M, et al. Quinolizidine alkaloids from Cylicomorpha solmsii. J Nat Prod. 2021;84:1198–202. - PubMed

Publication types

MeSH terms