Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Dec 20;14(1):8488.
doi: 10.1038/s41467-023-44035-y.

Open access repository-scale propagated nearest neighbor suspect spectral library for untargeted metabolomics

Affiliations

Open access repository-scale propagated nearest neighbor suspect spectral library for untargeted metabolomics

Wout Bittremieux et al. Nat Commun. .

Abstract

Despite the increasing availability of tandem mass spectrometry (MS/MS) community spectral libraries for untargeted metabolomics over the past decade, the majority of acquired MS/MS spectra remain uninterpreted. To further aid in interpreting unannotated spectra, we created a nearest neighbor suspect spectral library, consisting of 87,916 annotated MS/MS spectra derived from hundreds of millions of MS/MS spectra originating from published untargeted metabolomics experiments. Entries in this library, or "suspects," were derived from unannotated spectra that could be linked in a molecular network to an annotated spectrum. Annotations were propagated to unknowns based on structural relationships to reference molecules using MS/MS-based spectrum alignment. We demonstrate the broad relevance of the nearest neighbor suspect spectral library through representative examples of propagation-based annotation of acylcarnitines, bacterial and plant natural products, and drug metabolism. Our results also highlight how the library can help to better understand an Alzheimer's brain phenotype. The nearest neighbor suspect spectral library is openly available for download or for data analysis through the GNPS platform to help investigators hypothesize candidate structures for unknown MS/MS spectra in untargeted metabolomics data.

PubMed Disclaimer

Conflict of interest statement

P.C.D. consulted for DSM animal health in 2023, is an advisor and holds equity in Cybele, and is co-founder and scientific advisor and holds equity in Ometa, Arome, and Enveda, with prior approval by UC San Diego. M.W. is a co-founder of Ometa Labs LLC. A.A.A. and A.V.M. are founders of Arome Science Inc. C.M.A. is a consultant for Nuanced Health. J.J.J.vd.H. is a member of the Scientific Advisory Board of NAICONS Srl., Milano, Italy and consults for Corteva Agriscience, Indianapolis, IN, USA. R.F.K.D. is an inventor on several patents in the metabolomics field and holds founder equity in Metabolon, Chymia, and PsyProtix. The remaining authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Creation of the nearest neighbor suspect spectral library.
Overview of how the suspect library was created. Step 1: molecular networking of individual datasets. Step 2: co-networking of the 1335 datasets to create a global molecular network. Step 3: extract nearest neighbor suspects through annotation propagation to create the library.
Fig. 2
Fig. 2. Composition of the nearest neighbor suspect spectral library.
a The composition of suspects that exclusively exist of CH, CHO, CHNO, or contain P or S compared to the reference libraries. b Repeated occurrences of the suspects across datasets and files (i.e., individual LC-MS runs). c Frequently observed mass offsets (delta masses between pairs of spectra) associated with the suspect library. d Frequently observed mass offsets around a nominal mass of −80 Da. Source data are provided as a Source Data file.
Fig. 3
Fig. 3. Novel acylcarnitine reference spectra obtained using the nearest neighbor suspect spectral library.
Reference MS/MS spectra are indicated by ★. a Reference MS/MS spectrum for hexanoylcarnitine originally included in the GNPS community spectral libraries. b Nearest neighbor suspects related to hexanoylcarnitine. Annotations based on expert interpretation are: hexanoylcarnitine derived from a C6:1 fatty acid (unknown location of the double bond), benzoylcarnitine, and dodecanedioylcarnitine. c Nearest neighbor suspects related to hexanoylcarnitine for 3-hydroxybutyrylcarnitine and 3-hydroxyhexanoylcarnitine (bottom) confirmed against commercial standards (top). The suspect MS/MS spectra show a very high cosine similarity of 0.9988 and 0.9927 to the reference MS/MS spectra for 3-hydroxybutyrylcarnitine and 3-hydroxyhexanoylcarnitine, respectively.
Fig. 4
Fig. 4. Novel apratoxin reference spectra obtained using the nearest neighbor suspect spectral library.
a Apratoxin cluster in a molecular network created from Moorena bouillonii crude extracts. The reference spectral library hits are shown by the blue squares (b). The purple and pink diamond nodes represent matches to the nearest neighbor suspect spectral library, with the purple diamonds matching the MS/MS spectra shown for which structures could be proposed (c). The white nodes are additional MS/MS spectra within the apratoxin molecular family that remained unannotated, even when including the suspect library. b Reference MS/MS spectra and molecular structures of known apratoxins. c MS/MS spectra and structural hypotheses for four novel apratoxin suspects. All four apratoxin suspects were derived from the tropical marine benthic filamentous cyanobacterium Moorena bouillonii, which is known to produce apratoxins (MSV000086109 [10.25345/C52475]).
Fig. 5
Fig. 5. Impact of the nearest neighbor suspect spectral library on spectrum matches to enable the formulation of structural hypotheses.
a The MS/MS spectrum match rate with and without the suspect library for 1407 public datasets on GNPS/MassIVE. The full center line indicates the median values, and the dashed center line indicates the mean values. The box limits indicate the first and third quartiles of the data, and the whiskers extend to 1.5 times the interquartile range. b The MS/MS spectrum match rate for different types of datasets with and without the suspect library. The data comes from 45,845 raw files in 179 datasets with known sample types recorded using the ReDU metadata system. c MS/MS matches to an untargeted metabolomics human brain dataset from Alzheimer’s disease patients (n = 360) and healthy subjects (n = 154) with and without the suspect library. d Differentially abundant carnitines for Alzheimer’s disease patients (n = 514 biologically independent samples; Benjamini-Hochberg corrected p-value < 0.05). The suspect library was able to identify four additional mass spectrometry features as acylcarnitines, which would have remained unannotated matched only against the default GNPS libraries. Statistically significant carnitines were determined using the Spearman correlations between all acylcarnitine extracted ion chromatograms (XICs) and the subjects’ CERAD scores (a measure of Alzheimer progression, with 1 indicating “definite” Alzheimer’s disease and 4 indicating “no” Alzheimer’s disease) and the correlation coefficients and associated p-values were recorded. Multiple testing correction of the p-values was performed using the Benjamini-Hochberg procedure. For visualization purposes the four-scale CERAD score was binarized by considering a CERAD score of 1 or 2 to correspond to positive Alzheimer’s disease patients, and a CERAD score of 3 or 4 to correspond to healthy individuals. The box limits indicate the first and third quartiles of the data, the center represents the median, and the whiskers extend to 1.5 times the interquartile range. P-values for the statistically significant carnitines are as follows. Hexanoylcarnitine – 12.036 Da → 0.00073; octanoylcarnitine → 0.03558; L-carnitine → 0.03966; hexanoyl-L-carnitine → 0.03966; lauroylcarnitine → 0.03966; decanoyl-L-carnitine – 14.090 Da → 0.03966; decanoyl-L-carnitine – 42.047 Da → 0.03966. Source data are provided as a Source Data file.

References

    1. Bittremieux W, Wang M, Dorrestein PC. The critical role that spectral libraries play in capturing the metabolomics community knowledge. Metabolomics. 2022;18:94. doi: 10.1007/s11306-022-01947-y. - DOI - PMC - PubMed
    1. Sindelar M, Patti GJ. Chemical discovery in the era of metabolomics. J. Am. Chem. Soc. 2020;142:9097–9105. doi: 10.1021/jacs.9b13198. - DOI - PMC - PubMed
    1. Schmid R, et al. Ion identity molecular networking for mass spectrometry-based metabolomics in the GNPS environment. Nat. Commun. 2021;12:3832. doi: 10.1038/s41467-021-23953-9. - DOI - PMC - PubMed
    1. Chen L, et al. Metabolite discovery through global annotation of untargeted metabolomics data. Nat. Methods. 2021;18:1377–1385. doi: 10.1038/s41592-021-01303-3. - DOI - PMC - PubMed
    1. Djoumbou-Feunang Y, et al. BioTransformer: a comprehensive computational tool for small molecule metabolism prediction and metabolite identification. J. Cheminform. 2019;11:2. doi: 10.1186/s13321-018-0324-5. - DOI - PMC - PubMed

Publication types