Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2022 Nov 19;18(12):94.
doi: 10.1007/s11306-022-01947-y.

The critical role that spectral libraries play in capturing the metabolomics community knowledge

Affiliations
Review

The critical role that spectral libraries play in capturing the metabolomics community knowledge

Wout Bittremieux et al. Metabolomics. .

Abstract

Background: Spectral library searching is currently the most common approach for compound annotation in untargeted metabolomics. Spectral libraries applicable to liquid chromatography mass spectrometry have grown in size over the past decade to include hundreds of thousands to millions of mass spectra and tens of thousands of compounds, forming an essential knowledge base for the interpretation of metabolomics experiments.

Aim of review: We describe existing spectral library resources, highlight different strategies for compiling spectral libraries, and discuss quality considerations that should be taken into account when interpreting spectral library searching results. Finally, we describe how spectral libraries are empowering the next generation of machine learning tools in computational metabolomics, and discuss several opportunities for using increasingly accessible large spectral libraries.

Key scientific concepts of review: This review focuses on the current state of spectral libraries for untargeted LC-MS/MS based metabolomics. We show how the number of entries in publicly accessible spectral libraries has increased more than 60-fold in the past eight years to aid molecular interpretation and we discuss how the role of spectral libraries in untargeted metabolomics will evolve in the near future.

Keywords: Compound identification; Mass spectrometry; Spectral library; Untargeted metabolomics.

PubMed Disclaimer

Conflict of interest statement

Conflict of interests statement

PCD is an advisor to Cybele and co-founder and scientific advisor to Ometa and Enveda, with prior approval by UC San Diego. MW is a co-founder of Ometa Labs LLC.

Figures

Figure 1:
Figure 1:
Representative example of a molecular family level annotation from spectral library searching that matches to hexenoylcarnitine. The MS/MS spectrum contains several diagnostic fragments and neutral losses that make it possible to assign it to the acylcarnitines molecular family, as indicated on the molecular structures (Yan et al., 2020). However, routine spectral library matching cannot distinguish between the 14 potential stereo- and regioisomers, resulting in a level 3 annotation. This highlights the need for new strategies to communicate the results from spectral library searching, as narrowing down to the molecular family, even when the exact molecular identity is unknown, can often already be valuable for biological interpretation. Top is the experimental observed MS/MS spectrum, with a precursor m/z deviation of 11.6 ppm compared to the calculated m/z of the protonated ions.
Figure 2:
Figure 2:
Advances in spectral libraries for LC-MS/MS based untargeted metabolomics. (a) The GNPS community spectral libraries (non-commercial only) have grown from 23,790 MS/MS spectra in 2014 to 586,647 MS/MS spectra in 2022 (September 2022). Concurrently, the number of library spectra that matched to public data has grown from 4,727 MS/MS spectra in 2014 to 127,405 MS/MS spectra in 2022 (22% of the publicly available library spectra have matches to experimental MS/MS spectra in public data). (b) Fueled by growing spectral libraries, the MS/MS spectrum annotation rate for the GNPS continuous identification mode as part of living data (M. Wang et al., 2016), which periodically reanalyses all public datasets on GNPS/MassIVE with the latest spectral libraries, has increased from 2% of MS/MS spectra on average in 2014 to 13% in 2022.
Figure 3.
Figure 3.
Distribution of ion adducts in public spectral libraries. The majority of positive ion mode MS/MS spectra in MoNA (a) and GNPS (b) are protonated, while other adducts, in-source fragments, multiply charged species, and multimers are minimally represented. (c) Ion identity molecular networking was used to extract novel reference MS/MS spectra that exhibit overall broader coverage of different adducts, multimers, and in-source fragments (Schmid et al., 2021). Note that these ion forms are found with a predefined inclusion list, rather than a comprehensive search for all ion forms that might be present in untargeted metabolomics data of a biological sample.
Figure 4:
Figure 4:
Spectral entropy distributions for the GNPS, MoNA, and NIST20 spectral libraries. GNPS consists of 497,137 MS/MS spectra from the “ALL_GNPS_NO_PROPOGATED” library (downloaded on 2022-09-08), MoNA contains 145,361 MS/MS spectra from the “LC-MS/MS Spectra” collection (downloaded on 2022-09-08), and NIST20 consists of 1,026,712 MS/MS spectra (high-resolution MS/MS collection). Spectra were processed by removing noise peaks below 1% of the base peak intensity and normalizing fragment intensities to sum to one. (a) There is a strong relationship between spectral entropy and the number of fragment ions (Spearman correlation 0.963). (b) Although the NIST20 library contains smaller molecules than GNPS and MoNA, the difference in entropy distributions cannot be directly explained by the weight of the molecules (Spearman correlation 0.095).

References

    1. Aksenov AA, da Silva R, Knight R, Lopes NP, & Dorrestein PC (2017). Global chemical analysis of biology by mass spectrometry. Nature Reviews Chemistry, 1(7), 0054. 10.1038/s41570-017-0054 - DOI
    1. Alka O, Shanthamoorthy P, Witting M, Kleigrewe K, Kohlbacher O, & Röst HL (2022). DIAMetAlyzer allows automated false-discovery rate-controlled analysis for data-independent acquisition in metabolomics. Nature Communications, 13(1), 1347. 10.1038/s41467-022-29006-z - DOI - PMC - PubMed
    1. Aron AT, Gentry EC, McPhail KL, Nothias L-F, Nothias-Esposito M, Bouslimani A, Petras D, Gauglitz JM, Sikora N, Vargas F, van der Hooft JJJ, Ernst M, Kang KB, Aceves CM, Caraballo-Rodríguez AM, Koester I, Weldon KC, Bertrand S, Roullier C, … Dorrestein PC (2020). Reproducible molecular networking of untargeted mass spectrometry data using GNPS. Nature Protocols, 15(6), 1954–1991. 10.1038/S41596-020-0317-5 - DOI - PubMed
    1. Bittremieux W, Avalon NE, Thomas SP, Kakhkhorov SA, Aksenov AA, Gomes PWP, Aceves CM, Caraballo Rodriguez AM, Gauglitz JM, Gerwick WH, Jarmusch AK, Kaddurah-Daouk RF, Kang KB, Kim HW, Kondic T, Mannochio-Russo H, Meehan MJ, Melnik A, Nothias L-F, … Dorrestein PC, (2022). Open access repository-scale propagated nearest neighbor suspect spectral library for untargeted metabolomics. BioRxiv. 10.1101/2022.05.15.490691 - DOI - PMC - PubMed
    1. Bittremieux W, Laukens K, & Noble WS (2019). Extremely fast and accurate open modification spectral library searching of high-resolution mass spectra using feature hashing and graphics processing units. Journal of Proteome Research, 18(10), 3792–3799. 10.1021/acs.jproteome.9b00291 - DOI - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources