Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Jun 22;12(1):3832.
doi: 10.1038/s41467-021-23953-9.

Ion identity molecular networking for mass spectrometry-based metabolomics in the GNPS environment

Affiliations

Ion identity molecular networking for mass spectrometry-based metabolomics in the GNPS environment

Robin Schmid et al. Nat Commun. .

Abstract

Molecular networking connects mass spectra of molecules based on the similarity of their fragmentation patterns. However, during ionization, molecules commonly form multiple ion species with different fragmentation behavior. As a result, the fragmentation spectra of these ion species often remain unconnected in tandem mass spectrometry-based molecular networks, leading to redundant and disconnected sub-networks of the same compound classes. To overcome this bottleneck, we develop Ion Identity Molecular Networking (IIMN) that integrates chromatographic peak shape correlation analysis into molecular networks to connect and collapse different ion species of the same molecule. The new feature relationships improve network connectivity for structurally related molecules, can be used to reveal unknown ion-ligand complexes, enhance annotation within molecular networks, and facilitate the expansion of spectral reference libraries. IIMN is integrated into various open source feature finding tools and the GNPS environment. Moreover, IIMN-based spectral libraries with a broad coverage of ion species are publicly available.

PubMed Disclaimer

Conflict of interest statement

M.W. is the founder of Ometa Labs LLC. A.A. is a consultant for Ometa Labs LLC. S.B. and K.D. are co-founders of Bright Giant GmbH. A.K. is an employee of Bruker Daltonics GmbH & Co. KG.. P.C.D. is on the advisory board for Sirenas and Cybele. The other authors declare no competing interests.

Figures

Fig. 1
Fig. 1. The concept of ion identity molecular networking (IIMN).
The workflow integrates a MS1 feature grouping to connect different ion species of the same compound and b feature-based molecular networking to connect similar compound structures based on MS2 spectral similarity to yield c combined networks. d highlights the data processing steps to create IIMN networks in MZmine and GNPS. After feature detection and alignment across multiple samples, features are grouped based on the correlation of their chromatographic feature shapes (intensity profiles) and other MS1 characteristics. Subsequently, ion species of grouped features are identified with an ion identity library generated based on user input for included adducts, in-source modifications, and a maximum multimer parameter. After uploading these results to the GNPS web server, the IIMN workflow generates combined networks and an alternative output with all IIN collapsed into single molecular nodes to reduce complexity and redundancy.
Fig. 2
Fig. 2. Ion identity molecular networking.
Depicted are three visualizations of the same ion identity molecular network from the post-column salt infusion experiments. a Sorting by ion identities reveals that MS2 similarity edges (blue) often link sodiated ions (e.g., [M + Na]+ and [2 M + Na]+) into a subnetwork that is separated from a subnetwork of ammonium adducts with protonated species. The pie charts indicate relative abundances in different salt addition experiments (Control (H2O), gray; Na-Acetate, yellow; NH4-Acetate, green). The complexity and redundancy are reduced by b sorting all ions of the same molecule in a circular layout and c collapsing all IIN into representative single molecular nodes. This option reduces the complexity of this IIMN from 43 feature nodes to four molecular nodes (A–D) and 15 feature nodes (−56%). d Lists the structure of all GNPS library matches and e propagated structures for D (based on A and C) and the in-source fragments A’ to D’. This subset of structurally related compounds gives a first statistical proof for high correct annotation rates during IIN in MZmine as adduct formation responds to the corresponding salt infusion, e.g., higher [M + Na]+ abundances in the sodium acetate buffer infusion.
Fig. 3
Fig. 3. Statistical impact of salt addition experiments on ion identity abundances.
The relative intensities of selected ion identities are plotted for each post-column infusion in triplicate. The significant change for [M + Na]+ and [M + NH4]+ ion identities in the corresponding post-column salt infusions compared to the control samples agree with the expected ionization behavior. The exclusive formation of an uncommon [M + ACN + NH4]+ in-source cluster in the ammonium acetate buffer infusion further verifies ion identity networking results. Boxplots visualize the median as a horizontal line, the mean as an x, the first (Q1) and third quartile (Q3) as the lower and upper hinges, and the whiskers corresponding to the minimum value below Q1 and the maximum value above Q3 within the 1.5 × IQR (where IQR is the interquartile range). The p-values of a Welch two-samples t-test and the corresponding number of ion identities n are provided for each pair of compared triplicate injections with different post-column salt infusion conditions. Source data are provided as a Source Data file.
Fig. 4
Fig. 4. Overview of IIMN results for 24 experimental datasets.
a Summarizes the relative number of LC-MS features (with an MS2 spectrum) that were annotated by ion identities or matches to the GNPS spectral libraries. The increased annotation rate by propagating library matches to connected unannotated ion identities is highlighted and b displayed as relative gains with a mean increase by 35% compared to all library matches. c Comparison of relative ion formation tendencies measured as the number of ion identities. Boxplots summarize the statistics of overall n = 24 datasets by visualizing the median as a horizontal line, the mean as an x, the first and third quartile as the lower and upper hinges, and the whiskers corresponding to the minimum value below Q1 and the maximum value above Q3 within the 1.5 × IQR. Source data are provided as a Source Data file.
Fig. 5
Fig. 5. Comparisons of a subnetwork with matches to bile acids from 88 feces and gall bladder samples of various animals (MSV000084170).
This overview compares a the FBMN results to IIMN b before and c after collapsing all ion identity networks into single representative nodes. In the top row, nodes are colorized depending on the adduct that ion identities are based on. In contrast, the lower three networks emphasize nodes with MS2 spectra that match library spectra of specific compound classes, mainly bile acids and their conjugates. The collapsed network (c) reduces the complexity and redundancy of having multiple nodes per compound and only keeps MS2 spectral similarity edges.
Fig. 6
Fig. 6. Analysis of the coverage and distribution of ion identities in public LC-MS2 spectral libraries (refer to Supplementary Table 3 for library origins).
Two-thirds of the MassBank of North America LC-MS2-positive ion mode library entries were entered as [M + H]+ while only four other ion types reached more than 1000 entries, namely, [M + Na]+, [M + NH4]+, [M + K]+, and [M − H2O + H]+. Other in-source fragments, multiply charged species, and multimers are only covered for a few compounds. A significant number of entries were either annotated as negatively charged adducts (e.g., [M − H]) or were missing an annotation. As the ion identity naming was not harmonized, different versions pointing to the same ion identity were added to a total count. A similar ion annotation coverage was found in the GNPS spectral libraries. In contrast, libraries that were generated with the recently described MSMS-Chooser workflow on GNPS or the IIMN-based library extraction workflow, described here, show an overall broader coverage of different adducts, multimers, and in-source fragments. The depicted statistical visualization compares a subset of significant or representative ion identities. The IIMN-based numbers summarize the libraries from both the 24 experimental datasets and the two NIH natural product standards datasets with a total of 2659 library entries. Source data are provided as a Source Data file.

References

    1. Watrous J, et al. Mass spectral molecular networking of living microbial colonies. Proc. Natl Acad. Sci. USA. 2012;109:E1743–E1752. doi: 10.1073/pnas.1203689109. - DOI - PMC - PubMed
    1. Wang M, et al. Sharing and community curation of mass spectrometry data with Global Natural Products Social Molecular Networking. Nat. Biotechnol. 2016;34:828–837. doi: 10.1038/nbt.3597. - DOI - PMC - PubMed
    1. Quinn RA, et al. Molecular networking as a drug discovery, drug metabolism, and precision medicine strategy. Trends Pharmacol. Sci. 2017;38:143–154. doi: 10.1016/j.tips.2016.10.011. - DOI - PubMed
    1. Fox Ramos AE, Evanno L, Poupon E, Champy P, Beniddir MA. Natural products targeting strategies involving molecular networking: different manners, one goal. Nat. Prod. Rep. 2019;36:960–980. doi: 10.1039/C9NP00006B. - DOI - PubMed
    1. Aron AT, et al. Reproducible molecular networking of untargeted mass spectrometry data using GNPS. Nat. Protoc. 2020;15:1954–1991. doi: 10.1038/s41596-020-0317-5. - DOI - PubMed

Publication types