Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Nov 28;14(1):29570.
doi: 10.1038/s41598-024-80955-5.

Exploratory analysis of metabolic changes using mass spectrometry data and graph embeddings

Affiliations

Exploratory analysis of metabolic changes using mass spectrometry data and graph embeddings

Edwin Alvarez-Mamani et al. Sci Rep. .

Abstract

Mass spectrometry (MS)-based metabolomics analysis is a powerful tool, but it comes with its own set of challenges. The MS workflow involves multiple steps before its interpretation in what is denominate data mining. Data mining consists of a two-step process. First, the MS data is ordered, arranged, and presented for filtering before being analyzed. Second, the filtered and reduced data are analyzed using statistics to remove further variability. This holds true particularly for MS-based untargeted metabolomics studies, which focused on understanding fold changes in metabolic networks. Since the task of filtering and identifying changes from a large dataset is challenging, automated techniques for mining untargeted MS-based metabolomic data are needed. The traditional statistics-based approach tends to overfilter raw data, which may result in the removal of relevant data and lead to the identification of fewer metabolomic changes. This limitation of the traditional approach underscores the need for a new method. In this work, we present a novel deep learning approach using node embeddings (powered by GNNs), edge embeddings, and anomaly detection algorithm to analyze the data generated by mass spectrometry (MS)-based metabolomics called GEMNA (Graph Embedding-based Metabolomics Network Analysis), for example for an untargeted volatile study on Mentos candy, the data clusters produced by GEMNA were better than the ones used traditional tools, i.e., GEMNA has [Formula: see text], vs. the traditional approach has [Formula: see text].

Keywords: Graph embeddings; Graph neural networks; Mass spectrometry; Metabolomic networks.

PubMed Disclaimer

Conflict of interest statement

Declarations. Competing interests: The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
Performance of biological repetitions (Br) and analytical repetitions (Ar). The red dotted line represents the number of common edges found by a greedy algorithm, which does not eliminate edges considered as noise (outlier). The solid grey line represents the number of common edges based on the inverse of the covariance matrix (Precision matrix).
Fig. 2
Fig. 2
Number of common edges in filtering networks.
Fig. 3
Fig. 3
MAD values on Mentos dataset.
Fig. 4
Fig. 4
Runtimes to generate embeddings on Synthetic networks.
Fig. 5
Fig. 5
Similarity analysis on Mutant network.
Fig. 6
Fig. 6
Heatmap on Leaf filter data.
Fig. 7
Fig. 7
PCA on Mentos filter data workflow.
Fig. 8
Fig. 8
Mass spectrometry data details. (a) formula image is Alignment ID, formula image is Retention time, formula image is Average Mz, formula image is Metabolite name, formula image, formula image, ..., formula image are Biological repetitions. In addition, each biological repetition has two analytical repetitions at least.
Fig. 9
Fig. 9
Pipeline 1: Network generation. Note: none means, network without variation.
Fig. 10
Fig. 10
Pipeline 2: Network filtering. Note: The edges in red color have positive correlation and the edges in blue color have negative correlation.
Fig. 11
Fig. 11
Pipeline 3: Similarity analysis. Note: the “?” mark means that there is no correlation.
Fig. 12
Fig. 12
Our method with a toy example. Note: formula image, formula image, ..., formula image are Biological repetitions.

Similar articles

References

    1. Liebal, U. W., Phan, A. N., Sudhakar, M., Raman, K. & Blank, L. M. Machine learning applications for mass spectrometry-based metabolomics. Metabolites10, 243 (2020). - PMC - PubMed
    1. Fan, S. et al. Systematic error removal using random forest for normalizing large-scale untargeted lipidomics data. Anal. Chem.91, 3590–3596 (2019). - PMC - PubMed
    1. Bahado-Singh, R. O. et al. Artificial intelligence and the analysis of multi-platform metabolomics data for the detection of intrauterine growth restriction. PLoS ONE14, e0214121 (2019). - PMC - PubMed
    1. Sauer, U. & Zamboni, N. From biomarkers to integrated network responses. Nat. Biotechnol.26, 1090–1092 (2008). - PubMed
    1. Patti, G. J., Yanes, O. & Siuzdak, G. Metabolomics: the apogee of the omics trilogy. Nat. Rev. Mol. Cell Biol.13, 263–269 (2012). - PMC - PubMed

LinkOut - more resources