. 2024 Nov 28;14(1):29570.

doi: 10.1038/s41598-024-80955-5.

Exploratory analysis of metabolic changes using mass spectrometry data and graph embeddings

Edwin Alvarez-Mamani^{1

2}, Florian Buettner^{3

4

5}, Cesar A Beltran-Castanon¹, Alfredo J Ibanez^{6

7}

Affiliations

¹ Engineering Department, Pontificia Universidad Catolica del Peru, Lima, Peru.
² Institute for Omics Sciences and Applied Biotechnology, Pontificia Universidad Catolica del Peru, Lima, Peru.
³ Goethe University, Frankfurt, Frankfurt am Main, Germany.
⁴ German Cancer Consortium (DKTK), Frankfurt am Main, Germany.
⁵ German Cancer Research Center (DKFZ), Frankfurt am Main, Germany.
⁶ Institute for Omics Sciences and Applied Biotechnology, Pontificia Universidad Catolica del Peru, Lima, Peru. aibanez@pucp.edu.pe.
⁷ Science Department, Pontificia Universidad Catolica del Peru, Lima, Peru. aibanez@pucp.edu.pe.

PMID: 39609505
PMCID: PMC11604959
DOI: 10.1038/s41598-024-80955-5

Exploratory analysis of metabolic changes using mass spectrometry data and graph embeddings

Edwin Alvarez-Mamani et al. Sci Rep. 2024.

. 2024 Nov 28;14(1):29570.

doi: 10.1038/s41598-024-80955-5.

Authors

Edwin Alvarez-Mamani^{1

2}, Florian Buettner^{3

4

5}, Cesar A Beltran-Castanon¹, Alfredo J Ibanez^{6

7}

Affiliations

¹ Engineering Department, Pontificia Universidad Catolica del Peru, Lima, Peru.
² Institute for Omics Sciences and Applied Biotechnology, Pontificia Universidad Catolica del Peru, Lima, Peru.
³ Goethe University, Frankfurt, Frankfurt am Main, Germany.
⁴ German Cancer Consortium (DKTK), Frankfurt am Main, Germany.
⁵ German Cancer Research Center (DKFZ), Frankfurt am Main, Germany.
⁶ Institute for Omics Sciences and Applied Biotechnology, Pontificia Universidad Catolica del Peru, Lima, Peru. aibanez@pucp.edu.pe.
⁷ Science Department, Pontificia Universidad Catolica del Peru, Lima, Peru. aibanez@pucp.edu.pe.

PMID: 39609505
PMCID: PMC11604959
DOI: 10.1038/s41598-024-80955-5

Abstract

Mass spectrometry (MS)-based metabolomics analysis is a powerful tool, but it comes with its own set of challenges. The MS workflow involves multiple steps before its interpretation in what is denominate data mining. Data mining consists of a two-step process. First, the MS data is ordered, arranged, and presented for filtering before being analyzed. Second, the filtered and reduced data are analyzed using statistics to remove further variability. This holds true particularly for MS-based untargeted metabolomics studies, which focused on understanding fold changes in metabolic networks. Since the task of filtering and identifying changes from a large dataset is challenging, automated techniques for mining untargeted MS-based metabolomic data are needed. The traditional statistics-based approach tends to overfilter raw data, which may result in the removal of relevant data and lead to the identification of fewer metabolomic changes. This limitation of the traditional approach underscores the need for a new method. In this work, we present a novel deep learning approach using node embeddings (powered by GNNs), edge embeddings, and anomaly detection algorithm to analyze the data generated by mass spectrometry (MS)-based metabolomics called GEMNA (Graph Embedding-based Metabolomics Network Analysis), for example for an untargeted volatile study on Mentos candy, the data clusters produced by GEMNA were better than the ones used traditional tools, i.e., GEMNA has [Formula: see text], vs. the traditional approach has [Formula: see text].

Keywords: Graph embeddings; Graph neural networks; Mass spectrometry; Metabolomic networks.

PubMed Disclaimer

Conflict of interest statement

Declarations. Competing interests: The authors declare no competing interests.

Figures

**Fig. 1**
Performance of biological repetitions (Br) and analytical repetitions (Ar). The red dotted line represents the number of common edges found by a greedy algorithm, which does not eliminate edges considered as noise (outlier). The solid grey line represents the number of common edges based on the inverse of the covariance matrix (Precision matrix).

**Fig. 2**
Number of common edges in filtering networks.

**Fig. 3**
MAD values on Mentos dataset.

**Fig. 4**
Runtimes to generate embeddings on Synthetic networks.

**Fig. 5**
Similarity analysis on Mutant network.

**Fig. 7**
PCA on Mentos filter data workflow.

**Fig. 8**
Mass spectrometry data details. (a) is Alignment ID, is Retention time, is Average Mz, is Metabolite name, , , ..., are Biological repetitions. In addition, each biological repetition has two analytical repetitions at least.

formula image — **Fig. 8**
Mass spectrometry data details. (a) is Alignment ID, is Retention time, is Average Mz, is Metabolite name, , , ..., are Biological repetitions. In addition, each biological repetition has two analytical repetitions at least.

**Fig. 9**
Pipeline 1: Network generation. Note: ***none*** means, network without variation.

**Fig. 10**
Pipeline 2: Network filtering. Note: The edges in red color have positive correlation and the edges in blue color have negative correlation.

**Fig. 11**
Pipeline 3: Similarity analysis. Note: the “?” mark means that there is no correlation.

**Fig. 12**
Our method with a toy example. Note: , , ..., are Biological repetitions.

See this image and copyright information in PMC

References

1. Liebal, U. W., Phan, A. N., Sudhakar, M., Raman, K. & Blank, L. M. Machine learning applications for mass spectrometry-based metabolomics. Metabolites10, 243 (2020). - DOI - PMC - PubMed
1. Fan, S. et al. Systematic error removal using random forest for normalizing large-scale untargeted lipidomics data. Anal. Chem.91, 3590–3596 (2019). - DOI - PMC - PubMed
1. Bahado-Singh, R. O. et al. Artificial intelligence and the analysis of multi-platform metabolomics data for the detection of intrauterine growth restriction. PLoS ONE14, e0214121 (2019). - DOI - PMC - PubMed
1. Sauer, U. & Zamboni, N. From biomarkers to integrated network responses. Nat. Biotechnol.26, 1090–1092 (2008). - DOI - PubMed
1. Patti, G. J., Yanes, O. & Siuzdak, G. Metabolomics: the apogee of the omics trilogy. Nat. Rev. Mol. Cell Biol.13, 263–269 (2012). - DOI - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
- Nature Publishing Group
- PubMed Central

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Exploratory analysis of metabolic changes using mass spectrometry data and graph embeddings

Affiliations

Exploratory analysis of metabolic changes using mass spectrometry data and graph embeddings

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources