Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Jul 22:9:917911.
doi: 10.3389/fmolb.2022.917911. eCollection 2022.

Graph Properties of Mass-Difference Networks for Profiling and Discrimination in Untargeted Metabolomics

Affiliations

Graph Properties of Mass-Difference Networks for Profiling and Discrimination in Untargeted Metabolomics

Francisco Traquete et al. Front Mol Biosci. .

Abstract

Untargeted metabolomics seeks to identify and quantify most metabolites in a biological system. In general, metabolomics results are represented by numerical matrices containing data that represent the intensities of the detected variables. These matrices are subsequently analyzed by methods that seek to extract significant biological information from the data. In mass spectrometry-based metabolomics, if mass is detected with sufficient accuracy, below 1 ppm, it is possible to derive mass-difference networks, which have spectral features as nodes and chemical changes as edges. These networks have previously been used as means to assist formula annotation and to rank the importance of chemical transformations. In this work, we propose a novel role for such networks in untargeted metabolomics data analysis: we demonstrate that their properties as graphs can also be used as signatures for metabolic profiling and class discrimination. For several benchmark examples, we computed six graph properties and we found that the degree profile was consistently the property that allowed for the best performance of several clustering and classification methods, reaching levels that are competitive with the performance using intensity data matrices and traditional pretreatment procedures. Furthermore, we propose two new metrics for the ranking of chemical transformations derived from network properties, which can be applied to sample comparison or clustering. These metrics illustrate how the graph properties of mass-difference networks can highlight the aspects of the information contained in data that are complementary to the information extracted from intensity-based data analysis.

Keywords: Fourier transform mass spectrometry; graph properties; mass-difference networks; metabolomics data analysis; untargeted metabolomics.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

FIGURE 1
FIGURE 1
The concept of mass-difference networks (MDiNs). In this four-node example, neutral mass values (in Da) obtained from a mass spectrometry analysis are represented as nodes, connected with mass differences associated with particular mass-difference-based building blocks (MDBs). Δm, mass difference (in Da).
FIGURE 2
FIGURE 2
Mass-difference network built from the YD dataset. The inset is a close-up of the selected rectangle in the populated area of the largest network component. Edge colors represent each MDB: (formula image) – O(–NH), (formula image) – NH3(–O), (formula image) – H2, (formula image) – CH2, (formula image) – O, (formula image) – H2O, (formula image) – NCH, (formula image) – CO, (formula image) – CHOH, (formula image) – S, (formula image) – CH2O, (formula image) – CONH, (formula image) – CO2, (formula image) – SO3, (formula image) – PO3H, (formula image) – CHCOOH, and (formula image) – CCH3COOH. Node background colors represent the node degree. Network representations were made with Cytoscape 3.8.1 (Shannon et al., 2003).
FIGURE 3
FIGURE 3
Effect of IDT and sMDiN graph property analysis on clustering performance. (A) Correct clustering in HCA; (B) discrimination distance in HCA; (C) correct first clustering in HCA; (D) correct clustering in K-means clustering; (E) discrimination distance in K-means clustering; (F) adjusted Rand Index in K-means clustering. Methods are as follows: intensity-based data pretreatment (IDT); network analysis: degree analysis (degree), betweenness centrality analysis (betweenness), closeness centrality analysis (closeness), MDB impact (MDBI), weighted MDB impact (WMDBI), and GCD-11 topology analysis (GCD11).
FIGURE 4
FIGURE 4
Classification performance of models developed from IDT-treated data or sMDiN graph property methods. (A) Performance of random forest (RF) models; (B) performance of projection in latent structures–discriminant analysis (PLS-DA). For all datasets except HD, accuracy was estimated by 20 iterations of internal three- or fivefold stratified cross-validation, with the error bars representing the accuracy standard deviation. For the HD dataset, accuracy was estimated on a test set resulting from a stratified random 70/30% train/test split. Methods are as follows: intensity-based data pretreatment (IDT); network analysis: degree analysis (degree), betweenness centrality analysis (betweenness), closeness centrality analysis (closeness), MDB impact (MDBI), weighted MDB impact (WMDBI), and GCD-11 topology analysis (GCD11).
FIGURE 5
FIGURE 5
MDBI and WMDBI values for the sMDiNs of the YD dataset. (A) MDB impact; (B) weighted MDB impact. Values were mean-centered and standard scaled. MDBs are ordered by decreasing gini importance. Samples are triplicates of yeast strains of the wild-type reference strain (WT) and four single-gene deletion isogenic mutants of this strain: ΔGLO1, ΔGLO2, ΔGRE3, and ΔENO1. MDBs are listed in Table 2. Samples were clustered by HCA, with Euclidean distance and Ward linkage.

Similar articles

Cited by

References

    1. Amara A., Frainay C., Jourdan F., Naake T., Neumann S., Novoa-del-Toro E. M., et al. (2022). Networks and Graphs Discovery in Metabolomics Data Analysis and Interpretation. Front. Mol. Biosci. 9. 10.3389/fmolb.2022.841373 - DOI - PMC - PubMed
    1. Andreopoulos B., An A., Wang X., Schroeder M. (2009). A Roadmap of Clustering Algorithms: Finding a Match for a Biomedical Application. Briefings Bioinforma. 10, 297–314. 10.1093/bib/bbn058 - DOI - PubMed
    1. Barabási A.-L., Oltvai Z. N. (2004). Network Biology: Understanding the Cell's Functional Organization. Nat. Rev. Genet. 5, 101–113. 10.1038/nrg1272 - DOI - PubMed
    1. Bartel J., Krumsiek J., Theis F. J. (2013). Statistical Methods for the Analysis of High-Throughput Metabolomics Data. Comput. Struct. Biotechnol. J. 4, e201301009. 10.5936/csbj.201301009 - DOI - PMC - PubMed
    1. Breitling R., Ritchie S., Goodenowe D., Stewart M. L., Barrett M. P. (2006). Ab Initio prediction of Metabolic Networks Using Fourier Transform Mass Spectrometry Data. Metabolomics 2, 155–164. 10.1007/s11306-006-0029-z - DOI - PMC - PubMed

LinkOut - more resources