Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Sep 24;26(19):5787.
doi: 10.3390/molecules26195787.

NMF-Based Approach for Missing Values Imputation of Mass Spectrometry Metabolomics Data

Affiliations

NMF-Based Approach for Missing Values Imputation of Mass Spectrometry Metabolomics Data

Jingjing Xu et al. Molecules. .

Abstract

In mass spectrometry (MS)-based metabolomics, missing values (NAs) may be due to different causes, including sample heterogeneity, ion suppression, spectral overlap, inappropriate data processing, and instrumental errors. Although a number of methodologies have been applied to handle NAs, NA imputation remains a challenging problem. Here, we propose a non-negative matrix factorization (NMF)-based method for NA imputation in MS-based metabolomics data, which makes use of both global and local information of the data. The proposed method was compared with three commonly used methods: k-nearest neighbors (kNN), random forest (RF), and outlier-robust (ORI) missing values imputation. These methods were evaluated from the perspectives of accuracy of imputation, retrieval of data structures, and rank of imputation superiority. The experimental results showed that the NMF-based method is well-adapted to various cases of data missingness and the presence of outliers in MS-based metabolic profiles. It outperformed kNN and ORI and showed results comparable with the RF method. Furthermore, the NMF method is more robust and less susceptible to outliers as compared with the RF method. The proposed NMF-based scheme may serve as an alternative NA imputation method which may facilitate biological interpretations of metabolomics data.

Keywords: mass spectrometry; metabolomics data; missing pattern; missing values imputation; non-negative matrix factorization; outliers.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
Sketch map of imputation methods based on local information. (A) Prediction model f trained using a part of data X. (B) NAs predicted using model f.
Figure 2
Figure 2
NRMSE curves obtained from the NMF, RF, ORI, and kNN methods applied on MNAR and MM patterns and MAR/MCAR patterns with different missing percentages. Fifty missingness datasets were generated randomly from Dataset I (A,E), Dataset II (B,F), Dataset III (C), and Dataset IV (D). Error bars represent the standard deviation with * denoting p < 0.05 (t-test with BH adjusted) relative to the NMF-based method.
Figure 3
Figure 3
NRMSE curves for NMF, RF, kNN, and ORI apply to the MM type of missing values with 1%, 3%, and 5% outliers. Fifty missingness datasets were generated randomly from the Dataset I (A,C,E), and Dataset II (B,D,F). Error bars represent the standard deviation with * denoting p < 0.05 (t-test with BH adjusted) relative to NMF.
Figure 4
Figure 4
The F1 scores of predicted CCN given different rates of missing values (horizontal axis) for the NMF, RF, kNN, and ORI methods, using four metabolomics datasets.
Figure 5
Figure 5
MSR for comparing the performance of NMF and RF on four datasets in the absence and presence of outliers.

Similar articles

Cited by

References

    1. Dettmer K., Aronov P.A., Hammock B.D. Mass spectrometry-based metabolomics. Mass Spectrom. Rev. 2007;26:51–78. doi: 10.1002/mas.20108. - DOI - PMC - PubMed
    1. Hrydziuszko O., Viant M.R. Missing values in mass spectrometry based metabolomics: An undervalued step in the data processing pipeline. Metabolomics. 2012;8:S161–S174. doi: 10.1007/s11306-011-0366-4. - DOI
    1. Armitage E.G., Godzien J., Alonso-Herranz V., Lopez-Gonzalvez A., Barbas C. Missing value imputation strategies for metabolomics data. Electrophoresis. 2015;36:3050–3060. doi: 10.1002/elps.201500352. - DOI - PubMed
    1. Annesley T.M. Ion suppression in mass spectrometry. Clin. Chem. 2003;49:1041–1044. doi: 10.1373/49.7.1041. - DOI - PubMed
    1. Little R.J.A., Rubin D.B. Statistical Analysis with Missing Data. John Wiley & Sons, Inc.; Hoboken, NJ, USA: 2002.

LinkOut - more resources