Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Mar 4;40(3):btae098.
doi: 10.1093/bioinformatics/btae098.

imputomics: web server and R package for missing values imputation in metabolomics data

Affiliations

imputomics: web server and R package for missing values imputation in metabolomics data

Jarosław Chilimoniuk et al. Bioinformatics. .

Abstract

Motivation: Missing values are commonly observed in metabolomics data from mass spectrometry. Imputing them is crucial because it assures data completeness, increases the statistical power of analyses, prevents inaccurate results, and improves the quality of exploratory analysis, statistical modeling, and machine learning. Numerous Missing Value Imputation Algorithms (MVIAs) employ heuristics or statistical models to replace missing information with estimates. In the context of metabolomics data, we identified 52 MVIAs implemented across 70 R functions. Nevertheless, the usage of those 52 established methods poses challenges due to package dependency issues, lack of documentation, and their instability.

Results: Our R package, 'imputomics', provides a convenient wrapper around 41 (plus random imputation as a baseline model) out of 52 MVIAs in the form of a command-line tool and a web application. In addition, we propose a novel functionality for selecting MVIAs recommended for metabolomics data with the best performance or execution time.

Availability and implementation: 'imputomics' is freely available as an R package (github.com/BioGenies/imputomics) and a Shiny web application (biogenies.info/imputomics-ws). The documentation is available at biogenies.info/imputomics.

PubMed Disclaimer

Conflict of interest statement

None declared.

Figures

Figure 1.
Figure 1.
(A) A graphical representation of missing values in a preliminary analysis of dataset. (B) Distribution of imputed data compared to the observed data. (C) Occurrence of MVIAs (missing value imputation algorithms). Filled squares mark the presence of a given MVIA. The right-hand side annotations represent the number of MVIAs covered by a given article. The top annotations represent the articles covering a given MVIA. (D) Normalized root mean squared error (NRMSE) of MVIAs. The vertical line marks the baseline MVIA: random imputation. As the NRMSE for PEMM exceeded 6.03×1016, this MVIA is not represented on the chart. (E) The maximum time [ms] necessary to impute missing values. Both in (D) and (E), the color of the bars marks the percentage of datasets on which an MVIA converged successfully in under 2 min.

References

    1. Armitage EG, Godzien J, Alonso-Herranz V. et al. Missing value imputation strategies for metabolomics data. Electrophoresis 2015;36:3050–60. - PubMed
    1. Chang W, Cheng J, Allaire J. et al. Shiny: Web Application Framework for R, 2024. https://github.com/rstudio/shiny.
    1. Davis TJ, Firzli TR, Higgins Keppler EA. et al. Addressing missing data in GC × GC metabolomics: identifying missingness type and evaluating the impact of imputation methods on experimental replication. Anal Chem 2022;94:10912–20. - PMC - PubMed
    1. Josse J, Husson F.. missMDA: a package for handling missing values in multivariate data analysis. J Stat Soft 2016;70:1–31.
    1. Karpievitch YV, Dabney AR, Smith RD. et al. Normalization and missing value imputation for label-free LC-MS analysis. BMC Bioinformatics 2012;13:S5. - PMC - PubMed

Publication types