imputomics: web server and R package for missing values imputation in metabolomics data

Jarosław Chilimoniuk¹, Krystyna Grzesiak^{1

2}, Jakub Kała¹, Dominik Nowakowski³, Adam Krętowski¹, Rafał Kolenda^{4

5}, Michał Ciborowski¹, Michał Burdukiewicz^{1

6}

Affiliations

¹ Clinical Research Centre, Medical University of Białystok, Białystok, Poland.
² Faculty of Mathematics and Computer Science, University of Wrocław, Wrocław, Poland.
³ Department of Biostatistics and Medical Informatics, Medical University of Białystok, Białystok, Poland.
⁴ Quadram Institute Biosciences, Norwich Research Park, Norwich, United Kingdom.
⁵ Faculty of Veterinary Medicine, Wrocław University of Environmental and Life Sciences, Wrocław, Poland.
⁶ Institute of Biotechnology and Biomedicine, Autonomous University of Barcelona, Cerdanyola del Vallès, Spain.

PMID: 38377398
PMCID: PMC10918629
DOI: 10.1093/bioinformatics/btae098

imputomics: web server and R package for missing values imputation in metabolomics data

Jarosław Chilimoniuk et al. Bioinformatics. 2024.

. 2024 Mar 4;40(3):btae098.

doi: 10.1093/bioinformatics/btae098.

Authors

Jarosław Chilimoniuk¹, Krystyna Grzesiak^{1

2}, Jakub Kała¹, Dominik Nowakowski³, Adam Krętowski¹, Rafał Kolenda^{4

5}, Michał Ciborowski¹, Michał Burdukiewicz^{1

6}

Affiliations

¹ Clinical Research Centre, Medical University of Białystok, Białystok, Poland.
² Faculty of Mathematics and Computer Science, University of Wrocław, Wrocław, Poland.
³ Department of Biostatistics and Medical Informatics, Medical University of Białystok, Białystok, Poland.
⁴ Quadram Institute Biosciences, Norwich Research Park, Norwich, United Kingdom.
⁵ Faculty of Veterinary Medicine, Wrocław University of Environmental and Life Sciences, Wrocław, Poland.
⁶ Institute of Biotechnology and Biomedicine, Autonomous University of Barcelona, Cerdanyola del Vallès, Spain.

PMID: 38377398
PMCID: PMC10918629
DOI: 10.1093/bioinformatics/btae098

Abstract

Motivation: Missing values are commonly observed in metabolomics data from mass spectrometry. Imputing them is crucial because it assures data completeness, increases the statistical power of analyses, prevents inaccurate results, and improves the quality of exploratory analysis, statistical modeling, and machine learning. Numerous Missing Value Imputation Algorithms (MVIAs) employ heuristics or statistical models to replace missing information with estimates. In the context of metabolomics data, we identified 52 MVIAs implemented across 70 R functions. Nevertheless, the usage of those 52 established methods poses challenges due to package dependency issues, lack of documentation, and their instability.

Results: Our R package, 'imputomics', provides a convenient wrapper around 41 (plus random imputation as a baseline model) out of 52 MVIAs in the form of a command-line tool and a web application. In addition, we propose a novel functionality for selecting MVIAs recommended for metabolomics data with the best performance or execution time.

Availability and implementation: 'imputomics' is freely available as an R package (github.com/BioGenies/imputomics) and a Shiny web application (biogenies.info/imputomics-ws). The documentation is available at biogenies.info/imputomics.

PubMed Disclaimer

Conflict of interest statement

None declared.

Figures

**Figure 1.**
(A) A graphical representation of missing values in a preliminary analysis of dataset. (B) Distribution of imputed data compared to the observed data. (C) Occurrence of MVIAs (missing value imputation algorithms). Filled squares mark the presence of a given MVIA. The right-hand side annotations represent the number of MVIAs covered by a given article. The top annotations represent the articles covering a given MVIA. (D) Normalized root mean squared error (NRMSE) of MVIAs. The vertical line marks the baseline MVIA: random imputation. As the NRMSE for PEMM exceeded $6.03 \times 10^{16}$ , this MVIA is not represented on the chart. (E) The maximum time [ms] necessary to impute missing values. Both in (D) and (E), the color of the bars marks the percentage of datasets on which an MVIA converged successfully in under 2 min.

See this image and copyright information in PMC

References

1. Armitage EG, Godzien J, Alonso-Herranz V. et al. Missing value imputation strategies for metabolomics data. Electrophoresis 2015;36:3050–60. - PubMed
1. Chang W, Cheng J, Allaire J. et al. Shiny: Web Application Framework for R, 2024. https://github.com/rstudio/shiny.
1. Davis TJ, Firzli TR, Higgins Keppler EA. et al. Addressing missing data in GC $\times$ GC metabolomics: identifying missingness type and evaluating the impact of imputation methods on experimental replication. Anal Chem 2022;94:10912–20. - PMC - PubMed
1. Josse J, Husson F.. missMDA: a package for handling missing values in multivariate data analysis. J Stat Soft 2016;70:1–31.
1. Karpievitch YV, Dabney AR, Smith RD. et al. Normalization and missing value imputation for label-free LC-MS analysis. BMC Bioinformatics 2012;13:S5. - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

imputomics: web server and R package for missing values imputation in metabolomics data

Affiliations

imputomics: web server and R package for missing values imputation in metabolomics data

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources