Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Mar 28;38(7):1980-1987.
doi: 10.1093/bioinformatics/btac059.

metaboprep: an R package for preanalysis data description and processing

Affiliations

metaboprep: an R package for preanalysis data description and processing

David A Hughes et al. Bioinformatics. .

Abstract

Motivation: Metabolomics is an increasingly common part of health research and there is need for preanalytical data processing. Researchers typically need to characterize the data and to exclude errors within the context of the intended analysis. Whilst some preprocessing steps are common, there is currently a lack of standardization and reporting transparency for these procedures.

Results: Here, we introduce metaboprep, a standardized data processing workflow to extract and characterize high quality metabolomics datasets. The package extracts data from preformed worksheets, provides summary statistics and enables the user to select samples and metabolites for their analysis based on a set of quality metrics. A report summarizing quality metrics and the influence of available batch variables on the data are generated for the purpose of open disclosure. Where possible, we provide users flexibility in defining their own selection thresholds.

Availability and implementation: metaboprep is an open-source R package available at https://github.com/MRCIEU/metaboprep.

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Brief description of the metaboprep pipeline. Along the top, the six primary steps the pipeline takes are outlined. The column on left provides an outline of the steps for the generation of summary statistics whilst the right provides an outline of the steps taken for sample and metabolite filtering. Common abbreviations used are: ‘dme’ for derived measures excluded; SD for standard deviations; ‘X’ which denotes a threshold variable that is defined by the user in the pipeline parameter file; PCs for principal components
Fig. 2.
Fig. 2.
Summary figure found in each HTML report for the filtered dataset. There are seven figures in this BiB dataset summary figure. (1) The distribution of sample missingness. (2) The distribution for feature missingness. (3) The distribution for TSA, at complete features only. (4) A hierarchical clustering dendrogram based on absolute Spearman rho distances (1-rho) and cut at a tree cut height (red horizontal line) defined by the user. Blue branches on the dendrogram denote the features specified as ‘representative’ features used in the PCA. (5) A table of the number of metabolites used at each step of the dendrogram and PCA. (6) A scree plot of the variance explained for each PC also identifying the number PCs estimated to be informative (vertical lines) by the Cattel’s Scree Test acceleration factor (red, n = 2) and Parallel Analysis (green, n = 49). (7) A PC plot of the top two PCs for each sample. The number of metabolites used in the analysis is again indicated in the title of the PC plot. Individuals in the PC plot were clustered into four k-means (k) clusters, using data from the top two PCs. The k-means clustering and colour coding is strictly there to help provide some visualization of the major axes of variation in the sample population(s)

Similar articles

Cited by

References

    1. Ala-Korpela M. (2015) Serum nuclear magnetic resonance spectroscopy: one more step toward clinical utility. Clin. Chem., 61, 681–683. - PubMed
    1. Barnes S. (2020) Overview of experimental methods and study design in metabolomics, and statistical and pathway considerations. Methods Mol. Biol., 2104, 1–10. - PubMed
    1. Beger R.D. (2019) Towards quality assurance and quality control in untargeted metabolomics studies. Metabolomics, 15, 1–5. - PMC - PubMed
    1. Begou O. et al. (2018) Quality control and validation issues in LC-MS metabolomics. Methods Mol. Biol., 1738, 15–26. - PubMed
    1. Boyd A. et al. (2013) Cohort profile: the ‘children of the 90s’—the index offspring of the Avon Longitudinal Study of Parents and Children. Int. J. Epidemiol., 42, 111–127. - PMC - PubMed

Publication types