Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019:1978:323-340.
doi: 10.1007/978-1-4939-9236-2_20.

Pre-analytic Considerations for Mass Spectrometry-Based Untargeted Metabolomics Data

Affiliations

Pre-analytic Considerations for Mass Spectrometry-Based Untargeted Metabolomics Data

Dominik Reinhold et al. Methods Mol Biol. 2019.

Abstract

Metabolomics is the science of characterizing and quantifying small molecule metabolites in biological systems. These metabolites give organisms their biochemical characteristics, providing a link between genotype, environment, and phenotype. With these opportunities also come data challenges, such as compound annotation, missing values, and batch effects. We present the steps of a general pipeline to process untargeted mass spectrometry data to alleviate the latter two challenges. We assume to have a matrix with metabolite abundances, with metabolites in rows and samples in columns. The steps in the pipeline include summarizing technical replicates (if available), filtering, imputing, transforming, and normalizing the data. In each of these steps, a method and parameters should be chosen based on assumptions one is willing to make, the question of interest, and diagnostic tools. Besides giving a general pipeline that can be adapted by the reader, our goal is to review diagnostic tools and criteria that are helpful when making decisions in each step of the pipeline and assessing the effectiveness of normalization and batch correction. We conclude by giving a list of useful packages and discuss some alternative approaches that might be more appropriate for the reader's data.

Keywords: Filtering; Imputation; Mass spectrometry; Metabolomics; Normalization; Pre-analytic; Processing; Technical replicates; Untargeted.

PubMed Disclaimer

Figures

Figure 1
Figure 1. Pre-analytic pipeline.
The five individual pipeline steps are discussed in more detail in Sections 2.1–2.5, with each section corresponding to one step.
Figure 2
Figure 2. Example of summarization of technical replicates.
For each metabolite and sample, we calculate the coefficient of variation (CV). If two or more replicates have a missing value, we assign a missing value to the metabolite for this sample. If at most one of the three values is missing, and the CV is low (≤0.5), we summarize the replicates with the mean. If the CV is high (>0.5) and we have a missing value, we conclude that the metabolite cannot be measured accurately and we assign a missing value. With a high CV and no missing values, we choose the median as a robust summary of the replicates.
Figure 3
Figure 3
After summarization of replicates, histogram of missingness across samples for each of the 6166 metabolite features.
Figure 4
Figure 4
Distribution of example metabolite (sphingosine) abundance before (left panel) and after log2-transformation (right panel).
Figure 5
Figure 5. RLE Plots.
The kNN-imputed data before normalization (left panel). The kNN-imputed data after median and ComBat normalization (right panel). The nine batches are color-coded. The kNN imputation was done using the VIM R package, with k=5 (40). The RLE plot was produced using plotRLE in the EDASeq package (57).
Figure 6
Figure 6. PCA Plots.
The kNN-imputed data before normalization (left panel) and after median and ComBat normalization (right panel). The PCA plot was produced using the factoextra package (58).

References

    1. Jordan KW, Nordenstam J, Lauwers GY, Rothenberger DA, Alavi K, Garwood M, Cheng LL. Metabolomic Characterization of Human Rectal Adenocarcinoma with Intact Tissue Magnetic Resonance Spectroscopy. Diseases of the Colon & Rectum. 2009;52(3):520–5. doi: 10.1007/DCR.0b013e31819c9a2c. - DOI - PMC - PubMed
    1. Spratlin JL, Serkova NJ, Eckhardt SG. Clinical Applications of Metabolomics in Oncology: A Review. Clinical Cancer Research. 2009;15(2):431. - PMC - PubMed
    1. Griffin JL, Shockcor JP. Metabolic profiles of cancer cells. Nature Reviews Cancer. 2004;4:551. doi: 10.1038/nrc1390. - DOI - PubMed
    1. Mendes P, Kell DB, Westerhoff HV. Why and when channelling can decrease pool size at constant net flux in a simple dynamic channel. Biochimica et Biophysica Acta (BBA) - General Subjects. 1996;1289(2):175–86. doi: 10.1016/0304-4165(95)00152-2. - DOI - PubMed
    1. Mendes P, Kell DB, Westerhoff HV. Channelling can decrease pool size. European Journal of Biochemistry. 2005;204(1):257–66. doi: 10.1111/j.1432-1033.1992.tb16632.x. - DOI - PubMed

Publication types

LinkOut - more resources