Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 Sep 18;84(18):7963-71.
doi: 10.1021/ac3016856. Epub 2012 Sep 7.

Data preprocessing method for liquid chromatography-mass spectrometry based metabolomics

Affiliations

Data preprocessing method for liquid chromatography-mass spectrometry based metabolomics

Xiaoli Wei et al. Anal Chem. .

Abstract

A set of data preprocessing algorithms for peak detection and peak list alignment are reported for analysis of liquid chromatography-mass spectrometry (LC-MS)-based metabolomics data. For spectrum deconvolution, peak picking is achieved at the selected ion chromatogram (XIC) level. To estimate and remove the noise in XICs, each XIC is first segmented into several peak groups based on the continuity of scan number, and the noise level is estimated by all the XIC signals, except the regions potentially with presence of metabolite ion peaks. After removing noise, the peaks of molecular ions are detected using both the first and the second derivatives, followed by an efficient exponentially modified Gaussian-based peak deconvolution method for peak fitting. A two-stage alignment algorithm is also developed, where the retention times of all peaks are first transferred into the z-score domain and the peaks are aligned based on the measure of their mixture scores after retention time correction using a partial linear regression. Analysis of a set of spike-in LC-MS data from three groups of samples containing 16 metabolite standards mixed with metabolite extract from mouse livers demonstrates that the developed data preprocessing method performs better than two of the existing popular data analysis packages, MZmine2.6 and XCMS(2), for peak picking, peak list alignment, and quantification.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Flow chart of data analysis methods developed in this study.
Figure 2
Figure 2
An example of spectrum deconvolution by MetSign. (A) XIC and background noise level estimation. The entire XIC is segmented into four peak groups because of the discontinuity of signals in the chromatographic dimension (scan). It was detected that the first segment contains at least one peak, and leaving the rest retention time range of XIC as noise area for polynomial fitting and median filtering. The estimated noise level is shown in red line. (B) Detection of significant peaks. The dominant peaks are determined by the first derivative cross zero position from positive to negative values and meeting the criteria of minimum data points in the two sides of each peak. (C) Detection of non-significant peaks (hidden peaks). The hidden peaks are recognized as the second derivative cross zero position with changing from positive to negative values and the first derivative value is negative, or changing from negative to positive values and the first derivative value is positive. There is one hidden peak that is detected in the example. (D) Two peaks deconvoluted by mixture EMG models.
Figure 3
Figure 3
An example of peak picking by MetSgin and MZmine2.6. (A) Peak fitting results by MetSign using mixture EMG model. (B) Five peak components deconvoluted by peak detection and EMG fitting algorithm by MetSign. (C), (D) and (E) are the peak deconvolution results on same data by MZmine2.6. MetSign detected five peaks including four dominant peaks and one hidden peak; while MZmine2.6 correctly detected the two abundant peaks on the right and incorrectly considered the three peaks on the left as one peak.
Figure 4
Figure 4
Comparison of alignment results among MetSign, MZmine2.6, XCMS2 with retention time correction, and XCMS2 without retention time correction. (A) εm/z ≤ 6 ppm. (B) εm/z ≤ 10 ppm. The εm/z was set as 0.025 for XCMS2 as specified by the software.

References

    1. van den Berg RA, Hoefsloot HCJ, Westerhuis JA, Smilde AK, van der Werf MJ. BMC Genomics. 2006:7. - PMC - PubMed
    1. Zhang X, Asara JM, Adamec J, Ouzzani M, Elmagarmid AK. Bioinformatics. 2005;21:4054. - PubMed
    1. Lommen A, Kools H. Metabolomics. 2011;1 doi: 10.1007/s11306-011-0369-1. - DOI - PubMed
    1. Benton HP, Wong DM, Trauger SA, Siuzdak G. Anal Chem. 2008;80:6382. - PMC - PubMed
    1. Tautenhahn R, Patti GJ, Rinehart D, Siuzdak G. Analytical chemistry. 2012 - PMC - PubMed

Publication types

LinkOut - more resources