Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Jun 15;82(12):5069-81.
doi: 10.1021/ac100064b.

DISCO: distance and spectrum correlation optimization alignment for two-dimensional gas chromatography time-of-flight mass spectrometry-based metabolomics

Affiliations

DISCO: distance and spectrum correlation optimization alignment for two-dimensional gas chromatography time-of-flight mass spectrometry-based metabolomics

Bing Wang et al. Anal Chem. .

Abstract

A novel peak alignment algorithm using a distance and spectrum correlation optimization (DISCO) method has been developed for two-dimensional gas chromatography time-of-flight mass spectrometry (GCxGC/TOF-MS)-based metabolomics. This algorithm uses the output of the instrument control software, ChromaTOF, as its input data. It detects and merges multiple peak entries of the same metabolite into one peak entry in each input peak list. After a z-score transformation of metabolite retention times, DISCO selects landmark peaks from all samples based on both two-dimensional retention times and mass spectrum similarity of fragment ions measured by Pearson's correlation coefficient. A local linear fitting method is employed in the original two-dimensional retention time space to correct retention time shifts. A progressive retention time map searching method is used to align metabolite peaks in all samples together based on optimization of the Euclidean distance and mass spectrum similarity. The effectiveness of the DISCO algorithm is demonstrated using data sets acquired under different experiment conditions and a spiked-in experiment.

PubMed Disclaimer

Figures

Figure 1
Figure 1
The workflow of DISCO alignment algorithm.
Figure 2
Figure 2
A sample process of detecting and removing potential false positive landmark peaks from the initial landmark peak list. DISCO first ranks the elution order of all landmark peaks in the reference sample and the target sample, respectively. It then calculates the absolute value of rank-order difference of each landmark peak pair. The landmark peaks in a landmark peak pair with the maximum rank-order difference are considered as potential false-positive landmark peaks and removed from the two landmark peak lists. This process is repeated until all landmark peak pairs have zero rank-order difference.
Figure 3
Figure 3
Schematic of retention time correction. DISCO first assigns the values of the retention time of a landmark peak in the reference sample to the retention time of all corresponding landmark peaks in the target sample. It then uses a local partial linear fitting function to interpolate the retention time of non-landmark peaks located between two landmark peaks in each retention time dimension, respectively.
Figure 4
Figure 4
Schematic of progressive retention time map searching to align a non-landmark peak in a target sample to another non-landmark peaks in the reference sample. DISCO searches the matching peak from all peaks of reference sample in the first searching area by optimizing the balance between the Euclidean distance and the mass spectrum correlation. If a matching peak is not found, the algorithm searches the matching peak in the secondary searching area. If there is no matching peak in the secondary searching area, the non-landmark peak will be considered as a new reference peak; otherwise, this non-landmark peak is aligned to the matching peak in reference sample.
Figure 5
Figure 5
Receiver operation characteristic (ROC) curve. TPR is true positive rate, and FPR is false positive rate. Each point on the ROC curve has a corresponding threshold value. Therefore, the threshold of spectrum similarity can be decided by the expected TPR and FPR.
Figure 6
Figure 6
Results of peak entry merging in sample S51. (a) displays the original peaks (colored in blue stars) reported by ChromaTOF software and the merged peaks (colored in red circles). (b) is a subset of (a) highlighted by a circle. Peak 1 was not merged with other peaks while peaks 2, 3, and 4 are merged and a representative peak 2′ is generated as substitute of these three peak entries. A peak-area-weighted method was used to determine the retention times of the representative peak 2′ in the two-dimensional retention time space.
Figure 7
Figure 7
Retention time distributions of standard compounds, analyzed under different temperature gradients, in the two-dimensional retention time space before and after z-score transformation. (a) is the distribution of metabolite peaks in the original two-dimensional retention time space before z-score transformation and (b) is the distribution of metabolites in the z-score transformation space. Blue stars (*) are metabolites detected during the temperature gradient of 5°C/min. Red circles () are metabolites detected during 7°C/min and black triangles (Δ) are metabolites detected during 10°C/min.
Figure 8
Figure 8
Corresponding landmark peaks discovered between two samples. (a) between S51 and S52; (b) between S51 and S71; (c) between S51 and S101. In the figure, blue circles are landmark peaks in sample S51, and red squares are corresponding landmark peaks in sample S52 in (a), S71 in (b) and S101 in (c), respectively.
Figure 9
Figure 9
Distribution of retention times before and after retention time correction. Probability density function (PDF) of the retention time in each sample is computed using the normal distribution. (a) is the first dimension retention time before correction. (b) is the first dimension time after correction. (c) is the second dimension retention time before correction. (d) is the second dimension retention time after correction.
Figure 10
Figure 10
Comparison of the mass spectra of compound acenaphthene-D10 acquired in the five replicate spiked-in experiments and the standard spectrum in NIST database. The value of CC is the Pearson’s correlation coefficient between the spiked-in spectrum and the standard spectrum.

Similar articles

Cited by

References

    1. Dettmer K, Aronov PA, Hammock BD. Mass Spectrom Rev. 2007;26:51–78. - PMC - PubMed
    1. Imasaka T, Nakamura N, Sakoda Y, Yamaguchi S, Watanabe-Ezoe Y, Uchimura T. Analyst. 2009;134:712–718. - PubMed
    1. Jia L, Liu BF, Terabe S, Nishioka T. Anal Chem. 2004;76:1419–1428. - PubMed
    1. Pierce KM, Hoggard JC, Mohler RE, Synovec RE. J Chromatogr A. 2008;1184:341–352. - PubMed
    1. Bedair M, Sumner LW. Trac-Trends in Analytical Chemistry. 2008;27:238–250.

Publication types