Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Sep 1;82(17):7319-28.
doi: 10.1021/ac101278x.

Construction of confidence regions for isotopic abundance patterns in LC/MS data sets for rigorous determination of molecular formulas

Affiliations
Free PMC article

Construction of confidence regions for isotopic abundance patterns in LC/MS data sets for rigorous determination of molecular formulas

Andreas Ipsen et al. Anal Chem. .
Free PMC article

Abstract

It has long been recognized that estimates of isotopic abundance patterns may be instrumental in identifying the many unknown compounds encountered when conducting untargeted metabolic profiling using liquid chromatography/mass spectrometry. While numerous methods have been developed for assigning heuristic scores to rank the degree of fit of the observed abundance patterns with theoretical ones, little work has been done to quantify the errors that are associated with the measurements made. Thus, it is generally not possible to determine, in a statistically meaningful manner, whether a given chemical formula would likely be capable of producing the observed data. In this paper, we present a method for constructing confidence regions for the isotopic abundance patterns based on the fundamental distribution of the ion arrivals. Moreover, we develop a method for doing so that makes use of the information pooled together from the measurements obtained across an entire chromatographic peak, as well as from any adducts, dimers, and fragments observed in the mass spectra. This greatly increases the statistical power, thus enabling the analyst to rule out a potentially much larger number of candidate formulas while explicitly guarding against false positives. In practice, small departures from the model assumptions are possible due to detector saturation and interferences between adjacent isotopologues. While these factors form impediments to statistical rigor, they can to a large extent be overcome by restricting the analysis to moderate ion counts and by applying robust statistical methods. Using real metabolic data, we demonstrate that the method is capable of reducing the number of candidate formulas by a substantial amount, even when no bromine or chlorine atoms are present. We argue that further developments in our ability to characterize the data mathematically could enable much more powerful statistical analyses.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Quantile−quantile plots of the x2 statistics obtained from the three compounds against the appropriate χ2 distributions. The red line indicates the idealized fit that would be obtained if the observed x2 statistics coincided exactly with the theoretical quantiles of the χ2 distributions. While the observed fit is very good for low quantiles, it is clear that the tails of the distributions obtained for hippurate and nitrotyrosine are too heavy to be consistent with that of the χ21 distribution.
Figure 2
Figure 2
Continuum plot of the two lowest mass isotopologues of nitrotyrosine. The tails of the mass peaks are heavy enough to reach the apexes of the mass peaks of adjacent isotopologues, so that it is not possible to construct a centroid that is comprised of only one species of isotopologue. While the effect is less apparent for chromatographic scans where the total ion count is lower, the mass peaks at these scans will be all the more sensitive to any contamination.
Figure 3
Figure 3
Quantile−quantile plots of the x2 statistics obtained from the three compounds after the most extreme 10% have been trimmed. The quantiles obtained for hippurate and nitrotyrosine are now consistently smaller than those of the χ21 distribution, as required. The effects are more moderate for the x2 statistics obtained from chenodeoxycholic acid due to the smaller sample size.
Figure 4
Figure 4
Using the robust approach, the median x2 and X2 statistics were evaluated for the data obtained from hippurate, nitrotyrosine, and chenodeoxycholic acid. The statistics were calculated for all formulas within 0.1 Da of the theoretical mass (black), for all formulas within 30 ppm of the theoretical mass (green), and for the true formula (magenta). Above each plot is listed the number of formulas that may be rejected at the 5% significance level (red line) out of the list of formulas within 30 ppm of the theoretical mass.
Figure 5
Figure 5
Mean number of false candidate formulas within the confidence regions (false negatives) obtained from the simulated isotopic abundance patterns. The probability that a true candidate formula lies outside a given confidence region (a false positive) is given by the chosen significance level, which was set to 0.05 for these simulations.

Similar articles

Cited by

References

    1. Raamsdonk L. M.; Teusink B.; Broadhurst D.; Zhang N. S.; Hayes A.; Walsh M. C.; Berden J. A.; Brindle K. M.; Kell D. B.; Rowland J. J.; Westerhoff H. V.; van Dam K.; Oliver S. G. Nat. Biotechnol. 2001, 19, 45–50. - PubMed
    1. Want E. J.; O’Maille G.; Smith C. A.; Brandon T. R.; Uritboonthai W.; Qin C.; Trauger S. A.; Siuzdak G. Anal. Chem. 2006, 78, 743–752. - PubMed
    1. Want E. J.; Cravatt B. F.; Siuzdak G. ChemBioChem 2005, 6, 1941–1951. - PubMed
    1. Sleno L.; Volmer D. A.; Marshall A. G. J. Am. Soc. Mass Spectrom. 2005, 16, 183–198. - PubMed
    1. Kind T.; Fiehn O. BMC Bioinf. 2007, 8, 105. - PMC - PubMed

Publication types