Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2007 Mar 27:8:105.
doi: 10.1186/1471-2105-8-105.

Seven Golden Rules for heuristic filtering of molecular formulas obtained by accurate mass spectrometry

Affiliations

Seven Golden Rules for heuristic filtering of molecular formulas obtained by accurate mass spectrometry

Tobias Kind et al. BMC Bioinformatics. .

Abstract

Background: Structure elucidation of unknown small molecules by mass spectrometry is a challenge despite advances in instrumentation. The first crucial step is to obtain correct elemental compositions. In order to automatically constrain the thousands of possible candidate structures, rules need to be developed to select the most likely and chemically correct molecular formulas.

Results: An algorithm for filtering molecular formulas is derived from seven heuristic rules: (1) restrictions for the number of elements, (2) LEWIS and SENIOR chemical rules, (3) isotopic patterns, (4) hydrogen/carbon ratios, (5) element ratio of nitrogen, oxygen, phosphor, and sulphur versus carbon, (6) element ratio probabilities and (7) presence of trimethylsilylated compounds. Formulas are ranked according to their isotopic patterns and subsequently constrained by presence in public chemical databases. The seven rules were developed on 68,237 existing molecular formulas and were validated in four experiments. First, 432,968 formulas covering five million PubChem database entries were checked for consistency. Only 0.6% of these compounds did not pass all rules. Next, the rules were shown to effectively reducing the complement all eight billion theoretically possible C, H, N, S, O, P-formulas up to 2000 Da to only 623 million most probable elemental compositions. Thirdly 6,000 pharmaceutical, toxic and natural compounds were selected from DrugBank, TSCA and DNP databases. The correct formulas were retrieved as top hit at 80-99% probability when assuming data acquisition with complete resolution of unique compounds and 5% absolute isotope ratio deviation and 3 ppm mass accuracy. Last, some exemplary compounds were analyzed by Fourier transform ion cyclotron resonance mass spectrometry and by gas chromatography-time of flight mass spectrometry. In each case, the correct formula was ranked as top hit when combining the seven rules with database queries.

Conclusion: The seven rules enable an automatic exclusion of molecular formulas which are either wrong or which contain unlikely high or low number of elements. The correct molecular formula is assigned with a probability of 98% if the formula exists in a compound database. For truly novel compounds that are not present in databases, the correct formula is found in the first three hits with a probability of 65-81%. Corresponding software and supplemental data are available for downloads from the authors' website.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Isotopic pattern of 45.000 compound formulas from the Wiley mass spectral database and 60.000 peptides formulas in the small molecule space < 1000 Dalton. M+1 and M+2 are given as relative abundances in [%] and are normalized to 100% of the highest isotope abundance in the molecular formula.
Figure 2
Figure 2
Hydrogen/Carbon ratio (H/C) for 42,000 diverse molecules (containing C, H, N, S, O, P, F, Cl, Br, I, Si) taken from the Wiley mass spectral library.
Figure 3
Figure 3
Frequency distribution for the molecular masses of all elemental compositions downloaded from the PubChem database (2006) covering more than 5 million single compounds.
Figure 4
Figure 4
Frequency distribution for 1,200 randomly selected molecules downloaded from the Dictionary of Natural Products at < 2000 Da and comprising C, H, N, S, O, P, F, Cl and Br. Left panel, 4a: mass distribution. Middle panel, 4b: simulated measured masses at 3 ppm mass accuracy. Right panel, 4c: simulated measured isotope ratios at ± 5% accuracy.
Figure 5
Figure 5
Mass dependence of calculated, chemically possible formulas derived from 1,200 randomly selected DNP molecules, imposed with simulated 3 ppm mass accuracy ± 5% isotope ratio measurement errors. Red graph: number of calculated formulas with common molecular generators. Green graph: number of formulas constrained by the seven rules. Outliers around 600 Dalton were found to be halogen containing compounds.
Figure 6
Figure 6
Effect of ranking the output formulas of the 2,400 randomly selected DrugBank molecules, imposed with simulated ± 3 ppm mass accuracy ± 5% isotope ratio measurement errors. Mass dependence is shown for no database query (red graph, correct formula found in the top three hits), PubChem database query (blue graph, correct formula ranked top) or querying the DrugBank database (green graph, correct formula ranked top).
Figure 7
Figure 7
Relative isotopic abundances of the M+1 and M+2 peak for all elemental compositions that would fit a measured mass of 774.94831 Da (Cangrelor), determined at 1 ppm mass accuracy (values exceeding 100% are removed in graphics). Most formulas can be discarded if isotope ratios are measured with an accuracy of ± 5% and used as search constraint (red box).

Similar articles

Cited by

References

    1. Djerassi C, Silva CJ. Sponge Sterols - Origin and Biosynthesis. Accounts of Chemical Research. 1991;24:371–378.
    1. Omura S. Trends in the Search for Bioactive Microbial Metabolites. Journal of Industrial Microbiology. 1992;10:135–156. - PubMed
    1. Wray V. Carbon-Carbon Coupling-Constants - Compilation of Data and a Practical Guide. Progress in Nuclear Magnetic Resonance Spectroscopy. 1979;13:177–256.
    1. Buchanan BG, Smith DH, White WC, Gritter RJ, Feigenbaum EA, Lederberg J, Djerassi C. Applications of Artificial Intelligence for Chemical Inference .22. Automatic Rule Formation in Mass-Spectrometry by Means of Meta-Dendral Program. J Am Chem Soc. 1976;98:6168–6178.
    1. Olson DL, Norcross JA, O'Neil-Johnson M, Molitor PF, Detlefsen DJ, Wilson AG, Peck TL. Microflow NMR: concepts and capabilities. Anal Chem. 2004;76:2966–2974. - PubMed

Publication types

LinkOut - more resources