Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Apr 2;91(7):4346-4356.
doi: 10.1021/acs.analchem.8b04567. Epub 2019 Mar 6.

ISiCLE: A Quantum Chemistry Pipeline for Establishing in Silico Collision Cross Section Libraries

Affiliations

ISiCLE: A Quantum Chemistry Pipeline for Establishing in Silico Collision Cross Section Libraries

Sean M Colby et al. Anal Chem. .

Abstract

High-throughput, comprehensive, and confident identifications of metabolites and other chemicals in biological and environmental samples will revolutionize our understanding of the role these chemically diverse molecules play in biological systems. Despite recent technological advances, metabolomics studies still result in the detection of a disproportionate number of features that cannot be confidently assigned to a chemical structure. This inadequacy is driven by the single most significant limitation in metabolomics, the reliance on reference libraries constructed by analysis of authentic reference materials with limited commercial availability. To this end, we have developed the in silico chemical library engine (ISiCLE), a high-performance computing-friendly cheminformatics workflow for generating libraries of chemical properties. In the instantiation described here, we predict probable three-dimensional molecular conformers (i.e., conformational isomers) using chemical identifiers as input, from which collision cross sections (CCS) are derived. The approach employs first-principles simulation, distinguished by the use of molecular dynamics, quantum chemistry, and ion mobility calculations, to generate structures and chemical property libraries, all without training data. Importantly, optimization of ISiCLE included a refactoring of the popular MOBCAL code for trajectory-based mobility calculations, improving its computational efficiency by over 2 orders of magnitude. Calculated CCS values were validated against 1983 experimentally measured CCS values and compared to previously reported CCS calculation approaches. Average calculated CCS error for the validation set is 3.2% using standard parameters, outperforming other density functional theory (DFT)-based methods and machine learning methods (e.g., MetCCS). An online database is introduced for sharing both calculated and experimental CCS values ( metabolomics.pnnl.gov ), initially including a CCS library with over 1 million entries. Finally, three successful applications of molecule characterization using calculated CCS are described, including providing evidence for the presence of an environmental degradation product, the separation of molecular isomers, and an initial characterization of complex blinded mixtures of exposure chemicals. This work represents a method to address the limitations of small molecule identification and offers an alternative to generating chemical identification libraries experimentally by analyzing authentic reference materials. All code is available at github.com/pnnl .

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing financial interest.

Figures

Figure 1.
Figure 1.
Validation set property distribution and chemical space coverage. (a) Superclass distribution of compounds, as determined by ClassyFire. (b) Mass distribution with mass labels corresponding to [X-200, X]. (c) Adduct distribution. (d, e) Comparison of the validation set to the Human Metabolome Database (HMDB), with black points corresponding to compounds found in the validation set, and gray points corresponding to compounds found in the HMDB (v4.1, only those with masses 50–1200). (d) Distribution of predicted properties, with the ring bond percentage (number of bonds in rings divided by the total number of bonds), log P, pKa, Balaban index, and Harary index calculated using cxcalc. (e) Independent component analysis performed on the properties plotted in (d), with properties normalized to have a mean of 0 and standard deviation of 1.
Figure 2.
Figure 2.
Schematic overview of the ISiCLE module for CCS calculation. Major computational tasks are listed for the Standard method and, where appropriate, the associated method used. Tasks include preparation of input geometry from InChI, adduct formation, conformer generation by molecular dynamics, structure optimization by density functional theory, CCS calculation by the trajectory method, and finally, final CCS prediction by Boltzmann weighting across conformers.
Figure 3.
Figure 3.
Calculated CCS versus m/z. Visual representation of CCS values calculated by ISiCLE Standard for the validation set, plotted against m/z by adduct ion, colored by chemical class as determined by ClassyFire.

References

    1. Feunang YD; Eisner R; Knox C; Chepelev L; Hastings J; Owen G; Fahy E; Steinbeck C; Subramanian S; Bolton E; Greiner R; Wishart DS J. Cheminf 2016, 8, 61. - PMC - PubMed
    1. Wishart DS; Feunang YD; Marcu A; Guo AC; Liang K; Vazquez-Fresno R; Sajed T; Johnson D; Li C; Karu N; Sayeeda Z; Lo E; Assempour N; Berjanskii M; Singhal S; Arndt D; Liang Y; Badran H; Grant J; Serra-Cayuela A; Liu Y; Mandal R; Neveu V; Pon A; Knox C; Wilson M; Manach C; Scalbert A Nucleic Acids Res. 2018, 46 (D1), D608–D617. - PMC - PubMed
    1. cxcalc, 16.11; ChemAxon, 2017.
    1. Dobson CM Nature 2004, 432 (7019), 824–828. - PubMed
    1. Rosenberg RN; Robinson AB; Partridge D Clin. Biochem 1975, 8 (6), 365–368. - PubMed

Publication types

MeSH terms

Substances