Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jul 4:2025:gigabyte160.
doi: 10.46471/gigabyte.160. eCollection 2025.

Galaxy QCxMS for straightforward semi-empirical quantum mechanical EI-MS prediction

Affiliations

Galaxy QCxMS for straightforward semi-empirical quantum mechanical EI-MS prediction

Wudmir Y Rojas et al. GigaByte. .

Abstract

High-performance computing (HPC) environments are crucial for computational research, including quantum chemistry (QC), but pose challenges for non-expert users. Researchers with limited computational knowledge struggle to utilise domain-specific software and access mass spectra prediction for in silico annotation. Here, we provide a robust workflow that leverages interoperable file formats for molecular structures to ensure integration across various QC tools. The quantum chemistry package for mass spectral predictions after electron ionization or collision-induced dissociation has been integrated into the Galaxy platform, enabling automated analysis of fragmentation mechanisms. The extended tight binding quantum chemistry package, chosen for its balance between accuracy and computational efficiency, provides molecular geometry optimisation. A Docker image encapsulates the necessary software stack. We demonstrated the workflow for four molecules, highlighting the scalability and efficiency of our solution via runtime performance analysis. This work shows how non-HPC users can make these predictions effortlessly, using advanced computational tools without needing in-depth expertise.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Figure 1.
Figure 1.
Galaxy workflow diagram for EI-MS prediction. The workflow begins with simplified molecular input line entry system (SMILES) strings as input, converts them to a structure data file (SDF) format, and handles files with multiple structures. The SDF file is first processed by the generate conformers tool (from the ChemicalToolbox [16]) to create multiple conformations for each molecule. Then, the molecular format conversion tool converts these into .xyz files for subsequent steps. Conformers are then optimised using the xtb molecular optimization tool to refine their molecular structure, producing optimised .xyz files. The optimised .xyz files are then fed into the qcxms neutral run tool, which initialises the simulation by generating .in, .start, and .xyz files for each trajectory. These outputs are fed to the qcxms production run tool that executes the main simulation and generates detailed .res files. Following the simulation, the qcxms get results tool processes the .res files to generate a standardised msp file, consolidating the simulation outcomes. Throughout the workflow, the filter failed jobs tool ensures that only successful job outputs are passed on to subsequent steps, maintaining data integrity and workflow efficiency. Downstream annotation via spectral matching is enabled via the matchms Galaxy tools [19].
Figure 2.
Figure 2.
Technical overview of the QCxMS Galaxy tool suite. The spectra prediction process starts with an input XYZ file, which undergoes a neutral run in QCxMS to generate method parameters, initial trajectory configurations, and initial coordinates. These files are then used in a production run to simulate various trajectories in parallel and produce .res files. The Docker container encapsulates the entire process, ensuring consistency and reproducibility. In the final stage, the results are compiled and analysed, generating a result file, result.msp.
Figure 3.
Figure 3.
Computational performance metrics for molecular simulations using the QCxMS workflow for ethylene (6 atoms), mirex (22 atoms), benzophenone (24 atoms), and enilconazole (33 atoms). The runtime (orange line, in hours) and memory allocated (black line, in terabytes) demonstrate an overall increase in computational demands as the number of atoms increases. Notably, mirex exhibits a higher runtime compared to benzophenone despite having fewer atoms, indicating variations in computational complexity due to its molecular structure.

Similar articles

References

    1. Aksenov AA, Da Silva R, Knight R et al. Global chemical analysis of biology by mass spectrometry. Nat. Rev. Chem., 2017; 1: 0054. doi: 10.1038/s41570-017-0054. - DOI
    1. David A, Chaker J, Price EJ et al. Towards a comprehensive characterisation of the human internal chemical exposome: challenges and perspectives. Environ. Int., 2021; 156: 106630. doi: 10.1016/j.envint.2021.106630. - DOI - PubMed
    1. Chao A, Al-Ghoul H, McEachran AD et al. In silico MS/MS spectra for identifying unknowns: a critical examination using CFM-ID algorithms and ENTACT mixture samples. Anal. Bioanal. Chem., 2020; 412: 1303–1315. doi: 10.1007/s00216-019-02351-7. - DOI - PMC - PubMed
    1. Bremer PL, Vaniya A, Kind T et al. How well can we predict mass spectra from structures? Benchmarking competitive fragmentation modeling for metabolite identification on untrained tandem mass spectra. J. Chem. Inf. Model., 2022; 62: 4049–4056. doi: 10.1021/acs.jcim.2c00936. - DOI - PMC - PubMed
    1. Grimme S. . Towards first principles calculation of electron impact mass spectra of molecules. Angew. Chem. Int. Ed., 2013; 52(24): 6306–6312. doi: 10.1002/anie.201300158. - DOI - PubMed

LinkOut - more resources