Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 May 20;97(19):10282-10288.
doi: 10.1021/acs.analchem.5c00286. Epub 2025 May 7.

Mapping the Edges of Mass Spectral Prediction: Evaluation of Machine Learning EIMS Prediction for Xeno Amino Acids

Affiliations

Mapping the Edges of Mass Spectral Prediction: Evaluation of Machine Learning EIMS Prediction for Xeno Amino Acids

Sean M Brown et al. Anal Chem. .

Abstract

Mass spectrometry is one of the most effective analytical methods for unknown compound identification. By comparing observed m/z spectra with a database of experimentally determined spectra, this process identifies compound(s) in any given sample. Unknown sample identification is thus limited to whatever has been experimentally determined. To address the reliance on experimentally determined signatures, multiple state-of-the-art MS spectra prediction algorithms have been developed within the past half decade. Here we evaluate the accuracy of the NEIMS spectral prediction algorithm. We focus our analyses on monosubstituted α-amino acids given their significance as important targets for astrobiology, synthetic biology, and diverse biomedical applications. Our general intent is to inform those using generated spectra for detection of unknown biomolecules. We find predicted spectra are inaccurate for amino acids beyond the algorithms training data. Interestingly, these inaccuracies are not explained by physicochemical differences or the derivatization state of the amino acids measured. We thus highlight the need to improve both current machine learning based approaches and further optimization of ab initio spectral prediction algorithms so as to expand databases for structures beyond what is currently experimentally possible, even including theoretical molecules.

PubMed Disclaimer

Figures

1
1
NEIMS predicted EI-MS spectra accuracy. Accuracy measurements span three spectral libraries of amino acids (i) NIST-MS 2017 (ii) MoNAf (iii) a hand curated set of amino acids (IOCB). Accuracy for predicted spectra is measured in terms of (A) spectral root-mean-square error (RMSE), (B) spectral contrast angle (SCA), (C) weighted cosine similarity (WCS), and (D) spectral entropy similarity (SEN). For SCA, angles below ∼26° (shown in green) are typically adequate for library search algorithms. For WCS, the metric used as the training metric within the NEIMS algorithm, scores greater than 0.7 (shown in green) are generally classified as “similar”. For SEN, scores greater than 0.75 are considered ideal as “false discovery rates of less than 10%” with scores of 0.75 or greater.
2
2
Accuracy across chemical space. Prediction performance is evaluated across three libraries (Columns: NIST17, MoNA, IOCB) with four accuracy metrics (Rows: RMSE, WCS, SCA, SEN). These are plotted as a function of chemical space (Molecular Weight, JChem log P). Accuracy in each plot is color coded from purple (most accurate) to yellow (least accurate) for each respective metric. The physicochemical properties we consider reveal no clear clustering patterns.
3
3
Accuracy measured by RMSE, SCA, WCS, SEN for underivatized and MTBSTFA derivatized amino acids. Accuracy metrics are compared across MTBSTFA derivatized (Deriv) and underivatized (Free) amino acids for the NIST17 and IOCB libraries. (A) Amino acid spectra within the NIST17 library, when accuracy is measured via RMSE, display a significant difference in accuracy (t test p-value <0.001) between MTBSTFA derivatized (Deriv) and underivatized (Free) amino acid spectra. All other comparisons (B–H) are found to be not significantly different (t test p-value >0.05). The MoNA database was omitted from this analysis as there were no unique MTBSTFA monosubstituted α-amino acid spectra within MoNA.

References

    1. Maurer H. H., Meyer M. R.. High-Resolution Mass Spectrometry in Toxicology: Current Status and Future Perspectives. Arch. Toxicol. 2016;90(9):2161–2172. doi: 10.1007/s00204-016-1764-1. - DOI - PubMed
    1. Paine M. R. L., Kooijman P. C., Fisher G. L., Heeren R. M. A., Fernández F. M., Ellis S. R.. Visualizing Molecular Distributions for Biomaterials Applications with Mass Spectrometry Imaging: A Review. J. Mater. Chem. B. 2017;5(36):7444–7460. doi: 10.1039/C7TB01100H. - DOI - PubMed
    1. Arnquist I. J., Beussman D. J.. Incorporating Biological Mass Spectrometry Into Undergraduate Teaching Labs, Part 1: Identifying Proteins Based on Molecular Mass. J. Chem. Educ. 2007;84(12):1971. doi: 10.1021/ed084p1971. - DOI
    1. Allgood C., Orlando R., Munson B.. Correlations of Relative Sensitivities in Gas Chromatography Electron Ionization Mass Spectrometry with Molecular Parameters. J. Am. Soc. Mass Spectrom. 1990;1(5):397–404. doi: 10.1016/1044-0305(90)85020-M. - DOI - PubMed
    1. McNair, H. M. ; Miller, J. M. ; Snow, N. H. . Basic Gas Chromatography; John Wiley & Sons, 2019.

MeSH terms

LinkOut - more resources