Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 May 23.
doi: 10.1038/s41587-025-02663-3. Online ahead of print.

Self-supervised learning of molecular representations from millions of tandem mass spectra using DreaMS

Affiliations

Self-supervised learning of molecular representations from millions of tandem mass spectra using DreaMS

Roman Bushuiev et al. Nat Biotechnol. .

Abstract

Characterizing biological and environmental samples at a molecular level primarily uses tandem mass spectroscopy (MS/MS), yet the interpretation of tandem mass spectra from untargeted metabolomics experiments remains a challenge. Existing computational methods for predictions from mass spectra rely on limited spectral libraries and on hard-coded human expertise. Here we introduce a transformer-based neural network pre-trained in a self-supervised way on millions of unannotated tandem mass spectra from our GNPS Experimental Mass Spectra (GeMS) dataset mined from the MassIVE GNPS repository. We show that pre-training our model to predict masked spectral peaks and chromatographic retention orders leads to the emergence of rich representations of molecular structures, which we named Deep Representations Empowering the Annotation of Mass Spectra (DreaMS). Further fine-tuning the neural network yields state-of-the-art performance across a variety of tasks. We make our new dataset and model available to the community and release the DreaMS Atlas-a molecular network of 201 million MS/MS spectra constructed using DreaMS annotations.

PubMed Disclaimer

Conflict of interest statement

Ethics and inclusion statement: All co-authors of this publication meet the authorship criteria outlined by Nature Portfolio journals, as detailed in the ‘Author contributions’. The authors have complied with the inclusion and ethics guidelines of the Nature Portfolio journals. Competing interests: T.P. is a co-founder of mzio GmbH, which develops technologies related to mass spectrometry data processing. The other authors declare no competing interests.

References

    1. Atanasov, A. G. et al. Natural products in drug discovery: advances and opportunities. Nat. Rev. Drug Discov. 20, 200–216 (2021). - PubMed - PMC
    1. Vermeulen, R., Schymanski, E. L., Barabási, A.-L. & Miller, G. W. The exposome and health: where chemistry meets biology. Science 367, 392–396 (2020). - PubMed - PMC
    1. Banerjee, S. Empowering clinical diagnostics with mass spectrometry. ACS Omega 5, 2041–2048 (2020). - PubMed - PMC
    1. Alseekh, S. et al. Mass spectrometry-based metabolomics: a guide for annotation, quantification and best reporting practices. Nat. Methods 18, 747–756 (2021). - PubMed - PMC
    1. da Silva, R. R., Dorrestein, P. C. & Quinn, R. A. Illuminating the dark matter in metabolomics. Proc. Natl Acad. Sci. USA 112, 12549–12550 (2015). - PubMed - PMC

LinkOut - more resources