Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation

Language model-guided anticipation and discovery of unknown metabolites

Hantao Qiang et al. bioRxiv. .

Abstract

Despite decades of study, large parts of the mammalian metabolome remain unexplored. Mass spectrometry-based metabolomics routinely detects thousands of small molecule-associated peaks within human tissues and biofluids, but typically only a small fraction of these can be identified, and structure elucidation of novel metabolites remains a low-throughput endeavor. Biochemical large language models have transformed the interpretation of DNA, RNA, and protein sequences, but have not yet had a comparable impact on understanding small molecule metabolism. Here, we present an approach that leverages chemical language models to discover previously uncharacterized metabolites. We introduce DeepMet, a chemical language model that learns the latent biosynthetic logic embedded within the structures of known metabolites and exploits this understanding to anticipate the existence of as-of-yet undiscovered metabolites. Prospective chemical synthesis of metabolites predicted to exist by DeepMet directs their targeted discovery. Integrating DeepMet with tandem mass spectrometry (MS/MS) data enables automated metabolite discovery within complex tissues. We harness DeepMet to discover several dozen structurally diverse mammalian metabolites. Our work demonstrates the potential for language models to accelerate the mapping of the metabolome.

PubMed Disclaimer

Publication types

LinkOut - more resources