Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2024 Oct 7;16(1):113.
doi: 10.1186/s13321-024-00905-1.

Insights into predicting small molecule retention times in liquid chromatography using deep learning

Affiliations
Review

Insights into predicting small molecule retention times in liquid chromatography using deep learning

Yuting Liu et al. J Cheminform. .

Abstract

In untargeted metabolomics, structures of small molecules are annotated using liquid chromatography-mass spectrometry by leveraging information from the molecular retention time (RT) in the chromatogram and m/z (formerly called ''mass-to-charge ratio'') in the mass spectrum. However, correct identification of metabolites is challenging due to the vast array of small molecules. Therefore, various in silico tools for mass spectrometry peak alignment and compound prediction have been developed; however, the list of candidate compounds remains extensive. Accurate RT prediction is important to exclude false candidates and facilitate metabolite annotation. Recent advancements in artificial intelligence (AI) have led to significant breakthroughs in the use of deep learning models in various fields. Release of a large RT dataset has mitigated the bottlenecks limiting the application of deep learning models, thereby improving their application in RT prediction tasks. This review lists the databases that can be used to expand training datasets and concerns the issue about molecular representation inconsistencies in datasets. It also discusses the application of AI technology for RT prediction, particularly in the 5 years following the release of the METLIN small molecule RT dataset. This review provides a comprehensive overview of the AI applications used for RT prediction, highlighting the progress and remaining challenges. SCIENTIFIC CONTRIBUTION: This article focuses on the advancements in small molecule retention time prediction in computational metabolomics over the past five years, with a particular emphasis on the application of AI technologies in this field. It reviews the publicly available datasets for small molecule retention time, the molecular representation methods, the AI algorithms applied in recent studies. Furthermore, it discusses the effectiveness of these models in assisting with the annotation of small molecule structures and the challenges that must be addressed to achieve practical applications.

Keywords: Deep learning; Liquid chromatography; MassBank; PredRet; QSRR; RepoRT; Retention time prediction; SMRT; Small molecules; Untargeted metabolomics.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
Workflow of the evaluation of retention time records using the SMRT, MassBank (release version Nov, 2023), MassBank of North America (MoNA; accessed on 11 Nov, 2023), and PredRet (accessed on 11 Nov, 2023) databases in this review
Fig. 2
Fig. 2
Overview of the liquid chromatography retention time records obtained from the SMRT, MassBank, MoNA and PredRet databases. A Ratio of unique compound numbers measured using liquid chromatography across datasets at superclass taxonomy level; compound classes were identified in the Classyfire Batch [39] by searching the International Chemical Identifier key (InChIKey). B Compound intersection numbers across the four datasets. Repeated compounds were removed in each dataset
Fig. 3
Fig. 3
Examples of four discrepancy types. Cases in (A) can be adjusted by searching for the PubChem identifier. Stereoisomers in SMRT datasets with different RTs shown in (B) are represented by the same InChI and same structural information in the SDF file; however, they can be distinguished by searching PubChem identifier, depending on the researcher's discretion. C PredRet strips stereo information for projection methods, and the structure therefore does not always match the reported PubChem entry, which also depending on the researcher's discretion. D Partial entries within individual dataset in PredRet may refer to different molecular objects and need to be carefully verified if they are to be used
Fig. 4
Fig. 4
Molecular representations used in recent RT prediction models. MDC-ANN [36], RT-transformer [80], qGeoGNN [25], retention_time_GNN [37], 1D-CNN-TL [42], MPNN [70], AWD-LSTM-TL [56], GNN-TL-HILIC [46], GNN-TL-RP [45], RGCN [44], DNNpwa-TL [35], Osipenko [81], Retip [73], Bouwmeeste [34], DLM [23], Hall [82], Wen [83], Wen [84], McEachran [85], Falchi [86], and Amos et al. [40]

References

    1. Jordan MI, Mitchell TM (2015) Machine learning: trends, perspectives, and prospects. Science 349(6245):255–260 - PubMed
    1. Dührkop K, Shen H, Meusel M, Rousu J, Böcker S (2015) Searching molecular structure databases with tandem mass spectra using CSI: FingerID. Proc Natl Acad Sci 112(41):12580–12585 - PMC - PubMed
    1. Dührkop K, Fleischauer M, Ludwig M, Aksenov AA, Melnik AV, Meusel M, Dorrestein PC, Rousu J, Böcker S (2019) SIRIUS 4: a rapid tool for turning tandem mass spectra into metabolite structure information. Nat Methods 16(4):299–302 - PubMed
    1. Wei JN, Belanger D, Adams RP, Sculley D (2019) Rapid prediction of electron–ionization mass spectrometry using neural networks. ACS Cent Sci 5(4):700–708 - PMC - PubMed
    1. Wang F, Liigand J, Tian S, Arndt D, Greiner R, Wishart DS (2021) CFM-ID 4.0: more accurate ESI-MS/MS spectral prediction and compound identification. Anal Chem 93(34):11692–11700 - PMC - PubMed

Grants and funding

LinkOut - more resources