Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Jun 2;92(11):7515-7522.
doi: 10.1021/acs.analchem.9b05765. Epub 2020 May 21.

Retip: Retention Time Prediction for Compound Annotation in Untargeted Metabolomics

Affiliations

Retip: Retention Time Prediction for Compound Annotation in Untargeted Metabolomics

Paolo Bonini et al. Anal Chem. .

Abstract

Unidentified peaks remain a major problem in untargeted metabolomics by LC-MS/MS. Confidence in peak annotations increases by combining MS/MS matching and retention time. We here show how retention times can be predicted from molecular structures. Two large, publicly available data sets were used for model training in machine learning: the Fiehn hydrophilic interaction liquid chromatography data set (HILIC) of 981 primary metabolites and biogenic amines,and the RIKEN plant specialized metabolome annotation (PlaSMA) database of 852 secondary metabolites that uses reversed-phase liquid chromatography (RPLC). Five different machine learning algorithms have been integrated into the Retip R package: the random forest, Bayesian-regularized neural network, XGBoost, light gradient-boosting machine (LightGBM), and Keras algorithms for building the retention time prediction models. A complete workflow for retention time prediction was developed in R. It can be freely downloaded from the GitHub repository (https://www.retip.app). Keras outperformed other machine learning algorithms in the test set with minimum overfitting, verified by small error differences between training, test, and validation sets. Keras yielded a mean absolute error of 0.78 min for HILIC and 0.57 min for RPLC. Retip is integrated into the mass spectrometry software tools MS-DIAL and MS-FINDER, allowing a complete compound annotation workflow. In a test application on mouse blood plasma samples, we found a 68% reduction in the number of candidate structures when searching all isomers in MS-FINDER compound identification software. Retention time prediction increases the identification rate in liquid chromatography and subsequently leads to an improved biological interpretation of metabolomics data.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing financial interest.

Figures

Figure 1.
Figure 1.
Workflow for predicting LC-retention times from experimental retention time libraries.
Figure 2.
Figure 2.
Violin plots for HILIC and RPLC prediction errors by five machine learning models. The number of independent compounds is given in parentheses. Upper panels: training data. Middle panels: test data. Lower panels: external validation data. The numbers in the violin plots show the percentage of compounds within ±1 min retention time windows, given by red dotted lines.

References

    1. Barupal DK; Zhang Y; Shen T; Fan S; Roberts BS; Fitzgerald P; Wancewicz B; Valdiviez L; Wohlgemuth G; Byram G; Choy YY; Haffner B; Showalter MR; Vaniya A; Bloszies CS; Folz JS; Kind T; Flenniken AM; McKerlie C; Nutter LMJ; Lloyd KC; Fiehn O Metabolites 2019, 9 (5), 101. - PMC - PubMed
    1. Seitzer PM; Searle BC J. Proteome Res 2019, 18 (2), 791–796. - PubMed
    1. Sumner LW; Amberg A; Barrett D; Beale MH; Beger R; Daykin CA; Fan TW; Fiehn O; Goodacre R; Griffin JL; Hankemeier T; Hardy N; Harnly J; Higashi R; Kopka J; Lane AN; Lindon JC; Marriott P; Nicholls AW; Reily MD; Thaden JJ; Viant MR Metabolomics 2007, 3 (3), 211–221. - PMC - PubMed
    1. Cui Y; Balshaw DM; Kwok RK; Thompson CL; Collman GW; Birnbaum LS Environ. Health Perspect 2016, 124 (8), A137–A140. - PMC - PubMed
    1. Blaženović I; Kind T; Ji J; Fiehn O Metabolites 2018, 8 (2), 31. - PMC - PubMed

Publication types

Substances