Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Sep 8;27(18):5827.
doi: 10.3390/molecules27185827.

Multi-Task Neural Networks and Molecular Fingerprints to Enhance Compound Identification from LC-MS/MS Data

Affiliations

Multi-Task Neural Networks and Molecular Fingerprints to Enhance Compound Identification from LC-MS/MS Data

Viviana Consonni et al. Molecules. .

Abstract

Mass spectrometry (MS) is widely used for the identification of chemical compounds by matching the experimentally acquired mass spectrum against a database of reference spectra. However, this approach suffers from a limited coverage of the existing databases causing a failure in the identification of a compound not present in the database. Among the computational approaches for mining metabolite structures based on MS data, one option is to predict molecular fingerprints from the mass spectra by means of chemometric strategies and then use them to screen compound libraries. This can be carried out by calibrating multi-task artificial neural networks from large datasets of mass spectra, used as inputs, and molecular fingerprints as outputs. In this study, we prepared a large LC-MS/MS dataset from an on-line open repository. These data were used to train and evaluate deep-learning-based approaches to predict molecular fingerprints and retrieve the structure of unknown compounds from their LC-MS/MS spectra. Effects of data sparseness and the impact of different strategies of data curing and dimensionality reduction on the output accuracy have been evaluated. Moreover, extensive diagnostics have been carried out to evaluate modelling advantages and drawbacks as a function of the explored chemical space.

Keywords: LC-MS/MS; chemometrics; classification; fingerprints; multi-task; neural networks; similarity matching.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
Score plot of the first and second MDS coordinates for the molecular fingerprints. Fingerprints of the 12K and 40K datasets are coloured in blue and grey, respectively.
Figure 2
Figure 2
Bar plot of the effects of the ANNs hyperparameters on the objective function with their 95% confidence intervals. AF: activation function; BS: batch size; DO: dropout; LR: learning rate; NTS: number of task-specific neurons; N: number of neurons; PA: patience; OT: optimisation type.
Figure 3
Figure 3
Violin plot of computational times required to train replicates of ANNs with 500 SPCA scores, 1000 SPCA scores and the 6596 raw MS features for the dataset 40K.
Figure 4
Figure 4
Score plot of the first and second MDS dimensions for the 40K test set predicted fingerprints; (a) fingerprints are coloured in a greyscale, the higher the similarity between predicted and true fingerprint, the darker the colour; (b) predicted fingerprints of exemplificative chemicals are coloured in blue (high accuracy between predicted and experimental fingerprints) and orange (low accuracy).
Figure 5
Figure 5
Exemplificative chemicals with low accuracy between predicted and true fingerprints.
Figure 6
Figure 6
Exemplificative chemicals with high accuracy between predicted and experimental fingerprints.
Figure 7
Figure 7
Violin plot showing the distribution of the percentage of active bits in the fingerprints of chemicals with low and high accuracy of predicted fingerprints.

Similar articles

Cited by

References

    1. Zang X., Monge M.E., Fernández F.M. Mass Spectrometry-Based Non-Targeted Metabolic Profiling for Disease Detection: Recent Developments. Trends Analyt. Chem. 2019;118:158–169. doi: 10.1016/j.trac.2019.05.030. - DOI - PMC - PubMed
    1. Sannino A., Bolzoni L. GC/CI–MS/MS Method for the Identification and Quantification of Volatile N-Nitrosamines in Meat Products. Food Chem. 2013;141:3925–3930. doi: 10.1016/j.foodchem.2013.06.070. - DOI - PubMed
    1. He Y., Zhang Z.M., Ma P., Ji H.C., Lu H.M. GC-MS Profiling of Leukemia Cells: An Optimized Preparation Protocol for the Intracellular Metabolome. Anal. Methods. 2018;10:1266–1274. doi: 10.1039/C7AY02578E. - DOI
    1. Ji H., Deng H., Lu H., Zhang Z. Predicting a Molecular Fingerprint from an Electron Ionization Mass Spectrum with Deep Neural Networks. Anal. Chem. 2020;92:8649–8653. doi: 10.1021/acs.analchem.0c01450. - DOI - PubMed
    1. Gosetti F., Mazzucco E., Zampieri D., Gennaro M.C. Signal Suppression/enhancement in High-Performance Liquid Chromatography Tandem Mass Spectrometry. J. Chromatogr. A. 2010;1217:3929–3937. doi: 10.1016/j.chroma.2009.11.060. - DOI - PubMed

MeSH terms

LinkOut - more resources