Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Mar 3;10(9):8980-8992.
doi: 10.1021/acsomega.4c07078. eCollection 2025 Mar 11.

Deep Learning for Odor Prediction on Aroma-Chemical Blends

Affiliations

Deep Learning for Odor Prediction on Aroma-Chemical Blends

Laura Sisson et al. ACS Omega. .

Abstract

The application of deep-learning techniques to aroma chemicals has resulted in models that surpass those of human experts in predicting olfactory qualities. However, public research in this field has been limited to predicting the qualities of individual molecules, whereas in industry, perfumers and food scientists are often more concerned with blends of multiple molecules. In this paper, we apply both established and novel approaches to a data set we compiled, which consists of labeled pairs of molecules. We present graph neural network models that accurately predict the olfactory qualities emerging from blends of aroma chemicals along with an analysis of how variations in model architecture can significantly impact predictive performance.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing financial interest.

Figures

Figure 1
Figure 1
(a,b) Data sets—nonlinear relationship between the qualities of constituent aroma chemicals and the overall blend. (c,d) Models—(c) MPNN-GNN for the single-molecule trained model and (d) MPNN-GNN for the mixture model.
Figure 2
Figure 2
Data set features for the molecules and their resultant blends. (a) Co-occurrence matrix for the 25 most common descriptors. The co-occurrence values are normalized to sum to 1 for each row and column. The color scale is logarithmic. The top 15 notes (up to “creamy”) all co-occur at least once, but the matrix becomes increasingly sparser for the less frequent elements. The total co-occurrence matrix for all 104 descriptors is available in the Supporting Information section. (b) Distribution of node degree, on a log scale. The majority of nodes have on the order of hundreds of edges, with the most connected node having 807 edges, and 23 nodes have only a single edge. (c) Occurrences for the 25 most common descriptors, ordered by the most frequent notes. The frequency (blue, on the left) for each note is the count of all pairs for which the descriptor appears. The support set (red, on the right) is the number of unique molecules across all pairs labeled with that note.
Figure 3
Figure 3
Experiment overview. (a,b) Schema of the experiment. (c) Graph carving schematic.
Figure 4
Figure 4
Meta-graph backbone consisting of the 15 most used molecules (nodes labeled by CID) and their combinations (edges). This figure is best viewed in color. (a) Aromatic combinations of molecules. Though each molecule has an aroma (node color) of its own, the blend (edge color) may smell different. (b) Graph carving for train/test separation. Using a 50:50 train/test split (node colors) allows the train/test data points (colored dashed edges) to cover similar descriptor distributions but results in some discarded data points (solid gray edges).
Figure 5
Figure 5
Kullback–Leibler similarity for label distributions between carved train/test components and the full data set. Similarity is calculated using KL(PQ) = exp(−KLdivergence(PQ)) which quantifies how much information is lost when approximating one distribution with another. To avoid distributional shift, only carvings very close to 50:50 are useable.
Figure 6
Figure 6
ROC values for both GIN and MPNN 5 folds. The figures show ROC values separately for each fold and also the mean value. (a) ROC value for the GIN model. The colored line shows the mean ROC value. (b) ROC value for the MPNN model. The colored line shows the mean ROC value.
Figure 7
Figure 7
Predictive power of our GNN models and the Morgan fingerprint baseline across all labels with random baseline (dashed line). (a) Blended pair task AUROC scores per descriptor, by the model. (b) Single-molecule task AUROC scores per descriptor, by the model.
Figure 8
Figure 8
Analysis of odor labels by conducting experiments. (a) KDE plots for top 5 descriptors by predictive accuracy in the training set of the single-molecule task. (b) KDE top 5 single molecules as above, for the test set. (c) KDE plots for bottom 5 descriptors by predictive accuracy in the training set of the single-molecule task. (d) KDE bottom 5 single molecules as above, for the test set. (e) KDE plots for top 5 descriptors by predictive accuracy in the training set of the blended pair prediction task. (f) KDE top 5 blended pairs as above, for the test set. (g) KDE plots for bottom 5 descriptors by predictive accuracy in the training set of the blended pair task. (h) KDE bottom 5 blended pairs as above, for the test set.
Figure 9
Figure 9
Scatter-plots of fit-coefficients for predicting the blended pair’s embedding using the GNN embeddings. (a) Scatter-plot of the fit coefficient using the MPNN-GNN model, with zoom on centroid. Notably, the distribution is not centered on the origin. In some cases, the blended pair’s embedding consists of equal combinations of each individual embedding, while in other cases, one particular embedding predominates. (b) Scatter-plot, as above, using the GIN-GNN embedding. The distribution is centered on the origin, suggesting that for many points, neither molecule’s embedding factors into the pair-level embedding. The vertical and horizontal lines represent where one component predominates, but the other molecule is not factored in at all.

References

    1. Rossiter K. J. Structure- odor relationships. Chem. Rev. 1996, 96, 3201–3240. 10.1021/cr950068a. - DOI - PubMed
    1. Keller A.; et al. Predicting human olfactory perception from chemical features of odor molecules. Science 2017, 355, 820–826. 10.1126/science.aal2014. - DOI - PMC - PubMed
    1. Virshup A. M.; Contreras-García J.; Wipf P.; Yang W.; Beratan D. N. Stochastic voyages into uncharted chemical space produce a representative library of all possible drug-like compounds. J. Am. Chem. Soc. 2013, 135, 7296–7303. 10.1021/ja401184g. - DOI - PMC - PubMed
    1. Krems R. Bayesian machine learning for quantum molecular dynamics. Phys. Chem. Chem. Phys. 2019, 21, 13392–13410. 10.1039/C9CP01883B. - DOI - PubMed
    1. Jaeger S.; Fulle S.; Turk S. Mol2vec: unsupervised machine learning approach with chemical intuition. J. Chem. Inf. Model. 2018, 58, 27–35. 10.1021/acs.jcim.7b00616. - DOI - PubMed

LinkOut - more resources