Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Sep;381(6661):999-1006.
doi: 10.1126/science.ade4401. Epub 2023 Aug 31.

A principal odor map unifies diverse tasks in olfactory perception

Affiliations

A principal odor map unifies diverse tasks in olfactory perception

Brian K Lee et al. Science. 2023 Sep.

Abstract

Mapping molecular structure to odor perception is a key challenge in olfaction. We used graph neural networks to generate a principal odor map (POM) that preserves perceptual relationships and enables odor quality prediction for previously uncharacterized odorants. The model was as reliable as a human in describing odor quality: On a prospective validation set of 400 out-of-sample odorants, the model-generated odor profile more closely matched the trained panel mean than did the median panelist. By applying simple, interpretable, theoretically rooted transformations, the POM outperformed chemoinformatic models on several other odor prediction tasks, indicating that the POM successfully encoded a generalized map of structure-odor relationships. This approach broadly enables odor prediction and paves the way toward digitizing odors.

PubMed Disclaimer

Conflict of interest statement

The original work and funding for this manuscript was provided by Google Research. BKL, JNW, BSL, WWQ, RCG, and ABW were employees of Google at the time this study was conducted. During the review process, ABW, RCG and WWQ joined Osmo Labs, PBC, a new venture that is commercializing some of the technologies described in this manuscript. ABW, RCG, and WWQ each have an ownership interest in Osmo Labs, PBC, and receive a salary from the company. ABW is an officer of the company. JDM received funding from Google and serves on the Scientific Advisory Board of Osmo Labs, PBC. EJM, KAL, MA, and BBN received funding from Google.

Google has signed a transfer of ownership of all relevant IP (data, code, models, patents) to this new company. The details of this document are confidential, and unfortunately cannot be shared.

Figures

Fig. 1.
Fig. 1.. POM preserves the structure of odor perceptual space.
(A) Example triplet of molecules in which the structurally similar pair is not the perceptually similar pair. (B) The GNN was trained on a curated dataset of ~5000 semantically labeled molecules drawn from GoodScents (13) and Leffingwell (14) flavor and fragrance databases; one square represents 100 molecules; three example training set molecules and their odor descriptions are shown: 2-methyl-2-hexenoic acid (top), 2,5-dimethyl-3-thioisovalerylfuran (middle), 1-methyl-3-hexenyl acetate (bottom). (C) Schematic illustrating the process of training a GNN to generate the POM. (D-F) Odorants plotted by the first and second principal components (PC) of their (D) perceptual labels from GS/LF training dataset (138 labels), (E) cFP structural fingerprints (radius 4, 2048-bit), and (F) POM coordinates (256 dimensions). Areas dense with molecules having the broad category labels floral, meaty, or alcoholic are shaded; areas dense with narrow category labels are outlined. The POM recapitulates the true perceptual map, but the FP map does not; note that only relative (not absolute) coordinates matter. Additional labels are visualized for POM in Fig. S1.
Fig. 2:
Fig. 2:. GNN model displays human-level odor description performance.
(A) GNN model label predictions, (B) random forest (RF) model label predictions, (C) panel mean ratings with standard error bars, and (D) individual panelist ratings, averaged over 2 replicates, for the molecule 2,3-dihydrobenzofuran-5-carboxaldehyde. In panels A-C, the top 5 ranked descriptors are in orange (GNN), purple (RF), or green (panel). Descriptors in panels A-D are ordered by panel mean ratings. Panels A, B, and D are annotated with the Pearson correlation coefficient of their data to the panel mean rating shown in panel C. Panel D includes panelist/panel correlation coefficients for the panelist that best matches the panel mean and for the panelist with the median match. (E) Cumulative density plot showing the distribution of correlations between human panelists and the panel mean (in green) and between the GNN, RF, and GNN shuffled model predictions and the panel mean on a per molecule basis. Curves shifted to the right are more strongly correlated to the panel mean. (F) Difference in the median correlation to the panel mean relative to the median human subject’s correlation to the panel mean for models trained using k-nearest neighbor (KNN) and RF, trained on cFPs or Mordred features, and the GNN model. Only the GNN model has a median correlation to the panel mean that is higher than that of the median panelist.
Fig 3.
Fig 3.. Model performance is robust across structural and perceptual classes.
(A) Correlation of GNN (in orange) and RF (in purple) model predictions and panelist ratings (in gray) to the panel mean for each of the 55 odor labels. (B) GNN model correlation to panel mean for each of the 55 odor labels plotted against the number of molecules in the training data for which the label applies. Circle size is proportional to the number of test set molecules for which the label applies. Selected data points are annotated. (C) Mean correlation of GNN (in orange) and RF (in purple) model predictions and panelist ratings (in gray) to the panel mean for molecules belonging to 10 common chemical classes. (D) Categorization of gas chromatography-olfactometry quality control results for 50 test set stimuli.
Fig. 4.
Fig. 4.. POM is robust to discontinuities in structure-odor mapping.
(A) Example triplet of molecules in which the structurally similar pair is not the perceptually similar pair (i.e. “discordant”), according to the empirical odor labels of each molecule. Training set descriptors (anchor) and mean panel ratings (novel odorants) are shown beneath the molecular structure in colored text; model-predicted labels are listed in black text. Structural nodes highlighted in darker red are more important to model predictions. (B) We selected 41 such triplets from the empirical label data, without consulting the model; by design, 100% of these are discordant, and thus represent a difficult test for a predictive perceptual model based on molecular structure. Each colored line connects molecules in a triplet that share the same anchor, as in (C). (C) Diagram of the psychophysical task in which panelists rated explicit perceptual distances between molecules in triplets. (D) Experimentally-measured explicit perceptual distance ratings in the same triplets also show high discordance with structural distance, i.e. the molecule more structurally similar to the anchor is usually (90%) less perceptually similar. (E) The GNN model-predicted labels agree with the counter-intuitive-but-correct perceptual relationship 50% of the time, i.e. they correctly predict the empirical discordance half of the time, as measured by the cosine distance of the predicted, binarized labels. (F) A baseline model correctly predicts the empirical discordance only 19% of the time. The models in (E) and (F) are the same as those from Figures 2 and 3.
Fig. 5.
Fig. 5.. POM solves a fundamental set of olfactory prediction tasks.
(A) 2D trimap embedding of 500,000 unique likely odorants previously uncharacterized. The position of each point (molecule) is determined by POM coordinates, and the RGB values of each point correspond to their coordinates in the first 3 dimensions of a non-negative matrix factorization of the predicted odor labels. The inset plot shows the known odorants from the GS/LF training set (~5,000) in color superimposed over the likely odorants in gray. (B) Intuitive geometric measures like vector length, vector distance, and vector projection correspond to the odor prediction tasks of odor detectability, similarity, and descriptor applicability. Equation shows that the projected space Y represents the dot product between POM and a task-specific projection matrix X. (C) A linear model atop POM outperforms a chemoinformatic SVM baseline at predicting odor applicability on two extant datasets, Dravnieks (27) and DREAM (6), as well as the current data. (D) A linear model atop POM outperforms a chemoinformatic SVM baseline at predicting odor detection threshold using data from Abraham et al, 2011 (28). (E) A linear model atop POM outperforms a chemoinformatic SVM baseline at predicting perceptual similarity on Snitz et al, 2013 (4).

References

    1. Smith T, Guild J, The C.I.E. colorimetric standards and their use. Trans. Opt. Soc 33, 73–134 (1931).
    1. Evans EF, Frequency selectivity at high signal levels of single units in cochlear nerve and nucleus. Psychophys. Physiol. Hear, 185–192 (1977).
    1. Sell CS, On the Unpredictability of Odor. Angew. Chem. Int. Ed 45, 6254–6261 (2006). - PubMed
    1. Snitz K et al., Predicting Odor Perceptual Similarity from Odor Structure. PLOS Comput. Biol 9, e1003184 (2013). - PMC - PubMed
    1. Ravia A et al., A measure of smell enables the creation of olfactory metamers. Nature (2020), doi:10/ghmtvk. - PubMed

LinkOut - more resources