Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Sep 18;8(9):e73289.
doi: 10.1371/journal.pone.0073289. eCollection 2013.

Categorical dimensions of human odor descriptor space revealed by non-negative matrix factorization

Affiliations

Categorical dimensions of human odor descriptor space revealed by non-negative matrix factorization

Jason B Castro et al. PLoS One. .

Abstract

In contrast to most other sensory modalities, the basic perceptual dimensions of olfaction remain unclear. Here, we use non-negative matrix factorization (NMF)--a dimensionality reduction technique--to uncover structure in a panel of odor profiles, with each odor defined as a point in multi-dimensional descriptor space. The properties of NMF are favorable for the analysis of such lexical and perceptual data, and lead to a high-dimensional account of odor space. We further provide evidence that odor dimensions apply categorically. That is, odor space is not occupied homogenously, but rather in a discrete and intrinsically clustered manner. We discuss the potential implications of these results for the neural coding of odors, as well as for developing classifiers on larger datasets that may be useful for predicting perceptual qualities from chemical structures.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Summary of non-negative matrix factorization (NMF) applied to odor profiling data.
formula image Schematic Overview: NMF seeks a lower, s-dimensional approximation of a matrix formula image as the product of matrices formula image and formula image. formula image is formula image, consisting in the present study of formula image odor descriptors formula image formula image odors. A given column of formula image is the semantic profile of one odor, with each entry providing the percent-used value (see methods) of a given descriptor. Columns of formula image are basis vectors of the reduced, s-dimensional odor descriptor space. Columns of formula image are formula image-dimensional representations (weights) of the odors in the new basis. formula image Plot of residual error between perceptual data, formula image, and different NMF-derived approximations. formula image. For each choice of subspace, data were divided into random training and testing halves, and residual error between formula image and formula image computed. One-hundred such divisions into training and testing were used to compute the standard errors shown (shaded areas). formula image Reconstruction error (fraction of unexplained variance) for PCA and NMF vs. number of dimensions. The change in reconstruction error for the first interval is indicated by asterisks(*), and corresponds to the first point in the next panel. formula image Change in reconstruction error for PCA and NMF, compared to the change in reconstruction error for PCA performed on a scrambled matrix (formula image). formula image is used to estimate the cutoff number of dimensions for which a given dimensionality reduction method is explaining only noise in a dataset. Note that each point, formula image, is actually the difference in reconstruction error between dimensions formula image and formula image (by way of illustration, points with an asterisk in this panel denote corresponding intervals in the previous panel formula image).
Figure 2
Figure 2. Properties of the perceptual basis set .
Plot of normalized odor descriptor amplitude vs. odor descriptor number for the basis vector formula image. Each point along the x-axis corresponds to a single odor descriptor, and the amplitude of each descriptor indicates the descriptor's relevance to the shown perceptual basis vector. Colored circles show the formula image largest points in the basis vector, and descriptors corresponding to these points are listed to the right. formula image Waterfall plot of the 10 basis vectors constituting formula image, used in subsequent analyses. Note that each vector contains many values close to or equal to zero. formula image Detailed view of the first four basis vectors and their leading values. Left column: peak-normalized, rank ordered basis vectors, illustrating their sparseness and non-negativity. Right column: semantic descriptors characterizing the first four basis vectors. Bars show the first six rank-ordered, peak-normalized components of basis vectors 1 through 4 (subset of data from left column). The semantic label for each component is show to the left.
Figure 3
Figure 3. NMF on full, descriptor-only, and odor-only shuffled versions of the data.
formula image Peak behavior of histograms obtained from NMF performed on shuffled data, for each of the various shuffling conditions (see text for descriptions). formula image Tail behavior of histograms, same procedure and conditions as in formula image; note difference in scaling of axes between formula image and formula image. formula image Waterfall plots of basis sets obtained when NMF was applied on shuffled data, for various shuffling conditions. Note the comparative lack of sparseness, relative to the basis set shown in Fig. 3A. Reproducibility of basis vectors across iterations of NMF for shuffled data sets was eliminated, or severely compromised, as shown in Fig. 4.
Figure 4
Figure 4. Consensus Matrices for odor-shuffles, descriptor-shuffles, and full-shuffles.
formula image Consensus matrices (see text) showing reliability of basis sets when NMF is applied to various shuffled versions of the data. Only the original data shows the bimodal distribution of 1s and 0s characteristic of highly reliable clustering. Image ranges and colorscale same for all 4 matrices. formula image Top: Histograms of consensus matrix values for the three shuffling conditions, and the original data, confirming that only the original data shows a bimodal distribution of 1s and 0s (line colors correspond to labels in formula image). Bottom: Cumulative histograms, same data as above.
Figure 5
Figure 5. Approximate orthogonality of the NMF basis vectors.
formula image Histogram of angles subtended by all pairs of basis vectors, formula image. Histogram was constructed for all pairwise comparisons between dimensions, excluding self-comparisons. Bar with (*) denotes self-comparisons. formula image Matrix of pairwise comparisons of angles between dimensions.
Figure 6
Figure 6. Visualization of odors expressed in coordinates of the new basis.
formula image The weight matrix, formula image, discovered by NMF. Columns of formula image (each column corresponds to a different odor), are normalized and sorted into groups defined by peak coordinate (1–10). formula image Plot of all 144 odors (each point is a column of formula image) in the space spanned by the first 3 basis vectors, formula image and formula image. Black, red, and blue points are those with peak coordinates in dimensions 1, 2, and 3 respectively. Gray points are all remaining odors. formula image Chemical structures of representative odorants from the second and seventh diagonal blocks of the sorted matrix formula image (panel formula image).
Figure 7
Figure 7. Two-dimensional embedding of the descriptor-space, .
Results of stochastic neighbor embedding (see text) applied to the similarity matrix for formula image. Axis units are arbitrary, but preserve neighbor relations present in the higher dimensional space, formula image. Note that discrete clusters are clearly evident. Clusters were identified by eye, and descriptors composing each cluster are listed in the table below.
Figure 8
Figure 8. Two-dimensional embedding of the odorant-space, .
Results of stochastic neighbor embedding (see text) applied to the similarity matrix for formula image. As in figure 7, axis units are arbitrary, but preserve neighbor relationships observed in the full-dimensional space, formula image. Clusters were identified by eye, and odorants composing each cluster are listed in the table below.
Figure 9
Figure 9. Co-clustering of descriptors and odors.
formula image Overview of method used for defining a bicluster (see text for definition). A column formula image of formula image (descriptors), and the corresponding formula image row of formula image (odors) are rank ordered. The indices derived from the rank-ordering are used to re-order rows and columns of formula image (accomplished by computing the outer product between the rank-ordered formula image column of formula image and rank-ordered formula image row of formula image), producing a submatrix with high correlation among both odors and descriptors. By the nature of the sorting procedure, these matrices – biclusters – will have their largest values in the upper-left corner. For purposes of visualization, biclusters were convolved with an averaging filter. formula image The 10 biclusters defined by NMF on odor perceptual data.

Similar articles

Cited by

References

    1. Arzi A, Sobel N (2011) Olfactory perception as a compass for olfactory neural maps. Trends Cogn Sci (Regul Ed) 15: 537–545. - PubMed
    1. Lotto RB, Purves D (2002) A rationale for the structure of color space. Trends Neurosci 25: 84–88. - PubMed
    1. Lennie P, D'Zmura M (1988) Mechanisms of color vision. Crit Rev Neurobiol 3: 333–400. - PubMed
    1. Henning H (1916) Der Geruch. Leipzig.
    1. Amoore JE (1974) Evidence for the chemical olfactory code in man. Ann N Y Acad Sci 237: 137–143. - PubMed

Publication types