Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Feb 1;9(2):giaa011.
doi: 10.1093/gigascience/giaa011.

Artificial intelligence deciphers codes for color and odor perceptions based on large-scale chemoinformatic data

Affiliations

Artificial intelligence deciphers codes for color and odor perceptions based on large-scale chemoinformatic data

Xiayin Zhang et al. Gigascience. .

Abstract

Background: Color vision is the ability to detect, distinguish, and analyze the wavelength distributions of light independent of the total intensity. It mediates the interaction between an organism and its environment from multiple important aspects. However, the physicochemical basis of color coding has not been explored completely, and how color perception is integrated with other sensory input, typically odor, is unclear.

Results: Here, we developed an artificial intelligence platform to train algorithms for distinguishing color and odor based on the large-scale physicochemical features of 1,267 and 598 structurally diverse molecules, respectively. The predictive accuracies achieved using the random forest and deep belief network for the prediction of color were 100% and 95.23% ± 0.40% (mean ± SD), respectively. The predictive accuracies achieved using the random forest and deep belief network for the prediction of odor were 93.40% ± 0.31% and 94.75% ± 0.44% (mean ± SD), respectively. Twenty-four physicochemical features were sufficient for the accurate prediction of color, while 39 physicochemical features were sufficient for the accurate prediction of odor. A positive correlation between the color-coding and odor-coding properties of the molecules was predicted. A group of descriptors was found to interlink prominently in color and odor perceptions.

Conclusions: Our random forest model and deep belief network accurately predicted the colors and odors of structurally diverse molecules. These findings extend our understanding of the molecular and structural basis of color vision and reveal the interrelationship between color and odor perceptions in nature.

Keywords: color perception; deep belief network; odor perception; physicochemical features; random forest.

PubMed Disclaimer

Figures

Figure 1:
Figure 1:
The overall workflow of color prediction and odor prediction. A total of 1,267 structurally diverse molecules were labeled with 12 diverse colors, and 598 structurally diverse molecules were labeled with 12 diverse odors. In addition, 5,270 physicochemical features of each molecule were generated by Dragon. Random forest models and deep belief networks were built to predict colors or odors using their physicochemical features. Feature selection was conducted by random forest models and the genetic algorithm. With the selected feature, random forest models and deep belief networks were reused for color and odor prediction. The models were evaluated on the basis of the means and variances of the accuracies between the labeled and predicted colors or odors.
Figure 2:
Figure 2:
Color prediction using the random forest model and DBN. A. Confusion matrix for the classification of color with 100% accuracy by the random forest. The X-axis presents the labeled colors of the molecules, and the Y-axis presents the predicted colors of the molecules. B. The classification results for color were as high as 95.23% using the DBN. The X-axis presents the learning rate, the Y-axis presents the algorithm parameter “momentum,” and the Z-axis presents the accuracy rate. C. Boxplot presenting the accuracy of color prediction from 4-fold cross-validations using the random forest with all features, the top 24 features selected by random forest models, the top 24 features selected by random forest and the genetic algorithm, and the total 48 features from above. The median values of these boxplots are labeled. D. Boxplot presenting the accuracy of color prediction using the DBN with all features, the top 24 features selected by random forest models, the top 24 features selected by random forest and the genetic algorithm, and the total 48 features from above. The median values of these boxplots are labeled. #Random forest models, *random forest models and genetic algorithm. E. Heat map of the correlation values between the top 24 features selected by random forest models and the 12 colors based on the hierarchical clustering framework. The connections between the colors and descriptors were calculated by the Euclid distances.
Figure 3:
Figure 3:
Odor prediction using the random forest model and DBN. A. The confusion matrix for the classification of odor with 93.40% accuracy by the random forest. B. The classification results for odor were as high as 94.75% using the DBN. The X-axis presents the learning rate, the Y-axis presents the algorithm parameter “momentum,” and the Z-axis presents the accuracy rate. C. Boxplot to present the accuracy of color prediction using the random forest with all features, the top 39 features selected by random forest models, the top 39 features selected by the random forest and the genetic algorithm, and the total 78 features from above. The median values of these boxplots are labeled. D. Boxplot presenting the accuracy of color prediction using the DBN with all features, the top 39 features selected by random forest models, the top 39 features selected by random forest and the genetic algorithm, and the total 78 features from above. The median values of these boxplots are labeled. #Random forest models, *random forest models and genetic algorithm. E. Heat map of the correlation values between the top 39 features selected by random forest models and the 12 odors based on the hierarchical clustering framework. Connections between the odors and descriptors were calculated by the Euclid distances.
Figure 4:
Figure 4:
The correlations between color and olfaction perception. A. Of the 1,267 molecules with color, 90 also had odor information. B. Schematic diagram of the key physicochemical features for color and odor perceptions in the interactome. The key features for color perception were closely connected with the key features for odor perception. The distance of each line represents its correlation value.

Similar articles

Cited by

References

    1. Vukusic P, Sambles JR. Photonic structures in biology. Nature. 2003;424:852–5. - PubMed
    1. Chang L, Bao P, Tsao DY. The representation of colored objects in macaque color patches. Nat Commun. 2017;8(1):2064. - PMC - PubMed
    1. Kinoshita S, Yoshioka S, Miyazaki J. Physics of structural colors. Rep Prog Phys. 2008;71:076401.
    1. Wilkinson FA, Murillo SG. Advanced Inorganic Chemistry. Wiley: 1988.
    1. McMurry J. Organic Chemistry. Brooks Cole: 2007.

Publication types