Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2022 Mar 3;23(5):2797.
doi: 10.3390/ijms23052797.

Unsupervised Learning in Drug Design from Self-Organization to Deep Chemistry

Affiliations
Review

Unsupervised Learning in Drug Design from Self-Organization to Deep Chemistry

Jaroslaw Polanski. Int J Mol Sci. .

Abstract

The availability of computers has brought novel prospects in drug design. Neural networks (NN) were an early tool that cheminformatics tested for converting data into drugs. However, the initial interest faded for almost two decades. The recent success of Deep Learning (DL) has inspired a renaissance of neural networks for their potential application in deep chemistry. DL targets direct data analysis without any human intervention. Although back-propagation NN is the main algorithm in the DL that is currently being used, unsupervised learning can be even more efficient. We review self-organizing maps (SOM) in mapping molecular representations from the 1990s to the current deep chemistry. We discovered the enormous efficiency of SOM not only for features that could be expected by humans, but also for those that are not trivial to human chemists. We reviewed the DL projects in the current literature, especially unsupervised architectures. DL appears to be efficient in pattern recognition (Deep Face) or chess (Deep Blue). However, an efficient deep chemistry is still a matter for the future. This is because the availability of measured property data in chemistry is still limited.

Keywords: deep chemistry; deep learning; drug design; feature engineering; feature learning; molecular representation; self-organizing maps; supervised learning; unsupervised learning.

PubMed Disclaimer

Conflict of interest statement

The author declares no conflict of interest.

Figures

Figure 1
Figure 1
The direct drug design problem can be defined as mapping property to structure (P → S)3. Mainly, it is realized in the indirect mode by structure to property mapping (S → P). Individual methods allow to include various domains (S → P)1 or (S → P)2. Domain diversity is indicated schematically by colors.
Figure 2
Figure 2
Supervised learning vs. unsupervised learning architectures. Both modes demand optimization; however, while in supervised learning, we need a label within the inputs which we use to estimate the error between the label and the output value, in unsupervised learning, the error is minimized by comparing the unlabeled inputs.
Figure 3
Figure 3
The propane vs. butane colored by methyl (yellow or blue in butane, yellow or green in propane) and methylene (red or green in butane, red in propane) fragments (a) provides a series of two types of CoMSA (SOM) projections (b), depending upon the SOM network regulation. Two types of patterns (b) can be explained by fuzzy topology (c). Details in text.
Figure 4
Figure 4
A series of CBG steroid surface data projected by CoMSA (SOM) without superimposition [21]. Without a single misinterpretation, H (high) and M (medium) activity compounds can be differentiated from the L (low) activity compounds. Details in text. Copyright © 1996 Polish Chemical Society.
Figure 5
Figure 5
Automatic chemical design using a data-driven continuous representation of molecules. In the critical operation of the latent space formation, the architecture analyzes the similarity of the SMILES codes of the candidate and the known inhibitor structures. A deep neural network involves three coupled functions: an encoder, a decoder (a) and a predictor (b) [45]. Copyright © 2018 American Chemical Society.

References

    1. Polanski J. Encyclopedia of Bioinformatics and Computational Biology. Elsevier; Amsterdam, The Netherlands: 2019. Chemoinformatics: From Chemical Art to Chemistry in Silico; pp. 601–618. - DOI
    1. Schneider G. Automating Drug Discovery. Nat. Rev. Drug Discov. 2017;17:97–113. doi: 10.1038/nrd.2017.232. - DOI - PubMed
    1. Dreyfus H.L. What Computers Can’t Do—The Limits of Artificial Intelligence. Harper and Row; New York, NY, USA: 1979.
    1. McCarthy What is AI?/Basic Questions. [(accessed on 26 February 2022)]. Available online: http://jmc.stanford.edu/artificial-intelligence/what-is-ai/index.html#:~....
    1. Sutton R.S., Barto A.G. Reinforcement Learning: An Introduction. MIT Press; Cambridge, MA, USA: Cambridge Mass; Cambridge, MA, USA: 2018.

LinkOut - more resources