Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Apr 3;26(7):2065.
doi: 10.3390/molecules26072065.

FragNet, a Contrastive Learning-Based Transformer Model for Clustering, Interpreting, Visualizing, and Navigating Chemical Space

Affiliations

FragNet, a Contrastive Learning-Based Transformer Model for Clustering, Interpreting, Visualizing, and Navigating Chemical Space

Aditya Divyakant Shrivastava et al. Molecules. .

Abstract

The question of molecular similarity is core in cheminformatics and is usually assessed via a pairwise comparison based on vectors of properties or molecular fingerprints. We recently exploited variational autoencoders to embed 6M molecules in a chemical space, such that their (Euclidean) distance within the latent space so formed could be assessed within the framework of the entire molecular set. However, the standard objective function used did not seek to manipulate the latent space so as to cluster the molecules based on any perceived similarity. Using a set of some 160,000 molecules of biological relevance, we here bring together three modern elements of deep learning to create a novel and disentangled latent space, viz transformers, contrastive learning, and an embedded autoencoder. The effective dimensionality of the latent space was varied such that clear separation of individual types of molecules could be observed within individual dimensions of the latent space. The capacity of the network was such that many dimensions were not populated at all. As before, we assessed the utility of the representation by comparing clozapine with its near neighbors, and we also did the same for various antibiotics related to flucloxacillin. Transformers, especially when as here coupled with contrastive learning, effectively provide one-shot learning and lead to a successful and disentangled representation of molecular latent spaces that at once uses the entire training set in their construction while allowing "similar" molecules to cluster together in an effective and interpretable way.

Keywords: artificial intelligence; attention; chemical space; cheminformatics; deep learning; generative methods; neural networks; transformers.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Figures

Figure 1
Figure 1
The transformer-based architecture used in the present work. The internals are described in Section 4.
Scheme 1
Scheme 1
Pseudocode for the transformer algorithm as implemented here.
Figure 2
Figure 2
Learning curve for training our transformers on (A) drugs, metabolites, fluorophores, and 2000 natural products, and (B) a full set of natural products. Because the transformer is effectively a one-shot learner, and the batch size varied, the abscissa is shown as a single epoch. The batch size was varied, as described in Section 4, and was (A) 50 (latent space of 64 dimensions) and (B) 20 (latent space of 256 dimensions), leading to an actual number of batches of (A) 92 and (B) 7500.
Figure 2
Figure 2
Learning curve for training our transformers on (A) drugs, metabolites, fluorophores, and 2000 natural products, and (B) a full set of natural products. Because the transformer is effectively a one-shot learner, and the batch size varied, the abscissa is shown as a single epoch. The batch size was varied, as described in Section 4, and was (A) 50 (latent space of 64 dimensions) and (B) 20 (latent space of 256 dimensions), leading to an actual number of batches of (A) 92 and (B) 7500.
Figure 3
Figure 3
Effect of adjusting the temperature parameter in the contrastive learning loss on the distribution of molecules in the latent space as visualized via the t-SNE algorithm. For clarity, only a random subset of 2000 natural products is shown. (A) Learning based purely on the cross-entropy objective function. (BE) The temperature scalar (as in [112]) was varied between 0.02 and 0.5 as indicated. (Reducing t below led to numerical instabilities.) All drugs, fluorophores, and Recon2 metabolites are plotted, along with a randomly chosen 2000 natural products (as in [113]).
Figure 3
Figure 3
Effect of adjusting the temperature parameter in the contrastive learning loss on the distribution of molecules in the latent space as visualized via the t-SNE algorithm. For clarity, only a random subset of 2000 natural products is shown. (A) Learning based purely on the cross-entropy objective function. (BE) The temperature scalar (as in [112]) was varied between 0.02 and 0.5 as indicated. (Reducing t below led to numerical instabilities.) All drugs, fluorophores, and Recon2 metabolites are plotted, along with a randomly chosen 2000 natural products (as in [113]).
Figure 3
Figure 3
Effect of adjusting the temperature parameter in the contrastive learning loss on the distribution of molecules in the latent space as visualized via the t-SNE algorithm. For clarity, only a random subset of 2000 natural products is shown. (A) Learning based purely on the cross-entropy objective function. (BE) The temperature scalar (as in [112]) was varied between 0.02 and 0.5 as indicated. (Reducing t below led to numerical instabilities.) All drugs, fluorophores, and Recon2 metabolites are plotted, along with a randomly chosen 2000 natural products (as in [113]).
Figure 4
Figure 4
Relationship between the extent of population of different dimensions and the dimensionality of the latent space using transformers with contrastive learning.
Figure 5
Figure 5
Values adopted in dimension 254 of the trained 256-D transformer, showing the values of various tri-hydroxy-benzene-containing compounds (left) ca. 0.59 and two lactones (ca. 0.73). The arrows indicate the bins (0.58, 0.73) in the histogram of values in this dimension from which the representative molecules shown were taken.
Figure 6
Figure 6
Values adopted in dimension 182 of the trained 256-D transformer, showing the values of various halide-containing (~0.835) and other molecules. As in Figure 5, we indicate the bins in the histogram of values (0.76, 0.81, 0.83) in this dimension from which the representative molecules shown were taken.
Figure 7
Figure 7
Histogram of the population of dimension 25 for the 256-D dataset. It is evident that most molecules adopt only a small range of non-zero values in this dimension.
Figure 8
Figure 8
Effective disentanglement of molecular features into individual dimensions, using the indicated values of 25th dimension of the latent space of the 2nd dataset. In this case we used a latent space of 256 dimensions and a temperature t of 0.05. (A) Trihydroxycyclohexane derivatives, (B) halide-containing moieties.
Figure 8
Figure 8
Effective disentanglement of molecular features into individual dimensions, using the indicated values of 25th dimension of the latent space of the 2nd dataset. In this case we used a latent space of 256 dimensions and a temperature t of 0.05. (A) Trihydroxycyclohexane derivatives, (B) halide-containing moieties.
Figure 9
Figure 9
Relationship between cosine similarity and Tanimoto similarity for clozapine in our chemical space, using a temperature of 0.05.
Figure 10
Figure 10
Relationship between cosine similarity and Euclidean distance for clozapine in our chemical space using a temperature of 0.1. (A) Overview. (B) Illustration of molecules in the bifurcation.
Figure 11
Figure 11
Relationship between cosine similarity for values of the temperature parameter of 0.05 and 0.02 for clozapine in our chemical space.
Figure 12
Figure 12
Relationship between cosine similarity for values of the temperature parameter of 0.05 and 0.1 for clozapine in our chemical space.
Figure 13
Figure 13
Relationship between cosine similarity for values of the temperature parameter of 0.05 and 0.5 for clozapine in our chemical space.
Figure 14
Figure 14
Relationship between cosine similarity and Tanimoto similarity (temperature = 0.05) for flucloxacillin in our chemical space.
Figure 15
Figure 15
Relationship between cosine similarity and Euclidean distance for flucloxacillin in our chemical space, with a temperature parameter of 0.1.
Figure 16
Figure 16
Relationship between cosine similarity for values of the temperature parameter of 0.05 and 0.02 for flucloxacillin in our chemical space.
Figure 17
Figure 17
Relationship between cosine similarity for values of the temperature parameter of 0.05 and 0.1 for flucloxacillin in our chemical space.
Figure 18
Figure 18
Relationship between cosine similarity for values of the temperature parameter of 0.05 and 0.5 for flucloxacillin in our chemical space.
Figure 19
Figure 19
Molecules closest to clozapine when a temperature of 0.1 is used, as judged by both cosine similarity and Euclidean distance.
Figure 20
Figure 20
Positions of chlorpromazine, prazosin and some other molecules in UMAP space when the NT-Xent temperature factor is 0.1.

References

    1. LeCun Y., Bengio Y., Hinton G. Deep learning. Nature. 2015;521:436–444. doi: 10.1038/nature14539. - DOI - PubMed
    1. Schmidhuber J. Deep learning in neural networks: An overview. Neural Netw. 2015;61:85–117. doi: 10.1016/j.neunet.2014.09.003. - DOI - PubMed
    1. Brown T.B., Mann B., Ryder N., Subbiah M., Kaplan J., Dhariwal P., Neelakantan A., Shyam P., Sastry G., Askell A., et al. Language models are few-shot learners. arXiv. 20202005.14165
    1. Senior A.W., Evans R., Jumper J., Kirkpatrick J., Sifre L., Green T., Qin C., Zidek A., Nelson A.W.R., Bridgland A., et al. Improved protein structure prediction using potentials from deep learning. Nature. 2020;577:706–710. doi: 10.1038/s41586-019-1923-7. - DOI - PubMed
    1. Samanta S., O’Hagan S., Swainston N., Roberts T.J., Kell D.B. VAE-Sim: A novel molecular similarity measure based on a variational autoencoder. Molecules. 2020;25:3446. doi: 10.3390/molecules25153446. - DOI - PMC - PubMed