. 2021 Apr 3;26(7):2065.

doi: 10.3390/molecules26072065.

FragNet, a Contrastive Learning-Based Transformer Model for Clustering, Interpreting, Visualizing, and Navigating Chemical Space

Aditya Divyakant Shrivastava^{1

2}, Douglas B Kell^{2

3

4}

Affiliations

¹ Department of Computer Science and Engineering, Nirma University, Ahmedabad 382481, India.
² Department of Biochemistry and Systems Biology, Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Crown St., Liverpool L69 7ZB, UK.
³ Novo Nordisk Foundation Centre for Biosustainability, Technical University of Denmark, Building 220, Kemitorvet, 2800 Kgs Lyngby, Denmark.
⁴ Mellizyme Ltd., Liverpool Science Park, IC1, 131 Mount Pleasant, Liverpool L3 5TF, UK.

PMID: 33916824
PMCID: PMC8038408
DOI: 10.3390/molecules26072065

FragNet, a Contrastive Learning-Based Transformer Model for Clustering, Interpreting, Visualizing, and Navigating Chemical Space

Aditya Divyakant Shrivastava et al. Molecules. 2021.

. 2021 Apr 3;26(7):2065.

doi: 10.3390/molecules26072065.

Authors

Aditya Divyakant Shrivastava^{1

2}, Douglas B Kell^{2

3

4}

Affiliations

¹ Department of Computer Science and Engineering, Nirma University, Ahmedabad 382481, India.
² Department of Biochemistry and Systems Biology, Institute of Systems, Molecular and Integrative Biology, University of Liverpool, Crown St., Liverpool L69 7ZB, UK.
³ Novo Nordisk Foundation Centre for Biosustainability, Technical University of Denmark, Building 220, Kemitorvet, 2800 Kgs Lyngby, Denmark.
⁴ Mellizyme Ltd., Liverpool Science Park, IC1, 131 Mount Pleasant, Liverpool L3 5TF, UK.

PMID: 33916824
PMCID: PMC8038408
DOI: 10.3390/molecules26072065

Abstract

The question of molecular similarity is core in cheminformatics and is usually assessed via a pairwise comparison based on vectors of properties or molecular fingerprints. We recently exploited variational autoencoders to embed 6M molecules in a chemical space, such that their (Euclidean) distance within the latent space so formed could be assessed within the framework of the entire molecular set. However, the standard objective function used did not seek to manipulate the latent space so as to cluster the molecules based on any perceived similarity. Using a set of some 160,000 molecules of biological relevance, we here bring together three modern elements of deep learning to create a novel and disentangled latent space, viz transformers, contrastive learning, and an embedded autoencoder. The effective dimensionality of the latent space was varied such that clear separation of individual types of molecules could be observed within individual dimensions of the latent space. The capacity of the network was such that many dimensions were not populated at all. As before, we assessed the utility of the representation by comparing clozapine with its near neighbors, and we also did the same for various antibiotics related to flucloxacillin. Transformers, especially when as here coupled with contrastive learning, effectively provide one-shot learning and lead to a successful and disentangled representation of molecular latent spaces that at once uses the entire training set in their construction while allowing "similar" molecules to cluster together in an effective and interpretable way.

Keywords: artificial intelligence; attention; chemical space; cheminformatics; deep learning; generative methods; neural networks; transformers.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Figures

**Figure 1**
The transformer-based architecture used in the present work. The internals are described in Section 4.

**Scheme 1**
Pseudocode for the transformer algorithm as implemented here.

**Figure 2**
Learning curve for training our transformers on (A) drugs, metabolites, fluorophores, and 2000 natural products, and (B) a full set of natural products. Because the transformer is effectively a one-shot learner, and the batch size varied, the abscissa is shown as a single epoch. The batch size was varied, as described in Section 4, and was (A) 50 (latent space of 64 dimensions) and (B) 20 (latent space of 256 dimensions), leading to an actual number of batches of (A) 92 and (B) 7500.

**Figure 3**
Effect of adjusting the temperature parameter in the contrastive learning loss on the distribution of molecules in the latent space as visualized via the t-SNE algorithm. For clarity, only a random subset of 2000 natural products is shown. (A) Learning based purely on the cross-entropy objective function. (B–E) The temperature scalar (as in [112]) was varied between 0.02 and 0.5 as indicated. (Reducing t below led to numerical instabilities.) All drugs, fluorophores, and Recon2 metabolites are plotted, along with a randomly chosen 2000 natural products (as in [113]).

**Figure 4**
Relationship between the extent of population of different dimensions and the dimensionality of the latent space using transformers with contrastive learning.

**Figure 5**
Values adopted in dimension 254 of the trained 256-D transformer, showing the values of various tri-hydroxy-benzene-containing compounds (left) ca. 0.59 and two lactones (ca. 0.73). The arrows indicate the bins (0.58, 0.73) in the histogram of values in this dimension from which the representative molecules shown were taken.

**Figure 6**
Values adopted in dimension 182 of the trained 256-D transformer, showing the values of various halide-containing (~0.835) and other molecules. As in Figure 5, we indicate the bins in the histogram of values (0.76, 0.81, 0.83) in this dimension from which the representative molecules shown were taken.

**Figure 7**
Histogram of the population of dimension 25 for the 256-D dataset. It is evident that most molecules adopt only a small range of non-zero values in this dimension.

**Figure 8**
Effective disentanglement of molecular features into individual dimensions, using the indicated values of 25th dimension of the latent space of the 2nd dataset. In this case we used a latent space of 256 dimensions and a temperature t of 0.05. (A) Trihydroxycyclohexane derivatives, (B) halide-containing moieties.

**Figure 9**
Relationship between cosine similarity and Tanimoto similarity for clozapine in our chemical space, using a temperature of 0.05.

**Figure 10**
Relationship between cosine similarity and Euclidean distance for clozapine in our chemical space using a temperature of 0.1. (A) Overview. (B) Illustration of molecules in the bifurcation.

**Figure 11**
Relationship between cosine similarity for values of the temperature parameter of 0.05 and 0.02 for clozapine in our chemical space.

**Figure 12**
Relationship between cosine similarity for values of the temperature parameter of 0.05 and 0.1 for clozapine in our chemical space.

**Figure 13**
Relationship between cosine similarity for values of the temperature parameter of 0.05 and 0.5 for clozapine in our chemical space.

**Figure 14**
Relationship between cosine similarity and Tanimoto similarity (temperature = 0.05) for flucloxacillin in our chemical space.

**Figure 15**
Relationship between cosine similarity and Euclidean distance for flucloxacillin in our chemical space, with a temperature parameter of 0.1.

**Figure 16**
Relationship between cosine similarity for values of the temperature parameter of 0.05 and 0.02 for flucloxacillin in our chemical space.

**Figure 17**
Relationship between cosine similarity for values of the temperature parameter of 0.05 and 0.1 for flucloxacillin in our chemical space.

**Figure 18**
Relationship between cosine similarity for values of the temperature parameter of 0.05 and 0.5 for flucloxacillin in our chemical space.

**Figure 19**
Molecules closest to clozapine when a temperature of 0.1 is used, as judged by both cosine similarity and Euclidean distance.

**Figure 20**
Positions of chlorpromazine, prazosin and some other molecules in UMAP space when the NT-Xent temperature factor is 0.1.

See this image and copyright information in PMC

References

1. LeCun Y., Bengio Y., Hinton G. Deep learning. Nature. 2015;521:436–444. doi: 10.1038/nature14539. - DOI - PubMed
1. Schmidhuber J. Deep learning in neural networks: An overview. Neural Netw. 2015;61:85–117. doi: 10.1016/j.neunet.2014.09.003. - DOI - PubMed
1. Brown T.B., Mann B., Ryder N., Subbiah M., Kaplan J., Dhariwal P., Neelakantan A., Shyam P., Sastry G., Askell A., et al. Language models are few-shot learners. arXiv. 20202005.14165
1. Senior A.W., Evans R., Jumper J., Kirkpatrick J., Sifre L., Green T., Qin C., Zidek A., Nelson A.W.R., Bridgland A., et al. Improved protein structure prediction using potentials from deep learning. Nature. 2020;577:706–710. doi: 10.1038/s41586-019-1923-7. - DOI - PubMed
1. Samanta S., O’Hagan S., Swainston N., Roberts T.J., Kell D.B. VAE-Sim: A novel molecular similarity measure based on a variational autoencoder. Molecules. 2020;25:3446. doi: 10.3390/molecules25153446. - DOI - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions

Grants and funding

NNF10CC1016517/Novo Nordisk Fonden

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
- scite Smart Citations
Research Materials
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

FragNet, a Contrastive Learning-Based Transformer Model for Clustering, Interpreting, Visualizing, and Navigating Chemical Space

Affiliations

FragNet, a Contrastive Learning-Based Transformer Model for Clustering, Interpreting, Visualizing, and Navigating Chemical Space

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Research Materials