Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Jul;42(Web Server issue):W234-9.
doi: 10.1093/nar/gku379. Epub 2014 Apr 29.

A multi-fingerprint browser for the ZINC database

Affiliations

A multi-fingerprint browser for the ZINC database

Mahendra Awale et al. Nucleic Acids Res. 2014 Jul.

Abstract

To confirm the activity of an initial small molecule 'hit compound' from an activity screening, one needs to probe the structure-activity relationships by testing close analogs. The multi-fingerprint browser presented here (http://dcb-reymond23.unibe.ch:8080/MCSS/) enables one to rapidly identify such close analogs among commercially available compounds in the ZINC database (>13 million molecules). The browser retrieves nearest neighbors of any query molecule in multi-dimensional chemical spaces defined by four different fingerprints, each of which represents relevant structural and pharmacophoric features in a different way: sFP (substructure fingerprint), ECFP4 (extended connectivity fingerprint), MQNs (molecular quantum numbers) and SMIfp (SMILES fingerprint). Distances are calculated using the city-block distance, a similarity measure that performs as well as Tanimoto similarity but is much faster to compute. The list of up to 1000 nearest neighbors of any query molecule is retrieved by the browser and can be then clustered using the K-means clustering algorithm to produce a focused list of analogs with likely similar bioactivity to be considered for experimental evaluation.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Average AUC values (A) and EF at 0.1% of screened database (B), for recovery of 40 sets of actives in the directory useful decoys (DUD) from the ZINC database by using CBDfingerprint (blue bars) and Tfingerprint (brown bars) as scoring functions. Receiver operating characteristic curves (ROC) are provided in Supplementary Figures S1 and S2. ROC curves, average AUC and EF at 1% for recovery of DUD actives from the corresponding set of DUD decoys are provided in Supplementary Figures S3–S5.
Figure 2.
Figure 2.
Query page of Multi-Fingerprint browser for setting up search parameters. Search options can be divided into four parts: (i) molecular drawing panel for input query molecule, structure is shown for adrenaline; (ii) selection of one of the four fingerprint spaces (sFP/ECFP4/MQN/SMIfp) and of Max Count or Max Distance mode; (iii) choice of specific vendors for the search (by default all vendors will be searched); (iv) filters to fix certain molecular properties of the query molecule.
Figure 3.
Figure 3.
Similarity search results for retrieval of 500 nearest neighbors of adrenaline in MQN space. Structures of nearest neighbors are shown in the molecule table built with the MarvinView Applet from ChemAxon Pvt Ltd. The scatter plot showing the number of compounds as a function of CBD to the query is constructed with the ‘Google Chart’ application. These nearest neighbors can be saved to a file (green button at bottom of page) or can be further analyzed by clustering using K-means algorithm.
Figure 4.
Figure 4.
Visualization/analysis interface for clustering results. The list of 50 clusters for MQN analogs of adrenaline is shown in the table on the left. The molecular table on the right displays the structures of compounds in cluster no. 9, which is selected in the table on the left. The centroid of the cluster is displayed at position 1 in the table. The list of clusters can be saved to a file for further analysis using the ‘Save Clusters’ button. Molecules from the clusters can be selected manually and saved to file using ‘Add to Collection’ and ‘Save Collection’ buttons, respectively.

References

    1. Bleicher K.H., Bohm H.-J., Muller K., Alanine A.I. Hit and lead generation: beyond high-throughput screening. Nat. Rev. Drug. Discov. 2003;2:369–378. - PubMed
    1. Zhu T., Cao S., Su P.-C., Patel R., Shah D., Chokshi H.B., Szukala R., Johnson M.E., Hevener K.E. Hit identification and optimization in virtual screening: practical recommendations based on a critical literature analysis. J. Med. Chem. 2013;56:6560–6572. - PMC - PubMed
    1. Ripphausen P., Nisius B., Peltason L., Bajorath J. Quo vadis, virtual screening? A comprehensive survey of prospective applications. J. Med. Chem. 2010;53:8461–8467. - PubMed
    1. Hughes J.P., Rees S., Kalindjian S.B., Philpott K.L. Principles of early drug discovery. Br. J. Pharmacol. 2011;162:1239–1249. - PMC - PubMed
    1. Geppert H., Vogt M., Bajorath J. Current trends in ligand-based virtual screening: molecular representations, data mining methods, new application areas, and performance evaluation. J. Chem. Inf. Model. 2010;50:205–216. - PubMed

Publication types

Substances