Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Apr 30;3(4):3768-3777.
doi: 10.1021/acsomega.8b00344. Epub 2018 Apr 3.

Combining Similarity Searching and Network Analysis for the Identification of Active Compounds

Affiliations

Combining Similarity Searching and Network Analysis for the Identification of Active Compounds

Ryo Kunimoto et al. ACS Omega. .

Abstract

A variety of computational screening methods generate similarity-based compound rankings for hit identification. However, these rankings are difficult to interpret. It is essentially impossible to determine where novel active compounds might be found in database rankings. Thus, compound selection largely depends on intuition and guesswork. Herein, we show that molecular networks can substantially aid in the analysis of similarity-based compound rankings. A series of networks generated for rankings provides visual access to search results and adds chemical neighborhood and context information for reference compounds that are not available in rankings. Network structure is shown to serve as a diagnostic criterion for the likelihood to successfully select active compounds from rankings. In addition, comparison of different networks makes it possible to prioritize alternative similarity measures for search calculations and optimize the enrichment of active compounds in rankings.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing financial interest.

Figures

Figure 1
Figure 1
Similarity search hits in subsets of compound rankings. Boxplots show the distribution of hits in overlapping subsets representing the 500 top-ranked compounds across all 66 search trials. Boxplots report the smallest value (bottom), first quartile (lower boundary of the box), median value (thick red line), third quartile (upper boundary), and largest value (top).
Figure 2
Figure 2
Network properties. Boxplots report the distributions of different network properties of Tc-CSNs (left) and TcMCS-CSNs (right) generated for subsets of the 500 top-ranked compounds across all search trials. Network properties include the (a) clustering coefficient, (b) modularity, and (c) degree of drug nodes. The representation of boxplots is according to Figure 1.
Figure 3
Figure 3
Exemplary CSNs. For four exemplary similarity search sets reported in Table 1, Tc-CSNs (top) and TcMCS-CSNs (bottom) of overlapping subsets representing compound rankings are compared at a constant edge density of 5%. Nodes are color coded as follows: red, reference drug; green, hits with activity against the drug target; blue, database compounds with different activities. Drug nodes have constant size whereas nodes of hits and other database compounds are scaled in size according to their degrees. CSNs are shown for similarity searching using (a) cabozantinib, (b) iloperidone, (c) tolvaptan, and (d) sorafenib as reference drugs.
Figure 4
Figure 4
Similarity search performance. Shown are receiver operating characteristic (ROC) curves for similarity search calculations using (a) cabozantinib, (b) iloperidone, (c) tolvaptan, and (d) sorafenib as reference drugs. ROC curves compare true-positive and false-positive rates over compound rankings. In each case, the ROC curves were calculated for the 500 top-ranked compounds on the basis of ECFP4 Tc values (database search, blue) and after reranking of the top 500 compounds on the basis of TcMCS calculations (red).
Figure 5
Figure 5
Top-ranked compounds. Shown are the top three compounds for Tc- (left) and TcMCS-based (right) rankings according to Figure 4 using (a) cabozantinib, (b) iloperidone, (c) tolvaptan, and (d) sorafenib as reference drugs. Compounds whose ranks are highlighted in green are active against the drug target. For each of the top three compounds, the rank using the alternative similarity measure (Tc, right; TcMCS, left) is also reported (in italics).
Figure 6
Figure 6
Other bioactive compounds related to hits. Shown are the top three database compounds with other activities that are closely connected to correctly identified hits in TcMCS-CSNs. No. of connections report the total number of relationships formed with hits in the 1–200 subsets. In each case, the most similar hit is shown and ChEMBL targets are reported. Compounds are extracted from TcMCS-CSN of (a) cabozantinib, (b) iloperidone, (c) tolvaptan, and (d) sorafenib in Figure 3.

References

    1. Lavecchia A.; Di Giovanni C. Virtual Screening Strategies in Drug Discovery: A Critical Review. Curr. Med. Chem. 2013, 20, 2839–2860. 10.2174/09298673113209990001. - DOI - PubMed
    1. Irwin J. J.; Shoichet B. K. Docking Screens for Novel Ligands Conferring New Biology. J. Med. Chem. 2016, 59, 4103–4120. 10.1021/acs.jmedchem.5b02008. - DOI - PMC - PubMed
    1. Eckert H.; Bajorath J. Molecular Similarity Analysis in Virtual Screening: Foundations, Limitations and Novel Approaches. Drug Discovery Today 2007, 12, 225–233. 10.1016/j.drudis.2007.01.011. - DOI - PubMed
    1. Maggiora G.; Vogt M.; Stumpfe D.; Bajorath J. Molecular Similarity in Medicinal Chemistry. J. Med. Chem. 2014, 57, 3186–3204. 10.1021/jm401411z. - DOI - PubMed
    1. Maggiora G. M.; Shanmugasundaram V. Molecular Similarity Measures. Meth. Mol. Biol. 2004, 275, 1–50. 10.1385/1-59259-802-1:001. - DOI - PubMed