Combining Similarity Searching and Network Analysis for the Identification of Active Compounds

Ryo Kunimoto¹, Jürgen Bajorath¹

Affiliations

Affiliation

¹ Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Endenicher Allee 19c, D-53115 Bonn, Germany.

PMID: 30023879
PMCID: PMC6044633
DOI: 10.1021/acsomega.8b00344

Combining Similarity Searching and Network Analysis for the Identification of Active Compounds

Ryo Kunimoto et al. ACS Omega. 2018.

. 2018 Apr 30;3(4):3768-3777.

doi: 10.1021/acsomega.8b00344. Epub 2018 Apr 3.

Authors

Ryo Kunimoto¹, Jürgen Bajorath¹

Affiliation

¹ Department of Life Science Informatics, B-IT, LIMES Program Unit Chemical Biology and Medicinal Chemistry, Rheinische Friedrich-Wilhelms-Universität, Endenicher Allee 19c, D-53115 Bonn, Germany.

PMID: 30023879
PMCID: PMC6044633
DOI: 10.1021/acsomega.8b00344

Abstract

A variety of computational screening methods generate similarity-based compound rankings for hit identification. However, these rankings are difficult to interpret. It is essentially impossible to determine where novel active compounds might be found in database rankings. Thus, compound selection largely depends on intuition and guesswork. Herein, we show that molecular networks can substantially aid in the analysis of similarity-based compound rankings. A series of networks generated for rankings provides visual access to search results and adds chemical neighborhood and context information for reference compounds that are not available in rankings. Network structure is shown to serve as a diagnostic criterion for the likelihood to successfully select active compounds from rankings. In addition, comparison of different networks makes it possible to prioritize alternative similarity measures for search calculations and optimize the enrichment of active compounds in rankings.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing financial interest.

Figures

**Figure 1**
Similarity search hits in subsets of compound rankings. Boxplots show the distribution of hits in overlapping subsets representing the 500 top-ranked compounds across all 66 search trials. Boxplots report the smallest value (bottom), first quartile (lower boundary of the box), median value (thick red line), third quartile (upper boundary), and largest value (top).

**Figure 2**
Network properties. Boxplots report the distributions of different network properties of Tc-CSNs (left) and TcMCS-CSNs (right) generated for subsets of the 500 top-ranked compounds across all search trials. Network properties include the (a) clustering coefficient, (b) modularity, and (c) degree of drug nodes. The representation of boxplots is according to Figure 1.

**Figure 3**
Exemplary CSNs. For four exemplary similarity search sets reported in Table 1, Tc-CSNs (top) and TcMCS-CSNs (bottom) of overlapping subsets representing compound rankings are compared at a constant edge density of 5%. Nodes are color coded as follows: red, reference drug; green, hits with activity against the drug target; blue, database compounds with different activities. Drug nodes have constant size whereas nodes of hits and other database compounds are scaled in size according to their degrees. CSNs are shown for similarity searching using (a) cabozantinib, (b) iloperidone, (c) tolvaptan, and (d) sorafenib as reference drugs.

**Figure 4**
Similarity search performance. Shown are receiver operating characteristic (ROC) curves for similarity search calculations using (a) cabozantinib, (b) iloperidone, (c) tolvaptan, and (d) sorafenib as reference drugs. ROC curves compare true-positive and false-positive rates over compound rankings. In each case, the ROC curves were calculated for the 500 top-ranked compounds on the basis of ECFP4 Tc values (database search, blue) and after reranking of the top 500 compounds on the basis of TcMCS calculations (red).

**Figure 5**
Top-ranked compounds. Shown are the top three compounds for Tc- (left) and TcMCS-based (right) rankings according to Figure 4 using (a) cabozantinib, (b) iloperidone, (c) tolvaptan, and (d) sorafenib as reference drugs. Compounds whose ranks are highlighted in green are active against the drug target. For each of the top three compounds, the rank using the alternative similarity measure (Tc, right; TcMCS, left) is also reported (in italics).

**Figure 6**
Other bioactive compounds related to hits. Shown are the top three database compounds with other activities that are closely connected to correctly identified hits in TcMCS-CSNs. No. of connections report the total number of relationships formed with hits in the 1–200 subsets. In each case, the most similar hit is shown and ChEMBL targets are reported. Compounds are extracted from TcMCS-CSN of (a) cabozantinib, (b) iloperidone, (c) tolvaptan, and (d) sorafenib in Figure 3.

See this image and copyright information in PMC

References

1. Lavecchia A.; Di Giovanni C. Virtual Screening Strategies in Drug Discovery: A Critical Review. Curr. Med. Chem. 2013, 20, 2839–2860. 10.2174/09298673113209990001. - DOI - PubMed
1. Irwin J. J.; Shoichet B. K. Docking Screens for Novel Ligands Conferring New Biology. J. Med. Chem. 2016, 59, 4103–4120. 10.1021/acs.jmedchem.5b02008. - DOI - PMC - PubMed
1. Eckert H.; Bajorath J. Molecular Similarity Analysis in Virtual Screening: Foundations, Limitations and Novel Approaches. Drug Discovery Today 2007, 12, 225–233. 10.1016/j.drudis.2007.01.011. - DOI - PubMed
1. Maggiora G.; Vogt M.; Stumpfe D.; Bajorath J. Molecular Similarity in Medicinal Chemistry. J. Med. Chem. 2014, 57, 3186–3204. 10.1021/jm401411z. - DOI - PubMed
1. Maggiora G. M.; Shanmugasundaram V. Molecular Similarity Measures. Meth. Mol. Biol. 2004, 275, 1–50. 10.1385/1-59259-802-1:001. - DOI - PubMed

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Combining Similarity Searching and Network Analysis for the Identification of Active Compounds

Affiliation

Combining Similarity Searching and Network Analysis for the Identification of Active Compounds

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

References

LinkOut - more resources

Full Text Sources

Other Literature Sources