Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 May 4:13:75.
doi: 10.1186/1471-2105-13-75.

Quantitatively integrating molecular structure and bioactivity profile evidence into drug-target relationship analysis

Affiliations

Quantitatively integrating molecular structure and bioactivity profile evidence into drug-target relationship analysis

Tianlei Xu et al. BMC Bioinformatics. .

Abstract

Background: Public resources of chemical compound are in a rapid growth both in quantity and the types of data-representation. To comprehensively understand the relationship between the intrinsic features of chemical compounds and protein targets is an essential task to evaluate potential protein-binding function for virtual drug screening. In previous studies, correlations were proposed between bioactivity profiles and target networks, especially when chemical structures were similar. With the lack of effective quantitative methods to uncover such correlation, it is demanding and necessary for us to integrate the information from multiple data sources to produce an comprehensive assessment of the similarity between small molecules, as well as quantitatively uncover the relationship between compounds and their targets by such integrated schema.

Results: In this study a multi-view based clustering algorithm was introduced to quantitatively integrate compound similarity from both bioactivity profiles and structural fingerprints. Firstly, a hierarchy clustering was performed with the fused similarity on 37 compounds curated from PubChem. Compared to clustering in a single view, the overall common target number within fused classes has been improved by using the integrated similarity, which indicated that the present multi-view based clustering is more efficient by successfully identifying clusters with its members sharing more number of common targets. Analysis in certain classes reveals that mutual complement of the two views for compound description helps to discover missing similar compound when only single view was applied. Then, a large-scale drug virtual screen was performed on 1267 compounds curated from Connectivity Map (CMap) dataset based on the fused similarity, which obtained a better ranking result compared to that of single-view. These comprehensive tests indicated that by combining different data representations; an improved assessment of target-specific compound similarity can be achieved.

Conclusions: Our study presented an efficient, extendable and quantitative computational model for integration of different compound representations, and expected to provide new clues to improve the virtual drug screening from various pharmacological properties. Scripts, supplementary materials and data used in this study are publicly available at http://lifecenter.sgst.cn/fusion/.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Workflow of this study. Initially two similarity matrices of different views were used as input after standardization to the z-value and renormalization. Then a two-step alternative minimization was used to obtain the proper weights for the two similarity matrix in fusion. In the first step, given the initial weights α=α1,α2. cross-entropy between the input matrices and a combined non-negative factorization was minimized by an EM algorithm. In the second step, given the calculated cross-entropy, the weights were calculated by minimizing the object function, i.e. the cross-entropy and entropy of the weight. The two steps iterate until convergence. The final α was used as an ideal weighing vector that obtains balance between weighted sparseness and informativeness.
Figure 2
Figure 2
Parameter optimization. Average Mean Disagreement(AMD) and Average Dunn’s Index(ADI) with different η value. When η = 3, AMD and ADI was marked with red color.
Figure 3
Figure 3
Clustering result using fused similarity. Numbers represent Pubchem Compound ID (CID). The value of distance d transformed from similarity s:d=1s is shown in the left. Three distinct classes was marked with a red box and named as: A [CID: 2723601, 3246652, 6351879]; B [CID: 24360, 60699, 72402, 354677, 5351879]; C [CID: 4212, 5458171].
Figure 4
Figure 4
Average common target number of the clustering result. The hierarchy clustering tree was cut into a range of classes. The common target obtained by fused similarity, structural similarity, bioactivity profile similarity and bioactivity profile Euclidean distance were represented by red, green, blue and purple lines respectively. The value of the default class number 6 was marked with a yellow line.
Figure 5
Figure 5
Compound-targets network of clustering. Protein Targets are represented in rectangle shape, and the corresponding compounds are represented in eclipse shape in cluster B. Compounds in previous cluster B were marked in purple. The newly discovered class member using fused similarity was marked in blue.
Figure 6
Figure 6
Bioactivity profile(A) and compound structure (B) of the 6 compounds in cluster B. Bioactivity profile and compound structure of the 6 compounds in cluster B were presented in Figure 6A and 6B respectively.

References

    1. Eckert H, Bojorath J. Molecular similarity analysis in virtual screening: foundations, limitations and novel approaches. Drug Discov Today. 2007;12(5–6):225–233. - PubMed
    1. Cheng T, Li Q, Wang Y, Bryant SH. Identifying compound-target associations by combining bioactivity profile similarity search and public databases mining. Journal of Chemical Information and Modeling. 2011;51(9):2440–2448. doi: 10.1021/ci200192v. - DOI - PMC - PubMed
    1. Bryant SH, Wang YL, Xiao JW, Suzek TO, Zhang J, Wang JY. PubChem: a public information system for analyzing bioactivities of small molecules. Nucleic Acids Res. 2009;37:W623–W633. doi: 10.1093/nar/gkp456. - DOI - PMC - PubMed
    1. Barrett T, Troup DB, Wilhite SE, Ledoux P, Evangelista C, Kim IF, Tomashevsky M, Marshall KA, Phillippy KH, Sherman PM. et al.NCBI GEO: archive for functional genomics data sets-10 years on. Nucleic Acids Res. 2011;39:D1005–D1010. doi: 10.1093/nar/gkq1184. - DOI - PMC - PubMed
    1. Wishart DS, Knox C, Guo AC, Cheng D, Shrivastava S, Tzur D, Gautam B, Hassanali M. DrugBank: a knowledgebase for drugs, drug actions and drug targets. Nucleic Acids Res. 2008;36:D901–D906. - PMC - PubMed

Publication types

Substances