. 2015 Jan 16;7(1):2.

doi: 10.1186/s13321-014-0050-6. eCollection 2015.

A ranking method for the concurrent learning of compounds with various activity profiles

Alexander Dörr¹, Lars Rosenbaum¹, Andreas Zell¹

Affiliations

PMID: 25643067
PMCID: PMC4306736
DOI: 10.1186/s13321-014-0050-6

A ranking method for the concurrent learning of compounds with various activity profiles

Alexander Dörr et al. J Cheminform. 2015.

. 2015 Jan 16;7(1):2.

doi: 10.1186/s13321-014-0050-6. eCollection 2015.

Authors

Alexander Dörr¹, Lars Rosenbaum¹, Andreas Zell¹

Affiliation

¹ Center for Bioinformatics Tübingen (ZBIT), University of Tuebingen, Sand 1, Tübingen, 72076 Germany.

PMID: 25643067
PMCID: PMC4306736
DOI: 10.1186/s13321-014-0050-6

Abstract

Background: In this study, we present a SVM-based ranking algorithm for the concurrent learning of compounds with different activity profiles and their varying prioritization. To this end, a specific labeling of each compound was elaborated in order to infer virtual screening models against multiple targets. We compared the method with several state-of-the-art SVM classification techniques that are capable of inferring multi-target screening models on three chemical data sets (cytochrome P450s, dehydrogenases, and a trypsin-like protease data set) containing three different biological targets each.

Results: The experiments show that ranking-based algorithms show an increased performance for single- and multi-target virtual screening. Moreover, compounds that do not completely fulfill the desired activity profile are still ranked higher than decoys or compounds with an entirely undesired profile, compared to other multi-target SVM methods.

Conclusions: SVM-based ranking methods constitute a valuable approach for virtual screening in multi-target drug design. The utilization of such methods is most helpful when dealing with compounds with various activity profiles and the finding of many ligands with an already perfectly matching activity profile is not to be expected.

Keywords: Machine learning; Multi-target; Ranking; Support vector machine; Virtual screening.

PubMed Disclaimer

Figures

**Figure 1**
**Overview of the workflow of different methods for multi-target screening.** The example assumes a data set with a main target T1 and a secondary target T2 which should be avoided by the ligand. S V M _Rank **(a)** and the multi-class SVM **(b)** learn the encoding s of the activity profile. Separate SVM models **(c)** predict the activity of i via act(i) and infer distinct classes based on the proposed encoding. The SVM with linear combinations **(d)** uses subsets of the data set to build several models that are combined before prediction. The encoding s of the desired activity profile is reflected in the factors c of the linear combinations.

**Figure 2**
**Support vector classification (SVC).** Illustration of an SVC classification function represented by w ^T x. The slack variables ξ _i=y _i w ^T x facilitate the trade-off between the size of the margin (indicated by a gray tube) and the error due to misclassifications. w denotes the weight vector, y _i the label of instance i, and x is the feature vector. ξ _i can assume a positive value between 0 and for 1 for training instances located in the margin. For instances on the wrong side of the margin ξ _i is less than 0. Support vectors are indicated by a red ring.

**Figure 3**
**Ranking SVM.** The learning algorithm of the ranking SVM yields a weight vector w that minimizes the pairwise loss dependent on the margin when the training instances are projected onto w. The overall ranking error is reduced to approximate the given ordering in the training set as effectively as possible along w. The principle of margin re-scaling allows for a ranking dependent on the degree of discrepancy in the ranking order and the pairwise loss is influenced by the k-partite ranking error. Therefore, ranking score 2 higher than score 4 is punished with a greater loss than a wrong order of the scores 4 and 3. This is indicated with an increasing margin dependent on the respective scores that are compared with each other.

**Figure 4**
**Composition of the cytochrome P450 and dehydrogenase data sets.** Both Venn diagrams show the composition of the active compounds with respect to their activity profiles in the cytochrome P450 data set (left) and the dehydrogenase data set (right).

**Figure 5**
**Distribution of pK** _i **values of the trypsin-like protease data set.** This figure shows the distribution of pK_i values of the three trypsin-like protease targets Factor Xa (FXa), Thrombin (Thr), and Trypsin (Try). The initial activity cutoff is drawn as a vertical dashed black line at 6.1.

**Figure 6**
**Performance of the cytochrome P450 and dehydrogenase baseline data sets for single-target activity.** Each boxplot depicts the mean ranking error on the 20 randomly generated test sets for each target. The given ranking error is equal to 1−A U C.

**Figure 7**
**Performance of the cytochrome P450 and dehydrogenase baseline data sets for dual-target activity.** Each boxplot depicts the mean ranking error on the 20 randomly generated test sets for each target. The given ranking error is equal to 1−A U C.

**Figure 8**
**Performance of the cytochrome P450 and dehydrogenase single-target data sets with binary test sets.** Each boxplot depicts the mean ranking error on the 20 randomly generated test sets for each target. The given ranking error is equal to 1−A U C.

**Figure 9**
**Performance of the cytochrome P450 and dehydrogenase single-target data sets with ranking test sets.** Each boxplot depicts the mean ranking error on the 20 randomly generated test sets for each target.

**Figure 10**
**Performance of the cytochrome P450s single-target data set with binary test sets.** Each boxplot depicts the mean ranking error on the 20 randomly generated test sets for each target separated according to the secondary target that should be avoided with higher priority. The given ranking error is equal to 1−A U C.

**Figure 11**
**Performance of the dehydrogenases single-target data set with binary test sets.** Each boxplot depicts the mean ranking error on the 20 randomly generated test sets for each target separated according to the secondary target that should be avoided with higher priority. The given ranking error is equal to 1−A U C.

**Figure 12**
**Performance of the cytochrome P450s single-target data set with ranking test sets.** Each boxplot depicts the mean ranking error on the 20 randomly generated test sets for each target separated according to the secondary target that should be avoided with higher priority.

**Figure 13**
**Performance of the dehydrogenases single-target data set with ranking test sets.** Each boxplot depicts the mean ranking error on the 20 randomly generated test sets for each target separated according to the secondary target that should be avoided with higher priority.

**Figure 14**
**Performance of the cytochrome P450 and dehydrogenase dual-target data sets with binary test sets.** Each boxplot depicts the mean ranking error on the 20 randomly generated test sets for each target. The given ranking error is equal to 1−A U C.

**Figure 15**
**Performance of the cytochrome P450 and dehydrogenase dual-target data sets with ranking test sets.** Each boxplot depicts the mean ranking error on the 20 randomly generated test sets for each target.

**Figure 16**
**Performance of the cytochrome P450s dual-target data set with binary test sets.** Each boxplot depicts the mean ranking error on the 20 randomly generated test sets for each target separated according to which of both main target should be regarded with higher priority. The given ranking error is equal to 1−A U C.

**Figure 17**
**Performance of the dehydrogenases dual-target data set with binary test sets.** Each boxplot depicts the mean ranking error on the 20 randomly generated test sets for each target separated according to which of both main target should be regarded with higher priority. The given ranking error is equal to 1−A U C.

**Figure 18**
**Performance of the cytochrome P450s dual-target data set with ranking test sets.** Each boxplot depicts the mean ranking error on the 20 randomly generated test sets for each target separated according to which of both main target should be regarded with higher priority.

**Figure 19**
**Performance of the dehydrogenases dual-target data set with ranking test sets.** Each boxplot depicts the mean ranking error on the 20 randomly generated test sets for each target separated according to which of both main target should be regarded with higher priority.

**Figure 20**
**Performance of the trypsin-like protease data set with FXa as main target.** Each boxplot depicts the mean ranking error on the 20 randomly generated test sets for each target separated according to the selected activity cutoff and which of the secondary targets should avoided with higher priority.

See this image and copyright information in PMC

References

1. Roth BL, Sheffler DJ, Kroeze WK. Magic shotguns versus magic bullets selectively non-selective drugs for mood disorders and schizophrenia. Nat Rev Drug Discov. 2004;3(4):353–9. doi: 10.1038/nrd1346. - DOI - PubMed
1. Ashburn TT, Thor KB. Drug repositioning: identifying and developing new uses for existing drugs. Nat Rev Drug Discov. 2004;3(8):673–83. doi: 10.1038/nrd1468. - DOI - PubMed
1. Kitano H. A robustness-based approach to systems-oriented drug design. Nat Rev Drug Discov. 2007;5(3):202–10. doi: 10.1038/nrd2195. - DOI - PubMed
1. Zimmermann GR, Lehar J, Keith CT. Multi-target therapeutics: when the whole is greater than the sum of the parts. Drug Discov Today. 2007;12:34–42. doi: 10.1016/j.drudis.2006.11.008. - DOI - PubMed
1. Hopkins AL. Network pharmacology. Nat Biotechnol. 2007;25(10):1110. doi: 10.1038/nbt1007-1110. - DOI - PubMed

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

A ranking method for the concurrent learning of compounds with various activity profiles

Affiliation

A ranking method for the concurrent learning of compounds with various activity profiles

Authors

Affiliation

Abstract

Figures

Similar articles

References

LinkOut - more resources

Full Text Sources

Other Literature Sources