Identifying problematic drugs based on the characteristics of their targets

Tiago J S Lopes¹, Jason E Shoemaker², Yukiko Matsuoka³, Yoshihiro Kawaoka¹, Hiroaki Kitano⁴

Affiliations

¹ Japan Science and Technology Agency ERATO Kawaoka Infection-Induced Host Responses Project Minato-ku, Japan ; Department of Pathobiological Sciences, School of Veterinary Medicine, Influenza Research Institute, University of Wisconsin-Madison Madison, WI, USA ; Division of Virology, Department of Microbiology and Immunology, Institute of Medical Science, University of Tokyo Tokyo, Japan.
² Japan Science and Technology Agency ERATO Kawaoka Infection-Induced Host Responses Project Minato-ku, Japan ; Division of Virology, Department of Microbiology and Immunology, Institute of Medical Science, University of Tokyo Tokyo, Japan.
³ Japan Science and Technology Agency ERATO Kawaoka Infection-Induced Host Responses Project Minato-ku, Japan ; The Systems Biology Institute Tokyo, Japan.
⁴ Japan Science and Technology Agency ERATO Kawaoka Infection-Induced Host Responses Project Minato-ku, Japan ; The Systems Biology Institute Tokyo, Japan ; Sony Computer Science Laboratories, Inc. Tokyo, Japan ; Integrated Open Systems Unit, Okinawa Institute of Science and Technology Okinawa, Japan ; Laboratory for Disease Systems Modeling, RIKEN Center for Integrative Medical Sciences Yokohama, Japan.

PMID: 26388775
PMCID: PMC4555035
DOI: 10.3389/fphar.2015.00186

Identifying problematic drugs based on the characteristics of their targets

Tiago J S Lopes et al. Front Pharmacol. 2015.

. 2015 Sep 1:6:186.

doi: 10.3389/fphar.2015.00186. eCollection 2015.

Authors

Tiago J S Lopes¹, Jason E Shoemaker², Yukiko Matsuoka³, Yoshihiro Kawaoka¹, Hiroaki Kitano⁴

Affiliations

¹ Japan Science and Technology Agency ERATO Kawaoka Infection-Induced Host Responses Project Minato-ku, Japan ; Department of Pathobiological Sciences, School of Veterinary Medicine, Influenza Research Institute, University of Wisconsin-Madison Madison, WI, USA ; Division of Virology, Department of Microbiology and Immunology, Institute of Medical Science, University of Tokyo Tokyo, Japan.
² Japan Science and Technology Agency ERATO Kawaoka Infection-Induced Host Responses Project Minato-ku, Japan ; Division of Virology, Department of Microbiology and Immunology, Institute of Medical Science, University of Tokyo Tokyo, Japan.
³ Japan Science and Technology Agency ERATO Kawaoka Infection-Induced Host Responses Project Minato-ku, Japan ; The Systems Biology Institute Tokyo, Japan.
⁴ Japan Science and Technology Agency ERATO Kawaoka Infection-Induced Host Responses Project Minato-ku, Japan ; The Systems Biology Institute Tokyo, Japan ; Sony Computer Science Laboratories, Inc. Tokyo, Japan ; Integrated Open Systems Unit, Okinawa Institute of Science and Technology Okinawa, Japan ; Laboratory for Disease Systems Modeling, RIKEN Center for Integrative Medical Sciences Yokohama, Japan.

PMID: 26388775
PMCID: PMC4555035
DOI: 10.3389/fphar.2015.00186

Abstract

Identifying promising compounds during the early stages of drug development is a major challenge for both academia and the pharmaceutical industry. The difficulties are even more pronounced when we consider multi-target pharmacology, where the compounds often target more than one protein, or multiple compounds are used together. Here, we address this problem by using machine learning and network analysis to process sequence and interaction data from human proteins to identify promising compounds. We used this strategy to identify properties that make certain proteins more likely to cause harmful effects when targeted; such proteins usually have domains commonly found throughout the human proteome. Additionally, since currently marketed drugs hit multiple targets simultaneously, we combined the information from individual proteins to devise a score that quantifies the likelihood of a compound being harmful to humans. This approach enabled us to distinguish between approved and problematic drugs with an accuracy of 60-70%. Moreover, our approach can be applied as soon as candidate drugs are available, as demonstrated with predictions for more than 5000 experimental drugs. These resources are available at http://sourceforge.net/projects/psin/.

Keywords: drug safety; machine learning; multi-target drugs; protein networks; supervised learning; target validation.

PubMed Disclaimer

Figures

**Figure 1**
**(A)** Nodes with up to ~500 connections are neighbors of proteins with approximately the same degree. After the peak, nodes with higher degrees are connected to other nodes with ~400 connections. Darker tones of blue indicate a higher concentration of nodes with these degree values. **(B)** Depicted are a few neighbors of notch1, their families, and domain compositions (with the shared domains boxed red). In the PSIN, notch1 is connected to members of the Peptidase S1 family through their shared EGF domain. The proteins from the other three families are connected to each other and to notch1 by their ankyrin domain.

**Figure 2**
**(A)** Although most targets of approved drugs are exclusive, the problematic targets are almost entirely covered by the approved category. Between parentheses are the number of singleton proteins in the PSIN. **(B)** Approved and problematic drugs have different numbers of reported targets. While most problematic drugs have only one target reported, approved drugs have several—identified either by the community after the drug is marketed or by companies as part of the drug-approval process. **(C)** The Burt's constraint was proposed in a sociological context to study positions of advantage for individuals in a group. In this simple example, if the nodes are individuals, on the left no node can negotiate or bargain with the others, since they all have alternative connections. However, on the right, if a structural hole exists, Node 1 is in a better position, since the other two nodes may not be aware of each other's existence;hence, Node 1 is less “constrained” than the other two. In a protein similarity context, proteins with low constraint values are generally those with several common domains, located between different protein families. In contrast, proteins with large constraint values are the peripheral nodes, with a few domains shared among only a few other proteins.

**Figure 3**
**(A–D)** In general, targets of problematic drugs have high degrees and closeness centralities in the PSIN and PPI networks. However, their betweenness values are not significantly different from the targets of approved drugs in either protein network (One-Way ANOVA, ^***p < < 0.0001 and ^*p>0.05, sample sizes for each group are the same as depicted in Figure 2A). The closeness from the targets of both networks was close to two main values, differing by only decimal digits; therefore, we rounded the values to their closest integer, namely 17 or 19 in the PSIN and 14 or 18 in the PPI. While three PSIN centrality measures were found to be strong indicators of the differences between targets of problematic and approved drugs, the centrality measures of the PPI network could also detect these differences, albeit in a moderate fashion (Tukey's Honest Significance Difference—Supplementary Figure 2). Overall, this likely stems from the fact that the current PPIs still have only ~10,000 proteins and numerous false-positive interactions; with new proteins and high-quality interactions being constantly added, we expect this to change in the future.

**Figure 4**
**(A)** The cumulative percentage of approved, experimental, and problematic drugs, according to their rejection scores (RSs) (the complete predictions are available in Supplementary Tables S3, S4). **(B)** We predicted the status of experimental drugs from the TTD, Drugbank, and ChEMBL databases. In general, more than half of the drugs have high rejection scores, whereas about 20–30% have RSs that position them among the low-risk compounds. Each chart contains the number of drugs of the respective group.

See this image and copyright information in PMC

Cited by

Network-Guided Discovery of Influenza Virus Replication Host Factors.
Ackerman EE, Kawakami E, Katoh M, Watanabe T, Watanabe S, Tomita Y, Lopes TJ, Matsuoka Y, Kitano H, Shoemaker JE, Kawaoka Y. Ackerman EE, et al. mBio. 2018 Dec 18;9(6):e02002-18. doi: 10.1128/mBio.02002-18. mBio. 2018. PMID: 30563907 Free PMC article.
A dual controllability analysis of influenza virus-host protein-protein interaction networks for antiviral drug target discovery.
Ackerman EE, Alcorn JF, Hase T, Shoemaker JE. Ackerman EE, et al. BMC Bioinformatics. 2019 Jun 3;20(1):297. doi: 10.1186/s12859-019-2917-z. BMC Bioinformatics. 2019. PMID: 31159726 Free PMC article.
Topological network measures for drug repositioning.
Badkas A, De Landtsheer S, Sauter T. Badkas A, et al. Brief Bioinform. 2021 Jul 20;22(4):bbaa357. doi: 10.1093/bib/bbaa357. Brief Bioinform. 2021. PMID: 33348366 Free PMC article.
TREAP: A New Topological Approach to Drug Target Inference.
Wang M, Luciani LL, Noh H, Mochan E, Shoemaker JE. Wang M, et al. Biophys J. 2020 Dec 1;119(11):2290-2298. doi: 10.1016/j.bpj.2020.10.021. Epub 2020 Oct 29. Biophys J. 2020. PMID: 33129831 Free PMC article.

References

1. Aha D. W., Kibler D., Albert M. K. (1991). Instance-based learning algorithms. Mach. Learn. 6, 37–66. 10.1007/BF00153759 - DOI - PubMed
1. Altschul S. F., Madden T. L., Schäffer A. A., Zhang J., Zhang Z., Miller W., et al. . (1997). Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402. 10.1093/nar/25.17.3389 - DOI - PMC - PubMed
1. Apsel B., Blair J. A., Gonzalez B., Nazif T. M., Feldman M. E., Aizenstein B., et al. . (2008). Targeted polypharmacology: discovery of dual inhibitors of tyrosine and phosphoinositide kinases. Nat. Chem. Biol. 4, 691–699. 10.1038/nchembio.117 - DOI - PMC - PubMed
1. Atkinson H. J., Morris J. H., Ferrin T. E., Babbitt P. C. (2009). Using sequence similarity networks for visualization of relationships across diverse protein superfamilies. PLoS ONE 4:e4345. 10.1371/journal.pone.0004345 - DOI - PMC - PubMed
1. Batista G. E. A. P. A., Monard M. C., Bazzan A. L. C. (2004). Improving rule induction precision for automated annotation by balancing skewed data sets. Knowl. Explor. Life Sci. Inform. Proc. 3303, 20–32. 10.1007/978-3-540-30478-4_3 - DOI

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Identifying problematic drugs based on the characteristics of their targets

Affiliations

Identifying problematic drugs based on the characteristics of their targets

Authors

Affiliations

Abstract

Figures

Similar articles

Cited by

References

LinkOut - more resources

Full Text Sources

Other Literature Sources