Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Sep 1:6:186.
doi: 10.3389/fphar.2015.00186. eCollection 2015.

Identifying problematic drugs based on the characteristics of their targets

Affiliations

Identifying problematic drugs based on the characteristics of their targets

Tiago J S Lopes et al. Front Pharmacol. .

Abstract

Identifying promising compounds during the early stages of drug development is a major challenge for both academia and the pharmaceutical industry. The difficulties are even more pronounced when we consider multi-target pharmacology, where the compounds often target more than one protein, or multiple compounds are used together. Here, we address this problem by using machine learning and network analysis to process sequence and interaction data from human proteins to identify promising compounds. We used this strategy to identify properties that make certain proteins more likely to cause harmful effects when targeted; such proteins usually have domains commonly found throughout the human proteome. Additionally, since currently marketed drugs hit multiple targets simultaneously, we combined the information from individual proteins to devise a score that quantifies the likelihood of a compound being harmful to humans. This approach enabled us to distinguish between approved and problematic drugs with an accuracy of 60-70%. Moreover, our approach can be applied as soon as candidate drugs are available, as demonstrated with predictions for more than 5000 experimental drugs. These resources are available at http://sourceforge.net/projects/psin/.

Keywords: drug safety; machine learning; multi-target drugs; protein networks; supervised learning; target validation.

PubMed Disclaimer

Figures

Figure 1
Figure 1
(A) Nodes with up to ~500 connections are neighbors of proteins with approximately the same degree. After the peak, nodes with higher degrees are connected to other nodes with ~400 connections. Darker tones of blue indicate a higher concentration of nodes with these degree values. (B) Depicted are a few neighbors of notch1, their families, and domain compositions (with the shared domains boxed red). In the PSIN, notch1 is connected to members of the Peptidase S1 family through their shared EGF domain. The proteins from the other three families are connected to each other and to notch1 by their ankyrin domain.
Figure 2
Figure 2
(A) Although most targets of approved drugs are exclusive, the problematic targets are almost entirely covered by the approved category. Between parentheses are the number of singleton proteins in the PSIN. (B) Approved and problematic drugs have different numbers of reported targets. While most problematic drugs have only one target reported, approved drugs have several—identified either by the community after the drug is marketed or by companies as part of the drug-approval process. (C) The Burt's constraint was proposed in a sociological context to study positions of advantage for individuals in a group. In this simple example, if the nodes are individuals, on the left no node can negotiate or bargain with the others, since they all have alternative connections. However, on the right, if a structural hole exists, Node 1 is in a better position, since the other two nodes may not be aware of each other's existence;hence, Node 1 is less “constrained” than the other two. In a protein similarity context, proteins with low constraint values are generally those with several common domains, located between different protein families. In contrast, proteins with large constraint values are the peripheral nodes, with a few domains shared among only a few other proteins.
Figure 3
Figure 3
(A–D) In general, targets of problematic drugs have high degrees and closeness centralities in the PSIN and PPI networks. However, their betweenness values are not significantly different from the targets of approved drugs in either protein network (One-Way ANOVA, ***p < < 0.0001 and *p>0.05, sample sizes for each group are the same as depicted in Figure 2A). The closeness from the targets of both networks was close to two main values, differing by only decimal digits; therefore, we rounded the values to their closest integer, namely 17 or 19 in the PSIN and 14 or 18 in the PPI. While three PSIN centrality measures were found to be strong indicators of the differences between targets of problematic and approved drugs, the centrality measures of the PPI network could also detect these differences, albeit in a moderate fashion (Tukey's Honest Significance Difference—Supplementary Figure 2). Overall, this likely stems from the fact that the current PPIs still have only ~10,000 proteins and numerous false-positive interactions; with new proteins and high-quality interactions being constantly added, we expect this to change in the future.
Figure 4
Figure 4
(A) The cumulative percentage of approved, experimental, and problematic drugs, according to their rejection scores (RSs) (the complete predictions are available in Supplementary Tables S3, S4). (B) We predicted the status of experimental drugs from the TTD, Drugbank, and ChEMBL databases. In general, more than half of the drugs have high rejection scores, whereas about 20–30% have RSs that position them among the low-risk compounds. Each chart contains the number of drugs of the respective group.

Similar articles

Cited by

References

    1. Aha D. W., Kibler D., Albert M. K. (1991). Instance-based learning algorithms. Mach. Learn. 6, 37–66. 10.1007/BF00153759 - DOI - PubMed
    1. Altschul S. F., Madden T. L., Schäffer A. A., Zhang J., Zhang Z., Miller W., et al. . (1997). Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402. 10.1093/nar/25.17.3389 - DOI - PMC - PubMed
    1. Apsel B., Blair J. A., Gonzalez B., Nazif T. M., Feldman M. E., Aizenstein B., et al. . (2008). Targeted polypharmacology: discovery of dual inhibitors of tyrosine and phosphoinositide kinases. Nat. Chem. Biol. 4, 691–699. 10.1038/nchembio.117 - DOI - PMC - PubMed
    1. Atkinson H. J., Morris J. H., Ferrin T. E., Babbitt P. C. (2009). Using sequence similarity networks for visualization of relationships across diverse protein superfamilies. PLoS ONE 4:e4345. 10.1371/journal.pone.0004345 - DOI - PMC - PubMed
    1. Batista G. E. A. P. A., Monard M. C., Bazzan A. L. C. (2004). Improving rule induction precision for automated annotation by balancing skewed data sets. Knowl. Explor. Life Sci. Inform. Proc. 3303, 20–32. 10.1007/978-3-540-30478-4_3 - DOI

LinkOut - more resources