Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Oct 4;13(10):e0204999.
doi: 10.1371/journal.pone.0204999. eCollection 2018.

Efficient multi-task chemogenomics for drug specificity prediction

Affiliations

Efficient multi-task chemogenomics for drug specificity prediction

Benoit Playe et al. PLoS One. .

Abstract

Adverse drug reactions, also called side effects, range from mild to fatal clinical events and significantly affect the quality of care. Among other causes, side effects occur when drugs bind to proteins other than their intended target. As experimentally testing drug specificity against the entire proteome is out of reach, we investigate the application of chemogenomics approaches. We formulate the study of drug specificity as a problem of predicting interactions between drugs and proteins at the proteome scale. We build several benchmark datasets, and propose NN-MT, a multi-task Support Vector Machine (SVM) algorithm that is trained on a limited number of data points, in order to solve the computational issues or proteome-wide SVM for chemogenomics. We compare NN-MT to different state-of-the-art methods, and show that its prediction performances are similar or better, at an efficient calculation cost. Compared to its competitors, the proposed method is particularly efficient to predict (protein, ligand) interactions in the difficult double-orphan case, i.e. when no interactions are previously known for the protein nor for the ligand. The NN-MT algorithm appears to be a good default method providing state-of-the-art or better performances, in a wide range of prediction scenario that are considered in the present study: proteome-wide prediction, protein family prediction, test (protein, ligand) pairs dissimilar to pairs in the train set, and orphan cases.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Nested 5-fold-CV performance of the MT method on the S1S4 datasets.
Numerical values can be found in Supporting Information S1 Table.
Fig 2
Fig 2. Nested 5-fold-CV performance of the MT method on the S1-S4 datasets.
Numerical values can be found in Supporting Information S2 Table.
Fig 3
Fig 3. Performance of single-task and MT-intra as a function of the n/n+ ratio.
Numerical values can be found in Supporting Information S4 Table.
Fig 4
Fig 4. AUPR as a function of the ne-/ne+ ratio for increasing numbers random extra-task points in the train set.
(a): NN-MT. (b): RN-MT. The blue horizontal line corresponds to MT-intra (which is trained only on intra-task pairs). Numerical values can be found in Supporting Information S5 and S6 Tables.
Fig 5
Fig 5. AUPR scores as a function of the n/n+ ratio, for percentile-based threshold θ ranging from 20% to 80%.
(a): MT-intra method. (b): ligand-based ST method. Numerical values can be found in Supporting Information, respectively S7 and S8 Tables.
Fig 6
Fig 6. AUPR score of NN-MT and RN-MT as a function of the ne-/ne+ ratio, for a number of extra-task positive pairs ne+ varying from 0 to 50, and for percentile-based similarity threshold θ of 20 and 80 applied to the intra-task positive pairs.
(a): NN-MT, θ = 0.20. (b): NN-MT, θ = 0.80. (c): RN-MT, θ = 0.20. (d): RN-MT, θ = 0.80. Numerical values can be found in Supporting Information, respectively S9–S12 Tables.
Fig 7
Fig 7. AUPR score as a function of percentile-based similarity θ, for n/n+ = 10, a number of extra-task positive pairs ne+ = 10 and a ratio of ne-/ne+ = 1 for extra-task pairs.
Numerical values can be found in Supporting Information S13 Table.
Fig 8
Fig 8. AUPR scores of the NN-MT and RN-MT multi-task methods as a function of the ne-/ne+ ratio, for a number of extra-task positive pairs ne+ varying from 1 to 50.
(a): NN-MT, θ = 0.20. (b): NN-MT, θ = 0.80. (c): RN-MT, θ = 0.20. (d): RN-MT, θ = 0.80. The two methods are trained with intra-task and extra-task examples that are both dissimilar to the tested pair (percentile-based similarity thresholds θ of 20 and 80). Exact values can be found in Supporting Information respectively in S14–S17 Tables.
Fig 9
Fig 9. AUPR score of the considered multi-task methods on the GPCR family as a function of the ne-/ne+ ratio, for a varying number ne+ of extra-task positive pairs.
(a): NN-MT-family (family hierarchy kernel). (b): NN-MT (sequence kernel). (c): RN-MT-family (family hierarchy kernel). (d): RN-MT (sequence kernel). The blue horizontal line corresponds to the MT-intra method trained only on intra-task pairs. Numerical values can be found in Supporting Information, respectively S18–S21 Tables.
Fig 10
Fig 10. AUPR of the multi-task methods on the IC family.
(a): NN-MT-family (family hierarchy kernel). (b): NN-MT (sequence kernel). (c): RN-MT-family (family hierarchy kernel). (d): RN-MT (sequence kernel). The blue horizontal line corresponds to the MT-intra method trained only on intra-task pairs. Numerical values can be found in Supporting Information, respectively S22–S25 Tables.
Fig 11
Fig 11. AUPR score of the multi-task methods within the kinase family.
(a): NN-MT-family (family hierarchy kernel). (b): NN-MT (sequence kernel). (c): RN-MT-family (family hierarchy kernel). (d): RN-MT (sequence kernel). The blue horizontal line corresponds to the MT-intra method trained only on intra-task pairs. Numerical values can be found in Supporting Information, respectively S26–S29 Tables.

References

    1. DiMasi JA, Grabowski HG, Hansen RW. Innovation in the pharmaceutical industry: new estimates of R&D costs. Journal of health economics. 2016;47:20–33. 10.1016/j.jhealeco.2016.01.012 - DOI - PubMed
    1. Miguel A, Azevedo LF, Araújo M, Pereira AC. Frequency of adverse drug reactions in hospitalized patients: a systematic review and meta-analysis. Pharmacoepidemiology and drug safety. 2012;21(11):1139–1154. 10.1002/pds.3309 - DOI - PubMed
    1. Lazarou J, Pomeranz BH, Corey PN. Incidence of adverse drug reactions in hospitalized patients: a meta-analysis of prospective studies. Jama. 1998;279(15):1200–1205. 10.1001/jama.279.15.1200 - DOI - PubMed
    1. Onakpoya IJ, Heneghan CJ, Aronson JK. Post-marketing withdrawal of 462 medicinal products because of adverse drug reactions: a systematic review of the world literature. BMC medicine. 2016;14(1):10 10.1186/s12916-016-0553-2 - DOI - PMC - PubMed
    1. Scheiber J, Chen B, Milik M, Sukuru SCK, Bender A, Mikhailov D, et al. Gaining insight into off-target mediated effects of drug candidates with a comprehensive systems chemical biology analysis. Journal of chemical information and modeling. 2009;49(2):308–317. 10.1021/ci800344p - DOI - PubMed

Publication types

MeSH terms

Substances