Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Nov 3;11(1):21594.
doi: 10.1038/s41598-021-01099-4.

Explainable machine learning predictions of dual-target compounds reveal characteristic structural features

Affiliations

Explainable machine learning predictions of dual-target compounds reveal characteristic structural features

Christian Feldmann et al. Sci Rep. .

Abstract

Compounds with defined multi-target activity play an increasingly important role in drug discovery. Structural features that might be signatures of such compounds have mostly remained elusive thus far. We have explored the potential of explainable machine learning to uncover structural motifs that are characteristic of dual-target compounds. For a pharmacologically relevant target pair-based test system designed for our study, accurate prediction models were derived and the influence of molecular representation features of test compounds was quantified to explain the predictions. The analysis revealed small numbers of specific features whose presence in dual-target and absence in single-target compounds determined accurate predictions. These features formed coherent substructures in dual-target compounds. From computational analysis of specific feature contributions, structural motifs emerged that were confirmed to be signatures of different dual-target activities. Our findings demonstrate the ability of explainable machine learning to bridge between predictions and intuitive chemical analysis and reveal characteristic substructures of dual-target compounds.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Figure 1
Figure 1
Shapley value analysis. For a test compound, positive (red) and negative (blue) SV feature contributions yield a probability P of DT activity. In this case, contributions from all but one feature present in the compound are positive. The sum of the base value of the classifier (0.5) and all feature importance values results in a probability of DT activity of 0.98.
Figure 2
Figure 2
Global contributions of present and absent features. For each correctly classified DT- and ST-CPD, the sum of SVs was calculated separately for representation features that were present (bit status on, black) or absent (off, white). (a) Shows results for the MAOB-A2aR and (b) for the MAOB-AChE target pair. SV distributions are captured as box plots. The upper and lower whiskers indicate maximum and minimum values, the boundaries of the box represent the upper and lower quartiles, and the median is depicted as a horizontal line.
Figure 3
Figure 3
Feature extraction scheme. On the basis of SVs, the N most important features present in correctly predicted DT-CPDs were pre-selected and the M features occurring most frequently across these compounds were identified and prioritized.
Figure 4
Figure 4
Feature distributions. Boxplots show the number of features per (a) MAOB-A2aR and (b) and MAOB-AChE DT-CPD.
Figure 5
Figure 5
Distribution of prioritized features. The histogram reports the number of prioritized features in DT-CPDs for the (a) MAOB-A2aR and (b) MAOB-AChE target pair. Predictions are summarized for the two single trials reported in Table 2.
Figure 6
Figure 6
Feature mapping onto dual-target compounds from the first pair. Prioritized features are mapped onto to the structures of MAOB-A2aR DT-CPDs. Atoms are color-coded according to the number of features containing them, as indicated by the insert at the bottom of (d). Accordingly, the color code ranges from light yellow for atoms contained in one feature to dark red for atoms contained in seven features. Features determining the prediction of the compounds in (a) and (b) delineate a caffeine substructure while features contained in the compounds in (c) and (d) define a thiazine moiety.
Figure 7
Figure 7
Contributions of caffeine-delineating features and others. For caffeine-containing DT-CPDs, cumulative SV contributions of features defining the caffeine moiety (green), features mapping elsewhere in the compound (blue), and absent features (red) are reported. The height of each bar accounts for the sum of feature SVs per compound.
Figure 8
Figure 8
Feature mapping onto dual-target compounds from the second pair. Prioritized features are mapped onto MAOB-AChE DT-CPDs. The representation is according to Fig. 6. Atoms are color-coded according to the number of features containing them, as indicated by the insert at the bottom of (b). Accordingly, the color code ranges from light yellow for atoms contained in one feature to dark red for atoms contained in six features. Features determining the prediction of the compounds in (a) and (b) mostly delineate a coumarin substructure and an acrylamide linker fragment, respectively.

References

    1. Mater AC, Michelle LC. Deep learning in chemistry. J. Chem. Inf. Model. 2019;59:2545–2559. doi: 10.1021/acs.jcim.9b00266. - DOI - PubMed
    1. Walters WP, Barzilay R. Applications of deep learning in molecule generation and molecular property prediction. Acc. Chem. Res. 2020;54:263–270. doi: 10.1021/acs.accounts.0c00699. - DOI - PubMed
    1. Bajorath J. State-of-the-art of artificial intelligence in medicinal chemistry. Future Sci. OA. 2012;7:FSO702. doi: 10.2144/fsoa-2021-0030. - DOI - PMC - PubMed
    1. Castelvecchi D. Can we open the black box of AI? Nature. 2016;538:20–23. doi: 10.1038/538020a. - DOI - PubMed
    1. Rudin C. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat. Mach. Intell. 2019;1:206–215. doi: 10.1038/s42256-019-0048-x. - DOI - PMC - PubMed

Publication types