Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Apr 24;10(4):e1003589.
doi: 10.1371/journal.pcbi.1003589. eCollection 2014 Apr.

Knowledge-based fragment binding prediction

Affiliations

Knowledge-based fragment binding prediction

Grace W Tang et al. PLoS Comput Biol. .

Abstract

Target-based drug discovery must assess many drug-like compounds for potential activity. Focusing on low-molecular-weight compounds (fragments) can dramatically reduce the chemical search space. However, approaches for determining protein-fragment interactions have limitations. Experimental assays are time-consuming, expensive, and not always applicable. At the same time, computational approaches using physics-based methods have limited accuracy. With increasing high-resolution structural data for protein-ligand complexes, there is now an opportunity for data-driven approaches to fragment binding prediction. We present FragFEATURE, a machine learning approach to predict small molecule fragments preferred by a target protein structure. We first create a knowledge base of protein structural environments annotated with the small molecule substructures they bind. These substructures have low-molecular weight and serve as a proxy for fragments. FragFEATURE then compares the structural environments within a target protein to those in the knowledge base to retrieve statistically preferred fragments. It merges information across diverse ligands with shared substructures to generate predictions. Our results demonstrate FragFEATURE's ability to rediscover fragments corresponding to the ligand bound with 74% precision and 82% recall on average. For many protein targets, it identifies high scoring fragments that are substructures of known inhibitors. FragFEATURE thus predicts fragments that can serve as inputs to fragment-based drug design or serve as refinement criteria for creating target-specific compound libraries for experimental or computational screening.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Knowledge base of protein microenvironments linked to ligand fragments.
For each protein-ligand complex from the PDB, we identify residue atoms interacting with the ligand and note the ligand atoms proximal to them (semi-transparent shaded regions) (top). Next, the FEATURE microenvironments of the residue atoms are calculated (semi-transparent circles) (center). We then map ligand atoms to their pre-computed fragment lists and link them to their proximal microenvironments to form the knowledge base (bottom).
Figure 2
Figure 2. FragFEATURE predicts fragments for a protein pocket of interest.
Given a pocket of interest as a series of microenvironments (semi-transparent circles), we compare each microenvironment to knowledge base microenvironments of the same type to retrieve the five most similar non-homologous neighbors. Each neighbor has a list of bound fragments for which a hypergeometric p-value is determined. For spatially proximal microenvironments (orange, blue, and magenta circles), we combine fragment hypergeometric p-values for shared fragments to generate Fisher's p-values. Denoted with an asterisk are statistically significant fragments with p-value(**)
Figure 3
Figure 3. FragFEATURE performance on the validation ligands.
A) Chemical structure (heavy atoms) of each validation ligand. B) FragFEATURE recall and precision on each validation ligand.
Figure 4
Figure 4. Fragment prediction and validation for exotoxin A.
A) Fragment 2331 (benzamide) and the microenvironments from the query exotoxin A structure associated with the fragment prediction. B) PDB ligand P34 and an alternate structure of exotoxin A bound to P34. The benzamide substructure of P34 is in pink. C) Example nearest neighbor microenvironments. The benzamide substructure of the bound ligands is in pink. The percent sequence identity between each knowledge base structure and exotoxin A is in parentheses. 1UK0, 3C49, 3KCZ, 3GEY, and 3HKV are members of the poly [ADP-ribose] polymerase superfamily while 3KI0 is cholix toxin. Proteins are shown in cartoon representation with microenvironments as semi-transparent spheres. Microenvironment color scheme is arbitrary but consistent between panels. Side chains corresponding to microenvironments are shown in stick representation. Ligands are also drawn in stick representation.
Figure 5
Figure 5. Fragment prediction and validation for DAPK1.
A) Fragment 13509097 and the microenvironments from the query DAPK1 structure associated with the fragment prediction. B) Fragment 2331 (benzamide) and the microenvironments from the query DAPK1 structure associated with the fragment prediction. C) PDB ligand STU and an alternate structure of DAPK1 bound to STU. Fragment 13509097 and 2331 substructures of STU are in pink. Proteins are shown in cartoon representation with microenvironments as semi-transparent spheres. Microenvironment color scheme is arbitrary but consistent between panels. Side chains corresponding to microenvironments are shown in stick representation. Ligands are also drawn in stick representation.
Figure 6
Figure 6. Fragment prediction and validation for aPKC.
A) Fragment 1049 and the microenvironments from the query aPKC structure associated with the fragment prediction. B) PDB ligand C58 and an alternate structure of aPKC bound to C58. Fragment 1049 substructure of C58 is in pink. C) Example nearest neighbor microenvironment from GSK3β. Fragment 1049 of the bound PDB ligand 0KD is in pink. The percent sequence identity between GSK3β and aPKC is in parentheses. D) Fragment 241 and the microenvironments from the query aPKC structure associated with the fragment prediction. E) PDB ligand BI1 and an alternate structure of aPKC bound to BI1. Fragment 241 substructure of BI1 is in purple. F) Example nearest neighbor microenvironment from GSK3β. Fragment 241 of the bound PDB ligand 679 is in purple. The percent sequence identity between GSK3β and aPKC is in parentheses. Proteins are shown in cartoon representation with microenvironments as semi-transparent spheres. Microenvironment color scheme is arbitrary but consistent between panels. Side chains corresponding to microenvironments are shown in stick representation. Ligands are also drawn in stick representation.

Similar articles

Cited by

References

    1. Pammolli F, Magazzini L, Riccaboni M (2011) The productivity crisis in pharmaceutical R&D. Nat Rev Drug Discov 10: 428–438. - PubMed
    1. Scannell JW, Blanckley A, Boldon H, Warrington B (2012) Diagnosing the decline in pharmaceutical R&D efficiency. Nat Rev Drug Discov 11: 191–200. - PubMed
    1. Hopkins AL, Groom CR (2002) The druggable genome. Nat Rev Drug Discov 1: 727–730. - PubMed
    1. Irwin JJ, Sterling T, Mysinger MM, Bolstad ES, Coleman RG (2012) ZINC: A Free Tool to Discover Chemistry for Biology. J Chem Inf Model 52 ((7)) 1757–68. - PMC - PubMed
    1. Bolton E, Wang Y, Thiessen PA, SH B (2008) PubChem: Integrated Platform of Small Molecules and Biological Activities. Annual Reports in Computational Chemistry. Washington, DC: American Chemical Society.

Publication types

LinkOut - more resources