Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Feb 1;26(3):319-25.
doi: 10.1093/bioinformatics/btp664. Epub 2009 Dec 4.

Optimization of minimum set of protein-DNA interactions: a quasi exact solution with minimum over-fitting

Affiliations

Optimization of minimum set of protein-DNA interactions: a quasi exact solution with minimum over-fitting

N A Temiz et al. Bioinformatics. .

Abstract

Motivation: A major limitation in modeling protein interactions is the difficulty of assessing the over-fitting of the training set. Recently, an experimentally based approach that integrates crystallographic information of C2H2 zinc finger-DNA complexes with binding data from 11 mutants, 7 from EGR finger I, was used to define an improved interaction code (no optimization). Here, we present a novel mixed integer programming (MIP)-based method that transforms this type of data into an optimized code, demonstrating both the advantages of the mathematical formulation to minimize over- and under-fitting and the robustness of the underlying physical parameters mapped by the code.

Results: Based on the structural models of feasible interaction networks for 35 mutants of EGR-DNA complexes, the MIP method minimizes the cumulative binding energy over all complexes for a general set of fundamental protein-DNA interactions. To guard against over-fitting, we use the scalability of the method to probe against the elimination of related interactions. From an initial set of 12 parameters (six hydrogen bonds, five desolvation penalties and a water factor), we proceed to eliminate five of them with only a marginal reduction of the correlation coefficient to 0.9983. Further reduction of parameters negatively impacts the performance of the code (under-fitting). Besides accurately predicting the change in binding affinity of validation sets, the code identifies possible context-dependent effects in the definition of the interaction networks. Yet, the approach of constraining predictions to within a pre-selected set of interactions limits the impact of these potential errors to related low-affinity complexes.

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Interactions of ZF-DNA triplets. (A) EGR–DNA complex (Elrod-Erickson et al., 1996). (B) Binding mode of finger I of EGR. H-bonds are shown as pink dashed lines. Binding site residues are indicated. (C) 2D representation of the interaction network of finger I of EGR to its DNA target. Inter-molecular H-bonds are indicated by arrows between residues and DNA, side-chain backbone and side-chain–side-chain intra-molecular bonds are noted as arrows over the top and lines below the protein sequence, respectively. (D) Typical interaction network of an EGR-like ZF. H-bonds typically form at positions −1, +2, +3 and +6 with respect to the beginning of the helix. Dashed lines in fingers II and III from pos +2 show the possible H-bonds by Ser residues to C (finger II) or A (finger III) in the complementary strand.
Fig. 2.
Fig. 2.
Sketch of three feasible submodels for two mutants, and the nine possible combinations (I,…, IX) in which, depending on the value and number of parameters, different submodels minimize the binding free energy (see text). Arrows correspond to H-bonds, circles and squares correspond to desolvation penalties and half open or filled triangles indicate the absence or presence of excess water molecules near the indicated interaction. For details and color codes see (Temiz and Camacho, 2009).
Fig. 3.
Fig. 3.
Mapping of MIP parameters onto a free energy landscape of four submodels (‘j’) of complex (‘i’) QDNR/GAC. The condition that only one submodel minimizes the free energy is imposed by the constraint Σjxij = 1. H-bonds are represented by two letters, the first letter corresponding to the residue and the second the nucleotide. In practice, the arrangement of submodels on the funnel is given by the solution of the MIP application.
Fig. 4.
Fig. 4.
Convergence of MIP optimization code based on EGR mutants of finger I. Top panel shows the R2 correlation coefficient as a function of the number of parameters. Lower panel displays changes in the optimal parameters as equivalent parameters are collapsed into one. For comparison, we show the results of (Temiz and Camacho, 2009) with no optimization (boxed points and black square). The two symbols for six parameters correspond to whether a H-bond or a desolvation penalty is further eliminated as a free parameter.
Fig. 5.
Fig. 5.
Predictions of ΔΔGbind in independent validation datasets of fingers II and III mutants of EGR. R2 correlation coefficient of predicted and experimental changes in binding affinities for MIP solutions with different number of parameters. Open spheres and open squares show correlation coefficients for finger II and III mutants of EGR, respectively. Solid sphere and solid square show the original predicted correlation coefficient (Temiz and Camacho, 2009). The two symbols for six parameters correspond to whether a H-bond or a desolvation penalty is further eliminated as a free parameter.
Fig. 6.
Fig. 6.
Re-examining context-dependent solvation effects in finger II mutants of EGR. (A) Changes in solvation patterns for finger II mutants. First column shows the finger I crystal complex of Q (dark boxes) and D (light boxes) binding modes (Elrod-Erickson et al., 1998). Second column shows the original solvation patterns (Temiz and Camacho, 2009) and third column shows the updated solvation patterns (this study). (B) Cartoon of the updated submodel QGDR/GCA complex. DNA is shown in dark sticks. Asp+3 is shown in light sticks. Crystal waters are shown as spheres. Dashed lines indicate H-bond interactions. Gln-1 and Arg+6, shown in light sticks, protect Asp+3 from solvation. Light colored numbers are predicted affinities using optimized code (in parenthesis are affinities based on unoptimized code, and black numbers are predicted and experimental relative affinities.

Similar articles

Cited by

References

    1. Bae KH, et al. Human zinc fingers as building blocks in the construction of artificial transcription factors. Nat. Biotechnol. 2003;21:275–280. - PubMed
    1. Bonvin AM, et al. Water molecules in DNA recognition II: a molecular dynamics view of the structure and hydration of the trp operator. J. Mol. Biol. 1998;282:859–873. - PubMed
    1. Bueno M, Camacho CJ. Acidic groups docked to well defined wetted pockets at the core of the binding interface: a tale of scoring and missing protein interactions in CAPRI. Proteins. 2007a;69:786–792. - PubMed
    1. Bueno M, et al. SIMPLE estimate of the free energy change due to aliphatic mutations: superior predictions based on first principles. Proteins. 2007b;68:850–862. - PubMed
    1. Camacho CJ, et al. Scoring a diverse set of high-quality docked conformations: a metascore based on electrostatic and desolvation interactions. Proteins. 2006;63:868–877. - PubMed

Publication types