Protein-DNA binding specificity predictions with structural models

Alexandre V Morozov¹, James J Havranek, David Baker, Eric D Siggia

Affiliations

PMID: 16246914
PMCID: PMC1270944
DOI: 10.1093/nar/gki875

Protein-DNA binding specificity predictions with structural models

Alexandre V Morozov et al. Nucleic Acids Res. 2005.

. 2005 Oct 24;33(18):5781-98.

doi: 10.1093/nar/gki875. Print 2005.

Authors

Alexandre V Morozov¹, James J Havranek, David Baker, Eric D Siggia

Affiliation

¹ Center for Studies in Physics and Biology, The Rockefeller University, 1230 York Avenue, New York, NY 10021, USA. morozov@edsb.rockefeller.edu

PMID: 16246914
PMCID: PMC1270944
DOI: 10.1093/nar/gki875

Abstract

Protein-DNA interactions play a central role in transcriptional regulation and other biological processes. Investigating the mechanism of binding affinity and specificity in protein-DNA complexes is thus an important goal. Here we develop a simple physical energy function, which uses electrostatics, solvation, hydrogen bonds and atom-packing terms to model direct readout and sequence-specific DNA conformational energy to model indirect readout of DNA sequence by the bound protein. The predictive capability of the model is tested against another model based only on the knowledge of the consensus sequence and the number of contacts between amino acids and DNA bases. Both models are used to carry out predictions of protein-DNA binding affinities which are then compared with experimental measurements. The nearly additive nature of protein-DNA interaction energies in our model allows us to construct position-specific weight matrices by computing base pair probabilities independently for each position in the binding site. Our approach is less data intensive than knowledge-based models of protein-DNA interactions, and is not limited to any specific family of transcription factors. However, native structures of protein-DNA complexes or their close homologs are required as input to the model. Use of homology modeling can significantly increase the extent of our approach, making it a useful tool for studying regulatory pathways in many organisms and cell types.

PubMed Disclaimer

Figures

**Figure 1**
ΔΔG predictions (ddG_comp) versus experimental measurements (ddG_exp). (A) Static model binding energy predictions for the set of experimental measurements used in fitting static model weights. (B) Dynamic model binding energy predictions for the same set of experimental measurements. Closed circles, static/dynamic model; red triangles, contact model; green triangles, number of mutations from the consensus sequence. r1-Linear correlation coefficient for the static/dynamic model and r2-linear correlation coefficient for the contact model. Three Zif268 datasets from Table 1 [two for Zif268 wild-type (42,67) and one for Zif268 D20A mutant (67)] are combined into one panel.

**Figure 2**
ΔΔG predictions (ddG_comp) versus experimental measurements (ddG_exp). Static model binding energy predictions for Ndt80, (34) MAT a1/α2, (30) AtERF1 (45) and c-Myb. (47) Closed circles, static model; red triangles, contact model; green triangles, number of mutations from the consensus sequence. r1-Linear correlation coefficient for the static model and r2-linear correlation coefficient for the contact model.

**Figure 3**
Experimental binding affinities conferred by indirect readout can be explained with DNA conformational energies alone. Dynamic model predictions of DNA base step energies (ddG_dna) versus experimental binding free energies (ddG_exp) for BamHI endonuclease (49) and PU.1 ETS domain (50).

**Figure 4**
Degree of pairwise additivity in binding energies predicted with the dynamic model. Comparison of binding energies computed after making multiple base pairs substitutions (ddG_comp) with the sum of binding energies computed for corresponding one-point mutations of the DNA site from the protein–DNA structure (ddG_pw).

**Figure 5**
PWM predictions for Ndt80 (A) and Zif268 (B). From top to bottom: experiment, contact model based on the consensus sequence and the number of protein–DNA contacts, static model and dynamic model (see text for details). PWMs are displayed using the uniform height WebLogo representation(58): the height of each letter in the column is proportional to its probability in the PWM.

**Figure 6**
PWM predictions by homology for *D.melanogaster* TF bicoid (Bcd) (A) and giant (Gt) (B). From top to bottom: (A) panel 1: experiment; panels 2–4: contact model, static model with full energy function and static model with DNA conformational energy only (with dna–bp and dna–bs weights multiplied by 5) using *D.melanogaster* Engrailed homeodomain Q50K (2hdd) as a template; panels 5–7: contact model, dynamic model with full energy function (reference DNA energies are not subtracted since DNA is bent in homeodomains) and static model with DNA conformational energy only (with dna–bp and dna–bs weights multiplied by 5) using *D.melanogaster* Engrailed wild-type homeodomain (3hdd) as a template. (B) Experiment, contact model, static model and dynamic model using *Homo sapiens* nuclear factor NF-IL6 (C/EBP-β;1gu4) as a template. All amino acids substituted at the protein–DNA interface are repacked in the static model. PWMs are displayed using the uniform height WebLogo representation (58).

See this image and copyright information in PMC

References

1. Bulyk M. Computational prediction of transcription-factor binding site locations. Genome Biol. 2003;5:201. - PMC - PubMed
1. Stormo G.D. DNA binding sites: representation and discovery. Bioinformatics. 2000;16:16–23. - PubMed
1. Siggia E.D. Computational methods for transcriptional regulation. Curr. Opin. Genet. Dev. 2005;15:214–221. - PubMed
1. Seeman N.C., Rosenberg J.M., Rich A. Sequence-specific recognition of double helical nucleic acids by proteins. Proc. Natl Acad. Sci. USA. 1976;73:804–808. - PMC - PubMed
1. Suzuki M., Yagi N. DNA recognition code of transcription factors in the helix–turn–helix, probe helix, hormone receptor, and zinc finger families. Proc. Natl Acad. Sci. USA. 1994;91:12357–12361. - PMC - PubMed

Publication types

Actions
Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Protein-DNA binding specificity predictions with structural models

Affiliation

Protein-DNA binding specificity predictions with structural models

Authors

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Other Literature Sources