Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 May 3:11:225.
doi: 10.1186/1471-2105-11-225.

Creating PWMs of transcription factors using 3D structure-based computation of protein-DNA free binding energies

Affiliations

Creating PWMs of transcription factors using 3D structure-based computation of protein-DNA free binding energies

Denitsa Alamanova et al. BMC Bioinformatics. .

Abstract

Background: Knowledge of transcription factor-DNA binding patterns is crucial for understanding gene transcription. Numerous DNA-binding proteins are annotated as transcription factors in the literature, however, for many of them the corresponding DNA-binding motifs remain uncharacterized.

Results: The position weight matrices (PWMs) of transcription factors from different structural classes have been determined using a knowledge-based statistical potential. The scoring function calibrated against crystallographic data on protein-DNA contacts recovered PWMs of various members of widely studied transcription factor families such as p53 and NF-kappaB. Where it was possible, extensive comparison to experimental binding affinity data and other physical models was made. Although the p50p50, p50RelB, and p50p65 dimers belong to the same family, particular differences in their PWMs were detected, thereby suggesting possibly different in vivo binding modes. The PWMs of p63 and p73 were computed on the basis of homology modeling and their performance was studied using upstream sequences of 85 p53/p73-regulated human genes. Interestingly, about half of the p63 and p73 hits reported by the Match algorithm in the altogether 126 promoters lay more than 2 kb upstream of the corresponding transcription start sites, which deviates from the common assumption that most regulatory sites are located more proximal to the TSS. The fact that in most of the cases the binding sites of p63 and p73 did not overlap with the p53 sites suggests that p63 and p73 could influence the p53 transcriptional activity cooperatively. The newly computed p50p50 PWM recovered 5 more experimental binding sites than the corresponding TRANSFAC matrix, while both PWMs showed comparable receiver operator characteristics.

Conclusions: A novel algorithm was developed to calculate position weight matrices from protein-DNA complex structures. The proposed algorithm was extensively validated against experimental data. The method was further combined with Homology Modeling to obtain PWMs of factors for which crystallographic complexes with DNA are not yet available. The performance of PWMs obtained in this work in comparison to traditionally constructed matrices demonstrates that the structure-based approach presents a promising alternative to experimental determination of transcription factor binding properties.

PubMed Disclaimer

Figures

Figure 1
Figure 1
The PWM computational scheme used in the present study.
Figure 2
Figure 2
Sequence logos for the p53 tetramer. From top: the p53 tetramer PWM from this study, the experimental one from Ref. [19] derived from 100 sites, one computed with the DDNA2 server, experimental matrix from Ref. [18] obtained from affinity measurements, and the corresponding matrix V$P53_01 from TRANSFAC. All sequence logos presented here were made with enoLOGOS [39].
Figure 3
Figure 3
The calculated scores (according to Eq.1) of the 51 oligonucleotides provided in Ref. [18] plotted against the logarithms of the dissociation constants measured in the same study.
Figure 4
Figure 4
Homology modeling using the p63 and p73 DNA-binding domains. From top: the p53, p63, and p73 dimer PWMs from this study and the corresponding p53 TRANSFAC logo from entry V$P53_02. The p63 and p73 PWMs were obtained by homology modeling using the p53 binding domain. For detailed presentations, the logos were computed by plotting the frequencies and not by calculating relative entropy as in Fig. 2.
Figure 5
Figure 5
Results of Match scan on 126 human promoter sequences. On the left is shown the distribution of reported hits for the p53, p63, and p73 PWMs when the promoter window is set to [-1900,100] in respect to the transcription start site, on the right the same results using larger promoter window [-4900,100]. About the half of the p63 and p73 hits lay beyond the 2 kb promoter window.
Figure 6
Figure 6
From top: the p50 homodimer, p50p65, and p50RelB heterodimers, and the general NFKB logo from the TRANSFAC matrix V$NFKAPPAB_01.
Figure 7
Figure 7
The GABP heterodimer sequence logo from this work (top) and the corresponding logo from TRANSFAC entry V$GABP_B (bottom).
Figure 8
Figure 8
The ERα logo from this work (top) and the corresponding logo from TRANSFAC entry V$ER_Q6_02.
Figure 9
Figure 9
Estimation of the PWM accuracy in distinguishing true positive from false positive TF binding sites. From top to bottom: performance of the p53 tetramer, NF-κB, GABP and ERα PWMs.

References

    1. Liu LA, Bader JS. Ab initio prediction of transcription factor binding sites. Pacific Symposium on Biocomputing. 2007;12:484–495. full_text. - PubMed
    1. Rohs R, Bloch I, Sklenar H, Shakked Z. Molecular flexibility in ab initio drug docking to DNA:binding-site and binding-mode transitions in all-atom Monte Carlo simulations. Nucleic Acids Res. 2005;33:7048–7057. doi: 10.1093/nar/gki1008. - DOI - PMC - PubMed
    1. Robertson TA, Varani G. An All-Atom, Distance-Dependent Scoring Function for the Prediction of Protein-DNA Interactions From Structure. PROTEINS: Structure, Function, and Bioinformatics. 2007;66:359–374. doi: 10.1002/prot.21162. - DOI - PubMed
    1. Liu Z, Mao F, Guo JT, Yan B, Wang P, Qu Y, Xu Y. Quantitative evaluation of protein-DNA interactions using an optimized knowledge-based potential. Nucleic Acids Res. 2005;33:546–558. doi: 10.1093/nar/gki204. - DOI - PMC - PubMed
    1. Morozov AV, Havranek JJ, Baker D, Siggia ED. Protein-DNA binding specificity predictions with structural models. Nucleic Acids Res. 2005;33:5781–5798. doi: 10.1093/nar/gki875. - DOI - PMC - PubMed

Publication types