Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2007 Mar 6;581(5):1058-66.
doi: 10.1016/j.febslet.2007.01.086. Epub 2007 Feb 7.

Residue-level prediction of DNA-binding sites and its application on DNA-binding protein predictions

Affiliations

Residue-level prediction of DNA-binding sites and its application on DNA-binding protein predictions

Nitin Bhardwaj et al. FEBS Lett. .

Abstract

Protein-DNA interactions are crucial to many cellular activities such as expression-control and DNA-repair. These interactions between amino acids and nucleotides are highly specific and any aberrance at the binding site can render the interaction completely incompetent. In this study, we have three aims focusing on DNA-binding residues on the protein surface: to develop an automated approach for fast and reliable recognition of DNA-binding sites; to improve the prediction by distance-dependent refinement; use these predictions to identify DNA-binding proteins. We use a support vector machines (SVM)-based approach to harness the features of the DNA-binding residues to distinguish them from non-binding residues. Features used for distinction include the residue's identity, charge, solvent accessibility, average potential, the secondary structure it is embedded in, neighboring residues, and location in a cationic patch. These features collected from 50 proteins are used to train SVM. Testing is then performed on another set of 37 proteins, much larger than any testing set used in previous studies. The testing set has no more than 20% sequence identity not only among its pairs, but also with the proteins in the training set, thus removing any undesired redundancy due to homology. This set also has proteins with an unseen DNA-binding structural class not present in the training set. With the above features, an accuracy of 66% with balanced sensitivity and specificity is achieved without relying on homology or evolutionary information. We then develop a post-processing scheme to improve the prediction using the relative location of the predicted residues. Balanced success is then achieved with average sensitivity, specificity and accuracy pegged at 71.3%, 69.3% and 70.5%, respectively. Average net prediction is also around 70%. Finally, we show that the number of predicted DNA-binding residues can be used to differentiate DNA-binding proteins from non-DNA-binding proteins with an accuracy of 78%. Results presented here demonstrate that machine-learning can be applied to automated identification of DNA-binding residues and that the success rate can be ameliorated as more features are added. Such functional site prediction protocols can be useful in guiding consequent works such as site-directed mutagenesis and macromolecular docking.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Plot of binding propensity vs. amino acids with different charges (shown in the parentheses).
Figure 2
Figure 2
Composition of the neighboring residue around binding and non-binding residues.
Figure 3
Figure 3
Fractions of binding and non-binding residues overlapping with large positive
Figure 4
Figure 4
Composition of the binding and non-binding residues.
Figure 5
Figure 5
Box-and-whisker plots of the Accessible Surface Area (ASA) in % and average residue potential for binding and non-binding residues.
Figure 6
Figure 6
Plot of % occurrence of binding and non-binding residues in different secondary structures (white and gray bars) and propensity of binding for a residue present in that secondary structure (black bars). Propensity was obtained by dividing the number of binding residues present in a secondary structure by the total number of residues present in that secondary structure.
Figure 7
Figure 7
Box-plot of the performance for holdout evaluation performed over the set of 37 proteins.
Figure 8
Figure 8
Prediction of DNA binding residues on the protein-DNA complex of arc repressor operator (PDB code 1par). DNA and protein are represented in tube representation in magenta and green, respectively. True positives, false negatives and false positives are represented by ball representation (Cα atoms) in blue, red and yellow, respectively.
Figure 9
Figure 9
Number of positively predicted neighbors within 9Å against the fraction of different kinds of residues: true positives (TP), true negatives (TN) and false negatives (FN).
Figure 10
Figure 10
Performance after enrichment. Negatively predicted residues having more than 3 positively predicted residues within 9 Å around them were labeled positive, hence increasing sensitivity.
Figure 11
Figure 11
Performance after trimming by removing residues if the minimum distance from any positively predicted residue was greater than 10Å.
Figure 12
Figure 12
Performance after enrichment followed by trimming (A) and trimming followed by enrichment (B).
Figure 13
Figure 13
ROC curve showing the initial performance curve and the performance points after the refinement steps. The gray diagonal line corresponds to a completely random predictor. Other cited studies are by Kuznetsov et al. [14] and Yan et al. [15]. * Ahmad et al. [12]. ** Ahamd et al. [13].
Figure 14
Figure 14
Distinction between DNA-binding and DNA-non-binding proteins on the basis of residues predicted to be binding before (A) and after (B) post-processing. Y-axis plots the fraction of proteins falling in each bin.

Similar articles

Cited by

References

    1. Luscombe NM, Austin SE, Berman HM, Thornton JM. An overview of the structures of protein-DNA complexes. Genome Biol. 2000;1:1–37. - PMC - PubMed
    1. Jones S, Heyningen Pv, Berman HM, Thornton JM. Protein-DNA interactions: A Structural Analysis. J Mol Biol. 1999;287:877–896. - PubMed
    1. Ren B, et al. Genome-wide location and function of DNA binding proteins. Science. 2000;290:2306–9. - PubMed
    1. Stormo GD. DNA binding sites: representation and discovery. Bioinformatics. 2000;16:16–23. - PubMed
    1. Stormo G. DNA binding sites: representation and discovery. Bioinformatics. 2000;16:16–23. - PubMed

Publication types