Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2005 Feb 19:6:33.
doi: 10.1186/1471-2105-6-33.

PSSM-based prediction of DNA binding sites in proteins

Affiliations

PSSM-based prediction of DNA binding sites in proteins

Shandar Ahmad et al. BMC Bioinformatics. .

Abstract

Background: Detection of DNA-binding sites in proteins is of enormous interest for technologies targeting gene regulation and manipulation. We have previously shown that a residue and its sequence neighbor information can be used to predict DNA-binding candidates in a protein sequence. This sequence-based prediction method is applicable even if no sequence homology with a previously known DNA-binding protein is observed. Here we implement a neural network based algorithm to utilize evolutionary information of amino acid sequences in terms of their position specific scoring matrices (PSSMs) for a better prediction of DNA-binding sites.

Results: An average of sensitivity and specificity using PSSMs is up to 8.7% better than the prediction with sequence information only. Much smaller data sets could be used to generate PSSM with minimal loss of prediction accuracy.

Conclusion: One problem in using PSSM-derived prediction is obtaining lengthy and time-consuming alignments against large sequence databases. In order to speed up the process of generating PSSMs, we tried to use different reference data sets (sequence space) against which a target protein is scanned for PSI-BLAST iterations. We find that a very small set of proteins can actually be used as such a reference data without losing much of the prediction value. This makes the process of generating PSSMs very rapid and even amenable to be used at a genome level. A web server has been developed to provide these predictions of DNA-binding sites for any new protein from its amino acid sequence.

Availability: Online predictions based on this method are available at http://www.netasa.org/dbs-pssm/

PubMed Disclaimer

Figures

Figure 1
Figure 1
Rows of Position Specific Scoring Matrices selected for neural network input: Network inputs consist of the PSSM of the target residue and its two neighboring residues on C- and N-terminals. Each residue is thereby represented by a 20 dimensional vector with integer values. These values represent (logarithmic) effective frequencies of occurrence at respective positions in a multiple alignment. Neural network input layer is therefore made of 20 × 5 = 100 units. Two units in the only hidden layer and one unit in the output layer add up to a total of 202 neural units to be trained in the fully connected neural network.
Figure 2
Figure 2
ROC analysis of binding site prediction using PSSMs against PDNA-RDN reference data set, compared with results obtained from sequence based predictions. The sensitivity of the prediction could be adjusted by changing the threshold on predicted probabilities, to annotate that residue to be DNA-binding or otherwise. As may be noted the area under the PSSM based prediction curve is significantly greater than that obtained from sequence based predictions. In addition, sensitivity versus specificity values also seems to be difficult to manipulate in case of sequence based predictions as points on the curve are very closely spaced. PDNA-RDN curve also shows the levels of prediction scores expected on our web-based predictions.

References

    1. Gutfreund MY, Margalit H. Quantitative parameters for amino acid-base interaction: implications for prediction of protein-DNA binding sites. Nucleic Acids Res. 1998;26:2306–2312. doi: 10.1093/nar/26.10.2306. - DOI - PMC - PubMed
    1. Pabo CO, Nekludova L. Geometric analysis and comparison of protein-DNA interfaces: why is there no simple code for recognition? J Mol Biol. 2000;301:597–624. doi: 10.1006/jmbi.2000.3918. - DOI - PubMed
    1. Luscombe NM, Thornton JM. Protein-DNA Interactions: Amino Acid Conservation and the Effects of Mutations on Binding Specificity. J Mol Biol. 2002;320:991–1009. doi: 10.1016/S0022-2836(02)00571-5. - DOI - PubMed
    1. Stawiski EW, Gregoret LM, Mandel-Gutfreund Y. Annotating Nucleic Acid binding function based on protein structure. J Mol Biol. 2003;326:1065–1079. doi: 10.1016/S0022-2836(03)00031-7. - DOI - PubMed
    1. Ahmad S, Gromiha MM, Sarai A. Analysis and Prediction of DNA-binding proteins and their binding residues based on Composition, Sequence and Structural Information. Bioinformatics. 2004;20:477–486. doi: 10.1093/bioinformatics/btg432. - DOI - PubMed

Publication types

MeSH terms