. 2017 Aug 29;18(1):379.

doi: 10.1186/s12859-017-1792-8.

EL_PSSM-RT: DNA-binding residue prediction by integrating ensemble learning with PSSM Relation Transformation

Jiyun Zhou^{1

2}, Qin Lu², Ruifeng Xu^{3

4}, Yulan He⁵, Hongpeng Wang¹

Affiliations

¹ School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, HIT Campus Shenzhen University Town, Xili, Shenzhen, Guangdong, 518055, China.
² Department of Computing, the Hong Kong Polytechnic University, Kowloon, Hong Kong.
³ School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, HIT Campus Shenzhen University Town, Xili, Shenzhen, Guangdong, 518055, China. xuruifeng@hit.edu.cn.
⁴ Shenzhen Engineering Laboratory of Performance Robots at Digital Stage, Shenzhen Graduate School, Harbin Institute of Technology, Shenzhen, China. xuruifeng@hit.edu.cn.
⁵ School of Engineering and Applied Science, Aston University, Birmingham, UK.

PMID: 28851273
PMCID: PMC5576297
DOI: 10.1186/s12859-017-1792-8

EL_PSSM-RT: DNA-binding residue prediction by integrating ensemble learning with PSSM Relation Transformation

Jiyun Zhou et al. BMC Bioinformatics. 2017.

. 2017 Aug 29;18(1):379.

doi: 10.1186/s12859-017-1792-8.

Authors

Jiyun Zhou^{1

2}, Qin Lu², Ruifeng Xu^{3

4}, Yulan He⁵, Hongpeng Wang¹

Affiliations

¹ School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, HIT Campus Shenzhen University Town, Xili, Shenzhen, Guangdong, 518055, China.
² Department of Computing, the Hong Kong Polytechnic University, Kowloon, Hong Kong.
³ School of Computer Science and Technology, Harbin Institute of Technology Shenzhen Graduate School, HIT Campus Shenzhen University Town, Xili, Shenzhen, Guangdong, 518055, China. xuruifeng@hit.edu.cn.
⁴ Shenzhen Engineering Laboratory of Performance Robots at Digital Stage, Shenzhen Graduate School, Harbin Institute of Technology, Shenzhen, China. xuruifeng@hit.edu.cn.
⁵ School of Engineering and Applied Science, Aston University, Birmingham, UK.

PMID: 28851273
PMCID: PMC5576297
DOI: 10.1186/s12859-017-1792-8

Abstract

Background: Prediction of DNA-binding residue is important for understanding the protein-DNA recognition mechanism. Many computational methods have been proposed for the prediction, but most of them do not consider the relationships of evolutionary information between residues.

Results: In this paper, we first propose a novel residue encoding method, referred to as the Position Specific Score Matrix (PSSM) Relation Transformation (PSSM-RT), to encode residues by utilizing the relationships of evolutionary information between residues. PDNA-62 and PDNA-224 are used to evaluate PSSM-RT and two existing PSSM encoding methods by five-fold cross-validation. Performance evaluations indicate that PSSM-RT is more effective than previous methods. This validates the point that the relationship of evolutionary information between residues is indeed useful in DNA-binding residue prediction. An ensemble learning classifier (EL_PSSM-RT) is also proposed by combining ensemble learning model and PSSM-RT to better handle the imbalance between binding and non-binding residues in datasets. EL_PSSM-RT is evaluated by five-fold cross-validation using PDNA-62 and PDNA-224 as well as two independent datasets TS-72 and TS-61. Performance comparisons with existing predictors on the four datasets demonstrate that EL_PSSM-RT is the best-performing method among all the predicting methods with improvement between 0.02-0.07 for MCC, 4.18-21.47% for ST and 0.013-0.131 for AUC. Furthermore, we analyze the importance of the pair-relationships extracted by PSSM-RT and the results validates the usefulness of PSSM-RT for encoding DNA-binding residues.

Conclusions: We propose a novel prediction method for the prediction of DNA-binding residue with the inclusion of relationship of evolutionary information and ensemble learning. Performance evaluation shows that the relationship of evolutionary information between residues is indeed useful in DNA-binding residue prediction and ensemble learning can be used to address the data imbalance issue between binding and non-binding residues. A web service of EL_PSSM-RT ( http://hlt.hitsz.edu.cn:8080/PSSM-RT_SVM/ ) is provided for free access to the biological research community.

Keywords: DNA-binding residue; DNA-protein interaction; Ensemble learning; PSSM; Random forest; Relation transformation; SVM.

PubMed Disclaimer

Conflict of interest statement

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Figures

**Fig. 1**
The framework diagram of EL_PSSM-RT. EL_PSSM-RT contains 4 steps. The first step is to divide the non-binding residues in the training dataset into n subsets and to construct n new training datasets by combining the n subsets of non-binding residues and binding residues individually. The secondary step is to extract the three categories of features for all the residues. The third step is to train both SVM classifier and Random Forest classifier by each category of features on every training subset. The fourth step is to use a dynamic ranking and selecting method to select the based predictors with the largest diversity between each other to build the ensemble predictor

**Fig. 2**
The compact of window size w on performance of PSSM-RT. The x-axis is the window size w and y-axis is the ST value of PSSM-RT

**Fig. 3**
Comparison between different encoding methods. a The ROC curves of PSSM-RT, Ma et al.’s method and Li et al.’s method on PDNA-62. b The ROC curves of PSSM-RT, Ma et al.’s method and Li et al.’s method on PDNA-224

**Fig. 4**
Comparison between different encoding methods when combining with sequence features and physicochemical features. a The ROC curves of PSSM-RT, Ma et al.’s method and Li et al.’s method on PDNA-62. b The ROC curves of PSSM-RT, Ma et al.’s method and Li et al.’s method on PDNA-224

**Fig. 5**
Comparison between EL_PSSM-RT, SVM classifier and Random Forest classifier. a The ROC curves EL_PSSM-RT, SVM classifier and Random Forest classifier on PDNA-62. b The ROC curves EL_PSSM-RT, SVM classifier and Random Forest classifier on PDNA-224

**Fig. 6**
The feature analysis results of PSSM-RT on PDNA-62. a The discriminant weights of the 400 features extracted from PSSM-RT. The x axis and y axis denote the 20 residue types. Every element denotes a specific pair-relationship. b 6 DNA-binding residues and its context residues extracted from the protein in 1u1q. The red residues are the binding residues and the yellow ones are the residues that can form important pair-relationship with it. The rest ones are the unimportant residues. The black polyline are the important pair-relationships

**Fig. 7**
Actual residues and predicted residues of proteins in 1s40 and 1b3t. a The predicted binding residues on the protein in 1s40. b The actual binding residues on the protein in 1s40. c The predicted binding residues on the protein in 1b3t. d The actual binding residues on the protein in 1b3t

**Fig. 8**
The homepage of the web service of EL_PSSM-RT. The web address of this webserver is http://hlt.hitsz.edu.cn:8080/PSSM-RT_SVM/. See the description in the server description for further explanation

See this image and copyright information in PMC

References

1. Ofran Y, Mysore V, Rost B. Prediction of DNA-binding residues from sequence. Bioinformatics. 2007;23(13):i347–i353. doi: 10.1093/bioinformatics/btm174. - DOI - PubMed
1. Luscombe NM, Austin SE, Berman HM, Thornton JM. An overview of the structures of protein–DNA complexes. Genome Biol. 2000;1(1):1–37. doi: 10.1186/gb-2000-1-1-reviews001. - DOI - PMC - PubMed
1. Walter MC, Rattei T, Arnold R, Guldener U, Munsterkotter M, Nenova K, Kastenmuller G, Tischler P, Wolling A, Volz A, et al. PEDANT covers all complete RefSeq genomes. Nucleic Acids Res. 2009;37:D408–D411. doi: 10.1093/nar/gkn749. - DOI - PMC - PubMed
1. Luscombe NM, Thornton JM. Protein-DNA interactions: amino acid conservation and the effects of mutations on binding specificity. J Mol Biol. 2002;320(5):991–1009. doi: 10.1016/S0022-2836(02)00571-5. - DOI - PubMed
1. Bullock AN, Fersht AR. Rescuing the function of mutant p53. Nat Rev Cancer. 2001;1:68–76. doi: 10.1038/35094077. - DOI - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations
Molecular Biology Databases
- NIAID Data Ecosystem - Find datasets on Infectious and Immune-mediated Diseases

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

EL_PSSM-RT: DNA-binding residue prediction by integrating ensemble learning with PSSM Relation Transformation

Affiliations

EL_PSSM-RT: DNA-binding residue prediction by integrating ensemble learning with PSSM Relation Transformation

Authors

Affiliations

Abstract

Conflict of interest statement

Ethics approval and consent to participate

Consent for publication

Competing interests

Publisher’s Note

Figures

References

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Other Literature Sources

Molecular Biology Databases