Identify DNA-Binding Proteins Through the Extreme Gradient Boosting Algorithm
- PMID: 35154264
- PMCID: PMC8837382
- DOI: 10.3389/fgene.2021.821996
Identify DNA-Binding Proteins Through the Extreme Gradient Boosting Algorithm
Abstract
The exploration of DNA-binding proteins (DBPs) is an important aspect of studying biological life activities. Research on life activities requires the support of scientific research results on DBPs. The decline in many life activities is closely related to DBPs. Generally, the detection method for identifying DBPs is achieved through biochemical experiments. This method is inefficient and requires considerable manpower, material resources and time. At present, several computational approaches have been developed to detect DBPs, among which machine learning (ML) algorithm-based computational techniques have shown excellent performance. In our experiments, our method uses fewer features and simpler recognition methods than other methods and simultaneously obtains satisfactory results. First, we use six feature extraction methods to extract sequence features from the same group of DBPs. Then, this feature information is spliced together, and the data are standardized. Finally, the extreme gradient boosting (XGBoost) model is used to construct an effective predictive model. Compared with other excellent methods, our proposed method has achieved better results. The accuracy achieved by our method is 78.26% for PDB2272 and 85.48% for PDB186. The accuracy of the experimental results achieved by our strategy is similar to that of previous detection methods.
Keywords: DNA-binding protein prediction; XGBoost model; dimensionality reduction; feature extraction; machine learning.
Copyright © 2022 Zhao, Yang, Zhai, Liang and Zhao.
Conflict of interest statement
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Figures
Similar articles
-
HMMPred: Accurate Prediction of DNA-Binding Proteins Based on HMM Profiles and XGBoost Feature Selection.Comput Math Methods Med. 2020 Mar 28;2020:1384749. doi: 10.1155/2020/1384749. eCollection 2020. Comput Math Methods Med. 2020. PMID: 32300371 Free PMC article.
-
FTWSVM-SR: DNA-Binding Proteins Identification via Fuzzy Twin Support Vector Machines on Self-Representation.Interdiscip Sci. 2022 Jun;14(2):372-384. doi: 10.1007/s12539-021-00489-6. Epub 2021 Nov 6. Interdiscip Sci. 2022. PMID: 34743286
-
Prediction of hot spots in protein-DNA binding interfaces based on supervised isometric feature mapping and extreme gradient boosting.BMC Bioinformatics. 2020 Sep 17;21(Suppl 13):381. doi: 10.1186/s12859-020-03683-3. BMC Bioinformatics. 2020. PMID: 32938395 Free PMC article.
-
DP-BINDER: machine learning model for prediction of DNA-binding proteins by fusing evolutionary and physicochemical information.J Comput Aided Mol Des. 2019 Jul;33(7):645-658. doi: 10.1007/s10822-019-00207-x. Epub 2019 May 23. J Comput Aided Mol Des. 2019. PMID: 31123959
-
HKAM-MKM: A hybrid kernel alignment maximization-based multiple kernel model for identifying DNA-binding proteins.Comput Biol Med. 2022 Jun;145:105395. doi: 10.1016/j.compbiomed.2022.105395. Epub 2022 Mar 17. Comput Biol Med. 2022. PMID: 35334314
Cited by
-
DBP-iDWT: Improving DNA-Binding Proteins Prediction Using Multi-Perspective Evolutionary Profile and Discrete Wavelet Transform.Comput Intell Neurosci. 2022 Sep 28;2022:2987407. doi: 10.1155/2022/2987407. eCollection 2022. Comput Intell Neurosci. 2022. PMID: 36211019 Free PMC article.
-
Comparative Analysis on Alignment-Based and Pretrained Feature Representations for the Identification of DNA-Binding Proteins.Comput Math Methods Med. 2022 Jun 28;2022:5847242. doi: 10.1155/2022/5847242. eCollection 2022. Comput Math Methods Med. 2022. PMID: 35799660 Free PMC article.
-
Development and validation of AI/ML derived splice-switching oligonucleotides.Mol Syst Biol. 2024 Jun;20(6):676-701. doi: 10.1038/s44320-024-00034-9. Epub 2024 Apr 25. Mol Syst Biol. 2024. PMID: 38664594 Free PMC article.
-
Predicting the retention time of Synthetic Cannabinoids using a combinatorial QSAR approach.Heliyon. 2023 May 25;9(6):e16671. doi: 10.1016/j.heliyon.2023.e16671. eCollection 2023 Jun. Heliyon. 2023. PMID: 37484220 Free PMC article.
-
Immune landscape-based machine-learning-assisted subclassification, prognosis, and immunotherapy prediction for glioblastoma.Front Immunol. 2022 Dec 1;13:1027631. doi: 10.3389/fimmu.2022.1027631. eCollection 2022. Front Immunol. 2022. PMID: 36532035 Free PMC article.
References
-
- Chen T., Guestrin C. (2016). “XGBoost: A Scalable Tree Boosting System,” in The 22nd ACM SIGKDD International Conference.
Publication types
LinkOut - more resources
Full Text Sources