Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2022 Jan 28:12:821996.
doi: 10.3389/fgene.2021.821996. eCollection 2021.

Identify DNA-Binding Proteins Through the Extreme Gradient Boosting Algorithm

Affiliations
Review

Identify DNA-Binding Proteins Through the Extreme Gradient Boosting Algorithm

Ziye Zhao et al. Front Genet. .

Abstract

The exploration of DNA-binding proteins (DBPs) is an important aspect of studying biological life activities. Research on life activities requires the support of scientific research results on DBPs. The decline in many life activities is closely related to DBPs. Generally, the detection method for identifying DBPs is achieved through biochemical experiments. This method is inefficient and requires considerable manpower, material resources and time. At present, several computational approaches have been developed to detect DBPs, among which machine learning (ML) algorithm-based computational techniques have shown excellent performance. In our experiments, our method uses fewer features and simpler recognition methods than other methods and simultaneously obtains satisfactory results. First, we use six feature extraction methods to extract sequence features from the same group of DBPs. Then, this feature information is spliced together, and the data are standardized. Finally, the extreme gradient boosting (XGBoost) model is used to construct an effective predictive model. Compared with other excellent methods, our proposed method has achieved better results. The accuracy achieved by our method is 78.26% for PDB2272 and 85.48% for PDB186. The accuracy of the experimental results achieved by our strategy is similar to that of previous detection methods.

Keywords: DNA-binding protein prediction; XGBoost model; dimensionality reduction; feature extraction; machine learning.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

FIGURE 1
FIGURE 1
Process of predicting DBPs.
FIGURE 2
FIGURE 2
ROC curves of different feature extraction methods on PDB1075 data.

Similar articles

Cited by

References

    1. Adilina S., Farid D. M., Shatabda S. (2019). Effective DNA Binding Protein Prediction by Using Key Features via Chou's General PseAAC. J. Theor. Biol. 460, 64–78. 10.1016/j.jtbi.2018.10.027 - DOI - PubMed
    1. Bi X.-a., Liu Y., Xie Y., Hu X., Jiang Q. (2020). Morbigenous Brain Region and Gene Detection with a Genetically Evolved Random Neural Network Cluster Approach in Late Mild Cognitive Impairment. Bioinformatics 36 (8), 2561–2568. 10.1093/bioinformatics/btz967 - DOI - PMC - PubMed
    1. Chen T., Guestrin C. (2016). “XGBoost: A Scalable Tree Boosting System,” in The 22nd ACM SIGKDD International Conference.
    1. Cheng L., Hu Y., Sun J., Zhou M., Jiang Q. (2018). DincRNA: a Comprehensive Web-Based Bioinformatics Toolkit for Exploring Disease Associations and ncRNA Function. Bioinformatics 34 (11), 1953–1956. 10.1093/bioinformatics/bty002 - DOI - PubMed
    1. Cheng L., Qi C., Zhuang H., Fu T., Zhang X. (2020). gutMDisorder: a Comprehensive Database for Dysbiosis of the Gut Microbiota in Disorders and Interventions. Nucleic Acids Res. 48 (D1), D554–D560. 10.1093/nar/gkz843 - DOI - PMC - PubMed

LinkOut - more resources