DualF-PBR: Dual-Extracting Protein Sequence Features for Predicting Plant Resistance Proteins
- PMID: 40811316
- DOI: 10.1109/TCBBIO.2025.3562082
DualF-PBR: Dual-Extracting Protein Sequence Features for Predicting Plant Resistance Proteins
Abstract
Plant resistance proteins are evolved during growth and development to cope with complex environmental changes and infection of pathogens. Predicting plant resistance proteins is of great significance for further exploring plant disease resistance mechanism against viruses. In this paper, we propose a method for predicting plant resistance protein by dual-extracting features. The dual-extracted features are composed of the features extracted by modeling self-attention neural network and detecting sequence structure information respectively to obtain 2381-dimensional protein sequence features. We utilize the Least Absolute Shrinkage and Selection Operator (LASSO) algorithm to eliminate redundant features from the extracted 2381-dimensional features to form 53 key features. These 53 key features are inputted into the Lightweight Gradient Boosting Machine (LightGBM) model to predict plant resistance proteins. Experimental results of five-fold cross-validation on real datasets demonstrate that our proposed prediction method outperforms existing methods overall in accuracy, sensitivity, specificity, Matthews correlation coefficient, F1 score, and area under the curve (AUC) in the case of slightly imbalanced datasets. This research work will aid in filtrating plant resistance genes and proteins, and promote disease-resistant breeding for plants.