Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Jul 22:13:851688.
doi: 10.3389/fgene.2022.851688. eCollection 2022.

Identification of the ubiquitin-proteasome pathway domain by hyperparameter optimization based on a 2D convolutional neural network

Affiliations

Identification of the ubiquitin-proteasome pathway domain by hyperparameter optimization based on a 2D convolutional neural network

Rahu Sikander et al. Front Genet. .

Abstract

The major mechanism of proteolysis in the cytosol and nucleus is the ubiquitin-proteasome pathway (UPP). The highly controlled UPP has an effect on a wide range of cellular processes and substrates, and flaws in the system can lead to the pathogenesis of a number of serious human diseases. Knowledge about UPPs provide useful hints to understand the cellular process and drug discovery. The exponential growth in next-generation sequencing wet lab approaches have accelerated the accumulation of unannotated data in online databases, making the UPP characterization/analysis task more challenging. Thus, computational methods are used as an alternative for fast and accurate identification of UPPs. Aiming this, we develop a novel deep learning-based predictor named "2DCNN-UPP" for identifying UPPs with low error rate. In the proposed method, we used proposed algorithm with a two-dimensional convolutional neural network with dipeptide deviation features. To avoid the over fitting problem, genetic algorithm is employed to select the optimal features. Finally, the optimized attribute set are fed as input to the 2D-CNN learning engine for building the model. Empirical evidence or outcomes demonstrates that the proposed predictor achieved an overall accuracy and AUC (ROC) value using 10-fold cross validation test. Superior performance compared to other state-of-the art methods for discrimination the relations UPPs classification. Both on and independent test respectively was trained on 10-fold cross validation method and then evaluated through independent test. In the case where experimentally validated ubiquitination sites emerged, we must devise a proteomics-based predictor of ubiquitination. Meanwhile, we also evaluated the generalization power of our trained modal via independent test, and obtained remarkable performance in term of 0.862 accuracy, 0.921 sensitivity, 0.803 specificity 0.803, and 0.730 Matthews correlation coefficient (MCC) respectively. Four approaches were used in the sequences, and the physical properties were calculated combined. When used a 10-fold cross-validation, 2D-CNN-UPP obtained an AUC (ROC) value of 0.862 predicted score. We analyzed the relationship between UPP protein and non-UPP protein predicted score. Last but not least, this research could effectively analyze the large scale relationship between UPP proteins and non-UPP proteins in particular and other protein problems in general and our research work might improve computational biological research. Therefore, we could utilize the latest features in our model framework and Dipeptide Deviation from Expected Mean (DDE) -based protein structure features for the prediction of protein structure, functions, and different molecules, such as DNA and RNA.

Keywords: 2D-CNN; CNN; DDE; protein sequence prediction; ubiquitin-proteasome pathway.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

FIGURE 1
FIGURE 1
Method flowchart of the identification of UPP proteins using 2D-CNN.
FIGURE 2
FIGURE 2
Test model accuracy and model loss.
FIGURE 3
FIGURE 3
Identification of the validation accuracy of the ubiquitin protein pathway based on different optimizers ranging from 0 to 150.
FIGURE 4
FIGURE 4
Confusion matrices predicted labels based on (A) cross-validation test and (B) independent test.
FIGURE 5
FIGURE 5
Comparison among the five optimizers based on 10-fold cross-validation with cross-validation and independent sets.
FIGURE 6
FIGURE 6
Performance comparison of five machine learning algorithms applied to 10-fold. Cross-validation datasets vs. independent datasets.
FIGURE 7
FIGURE 7
ROC–AUC calculation based on the (A) cross-validation test and (B) independent test.
FIGURE 8
FIGURE 8
Ubiquitin, related modifiers, and pathways (R&D Systems Europe, Ltd).
FIGURE 9
FIGURE 9
Ubiquitin–proteasome pathway (Ciechanover, 1998)
FIGURE 10
FIGURE 10
Ubiquitin protein pathway amino acid sequence.
FIGURE 11
FIGURE 11
Conservational Analysis of RNF123, UBAC1, RC3H1, and RPS3A genes. Multiple protein sequence alignment and Phylogenetic tree was performed by SmartBLAST. Parentheses refers to the percent sequence of identity of the reference sequence.
FIGURE 12
FIGURE 12
Protein-Protein interaction of identified genes of ubiquitin protein-pathway.

References

    1. Abadi M., Agarwal A., Barham P., Brevdo E., Chen Z., Citro C., et al. (2015). TensorFlow: Large-Scale machine learning on heterogeneous systems. Available from: https://www.tensorflow.org/ .
    1. Abdel-Hamid O., Deng L., Yu D. (2013). Exploring convolutional neural network structures and optimization techniques for speech recognition. Interspeech 11, 73–75. 10.21437/interspeech.2013-744 - DOI
    1. Bergstra J., Bardenet R., Bengio Y., Kégl B. (2011). “Algorithms for hyper-parameter optimization,” in Advances in neural information processing systems, 24.
    1. Billones C. D., Demetria O. J. L. D., Hostallero D. E. D., Naval P. C. (2016). “DemNet: A convolutional neural network for the detection of alzheimer's disease and mild cognitive impairment,” in Proceedings of the 2016 IEEE region 10 conference (TENCON), Singapore, November 2016 (IEEE; ), 3724–3727. 10.1109/tencon.2016.7848755 - DOI
    1. Cai B., Jiang X. (2016). Computational methods for ubiquitination site prediction using physicochemical properties of protein sequences. BMC Bioinforma. 17, 116. 10.1186/s12859-016-0959-z - DOI - PMC - PubMed