MoRF-FUNCpred: Molecular Recognition Feature Function Prediction Based on Multi-Label Learning and Ensemble Learning
- PMID: 35350759
- PMCID: PMC8957949
- DOI: 10.3389/fphar.2022.856417
MoRF-FUNCpred: Molecular Recognition Feature Function Prediction Based on Multi-Label Learning and Ensemble Learning
Abstract
Intrinsically disordered regions (IDRs) without stable structure are important for protein structures and functions. Some IDRs can be combined with molecular fragments to make itself completed the transition from disordered to ordered, which are called molecular recognition features (MoRFs). There are five main functions of MoRFs: molecular recognition assembler (MoR_assembler), molecular recognition chaperone (MoR_chaperone), molecular recognition display sites (MoR_display_sites), molecular recognition effector (MoR_effector), and molecular recognition scavenger (MoR_scavenger). Researches on functions of molecular recognition features are important for pharmaceutical and disease pathogenesis. However, the existing computational methods can only predict the MoRFs in proteins, failing to distinguish their different functions. In this paper, we treat MoRF function prediction as a multi-label learning task and solve it with the Binary Relevance (BR) strategy. Finally, we use Support Vector Machine (SVM), Logistic Regression (LR), Decision Tree (DT), and Random Forest (RF) as basic models to construct MoRF-FUNCpred through ensemble learning. Experimental results show that MoRF-FUNCpred performs well for MoRF function prediction. To the best knowledge of ours, MoRF-FUNCpred is the first predictor for predicting the functions of MoRFs. Availability and Implementation: The stand alone package of MoRF-FUNCpred can be accessed from https://github.com/LiangYu-Xidian/MoRF-FUNCpred.
Keywords: binary relevance; ensemble learning; intrinsically disordered regions; molecular recognition features; multi-label learning.
Copyright © 2022 Li, Pang, Liu and Yu.
Conflict of interest statement
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Figures



References
-
- Boutell M. R., Luo J., Shen X., Brown C. M. (2004). Learning Multi-Label Scene Classification. Pattern recognition. 37, 1757–1771. 10.1016/j.patcog.2004.03.009 - DOI
-
- Breiman L. (2001). Random Forests. Machine Learn. 45, 5–32. 10.1023/a:1010933404324 - DOI
-
- Canzhuang S., Yonge F. (2021). Identification of Disordered Regions of Intrinsically Disordered Proteins by Multi-Features Fusion. Curr. Bioinformatics 16, 1126–1132. 10.2174/1574893616666210308102552 - DOI