Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2022 Mar 8:13:856417.
doi: 10.3389/fphar.2022.856417. eCollection 2022.

MoRF-FUNCpred: Molecular Recognition Feature Function Prediction Based on Multi-Label Learning and Ensemble Learning

Affiliations
Review

MoRF-FUNCpred: Molecular Recognition Feature Function Prediction Based on Multi-Label Learning and Ensemble Learning

Haozheng Li et al. Front Pharmacol. .

Abstract

Intrinsically disordered regions (IDRs) without stable structure are important for protein structures and functions. Some IDRs can be combined with molecular fragments to make itself completed the transition from disordered to ordered, which are called molecular recognition features (MoRFs). There are five main functions of MoRFs: molecular recognition assembler (MoR_assembler), molecular recognition chaperone (MoR_chaperone), molecular recognition display sites (MoR_display_sites), molecular recognition effector (MoR_effector), and molecular recognition scavenger (MoR_scavenger). Researches on functions of molecular recognition features are important for pharmaceutical and disease pathogenesis. However, the existing computational methods can only predict the MoRFs in proteins, failing to distinguish their different functions. In this paper, we treat MoRF function prediction as a multi-label learning task and solve it with the Binary Relevance (BR) strategy. Finally, we use Support Vector Machine (SVM), Logistic Regression (LR), Decision Tree (DT), and Random Forest (RF) as basic models to construct MoRF-FUNCpred through ensemble learning. Experimental results show that MoRF-FUNCpred performs well for MoRF function prediction. To the best knowledge of ours, MoRF-FUNCpred is the first predictor for predicting the functions of MoRFs. Availability and Implementation: The stand alone package of MoRF-FUNCpred can be accessed from https://github.com/LiangYu-Xidian/MoRF-FUNCpred.

Keywords: binary relevance; ensemble learning; intrinsically disordered regions; molecular recognition features; multi-label learning.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

FIGURE 1
FIGURE 1
The network architecture of MoRF-FUNCpred. MoRF-FUNCpred uses PSFM to express the protein. MoRF-FUNCpred extracts features and labels of residues divided into training set and testing set. The SVM, LR, DT, and RF models are trained using the training set, and the four models are integrated to obtain better performance through ensemble learning. Obtain the weights of ensemble learning through the genetic algorithm.
FIGURE 2
FIGURE 2
Distance between the two models in the training set under the five MoRF functions.
FIGURE 3
FIGURE 3
MoRF-FUNCpred prediction of the MoR_assembler function of DP01087.

References

    1. Altschul S. F., Gish W., Miller W., Myers E. W., Lipman D. J. (1990). Basic Local Alignment Search Tool. J. Mol. Biol. 215, 403–410. 10.1016/S0022-2836(05)80360-2 - DOI - PubMed
    1. Altschul S. F., Madden T. L., Schäffer A. A., Zhang J., Zhang Z., Miller W., et al. (1997). Gapped BLAST and PSI-BLAST: a New Generation of Protein Database Search Programs. Nucleic Acids Res. 25, 3389–3402. 10.1093/nar/25.17.3389 - DOI - PMC - PubMed
    1. Boutell M. R., Luo J., Shen X., Brown C. M. (2004). Learning Multi-Label Scene Classification. Pattern recognition. 37, 1757–1771. 10.1016/j.patcog.2004.03.009 - DOI
    1. Breiman L. (2001). Random Forests. Machine Learn. 45, 5–32. 10.1023/a:1010933404324 - DOI
    1. Canzhuang S., Yonge F. (2021). Identification of Disordered Regions of Intrinsically Disordered Proteins by Multi-Features Fusion. Curr. Bioinformatics 16, 1126–1132. 10.2174/1574893616666210308102552 - DOI