FusionEncoder: identification of intrinsically disordered regions based on multi-feature fusion
- PMID: 40577786
- PMCID: PMC12231546
- DOI: 10.1093/bioinformatics/btaf362
FusionEncoder: identification of intrinsically disordered regions based on multi-feature fusion
Abstract
Motivation: Intrinsic disorder regions (IDRs) play a significant role in diverse biological processes and are widely distributed in proteins. Thus, accurately predicting these regions is essential for analyzing protein structure and function. Amino acid feature extraction servers as a foundational process in the development of computational predictive models. Existing methods typically rely on traditional biological features (e.g. PSSM) or use pre-trained protein language models (PPLMs) to capture sequence semantic information, often resorting to straightforward feature concatenation. However, these approaches fail to capture the multi-semantic interactions between traditional biological features and PPLMs-based features.
Results: In this study, we propose a method named FusionEncoder designed for the integration of traditional biological and PPLMs-based features of the protein. FusionEncoder is a fusion network built on a variant of long short-term memory (LSTM). We consider traditional biological features and PPLMs-based features to be two types of semantic inputs within a "multi-semantic" space. Traditional features are input into the cell state of the LSTM, while PPLMs-based features are fed into the input part. A fusion cell is then utilized to fuse these two types of features. This strategy leverages the capability of LSTM to encode long sequences, enhancing context-aware semantic learning of amino acid sequences. Finally, a transformer-based encoder layer is employed to predict the IDRs. Evaluation on four independent test datasets indicate that FusionEncoder obviously improves the accuracy of amino acid feature representation and achieves superior performance compared to the other existing methods.
Availability and implementation: To facilitate accessibility for experimental researchers, a user-friendly and publicly available webserver for the FusionEncoder predictor has been deployed at http://bliulab.net/FusionEncoder/. FusionEncoder is expected to serve as a valuable tool for the accurate identification of IDRs.
© The Author(s) 2025. Published by Oxford University Press.
Figures




Similar articles
-
pLMMoRF: A Web Server That Accurately Predicts Membrane-interacting Molecular Recognition Features by Employing a Protein Language Model.J Mol Biol. 2025 Sep 1;437(17):169236. doi: 10.1016/j.jmb.2025.169236. Epub 2025 May 27. J Mol Biol. 2025. PMID: 40441416
-
MoRF_ESM: Prediction of MoRFs in disordered proteins based on a deep transformer protein language model.J Bioinform Comput Biol. 2024 Apr;22(2):2450006. doi: 10.1142/S0219720024500069. Epub 2024 May 28. J Bioinform Comput Biol. 2024. PMID: 38812466
-
iACP-DPNet: a dual-pooling causal dilated convolutional network for interpretable anticancer peptide identification.Funct Integr Genomics. 2025 Jul 4;25(1):147. doi: 10.1007/s10142-025-01641-x. Funct Integr Genomics. 2025. PMID: 40613943
-
The Lived Experience of Autistic Adults in Employment: A Systematic Search and Synthesis.Autism Adulthood. 2024 Dec 2;6(4):495-509. doi: 10.1089/aut.2022.0114. eCollection 2024 Dec. Autism Adulthood. 2024. PMID: 40018061 Review.
-
Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.Cochrane Database Syst Rev. 2021 Apr 19;4(4):CD011535. doi: 10.1002/14651858.CD011535.pub4. Cochrane Database Syst Rev. 2021. Update in: Cochrane Database Syst Rev. 2022 May 23;5:CD011535. doi: 10.1002/14651858.CD011535.pub5. PMID: 33871055 Free PMC article. Updated.
References
-
- Bahdanau D, Cho KH, Bengio Y. Neural machine translation by jointly learning to align and translate. In: 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, 2015.
-
- Chen J, Guo M, Li S et al. ProtDec-LTR2. 0: an improved method for protein remote homology detection by combining pseudo protein and supervised learning to rank. Bioinformatics 2017;33:3473–6. - PubMed
-
- Cheng H, Rao B, Liu L et al. PepFormer: end-to-end transformer-based siamese network to predict and enhance peptide detectability based on sequence only. Anal Chem 2021;93:6481–90. - PubMed
-
- Cheng J, Sweredoski MJ, Baldi P. Accurate prediction of protein disordered regions by mining protein structure data. Data Min Knowl Discov 2005;11:213–22.
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources