Identifying molecular recognition features in intrinsically disordered regions of proteins by transfer learning
- PMID: 31504193
- DOI: 10.1093/bioinformatics/btz691
Identifying molecular recognition features in intrinsically disordered regions of proteins by transfer learning
Abstract
Motivation: Protein intrinsic disorder describes the tendency of sequence residues to not fold into a rigid three-dimensional shape by themselves. However, some of these disordered regions can transition from disorder to order when interacting with another molecule in segments known as molecular recognition features (MoRFs). Previous analysis has shown that these MoRF regions are indirectly encoded within the prediction of residue disorder as low-confidence predictions [i.e. in a semi-disordered state P(D)≈0.5]. Thus, what has been learned for disorder prediction may be transferable to MoRF prediction. Transferring the internal characterization of protein disorder for the prediction of MoRF residues would allow us to take advantage of the large training set available for disorder prediction, enabling the training of larger analytical models than is currently feasible on the small number of currently available annotated MoRF proteins. In this paper, we propose a new method for MoRF prediction by transfer learning from the SPOT-Disorder2 ensemble models built for disorder prediction.
Results: We confirm that directly training on the MoRF set with a randomly initialized model yields substantially poorer performance on independent test sets than by using the transfer-learning-based method SPOT-MoRF, for both deep and simple networks. Its comparison to current state-of-the-art techniques reveals its superior performance in identifying MoRF binding regions in proteins across two independent testing sets, including our new dataset of >800 protein chains. These test chains share <30% sequence similarity to all training and validation proteins used in SPOT-Disorder2 and SPOT-MoRF, and provide a much-needed large-scale update on the performance of current MoRF predictors. The method is expected to be useful in locating functional disordered regions in proteins.
Availability and implementation: SPOT-MoRF and its data are available as a web server and as a standalone program at: http://sparks-lab.org/jack/server/SPOT-MoRF/index.php.
Supplementary information: Supplementary data are available at Bioinformatics online.
© The Author(s) 2019. Published by Oxford University Press. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Similar articles
-
SPOT-Disorder2: Improved Protein Intrinsic Disorder Prediction by Ensembled Deep Learning.Genomics Proteomics Bioinformatics. 2019 Dec;17(6):645-656. doi: 10.1016/j.gpb.2019.01.004. Epub 2020 Mar 13. Genomics Proteomics Bioinformatics. 2019. PMID: 32173600 Free PMC article.
-
OPAL: prediction of MoRF regions in intrinsically disordered protein sequences.Bioinformatics. 2018 Jun 1;34(11):1850-1858. doi: 10.1093/bioinformatics/bty032. Bioinformatics. 2018. PMID: 29360926
-
MoRFPred-plus: Computational Identification of MoRFs in Protein Sequences using Physicochemical Properties and HMM profiles.J Theor Biol. 2018 Jan 21;437:9-16. doi: 10.1016/j.jtbi.2017.10.015. Epub 2017 Oct 16. J Theor Biol. 2018. PMID: 29042212
-
MoRF-FUNCpred: Molecular Recognition Feature Function Prediction Based on Multi-Label Learning and Ensemble Learning.Front Pharmacol. 2022 Mar 8;13:856417. doi: 10.3389/fphar.2022.856417. eCollection 2022. Front Pharmacol. 2022. PMID: 35350759 Free PMC article. Review.
-
Computational Prediction of MoRFs, Short Disorder-to-order Transitioning Protein Binding Regions.Comput Struct Biotechnol J. 2019 Mar 26;17:454-462. doi: 10.1016/j.csbj.2019.03.013. eCollection 2019. Comput Struct Biotechnol J. 2019. PMID: 31007871 Free PMC article. Review.
Cited by
-
Comparative evaluation of AlphaFold2 and disorder predictors for prediction of intrinsic disorder, disorder content and fully disordered proteins.Comput Struct Biotechnol J. 2023 Jun 2;21:3248-3258. doi: 10.1016/j.csbj.2023.06.001. eCollection 2023. Comput Struct Biotechnol J. 2023. PMID: 38213902 Free PMC article.
-
RNA secondary structure prediction using an ensemble of two-dimensional deep neural networks and transfer learning.Nat Commun. 2019 Nov 27;10(1):5407. doi: 10.1038/s41467-019-13395-9. Nat Commun. 2019. PMID: 31776342 Free PMC article.
-
Computational prediction of disordered binding regions.Comput Struct Biotechnol J. 2023 Feb 10;21:1487-1497. doi: 10.1016/j.csbj.2023.02.018. eCollection 2023. Comput Struct Biotechnol J. 2023. PMID: 36851914 Free PMC article. Review.
-
Challenges in describing the conformation and dynamics of proteins with ambiguous behavior.Front Mol Biosci. 2022 Aug 3;9:959956. doi: 10.3389/fmolb.2022.959956. eCollection 2022. Front Mol Biosci. 2022. PMID: 35992270 Free PMC article.
-
Intrinsic Disorder and Other Malleable Arsenals of Evolved Protein Multifunctionality.J Mol Evol. 2024 Dec;92(6):669-684. doi: 10.1007/s00239-024-10196-7. Epub 2024 Aug 30. J Mol Evol. 2024. PMID: 39214891 Review.
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Miscellaneous