Discovering molecular features of intrinsically disordered regions by using evolution for contrastive learning
- PMID: 35767567
- PMCID: PMC9275697
- DOI: 10.1371/journal.pcbi.1010238
Discovering molecular features of intrinsically disordered regions by using evolution for contrastive learning
Abstract
A major challenge to the characterization of intrinsically disordered regions (IDRs), which are widespread in the proteome, but relatively poorly understood, is the identification of molecular features that mediate functions of these regions, such as short motifs, amino acid repeats and physicochemical properties. Here, we introduce a proteome-scale feature discovery approach for IDRs. Our approach, which we call "reverse homology", exploits the principle that important functional features are conserved over evolution. We use this as a contrastive learning signal for deep learning: given a set of homologous IDRs, the neural network has to correctly choose a held-out homolog from another set of IDRs sampled randomly from the proteome. We pair reverse homology with a simple architecture and standard interpretation techniques, and show that the network learns conserved features of IDRs that can be interpreted as motifs, repeats, or bulk features like charge or amino acid propensities. We also show that our model can be used to produce visualizations of what residues and regions are most important to IDR function, generating hypotheses for uncharacterized IDRs. Our results suggest that feature discovery using unsupervised neural networks is a promising avenue to gain systematic insight into poorly understood protein sequences.
Conflict of interest statement
I have read the journal’s policy and the authors of this manuscript have the following competing interests: AMM is a Consultant to Dewpoint Therapeutics Inc.
Figures






Similar articles
-
SHARK enables sensitive detection of evolutionary homologs and functional analogs in unalignable and disordered sequences.Proc Natl Acad Sci U S A. 2024 Oct 15;121(42):e2401622121. doi: 10.1073/pnas.2401622121. Epub 2024 Oct 9. Proc Natl Acad Sci U S A. 2024. PMID: 39383002 Free PMC article.
-
Conformational ensembles of the human intrinsically disordered proteome.Nature. 2024 Feb;626(8000):897-904. doi: 10.1038/s41586-023-07004-5. Epub 2024 Jan 31. Nature. 2024. PMID: 38297118
-
IFF: Identifying key residues in intrinsically disordered regions of proteins using machine learning.Protein Sci. 2023 Sep;32(9):e4739. doi: 10.1002/pro.4739. Protein Sci. 2023. PMID: 37498545 Free PMC article.
-
Computational Methods to Predict Intrinsically Disordered Regions and Functional Regions in Them.Methods Mol Biol. 2023;2627:231-245. doi: 10.1007/978-1-0716-2974-1_13. Methods Mol Biol. 2023. PMID: 36959451 Review.
-
Towards Decoding the Sequence-Based Grammar Governing the Functions of Intrinsically Disordered Protein Regions.J Mol Biol. 2021 Jun 11;433(12):166724. doi: 10.1016/j.jmb.2020.11.023. Epub 2020 Nov 26. J Mol Biol. 2021. PMID: 33248138 Review.
Cited by
-
SHARK enables sensitive detection of evolutionary homologs and functional analogs in unalignable and disordered sequences.Proc Natl Acad Sci U S A. 2024 Oct 15;121(42):e2401622121. doi: 10.1073/pnas.2401622121. Epub 2024 Oct 9. Proc Natl Acad Sci U S A. 2024. PMID: 39383002 Free PMC article.
-
Intrinsically Disordered Proteins: An Overview.Int J Mol Sci. 2022 Nov 14;23(22):14050. doi: 10.3390/ijms232214050. Int J Mol Sci. 2022. PMID: 36430530 Free PMC article. Review.
-
Preserving condensate structure and composition by lowering sequence complexity.bioRxiv [Preprint]. 2023 Nov 29:2023.11.29.569249. doi: 10.1101/2023.11.29.569249. bioRxiv. 2023. Update in: Biophys J. 2024 Jul 2;123(13):1815-1826. doi: 10.1016/j.bpj.2024.05.026. PMID: 38076908 Free PMC article. Updated. Preprint.
-
Evaluation of predictions of disordered binding regions in the CAID2 experiment.Comput Struct Biotechnol J. 2024 Dec 17;27:78-88. doi: 10.1016/j.csbj.2024.12.009. eCollection 2025. Comput Struct Biotechnol J. 2024. PMID: 39811792 Free PMC article. Review.
-
Sequence-based prediction of condensate composition reveals that specificity can emerge from multivalent interactions among disordered regions.bioRxiv [Preprint]. 2025 Jul 31:2025.06.13.659429. doi: 10.1101/2025.06.13.659429. bioRxiv. 2025. PMID: 40667294 Free PMC article. Preprint.
References
-
- Lindorff-Larsen K, Kragelund BB. On the potential of machine learning to examine the relationship between sequence, structure, dynamics and function of intrinsically disordered proteins. 2021. - PubMed
Publication types
MeSH terms
Substances
Grants and funding
LinkOut - more resources
Full Text Sources
Other Literature Sources
Miscellaneous