Predicting the involvement of polyQ- and polyA in protein-protein interactions by their amino acid context
- PMID: 39323775
- PMCID: PMC11422028
- DOI: 10.1016/j.heliyon.2024.e37861
Predicting the involvement of polyQ- and polyA in protein-protein interactions by their amino acid context
Abstract
Homorepeats, specifically polyglutamine (polyQ) and polyalanine (polyA), are often implicated in protein-protein interactions (PPIs). So far, a method to predict the participation of homorepeats in protein interactions is lacking. We propose a machine learning approach to identify PPI-involved polyQ and polyA regions within the human proteome based on known interacting regions. Using the dataset of human homorepeats, we identified 157 polyQ and 745 polyA regions potentially involved in PPIs. Machine learning models, trained on amino acid context and homorepeat length, demonstrated high precision (0.90-0.98) but variable recall (0.42-0.85). Random forest outperformed other models (AUC polyQ = 0.686, AUC polyA = 0.732) using the positions surrounding the homorepeat -10 to +10. Integrating paralog information marginally improved predictions but was excluded for model simplicity. Further optimization revealed that for polyQ, using amino acid surrounding positions from -6 to +6 increased AUC to 0.715. For polyA, no improvement was found. Incorporating coiled coil overlap information enhanced polyA predictions (AUC = 0.745) but not polyQ. Finally, we applied these models to predict PPI involvement across all polyQ and polyA regions, identifying potential interactions. Case studies illustrated the method's predictive capacity, highlighting known interacting regions with high scores and elucidating potential false negatives.
Keywords: Homorepeat; Machine learning; Polyalanine; Polyglutamine; Protein-protein interaction.
© 2024 The Authors.
Conflict of interest statement
The authors declare that they have no competing interests.
Figures






Similar articles
-
The features of polyglutamine regions depend on their evolutionary stability.BMC Evol Biol. 2020 May 24;20(1):59. doi: 10.1186/s12862-020-01626-3. BMC Evol Biol. 2020. PMID: 32448113 Free PMC article.
-
Association of polyalanine and polyglutamine coiled coils mediates expansion disease-related protein aggregation and dysfunction.Hum Mol Genet. 2014 Jul 1;23(13):3402-20. doi: 10.1093/hmg/ddu049. Epub 2014 Feb 4. Hum Mol Genet. 2014. PMID: 24497578 Free PMC article.
-
Polyserine repeats promote coiled coil-mediated fibril formation and length-dependent protein aggregation.J Struct Biol. 2018 Dec;204(3):572-584. doi: 10.1016/j.jsb.2018.09.001. Epub 2018 Sep 6. J Struct Biol. 2018. PMID: 30194983
-
Amino acid homorepeats in proteins.Nat Rev Chem. 2020 Aug;4(8):420-434. doi: 10.1038/s41570-020-0204-1. Epub 2020 Jul 21. Nat Rev Chem. 2020. PMID: 37127972 Review.
-
Polyglutamine expansion diseases: More than simple repeats.J Struct Biol. 2018 Feb;201(2):139-154. doi: 10.1016/j.jsb.2017.09.006. Epub 2017 Sep 18. J Struct Biol. 2018. PMID: 28928079 Review.
References
LinkOut - more resources
Full Text Sources