Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Sep 14;10(18):e37861.
doi: 10.1016/j.heliyon.2024.e37861. eCollection 2024 Sep 30.

Predicting the involvement of polyQ- and polyA in protein-protein interactions by their amino acid context

Affiliations

Predicting the involvement of polyQ- and polyA in protein-protein interactions by their amino acid context

Pablo Mier et al. Heliyon. .

Abstract

Homorepeats, specifically polyglutamine (polyQ) and polyalanine (polyA), are often implicated in protein-protein interactions (PPIs). So far, a method to predict the participation of homorepeats in protein interactions is lacking. We propose a machine learning approach to identify PPI-involved polyQ and polyA regions within the human proteome based on known interacting regions. Using the dataset of human homorepeats, we identified 157 polyQ and 745 polyA regions potentially involved in PPIs. Machine learning models, trained on amino acid context and homorepeat length, demonstrated high precision (0.90-0.98) but variable recall (0.42-0.85). Random forest outperformed other models (AUC polyQ = 0.686, AUC polyA = 0.732) using the positions surrounding the homorepeat -10 to +10. Integrating paralog information marginally improved predictions but was excluded for model simplicity. Further optimization revealed that for polyQ, using amino acid surrounding positions from -6 to +6 increased AUC to 0.715. For polyA, no improvement was found. Incorporating coiled coil overlap information enhanced polyA predictions (AUC = 0.745) but not polyQ. Finally, we applied these models to predict PPI involvement across all polyQ and polyA regions, identifying potential interactions. Case studies illustrated the method's predictive capacity, highlighting known interacting regions with high scores and elucidating potential false negatives.

Keywords: Homorepeat; Machine learning; Polyalanine; Polyglutamine; Protein-protein interaction.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
Flowchart illustrating data preparation, model construction and PPI prediction.
Fig. 2
Fig. 2
ROC curves obtained with machine learning models created with random forest (RF), boosted logistic regression (logreg), k-nearest neighbors (knn), support vector machines with linear kernel (SVM), and neural network (NN), for (A) polyQ and (B) polyA regions, with positive interaction data obtained from Interactome3D.
Fig. 3
Fig. 3
(A) Distribution of prediction scores for polyQ regions. (B) Distribution of predictions scores versus the polyQ length, for polyQ regions known to interact (positives) and unknown (negatives). (C) Importance of top20 machine learning variables for the prediction for polyQ regions; the first character denotes if it is a plus (P) or a minus (M) position, the second the position, and the third the amino acid. (D) Distribution of prediction scores for polyA regions. (E) Distribution of predictions scores versus the polyA length, for polyA regions known to interact (positives) and unknown (negatives). (F) Importance of top20 machine learning variables for the prediction for polyA regions; the first character denotes if it is a plus (P) or a minus (M) position, the second the position, and the third the amino acid.
Fig. 4
Fig. 4
(A) Pairwise alignment of human Ubiquitin carboxyl-terminal hydrolase 6 (USP6) with Ubiquitin carboxyl-terminal hydrolase 2 (USP2). Structure of human USP2 (yellow) with ubiquitin (purple) (PDB:2HD5). The polyQ region is marked in red and focused in the inset. Key amino acids are indicated. (B) Pairwise alignment of human Ubiquitin carboxyl-terminal hydrolase 6 (USP6) with Ubiquitin carboxyl-terminal hydrolase 15 (USP15). Structure of human USP15 (yellow) (PDB:6CPM). The polyQ region is marked in red and focused in the inset. Key amino acids are indicated. Missing signal for fragment 376YQQ378 (discontinuous line) suggests that it is flexible.
Fig. 5
Fig. 5
(A) Pairwise alignment of human Heterogeneous nuclear ribonucleoprotein C-like 3 (HNRC3) with Poly(U)-binding-splicing factor (PUF60; a.k.a. FIR). Structure of human FIR's (yellow) RRM2 domain in complex with the Nbox peptide from FBP (purple) (PDB:2KXH). The polyA region is marked in red and focused in the inset. Key amino acids are indicated. (B) Pairwise alignment of human Heterogeneous nuclear ribonucleoprotein C-like 3 (HNRC3) with 91 % identical Heterogeneous nuclear ribonucleoproteins C1/C2 (HNRNPC). Structure of the protein (yellow) in complex with 5′-AUUUUUC-3′ RNA (pink) (PDB:2MXY). The polyA region is marked in red and focused in the inset. Key amino acids are indicated.
Fig. 6
Fig. 6
Structure of the ATP-dependent RNA helicase DHX15 in complex with a fragment from NKRF containing a G-patch motif (purple). A domain in DHX15 (positions 338 to 476) is marked in white (with the polyQ in red) and the rest of the sequence in yellow. The polyQ is in the side of DHX15 opposite the interacting molecule.

Similar articles

References

    1. Lobanov M.Y., Galzitskaya O.V. Occurrence of disordered patterns and homorepeats in eukaryotic and bacterial proteomes. Mol. Biosyst. 2012;8:327–337. doi: 10.1039/c1mb05318c. - DOI - PubMed
    1. Gonçalves-Kulik M., Schmid F., Andrade-Navarro M.A. One step closer to the understanding of the relationship IDR-LCR-structure. Genes. 2023;14:1711. doi: 10.3390/genes14091711. - DOI - PMC - PubMed
    1. Schaefer M.H., Wanker E.E., Andrade-Navarro M.A. Evolution and function of CAG/polyglutamine repeats in protein-protein interaction networks. Nucleic Acids Res. 2012;40:4273–4287. doi: 10.1093/nar/gks011. - DOI - PMC - PubMed
    1. Bunting E.L., Hamilton J., Tabrizi S.J. Polyglutamine diseases. Curr. Opin. Neurobiol. 2022;72:39–47. doi: 10.1016/j.conb.2021.07.001. - DOI - PubMed
    1. Orr H.T., Zoghbi H.Y. Trinucleotide repeat disorders. Annu. Rev. Neurosci. 2007;30:575–621. doi: 10.1146/annurev.neuro.29.051605.113042. - DOI - PubMed

LinkOut - more resources