This is a preprint.
PairK: Pairwise k-mer alignment for quantifying protein motif conservation in disordered regions
- PMID: 39091826
- PMCID: PMC11291154
- DOI: 10.1101/2024.07.23.604860
PairK: Pairwise k-mer alignment for quantifying protein motif conservation in disordered regions
Update in
-
PairK: Pairwise k-mer alignment for quantifying protein motif conservation in disordered regions.Protein Sci. 2025 Jan;34(1):e70004. doi: 10.1002/pro.70004. Protein Sci. 2025. PMID: 39720898 Free PMC article.
Abstract
Protein-protein interactions are often mediated by a modular peptide recognition domain binding to a short linear motif (SLiM) in the disordered region of another protein. The ability to predict domain-SLiM interactions would allow researchers to map protein interaction networks, predict the effects of perturbations to those networks, and develop biologically meaningful hypotheses. Unfortunately, sequence database searches for SLiMs generally yield mostly biologically irrelevant motif matches or false positives. To improve the prediction of novel SLiM interactions, researchers employ filters to discriminate between biologically relevant and improbable motif matches. One promising criterion for identifying biologically relevant SLiMs is the sequence conservation of the motif, exploiting the fact that functional motifs are more likely to be conserved than spurious motif matches. However, the difficulty of aligning disordered regions has significantly hampered the utility of this approach. We present PairK (pairwise k-mer alignment), an MSA-free method to quantify motif conservation in disordered regions. PairK outperforms both standard MSA-based conservation scores and a modern LLM-based conservation score predictor on the task of identifying biologically important motif instances. PairK can quantify conservation over wider phylogenetic distances than MSAs, indicating that SLiMs may be more conserved than is implied by MSA-based metrics. PairK is available as open-source code at https://github.com/jacksonh1/pairk.
Figures
References
-
- Kumar M., Michael S., Alvarado-Valverde J., Mészáros B., Sámano-Sánchez H., Zeke A., Dobson L., Lazar T., Örd M., Nagpal A., Farahi N., Käser M., Kraleti R., Davey N. E., Pancsa R., Chemes L. B., Gibson T. J., The Eukaryotic Linear Motif resource: 2022 release. Nucleic Acids Res 50, D497–D508 (2022). - PMC - PubMed
-
- Ball L. J., Kühne R., Hoffmann B., Häfner A., Schmieder P., Volkmer-Engert R., Hof M., Wahl M., Schneider-Mergener J., Walter U., Oschkinat H., Jarchau T., Dual epitope recognition by the VASP EVH1 domain modulates polyproline ligand specificity and binding affinity. EMBO J 19, 4903–4914 (2000). - PMC - PubMed
Publication types
Grants and funding
LinkOut - more resources
Full Text Sources
Research Materials
Miscellaneous