Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Jan;34(1):e70004.
doi: 10.1002/pro.70004.

PairK: Pairwise k-mer alignment for quantifying protein motif conservation in disordered regions

Affiliations

PairK: Pairwise k-mer alignment for quantifying protein motif conservation in disordered regions

Jackson C Halpin et al. Protein Sci. 2025 Jan.

Abstract

Protein-protein interactions are often mediated by a modular peptide recognition domain binding to a short linear motif (SLiM) in the disordered region of another protein. To understand the features of SLiMs that are important for binding and to identify motif instances that are important for biological function, it is useful to examine the evolutionary conservation of motifs across homologous proteins. However, the intrinsically disordered regions (IDRs) in which SLiMs reside evolve rapidly. Consequently, multiple sequence alignment (MSA) of IDRs often misaligns SLiMs and underestimates their conservation. We present PairK (pairwise k-mer alignment), an MSA-free method to align and quantify the relative local conservation of subsequences within an IDR. Lacking a ground truth for conservation, we tested PairK on the task of distinguishing biologically important motif instances from background motifs, under the assumption that biologically important motifs are more conserved. The method outperforms both standard MSA-based conservation scores and a modern LLM-based conservation score predictor. PairK can quantify conservation over wider phylogenetic distances than MSAs, indicating that some SLiMs are more conserved than MSA-based metrics imply. PairK is available as an open-source python package at https://github.com/jacksonh1/pairk. It is designed to be easily adapted for use with other SLiM tools and for diverse applications.

Keywords: conservation; intrinsically disordered proteins; multiple sequence alignment; short linear motif.

PubMed Disclaimer

Update of

Similar articles

References

    1. Acevedo LA, Greenwood AI, Nicholson LK. A noncanonical binding site in the EVH1 domain of vasodilator‐stimulated phosphoprotein regulates its interactions with the Proline rich region of Zyxin. Biochemistry. 2017;56:4626–4636. - PMC - PubMed
    1. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, et al. Gene ontology: tool for the unification of biology. Nat Genet. 2000;25:25–29. - PMC - PubMed
    1. Ball LJ, Kühne R, Hoffmann B, Häfner A, Schmieder P, Volkmer‐Engert R, et al. Dual epitope recognition by the VASP EVH1 domain modulates polyproline ligand specificity and binding affinity. EMBO J. 2000;19:4903–4914. - PMC - PubMed
    1. Bashaw GJ, Kidd T, Murray D, Pawson T, Goodman CS. Repulsive axon guidance: Abelson and enabled play opposing roles downstream of the roundabout receptor. Cell. 2000;101:703–715. - PubMed
    1. Benz C, Ali M, Krystkowiak I, Simonetti L, Sayadi A, Mihalic F, et al. Proteome‐scale mapping of binding sites in the unstructured regions of the human proteome. Mol Syst Biol. 2022;18:e10584. - PMC - PubMed

LinkOut - more resources