. 2022 Jan;18(1):e10584.

doi: 10.15252/msb.202110584.

Proteome-scale mapping of binding sites in the unstructured regions of the human proteome

Affiliations

¹ Department of Chemistry - BMC, Uppsala University, Uppsala, Sweden.
² Division of Cancer Biology, The Institute of Cancer Research, London, UK.
³ Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden.

PMID: 35044719
PMCID: PMC8769072
DOI: 10.15252/msb.202110584

Proteome-scale mapping of binding sites in the unstructured regions of the human proteome

Caroline Benz et al. Mol Syst Biol. 2022 Jan.

. 2022 Jan;18(1):e10584.

doi: 10.15252/msb.202110584.

Affiliations

¹ Department of Chemistry - BMC, Uppsala University, Uppsala, Sweden.
² Division of Cancer Biology, The Institute of Cancer Research, London, UK.
³ Department of Medical Biochemistry and Microbiology, Uppsala University, Uppsala, Sweden.

PMID: 35044719
PMCID: PMC8769072
DOI: 10.15252/msb.202110584

Abstract

Specific protein-protein interactions are central to all processes that underlie cell physiology. Numerous studies have together identified hundreds of thousands of human protein-protein interactions. However, many interactions remain to be discovered, and low affinity, conditional, and cell type-specific interactions are likely to be disproportionately underrepresented. Here, we describe an optimized proteomic peptide-phage display library that tiles all disordered regions of the human proteome and allows the screening of ~ 1,000,000 overlapping peptides in a single binding assay. We define guidelines for processing, filtering, and ranking the results and provide PepTools, a toolkit to annotate the identified hits. We uncovered >2,000 interaction pairs for 35 known short linear motif (SLiM)-binding domains and confirmed the quality of the produced data by complementary biophysical or cell-based assays. Finally, we show how the amino acid resolution-binding site information can be used to pinpoint functionally important disease mutations and phosphorylation events in intrinsically disordered regions of the proteome. The optimized human disorderome library paired with PepTools represents a powerful pipeline for unbiased proteome-wide discovery of SLiM-based interactions.

Keywords: intrinsically disordered regions; peptides; phage display; protein-protein interactions; short linear motifs.

PubMed Disclaimer

Figures

**Figure 1. ProP‐PD workflow, library design and quality, and initial evaluation of selection results**
Schematic visualization of library design, cloning process, phage selection, and data analysis.
Two main library parameters were explored: (i) comparing selection results from the whole HD2 library versus sublibraries grouped by subcellular localization, and (ii) the display of the HD2 peptide library design on phage proteins P8 (multivalent, HD2 P8) and P3 (monovalent, HD2 P3), respectively.
Comparison of the percentage of peptides that are reproduced in pairwise comparisons between replicate selections for the same bait (blue), for the same control bait (green) and for different bait proteins (red).
Comparison of the percentage of selected peptides that are overlapping in pairwise comparisons between replicate selections for the same bait (blue), for the same control bait (green), and for different bait proteins (red).
Comparison of the log₁₀ enrichment probability of the ELM defined motif consensus in peptides selected for the correct consensus‐binding bait (blue) and all other baits (red).
Comparison of the CompariMotif similarity of the *de novo* SLiMFinder‐defined enriched motif in the overlapping and replicated peptides against the established ELM consensus for the bait (blue) and against all other ELM classes (red).
Selection quality metrics split per bait. Data include metrics from panels (C) through (F). Enriched *de novo* consensus shows the P‐value of the SLiMFinder‐discovered enriched motif, and Enriched Interactors show the probability the selection returning the observed number of previously validated interactors for the bait by chance. Asterisk denotes no motif defined for the bait. Data for the panel are available in Dataset EV4.
Data information: Boxen plots (C–F) are used to more accurately visualize the distribution of values. The central section has two blocks each containing 25% of the data split by the median (denoted by a dark black bar) and each additional block represents 50% of the data of the previous block. Sample sizes are (C) and (D): n^bait‐bait = 358, n^{control‐control} = 156 and n^bait‐other = 23,276, (E): n^bait‐bait = 61 and n^bait‐other = 7,633, (F): n^bait‐bait = 40 and n^bait‐other = 1,560.

**Figure 2. Benchmarking of metrics for ranking of ProP‐PD results, evaluation of motif rediscovery, and interactome to interactome comparisons of results**
ROC curves of the metrics used to assign confidence levels.
Boxen plot of the number of replicated peptides for motif‐containing peptides from the benchmarking datasets (blue) compared to all other selected peptides (red).
As panel (B), showing overlapping peptides.
As panel (B), showing the PSSM‐derived specificity determinant score defining the similarity of the selected peptides to the SLiMFinder‐discovered enriched motif. Score is log₁₀ of the PSSMSearch PSSM probability.
As panel (B), showing log₁₀ of the normalized peptide count.
As panel (B), showing the consensus confidence level defined based on the replicated peptides, overlapping peptides, specificity determinant match, and normalized peptide count.
The predictive power, defined by the area under the ROC curve (AUC) and Mann–Whitney–Wilcoxon two‐sided test with Bonferroni correction P‐value (M‐W p), of the four confidence metrics and the consensus confidence level metric.
Benchmarking statistics of the four consensus confidence levels and the high/medium confidence levels grouped. Recall calculated on motif instances against the benchmarking dataset of 337 motif instances. Precision calculated as the number of motif‐containing peptides over the number of peptides at given confidence level.
Partial network of ProP‐PD‐derived high/medium interactors of the NEDD4 WW4. Shown interactions are annotated as WW domain ligands in the ELM resource (black) or curated from the literature (orange). Line thickness indicates the number of quality metrics fulfilled by the hit (4, 3, or 2).
Peptides matching previously validated NEDD4 binding peptides from panel (I) annotated with the number of replicates (#R) and the overlapping peptides (#O; gray denotes two overlapping peptides for the region and green denotes three overlapping peptides).
Interaction‐centric benchmarking metrics of the ProP‐PD, BioPlex, and HuRI based on the 302 unique motif‐mediated interactions for the 337 motif instances from the motif benchmarking dataset. Found is the number of motif‐mediated interactions from the benchmarking dataset that were rediscovered by each method, interactions are the total number of interactions returned by each method for the baits in the motif benchmarking dataset.
Overlap of previously validated motif‐based PPIs (N = 302) in the ProP‐PD benchmarking dataset rediscovered by ProP‐PD, BioPlex, and HuRI.
PABPC1 PPI network for proteins containing high/medium confidence peptides and annotated with BioPlex (magenta) interaction data. Edge width represents ProP‐PD confidence level. Black dots represent peptides that overlap with a known ELM instance. HuRI did not return any of these interactions.
Overlap between the ProP‐PD interactions and interactions in the HIPPIE database.
Data information: Boxen plots (B‐F) are used to more accurately visualize the distribution of values. The central section has two blocks each containing 25% of the data split by the median (denoted by a dark black bar) and each additional block represents 50% of the data of the previous block. Asterisks denote the likelihood of the null hypothesis that the distribution underlying each sample is the same using a Mann–Whitney U test (****P‐value = < 1.0 × 10⁻⁴). Sample sizes are n^motif = 144 and n^other = 18,679.

**Figure 3. ProP‐PD selections capture interactions with a broad range of affinities**
Sequence logos for the indicated bait proteins generated by PepTools using the medium/high confidence set of ligands.
Structures of KEAP1 Kelch, MDM2 SWIB, TLN1 PTB, and KPNB1 HEAT with the sequences of the bound peptides indicated (PDB codes 2FLU, 1YCR, 2G35, and 1O6O). Larger letters indicate residues that make up the consensus motifs.
FP affinity determinations. Affinities were measured by first determining the K _D value of FITC‐labeled probe peptides, and then determining the affinities for unlabeled peptides through competition experiments. All experiments were performed in triplicates (source data are provided). See Dataset EV6 for more details.
Partial network of KPNB1 ligands. Edge thickness reflects the confidence levels. Gray dot indicates that the peptide has a FxFG motif, red dot indicates FxF‐coo⁻ motif. Previously known ligands reported in the HIPPIE database are indicated by yellow circle.
Schematic of KPNB1's role in nuclear transport together with identified FxF(G/‐coo⁻) containing ligands. The multitude of FxFG repeats in NUP213, POM121/C, and NUP153 are indicated by yellow bars. Arrowheads indicate the KPNB1 binding sites identified in HD2 selections.
Sequence alignment of identified KPNB1‐binding peptides from proteins involved in nuclear transport (gray, two overlapping peptides for the region; green, three overlapping peptides; red, four overlapping peptides).

**Figure 4. Library design parameters can influence data quality**
Per bait comparison of the proportion of findable motifs in the ProP‐PD motif benchmarking dataset found by each library.
Overview of PEX14‐binding peptides in PEX5 returned from different libraries (motif region highlighted in light blue, motif residues in bold).
Summary statistics of the data in panel (A) comparing the recall and precision of the selections against the HD2 P8 library and sublibraries or the HD2 P3 library. HD2 P8 recall is calculated on the subset of motif instances that are present in the compared library.
Amino acid frequency (green color) in (i) the human proteome, (ii) the predicted IDRs, (iii) the HD2 library design, (iv) the binding enriched phage pools from selections against the HD2 P8 library, and (v) the combinatorial peptide phage display. The log₂ of the relative amino acid frequencies of HD2 P8 and combinatorial peptide phage display versus the amino acid frequencies of predicted IDRs are shown in a gradient from blue to red. Note the significant enrichment of tryptophan and the depletion of lysine in the data from combinatorial peptide phage display selections (z‐score > 2 indicated by white asterisk) but not the ProP‐PD results.
Source data are available online for this figure.

**Figure 5. KPNA4‐binding peptides are functional NLSs**
Sequence logos of four different NLS classes binding to KPNA4 generated using PepTools.
Structure of KPNA2 (PDB:1PJN, minor groove peptide PDB:3ZIP) with ligands bound to the major (purple) and minor groove (blue).
Representative cellular localization experiment. HEK293 cells were transiently transfected with the NLS sensor and fixed 36 h after transfection, and imaged using epifluorescence microscopy. The nucleus was stained with DAPI. (n = 3, independent experiments; the scale bar indicates 10 μm).
FP competition experiments using FITC‐Myc_320–328 as a probe for the major groove (blue) or FITC‐NCOR2_1307–1322 as a probe for the minor groove and competing with unlabeled DMTF1_44–59, KDR_958–973 and TPX2_312–327 peptides. (n = 3, technical replicates, shown are individual data points. Source data are provided).
Sequences of tested NLSs together with the outcome of the affinity measurement through FP and localization of the GFP‐tagged peptides (see Appendix Fig S8 for details).
Mutational analysis of identified NLSs in the context of full‐length proteins using mCherry‐tagged HJURP, SPRTN, and HNRNPC. The scale bar indicates 10 μm.
Source data are available online for this figure.

**Figure 6. The amino acid resolution binding site information allows accurate predictions of functional effects of disease mutations and PTMs**
PPI networks of KEAP1 showing reproducibly selected high/medium confidence interactions with mutations or phosphosites overlapping with the binding motif or in the flanking regions (± 2 residues).The disease‐associated mutation is colored in red (orange if not disease associated). Phosphosites are colored in gray. Dashed‐edges represent mutations or phosphosites in motif residues.
FP competition experiments of wild‐type and disease mutant peptides binding to KEAP1 Kelch using FITC‐NFE2L1_228–243 as probe (n = 3, technical replicates, shown are individual data points. Source data are provided).
Peptide sequences related to the interactions shown in panel (A).
PPI networks of KPNA4 showing reproducibly selected high/medium confidence interactions with mutations or phosphosites overlapping with the binding motif or in the flanking regions (± 2 residues).
FP competition experiments of wild‐type, disease mutant, and phospho‐peptides binding to KPNA4. The affinities of NXK2‐5 wild‐type and K194R mutant for KPNA4 were determined using FITC‐Myc_320–328 as a probe; the affinities of unphosphorylated and phosphorylated NCOR2 peptides were determined using FITC‐NCOR2_1307–1322 as probe (n = 3, technical replicates, shown are individual data points. Source data is provided).
Peptide sequences related to the interactions shown in panel (D).
Representative cellular localization experiments of the GFP‐based NLS sensor fused to wild‐type or K194R mutant NKX2‐5_192–207 peptide. HEK293 cells were transiently transfected with the NLS sensor and fixed 36 h after transfection, and imaged using epifluorescence microscopy. The nucleus was stained with DAPI. The scale bar indicates 10 μm (n = 3, independent experiments).
Peptides for additional baits with disease‐associated mutations in the consensus binding motif.
Data information: For (C), (F) and (H): Motifs are highlighted with blue background and key residues are indicated in bold letters, phosphosites are indicated by a box, and disease‐associated mutations of SLiMs are indicated in red bold letters. Source data are available online for this figure.

See this image and copyright information in PMC

References

1. Alanis‐Lobato G, Andrade‐Navarro MA, Schaefer MH (2017) HIPPIE v2.0: enhancing meaningfulness and reliability of protein‐protein interaction networks. Nucleic Acids Res 45: D408–D414 - PMC - PubMed
1. Ali M, Simonetti L, Ivarsson Y (2020) Screening Intrinsically disordered regions for short linear binding motifs. Methods Mol Biol 2141: 529–552 - PubMed
1. Anthis NJ, Haling JR, Oxley CL, Memo M, Wegener KL, Lim CJ, Ginsberg MH, Campbell ID (2009) Beta integrin tyrosine phosphorylation is a conserved mechanism for regulating talin‐induced integrin activation. J Biol Chem 284: 36700–36710 - PMC - PubMed
1. Benson DW, Silberbach GM, Kavanaugh‐McHugh A, Cottrill C, Zhang Y, Riggs S, Smalls O, Johnson MC, Watson MS, Seidman JG et al (1999) Mutations in the cardiac transcription factor NKX2.5 affect diverse cardiac developmental pathways. J Clin Invest 104: 1567–1573 - PMC - PubMed
1. Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE (2000) The protein data bank. Nucleic Acids Res 28: 235–242 - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions
Actions

Grants and funding

28159/CRUK_/Cancer Research UK/United Kingdom

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
Molecular Biology Databases
- BioCyc
- GlyGen glycoinformatics resource
Research Materials
- Addgene Non-profit plasmid repository
- NCI CPTC Antibody Characterization Program

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Proteome-scale mapping of binding sites in the unstructured regions of the human proteome

Affiliations

Proteome-scale mapping of binding sites in the unstructured regions of the human proteome

Authors

Affiliations

Abstract

Figures

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Molecular Biology Databases

Research Materials