Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Jul 15;31(14):2284-93.
doi: 10.1093/bioinformatics/btv155. Epub 2015 Mar 19.

QSLiMFinder: improved short linear motif prediction using specific query protein data

Affiliations

QSLiMFinder: improved short linear motif prediction using specific query protein data

Nicolas Palopoli et al. Bioinformatics. .

Abstract

Motivation: The sensitivity of de novo short linear motif (SLiM) prediction is limited by the number of patterns (the motif space) being assessed for enrichment. QSLiMFinder uses specific query protein information to restrict the motif space and thereby increase the sensitivity and specificity of predictions.

Results: QSLiMFinder was extensively benchmarked using known SLiM-containing proteins and simulated protein interaction datasets of real human proteins. Exploiting prior knowledge of a query protein likely to be involved in a SLiM-mediated interaction increased the proportion of true positives correctly returned and reduced the proportion of datasets returning a false positive prediction. The biggest improvement was seen if a short region of the query protein flanking the interaction site was known.

Availability and implementation: All the tools and data used in this study, including QSLiMFinder and the SLiMBench benchmarking software, are freely available under a GNU license as part of SLiMSuite, at: http://bioware.soton.ac.uk.

PubMed Disclaimer

Figures

<b>Fig. 1</b>.
Fig. 1.
Example reduction of LIG_PCNA motif definition. Each instance of the motif was aligned and used to generate a new motif definition in which only the high frequency recurring residues are included. For each position, amino acids occurring in at least three sequences are identified (bold, highlighted, centre panel). The summed frequency of these amino acids was then calculated and positions with a combined frequency ≥75% were redefined based on these amino acids alone (centre panel). Instances matching the new definition were identified (highlighted, left panel) and the process repeated for this subset (right panel) to produce the final ELMred definition and instances
<b>Fig. 2</b>.
Fig. 2.
(a) ELMBench dataset generation. ELMs are first reduced to only those datasets for which SLiMFinder or QSLiMFinder could theoretically find the ELMred based on the signal within the data (information content of motif and number of unrelated occurrences). For each ELM analysed, each protein is taken in turn and used as a query. Each query is masked at six levels of resolution: (i) Full-length protein; (ii) 300 amino acid window, centred on motif where possible; (iii) 100 amino acid window; (iv) 50 amino acid window; (v) ELM instance plus 2 × 5 amino acid flanking sequences and (vi) ELM instance region only. (b) SimBench dataset generation. ELMred definitions with a normalized IC ≥ 3.0 were searched against the human proteome and 10 queries selected (with replacement) to seed 10 replicate datasets. Next, additional ELMred-positive proteins were selected at random (without replacement) to make a total of 5 or 10 positive proteins and further human proteins selected at random (without replacement) to make the final simulated datasets of different total sizes (TP×1, ×2, ×5, ×10 and ×20). As with ELMBench, the SimBench queries are masked at same six different levels of site resolution
<b>Fig. 3</b>.
Fig. 3.
Comparison of QSLiMFinder (QSF, top rows) and SLiMFinder (SF, bottom rows) results for the ELMBench data after searching for true instances of an ELM using a region containing the ELM plus five flanking residues at each side. For each dataset, indicated by its ELM name, the percentage of Queries returning the TP motif at different significance cutoffs is shown. ELMred patterns below each ELM name were used to assess predictions for both QSLiMFinder and SLiMFinder. Fill intensity represents the percentage of queries that return the TP motif according to the scale on the lower right. Disorder masking (IUPred ≥ 0.2) was used for all analysis. ELMs for which neither method returned a TP prediction are not shown
<b>Fig. 4</b>.
Fig. 4.
Comparison of (a) QSLiMFinder (QSF) and (b) SLiMFinder (SF) results on SimBench datasets after searching with fragments of the Query protein of decreasing size. SN, the proportion of datasets returning a TP, is plotted against FPX, the proportion of datasets returning a FP, at different SLiMChance significance cut-offs (0.1, 0.05, 0.01, 0.005, 0.001, 5e-04, 1 e-04). Searches were made with the whole protein (‘none’, circles), with a window of five residues flanking the known ELM at each side (‘flank5’, triangles) or with the region of the motif only (‘site’, squares). For clarity, plots are truncated at the least significant cut-off for which FPX = 0
<b>Fig. 5</b>.
Fig. 5.
Comparison of the effect of incorporating ambiguity on motif definition on the proportion of SimBench datasets returning (a) at least one TP (SN) and (b) at least one FP (FPX) when searches are performed using QSLiMFinder (QSF) and SLiMFinder (SF). Results are plot at different SLiMChance significance cut-offs (0.05, 0.01, 0.005, 0.001, 5 e-04, 1 e-04, 1 e-05, 1 e-06, 1 e-07, 1 e-08, 1 e-09, 1 e-10; in panel (b) results are truncated at 1 e-04, the least significant cut-off for which FPX = 0.) Searches were made with the whole protein (‘none’, circles), with a window of five residues flanking the known ELM at each side (‘flank5’, triangles) or with the region of the motif only (‘site’, squares)
<b>Fig. 6</b>.
Fig. 6.
Comparison of QSLiMFinder (QSF) results on SimBench datasets with different masking strategies. The proportion of datasets returning a true motif (SN) is plotted against the proportion of datasets returning a false hit (FPX) for average values of controlled signal-noise combinations at each different SLiMChance significance cut-off (0.05, 0.01, 0.005, 0.001, 5 e-04, 1 e-04, 5 e-05). Searches were made (a) without further masking of the query (‘Nomask’, squares), (b) masking out disordered regions (‘Dismask’, triangles) or (c) masking out both disordered and evolutionary conserved positions (‘Bothmask’, circles). Results were obtained with (a) the whole protein as the query, (b) with a window of five residues at each side of the known motif or (c) with the motif only. For clarity, plots are truncated at the least significant cut-off for which FPX = 0
<b>Fig. 7</b>.
Fig. 7.
Comparison of (a) QSLiMFinder (QSF) and (b) SLiMFinder (SF) results on SimBench datasets with different signal-to-noise ratios. The proportion of datasets returning a true motif (SN) is plotted against the proportion of datasets returning a false hit (FPX) at each different SLiMChance significance cut-off (0.1, 0.05, 0.01, 0.005, 0.001, 5 e-04, 1 e-04). Selected combinations of signal (5, open symbols; 10, filled symbols) and dataset sizes (5, circles; 10, diamonds; 50, squares; 100, triangles) are displayed. Searches were made using the whole protein with disorder masking. For clarity, plots are truncated at the least significant cut-off for which FPX = 0

References

    1. Babu M.M., et al. (2011) Intrinsically disordered proteins: regulation and disease. Curr. Opin. Struct. Biol., 21, 432–440. - PubMed
    1. Berman H.M., et al. (2000) The Protein Data Bank. Nucleic Acids Res., 28, 235–242. - PMC - PubMed
    1. Bruning J.B., Shamoo Y. (2004) Structural and thermodynamic analysis of human PCNA with peptides derived from DNA polymerase-delta p66 subunit and flap endonuclease-1. Structure, 12, 2209–2219. - PubMed
    1. Davey N.E., et al. . (2006) SLiMDisc: short, linear motif discovery, correcting for common evolutionary descent. Nucleic Acids Res., 34, 3546–3554. - PMC - PubMed
    1. Davey N.E., et al. . (2009) Masking residues using context-specific evolutionary conservation significantly improves short linear motif discovery. Bioinformatics, 25, 443–450. - PubMed

Publication types