Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2006 Jul 1;34(Web Server issue):W350-5.
doi: 10.1093/nar/gkl159.

DILIMOT: discovery of linear motifs in proteins

Affiliations

DILIMOT: discovery of linear motifs in proteins

Victor Neduva et al. Nucleic Acids Res. .

Abstract

Discovery of protein functional motifs is critical in modern biology. Small segments of 3-10 residues play critical roles in protein interactions, post-translational modifications and trafficking. DILIMOT (DIscovery of LInear MOTifs) is a server for the prediction of these short linear motifs within a set of proteins. Given a set of sequences sharing a common functional feature (e.g. interaction partner or localization) the method finds statistically over-represented motifs likely to be responsible for it. The input sequences are first passed through a set of filters to remove regions unlikely to contain instances of linear motifs. Motifs are then found in the remaining sequence and ranked according to a statistic that measure over-representation and conservation across homologues in related species. The results are displayed via a visual interface for easy perusal. The server is available at http://dilimot.embl.de.

PubMed Disclaimer

Figures

Figure 1
Figure 1
The server process and output. (A) Schematic showing how submitted sequences are filtered, motifs found and arranged into a ranked list sorted by P (left). When the species is provided, sequences are assigned to the orthologous groups, species–specific probabilities for over-represented motifs are calculated (coloured box) the list resorted by SCONS (right). (B) Example of server output. A list of putative motifs is reported in an interactive table (left), which gives general details for each of them. Clicking on each motif launches an additional page (right) showing sequences containing the motif, where the motif is found in them and the degree to which the motif is conserved in related species. Motif locations (red bars) and other features found in the sequences, such as domains, are shown graphically and detailed below each image.
Figure 2
Figure 2
The EB1 motif SxIP detected by the server. (A) A sequence logo (27) for the EB1 binding motif, generated using all instances of the motif in the input set. (B) Examples of EB1 binding proteins from the input set (represented as boxes) and multiple alignments of putative motif containing regions. Dark blue regions in the boxes denote those removed by the domain and redundancy filters. A known EB1 binding region (in APC) lies at the C-terminus of a Pfam domain. To avoid its removal, we simply cut the sequence down to this region alone (switching the Pfam filter off will have similar effect). Sequences for the motif-containing region are shown aligned to the best homologues in closely related species. Amino acids in the alignments are coloured according to residue type: blue, positive; red, negative; light-blue, small; yellow, hydrophobic; green, aromatic; magenta, polar; Proline, orange. Positions within the predicted motif are denoted by red triangles. Species abbreviations: Hsa, H.sapiens; Mmu, M.musculus; Rno, R.norwegicus; Gga, G.gallus; Fru, F.rubripes; Cgi, Candida glabrata; Kla, Kluyveromyces lactis; Kwa, Kluyveromyces waltii; Ego, Eremothecium gossypii; Sce, Saccharomyces cerevisiae; Dha, Debaryomyces hansenii.
Figure 3
Figure 3
Features of known linear motifs. (A) Distributions of length (red), number of specified (i.e. non-‘x’; green) and invariant (i.e. a single specific residue; blue) positions for 120 known linear motifs extracted from the ELM database (7). Note that four motifs with lengths of 13–18 are not shown in the first (red) plot for clarity. (B) Degree to which residues are over-represented in known motifs. Numbers show the ratio of the abundance of the residue within the 120 motifs from ELM to the abundance in globular domains as computed from the protein databank [PDB; (28)]. ‘ALL’ includes all 120, ‘LIG’ are the 66 ligand binding, ‘TRG’ the 16 targeting and ‘MOD’ the 30 modification site motifs. For 7 of 40 residues in the latter two categories there were too few counts to obtain a confident measurement (i.e. <5); these are denoted by an asterix. Note that we have not included a fourth ELM category CLV, which includes protein cleavage sites, as there were too few examples to compute meaningful numbers. Colour scheme: red, strongly favoured in linear motifs compared to globular proteins; orange, moderately favoured; light-blue moderately disfavoured; blue strongly disfavoured.

Similar articles

Cited by

References

    1. Letunic I., Copley R.R., Schmidt S., Ciccarelli F.D., Doerks T., Schultz J., Ponting C.P., Bork P. SMART 4.0: towards genomic data integration. Nucleic Acids Res. 2004;32:D142–D144. - PMC - PubMed
    1. Bateman A., Birney E., Cerruti L., Durbin R., Etwiller L., Eddy S.R., Griffiths-Jones S., Howe K.L., Marshall M., Sonnhammer E.L. The Pfam protein families database. Nucleic Acids Res. 2002;30:276–280. - PMC - PubMed
    1. Eddy S.R. Profile hidden Markov models. Bioinformatics. 1998;14:755–763. - PubMed
    1. Madera M., Gough J. A comparison of profile hidden Markov model procedures for remote homology detection. Nucleic Acids Res. 2002;30:4321–4328. - PMC - PubMed
    1. Bork P., Gibson T.J. Applying motif and profile searches. Meth. Enzymol. 1996;266:162–184. - PubMed

Publication types

Substances