Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 1998 May 26;95(11):5857-64.
doi: 10.1073/pnas.95.11.5857.

SMART, a simple modular architecture research tool: identification of signaling domains

Affiliations

SMART, a simple modular architecture research tool: identification of signaling domains

J Schultz et al. Proc Natl Acad Sci U S A. .

Abstract

Accurate multiple alignments of 86 domains that occur in signaling proteins have been constructed and used to provide a Web-based tool (SMART: simple modular architecture research tool) that allows rapid identification and annotation of signaling domain sequences. The majority of signaling proteins are multidomain in character with a considerable variety of domain combinations known. Comparison with established databases showed that 25% of our domain set could not be deduced from SwissProt and 41% could not be annotated by Pfam. SMART is able to determine the modular architectures of single sequences or genomes; application to the entire yeast genome revealed that at least 6.7% of its genes contain one or more signaling domains, approximately 350 greater than previously annotated. The process of constructing SMART predicted (i) novel domain homologues in unexpected locations such as band 4.1-homologous domains in focal adhesion kinases; (ii) previously unknown domain families, including a citron-homology domain; (iii) putative functions of domain families after identification of additional family members, for example, a ubiquitin-binding role for ubiquitin-associated domains (UBA); (iv) cellular roles for proteins, such predicted DEATH domains in netrin receptors further implicating these molecules in axonal guidance; (v) signaling domains in known disease genes such as SPRY domains in both marenostrin/pyrin and Midline 1; (vi) domains in unexpected phylogenetic contexts such as diacylglycerol kinase homologues in yeast and bacteria; and (vii) likely protein misclassifications exemplified by a predicted pleckstrin homology domain in a Candida albicans protein, previously described as an integrin.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Calibration of thresholds. Selection of thresholds from the distributions of SH3 domain scores. (Upper) A histogram of SWise scores for the best match (optimal alignment; in green) of proteins with a SH3 domain profile. (Lower) Similar histograms for the second- and third-best matches (suboptimal alignments; in light blue and dark blue, respectively). Optimal alignment scores less than threshold Tp are mostly derived from sequences considered unlikely to contain SH3 domain homologues. Threshold Tp was selected as the lowest scoring true positive. Domains that are repeated twice or more in the same protein that each score above a lower threshold (Tr) are considered to be true negatives.
Figure 2
Figure 2
Schematic representations, produced using SMART, of the domain architectures of proteins discussed in the text. See Table 1 for the identified domains; gray lines (no SMART match) might contain other known domains not included in SMART. Putative homologues were identified during SWise (16) searches and/or psi-blast (1) searches (E < 0.01). (a) Domain recognition: A novel PTB domain was identified in tensin, resulting in completion of its modular architecture assignment. A psi-blast search with a previously predicted PTB domain in C. elegans F56D2.1 (53) yields the tensin PTB after four passes. Prediction of molecular function via domain hit: Identification of a domain homologous to band 4.1 protein in focal adhesion kinase (FAK) isoforms. FAKs are predicted to bind cytoplasmic portions of integrins in a similar manner to that of talin, another band 4.1 domain-containing protein. A psi-blast search with a band 4.1-like domain (41 HUMAN, residues 206–401) revealed band 4.1-like domains in human, bovine, and Xenopus FAK isoforms by pass 3. (b) Detection of new domains because of search space reduction: Putative DEP domains in ROM1 and ROM2 were identified by using SWise (16) and HMMer (14), but could not be detected by using psi-blast. Analysis of the regions surrounding identified domains revealed the presence of a novel domain in the C-terminal regions of ROM1 and ROM2 that occurs also in several Ste20-like protein kinases, and mouse citron (CNH, citron homology). A gapped blast search of the region of citron C-terminal to its PH domain (CTRO MOUSE, residues 1134–1457) reveals significant similarity with yeast ROM2 (E = 1 × 10−5). (c) Functional predictions for an entire domain family: A region of p62 known to bind ubiquitin (40), and its homologous sequence in the Drosophila protein ref(2)P, scored as the highest putative true negatives in a SWise search. We predict ubiquitin-binding functions for UBA domains. psi-blast searches were unable to corroborate this prediction. (d) Prediction of cellular functions: Although not indicated in the primary sources (43, 44), a DEATH domain was found in rcm and other UNC5 homologues, in agreement with a previous claim (41). At the molecular level, this domain in UNC5 is predicted to form a heterotypic dimer with an homologous domain in UNC44 implying a cellular role in axon guidance. A gapped blast search with the known DEATH domain of death-associated protein kinase (DAPK HUMAN, residues 1304–1396) predicts a DEATH domain in rat UNC5H1 with E = 9 × 10−3). (e) Signaling domains in “disease genes”: Pyrin or marenostrin, a protein that is mutated in patients with Mediterranean fever and is similar to butyrophilin, contains a SPRY domain. psi-blast with the SPRY domain of human DDX1 (EMBL:X70649, residues 124–240) yields a butyrophilin homologue by pass 5 and pyrin/marenostrin (residues 663–759) by pass 7. (f) Homologues of domains involved in eukaryotic signaling may not be eukaryotic-specific: DAG kinases have been found previously in mammals, invertebrates, plants, and slime mold. However, it is apparent that DAG kinase homologues of unknown function are present in yeasts and in eubacteria (see Fig. 3). A gapped blast search with Bacillus subtilis bmrU (BMRU BACSU) yields significant similarities with Arabidopsis thaliana DAG kinase (KDG1 ARATH; E = 4 × 10−4) and a Schizosaccharomyces pombe ORF (SPAC4A8.07c; E = 1 × 10−7). (g) Identification of potential misclassifications: A PH domain and the lack of an obvious transmembrane sequence indicates a cytoplasmic and signaling role for a protein (INT1 CANAL) previously thought to be a yeast integrin. A psi-blast search with the N-terminal PH domain of pleckstrin yielded INT1 CANAL in pass 3.
Figure 3
Figure 3
Multiple alignments of selected RasGEFN domains. A conserved region was found in the N-terminal regions of several proteins with RasGEF (Cdc25-like) domains (37). Surprisingly, this N-terminal domain may be present in the sequence either close to, of far from, the RasGEF domain. A psi-blast search using a region (residues 898–946) of C. albicans Cdc25 (CC25 CANAL) and E < 0.01, identified each of the sequences in Fig. 3 within nine passes before convergence. Predicted (54) secondary structure and 90% consensus sequences are shown beneath the alignments; SwissProt/PIR/EMBL accession codes and residue limits are given after the alignments. Residues are colored according to the consensus sequence [green: hydrophobic (h), ACFGHIKLMRTVWY; blue: polar (p), CDEHKNQRST; red: small (s), ACDGNPSTV; red: tiny (u), AGS; cyan: turn-like (t), ACDEGHKNQRST; green: aliphatic (l), ILV; and, magenta: alcohol (o), ST). The SwissProt sequence KMHC DICDI has been altered to account for probable frameshifts.

Similar articles

Cited by

References

    1. Altschul S F, Madden T L, Schaffer A A, Zhang J, Zhang Z, Miller W, Lipman D J. Nucleic Acids Res. 1997;25:3389–3402. - PMC - PubMed
    1. Pearson W R. Genomics. 1991;11:635–650. - PubMed
    1. Doolittle R F. Annu Rev Biochem. 1995;64:287–314. - PubMed
    1. Bork P, Downing A K, Kieffer B, Campbell I D. Q Rev Biophys. 1996;29:119–167. - PubMed
    1. Bork P, Schultz J, Ponting C P. Trends Biochem Sci. 1997;22:296–298. - PubMed

Publication types

LinkOut - more resources