Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2008 Oct 14:8:285.
doi: 10.1186/1471-2148-8-285.

Just how versatile are domains?

Affiliations

Just how versatile are domains?

January Weiner 3rd et al. BMC Evol Biol. .

Abstract

Background: Creating new protein domain arrangements is a frequent mechanism of evolutionary innovation. While some domains always form the same combinations, others form many different arrangements. This ability, which is often referred to as versatility or promiscuity of domains, its a random evolutionary model in which a domain's promiscuity is based on its relative frequency of domains.

Results: We show that there is a clear relationship across genomes between the promiscuity of a given domain and its frequency. However, the strength of this relationship differs for different domains. We thus redefine domain promiscuity by defining a new index, DV I ("domain versatility index"), which eliminates the effect of domain frequency. We explore links between a domain's versatility, when unlinked from abundance, and its biological properties.

Conclusion: Our results indicate that domains occurring as single domain proteins and domains appearing frequently at protein termini have a higher DV I. This is consistent with previous observations that the evolution of domain re-arrangements is primarily driven by fusion of pre-existing arrangements and single domains as well as loss of domains at protein termini. Furthermore, we studied the link between domain age, defined as the first appearance of a domain in the species tree, and the DV I. Contrary to previous studies based on domain promiscuity, it seems as if the DV I is age independent. Finally, we find that contrary to previously reported findings, versatility is lower in Eukaryotes. In summary, our measure of domain versatility indicates that a random attachment process is sufficient to explain the observed distribution of domain arrangements and that several views on domain promiscuity need to be revised.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Comparing different measures of domain promiscuity. Comparison of the different measures of versatility showing that they are correlated with the number of occurrences of a domain. Data were obtained from Pfam (for details refer to methods). Each point represents a different domain. Left, correlation with the number of occurrences of a domain. Right, correlation with the number of immediate neighbours. N – number of occurrences, NCO – co-occurrences, NN – number of direct neighbours, NTRP – number of triplets. Spearman rank correlation coefficients between the different measures are given in the respective panels.
Figure 2
Figure 2
The relationship between N and NN for selected examples. A) Correlation between the number of occurrences (N) and number of neighbours (NN) for the methyltransferase domain (PF08241) and the Sushi domain (PF00084) (corrected for repeats, see Methods, DVI calculation). Each data point corresponds to the number of occurrences and the number of neighbours that a domain has in one genome. B) Correlation between the number of occurrences (N) and number of neighbours (NN) for selected domains. Each data point the corresponds to the number of occurrences and the number of neighbours that a domain has in one genome. Domain ID, description and DV I are given in the left upper corner of the respective graph. For a definition of DV I, see section "The domain versatility index".
Figure 3
Figure 3
Examplary calculation of the DVI. Exemplary calculation of the DV I. Sets of proteins belonging to two distinct genomes are indicated as strings of domains represented by boxes in the top left. The occurrence of two exemplary domains, A and B, is displayed in the table, along with two measures of domain promiscuity. N denotes the total occurrence, NN the total number of direct neighbours and NCO the total number of co-occurrences for a given domain in its respective genome. Grey shaded fields within the NN and NCO fields indicate the specific domains that yield the respective values. In essence, the DV I represents the strength of the relationship between N and NN, indicated by the graph to the right. Each line represents a domain as indicated by associated boxes. The slope for the two domains, A and B, signifies the DV I. The desired unlinking of the versatility measurement from the total occurrence is clearly illustrated; despite the overall lower occurrence of domain B, it tends to form new combinations more readily indicated by the steeper slope in the relationship between N and NN.
Figure 4
Figure 4
The relationship between the DVI, domain position and domain age. Left: Domain age and the DV I. OLD – domains that are common to all three main branches of life (Bacteria, Archea, Eukaryota); MID – domains that are present in all taxons of one of these branches (e.g. domains that can be found only in Bacteria, but not in Archea or Eukaryota); NEW – domains that are present only in one subgroup of one of these branches (e.g. domains that occur only in vertebrates). Right: DV I and position of the domain within the protein. NTERM – N-terminal domains; NTERM1 – next-to N-terminal domains in proteins with four domains or more; CTERM – C-terminal domains; CTERM1 – next-to N-terminal domains in proteins with four domains or more; MID – all remaining (non-terminal) domains; SINGLE – domains in single-domain proteins. On the y axis, domain versatility index (DV I). Bold line denotes the median; boxes denote the firstand second quartiles; whiskers show the minimum and maximum values not including outliers.

References

    1. Doolittle R, Bork P. Evolutionarily mobile modules in proteins. Sci Am. 1993;269:50–6. - PubMed
    1. Doolittle RF. The origins and evolution of eukaryotic proteins. Philos Trans R Soc Lond B Biol Sci. 1995;349(1329):235–240. doi: 10.1098/rstb.1995.0107. - DOI - PubMed
    1. Doolittle RF. The multiplicity of domains in proteins. Annu Rev Biochem. 1995;64:287–314. doi: 10.1146/annurev.bi.64.070195.001443. - DOI - PubMed
    1. Bornberg-Bauer E, Beaussart F, Kummerfeld SK, Teichmann SA, Weiner J 3rd. The evolution of domain arrangements in proteins and interaction networks. Cell Mol Life Sci. 2005;62:435–45. doi: 10.1007/s00018-004-4416-1. - DOI - PMC - PubMed
    1. Copley RR, Doerks T, Letunic I, Bork P. Protein domain analysis in the era of complete genomes. FEBS Lett. 2002;513:129–134. doi: 10.1016/S0014-5793(01)03289-6. - DOI - PubMed

Publication types

LinkOut - more resources