Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2007 Oct 16:8:390.
doi: 10.1186/1471-2105-8-390.

On the detection of functionally coherent groups of protein domains with an extension to protein annotation

Affiliations

On the detection of functionally coherent groups of protein domains with an extension to protein annotation

William A McLaughlin et al. BMC Bioinformatics. .

Abstract

Background: Protein domains coordinate to perform multifaceted cellular functions, and domain combinations serve as the functional building blocks of the cell. The available methods to identify functional domain combinations are limited in their scope, e.g. to the identification of combinations falling within individual proteins or within specific regions in a translated genome. Further effort is needed to identify groups of domains that span across two or more proteins and are linked by a cooperative function. Such functional domain combinations can be useful for protein annotation.

Results: Using a new computational method, we have identified 114 groups of domains, referred to as domain assembly units (DASSEM units), in the proteome of budding yeast Saccharomyces cerevisiae. The units participate in many important cellular processes such as transcription regulation, translation initiation, and mRNA splicing. Within the units the domains were found to function in a cooperative manner; and each domain contributed to a different aspect of the unit's overall function. The member domains of DASSEM units were found to be significantly enriched among proteins contained in transcription modules, defined as genes sharing similar expression profiles and presumably similar functions. The observation further confirmed the functional coherence of DASSEM units. The functional linkages of units were found in both functionally characterized and uncharacterized proteins, which enabled the assessment of protein function based on domain composition.

Conclusion: A new computational method was developed to identify groups of domains that are linked by a common function in the proteome of Saccharomyces cerevisiae. These groups can either lie within individual proteins or span across different proteins. We propose that the functional linkages among the domains within the DASSEM units can be used as a non-homology based tool to annotate uncharacterized proteins.

PubMed Disclaimer

Figures

Figure 1
Figure 1
An illustration of the derivation of a DASSEM unit and its functional annotation. For the group of proteins shown, there are three prevalent domain compositions within individual proteins (circled). The domain fusions of these prevalent domain compositions were used to link three domains and to create a DASSEM unit that contains the fork head (FH) domain, the fork head associated (FHA) domain and the kinase domain. The overall function of the DASSEM unit was obtained by finding the GO terms that were enriched across the proteins associated with the unit. The GO term enrichment indicated that the unit participates in the cell cycle. Schematics of domains and proteins were taken from the Pfam database [89].
Figure 2
Figure 2
An illustration of the utilization of DASSEM units within a transcription module. The example transcription module is involved in the process of amino acid biosynthesis, and the DASSEM units contribute to necessary auxiliary processes. The terms listed are from the GO term "biological process" category. M- a transcription module involved in amino acid biosynthesis, 1- a unit involved in aromatic carbon metabolism, 2- a unit involved in sulfate assimilation, 3- a unit involved in serine biosynthesis, 4- a unit involved in ethanol metabolism, and 5- a unit involved in amino acid derivative metabolism. The equation for the overlap score is given along with the overlap scores for the DASSEM units in the example.
Figure 3
Figure 3
Plots of the overlap scores of doma in content of DASSEM units with that of transcription modules. Overlap scores of the DASSEM units with transcription modules are given for before (black) and after (white) randomization of the domains in the modules. The highest overlap scores, where one DASSEM unit was paired with each transcription module, are shown in panel A. The overlap scores were also calculated when a collection of five DASSEM used were paired with each transcription module. These are shown are shown in panel B. The plots indicate the DASSEM units were utilized in transcription modules, based their overlap of domain content.
Figure 4
Figure 4
Plot of the overlap scores of protein content of DASSEM units with that of transcription modules. The overlap scores of the DASSEM units with transcription modules are given for before (black) and after (white) randomization of the proteins in the modules. The highest overlap scores, where one DASSEM unit was paired with each transcription module, are shown. Student t-tests were used to compare the average overlap scores of the DASSEM unit with the original versus the randomized modules. The p-value of a Student's t-test that compared the two averages was significant at 0.004. Subsequent t-tests compared the second, third, fourth, and fifth highest overlap scores between the DASSEM units with the original or randomized modules. Their p-values were also significant at 0.0029, 0.0033, 0.0036, and 0.046 respectively.
Figure 5
Figure 5
The distributions of GO term p-values and hierarchy levels for the DASSEM units, the transcription modules, and the random protein sets. The p-values of GO term enrichments for the DASSEM units (black), the transcription modules (gray), and the random sets of proteins (white) are shown in panel A. Since the range of p-values was large, the second logarithm, i.e. the logarithm of the absolute value of the first logarithm, was plotted for ease of visualization. The plot indicates that the number of GO terms and the values of the p-values were similar between the transcription modules and the DASSEM units. In contrast, there were much less terms associated with random sets of proteins, and the p-values of these terms were less significant. Panel B shows the levels of the GO terms within the GO hierarchy. For the transcription modules and the DASSEM units the depths of the GO term levels were similar. In contrast, the terms for the random protein sets were distributed at the higher GO levels where the terms are less specific.
Figure 6
Figure 6
A schematic of how the DASSEM units were used to annotate proteins of unknown function. Domains are represented as colored blocks. If a protein of unknown function contains some of the domains of a DASSEM unit then it is likely to have all or part of the unit's function.

Similar articles

Cited by

References

    1. Koonin EV, Wolf YI, Karev GP. The structure of the protein universe and genome evolution. Nature. 2002;420:218–223. doi: 10.1038/nature01256. - DOI - PubMed
    1. Todd AE, Orengo CA, Thornton JM. Evolution of function in protein superfamilies, from a structural perspective. J Mol Biol. 2001;307:1113–1143. doi: 10.1006/jmbi.2001.4513. - DOI - PubMed
    1. Chothia C, Gough J, Vogel C, Teichmann SA. Evolution of the protein repertoire. Science. 2003;300:1701–1703. doi: 10.1126/science.1085371. - DOI - PubMed
    1. Vogel C, Bashton M, Kerrison ND, Chothia C, Teichmann SA. Structure, function and evolution of multidomain proteins. Curr Opin Struct Biol. 2004;14:208–216. doi: 10.1016/j.sbi.2004.03.011. - DOI - PubMed
    1. Orengo CA, Thornton JM. Protein families and their evolution-a structural perspective. Annu Rev Biochem. 2005;74:867–900. doi: 10.1146/annurev.biochem.74.082803.133029. - DOI - PubMed

Publication types

MeSH terms

LinkOut - more resources