Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2007 Aug 8:8:294.
doi: 10.1186/1471-2105-8-294.

Quantitative sequence-function relationships in proteins based on gene ontology

Affiliations
Comparative Study

Quantitative sequence-function relationships in proteins based on gene ontology

Vineet Sangar et al. BMC Bioinformatics. .

Abstract

Background: The relationship between divergence of amino-acid sequence and divergence of function among homologous proteins is complex. The assumption that homologs share function--the basis of transfer of annotations in databases--must therefore be regarded with caution. Here, we present a quantitative study of sequence and function divergence, based on the Gene Ontology classification of function. We determined the relationship between sequence divergence and function divergence in 6828 protein families from the PFAM database. Within families there is a broad range of sequence similarity from very closely related proteins--for instance, orthologs in different mammals--to very distantly-related proteins at the limit of reliable recognition of homology.

Results: We correlated the divergence in sequences determined from pairwise alignments, and the divergence in function determined by path lengths in the Gene Ontology graph, taking into account the fact that many proteins have multiple functions. Our results show that, among homologous proteins, the proportion of divergent functions decreases dramatically above a threshold of sequence similarity at about 50% residue identity. For proteins with more than 50% residue identity, transfer of annotation between homologs will lead to an erroneous attribution with a totally dissimilar function in fewer than 6% of cases. This means that for very similar proteins (about 50 % identical residues) the chance of completely incorrect annotation is low; however, because of the phenomenon of recruitment, it is still non-zero.

Conclusion: Our results describe general features of the evolution of protein function, and serve as a guide to the reliability of annotation transfer, based on the closeness of the relationship between a new protein and its nearest annotated relative.

PubMed Disclaimer

Figures

Figure 1
Figure 1
The minimal-length path from GO:0004452 to the root node in the molecular function ontology.
Figure 2
Figure 2
Distinction between similar and dissimilar function. We regard hydrolase activity, acting on ester bonds and oleoyl-[acyl-carrier protein] hydrolase activity, as similar functions, because their lowest common ancestor, hydrolase activity, acting on ester bonds, is not the root node of the molecular function DAG. However, we would regard hydrolase activity, acting on ester bonds and acyl carrier activity, as dissimilar functions, because their lowest common ancestor is the root node of the DAG. The Figure also illustrates the idea of the distal GO IDs that we extract from an annotation set, in this case describing proteins in the Acyl_ACP thioesterase family. Both acyl carrier activity and oleoyl-[acyl-carrier protein] hydrolase activity have no child nodes within the GO molecular function DAG. These annotations are therefore as specific as possible within the GO function classification. That is, they are distal both within the annotations of this family of proteins and in the overall GO DAG itself. The third GO ID, hydrolase activity, acting on ester bonds, annotates some proteins that are not annotated with the more precise function oleoyl-[acyl-carrier protein] hydrolase activity. For such proteins, hydrolase activity, acting on ester bonds is a distal GO ID.
Figure 3
Figure 3
Distribution of Similar functions in the EF-hand family. Figures 3-5 show that the dependence of function divergence on sequence divergence for the EF-hand family. Sequence similarities, measured by the % identical residues in optimal sequence alignment, were divided into bins of width 10%, plotted in different colors as shown in the graph. Abscissa: GO Distance; Ordinate: fraction of comparisons.
Figure 4
Figure 4
Distribution of Dissimilar functions in the EF-hand family.
Figure 5
Figure 5
Distribution of Similar + Dissimilar functions in the EF-hand family.
Figure 6
Figure 6
(a) Path in GO DAG between two annotations of proteins of the EF-hand family with Similar functions corresponding to GO distance = 7. (b) Path in GO DAG between two annotations of proteins of the EF-hand family with Dissimilar functions corresponding to GO distance = 12.
Figure 7
Figure 7
Distribution of sizes of PFAM families.
Figure 8
Figure 8
Distribution of functional distances (Y-axis in fraction) in bins of 20% sequence identity (X-axis). The graphs present the distribution of all functions (Similar + Dissimilar).
Figure 9
Figure 9
Distribution of fraction of Dissimilar function (Ordinate: fraction) versus sequence identity (X-axis in bins of 10%). The top of each box is the upper 75th percentile, the bottom is the lower 25th percentile. The median of each box is also shown but is superimposed on the 25th percentile. The circles are single extreme cases. The line joins the mean fraction of Dissimilar function at each level of sequence identity. The mean is well above the median due to the extreme skewness of the distribution towards mostly similar function.
Figure 10
Figure 10
The dependence of function divergence on sequence divergence for the EF-hand family in which the proteins with only the experimentally supported annotations were utilized. Abscissa: GO Distance; Ordinate: fraction of comparisons. Different colors show distributions of sets of pairs of proteins with different ranges of sequence similarity, divided into ranges of width 10% residue identity.
Figure 11
Figure 11
The dependence of function divergence on sequence divergence for the EF-hand family in which the proteins with only the non-experimentally supported annotations were utilized. Abscissa; GO Distance; Ordinate; fraction of comparisons. Different colors show distributions of sets of pairs of proteins with different ranges of sequence similarity, divided into bins of width 10% residue identity.
Figure 12
Figure 12
Possible relationships within the GO DAG between functions of two hypothetical homologous proteins, annotated by X and O respectively.

Similar articles

Cited by

References

    1. Laskowski RA, Watson JD, Thornton JM. From protein structure to biochemical function. J Struct Funct Genomics. 2003;4:167–77. doi: 10.1023/A:1026127927612. - DOI - PubMed
    1. Whisstock JC, Lesk AM. Prediction of protein function from protein sequence and structure. Quart Revs Biophys. 2003;36:307–40. doi: 10.1017/S0033583503003901. - DOI - PubMed
    1. Jones S, Thornton JM. Searching for functional sites in protein structures. Curr Opin Chem Biol. 2004;8:3–7. doi: 10.1016/j.cbpa.2003.11.001. - DOI - PubMed
    1. Andrade MA, Sander C. Bioinformatics: from genome data to biological knowledge. Curr Opin Biotechnol. 1997;8:675–683. doi: 10.1016/S0958-1669(97)80118-8. - DOI - PubMed
    1. Camon E, Magrane M, Barrell D, Binns D, Fleischmann W, Kersey P, Mulder N, Oinn T, Maslen J, Cox A, Apweiler R. The Gene Ontology Annotation (GOA) project: implementation of GO in SWISS-PROT, TrEMBL, and InterPro. Genome Res. 2003;13:662–672. doi: 10.1101/gr.461403. - DOI - PMC - PubMed

Publication types

LinkOut - more resources