Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2009 May 29;4(5):e5736.
doi: 10.1371/journal.pone.0005736.

Structure-based phylogeny as a diagnostic for functional characterization of proteins with a cupin fold

Affiliations

Structure-based phylogeny as a diagnostic for functional characterization of proteins with a cupin fold

Garima Agarwal et al. PLoS One. .

Abstract

Background: The members of cupin superfamily exhibit large variations in their sequences, functions, organization of domains, quaternary associations and the nature of bound metal ion, despite having a conserved beta-barrel structural scaffold. Here, an attempt has been made to understand structure-function relationships among the members of this diverse superfamily and identify the principles governing functional diversity. The cupin superfamily also contains proteins for which the structures are available through world-wide structural genomics initiatives but characterized as "hypothetical". We have explored the feasibility of obtaining clues to functions of such proteins by means of comparative analysis with cupins of known structure and function.

Methodology/principal findings: A 3-D structure-based phylogenetic approach was undertaken. Interestingly, a dendrogram generated solely on the basis of structural dissimilarity measure at the level of domain folds was found to cluster functionally similar members. This clustering also reflects an independent evolution of the two domains in bicupins. Close examination of structural superposition of members across various functional clusters reveals structural variations in regions that not only form the active site pocket but are also involved in interaction with another domain in the same polypeptide or in the oligomer.

Conclusions/significance: Structure-based phylogeny of cupins can influence identification of functions of proteins of yet unknown function with cupin fold. This approach can be extended to other proteins with a common fold that show high evolutionary divergence. This approach is expected to have an influence on the function annotation in structural genomics initiatives.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Active site region of various members of cupin superfamily.
A. Quercetin dioxygenase (1H1Ia1), B. Auxin binding protein (1LRHa_), C. Glucose-6-phosphate isomerase (1QXRa_), D. 3-hydroxyanthranilate-3,4-dioxygenase (1YFYa_), E. Oxalate Oxidase (2ETEa_) F. RmlC epimerase (1EPZa_). The metals have been shown as spheres, substrates as sticks and metal binding residues as lines. The protein RmlC epimerase do not require metal for its function. All the structural superposition figures have been generated using pymol .
Figure 2
Figure 2. Dendrogram generated on the basis of structural comparisons of cupins at the level of domain folds.
The protein domains carrying out similar function are marked in the same colour. The monocupins are shown in bold. The proteins with unknown function are indicated by black lines. The first four letters in the taxon names refer to protein codes, fifth is the chain identifier while the numbers 1 and 2 for bicupins indicate the N and C-terminal domains respectively. The identity of the metal bound at the active site is indicated in brackets. A detailed investigation was performed on the functional clusters indicated in the dendrogram.
Figure 3
Figure 3. Structure-based sequence alignment of various functional clusters.
A. RmlC epimerase (cyan in dendrogram), B. Ureidoglycolate hydrolase (teal in dendrogram), C. Glucose-6-phosphate isomerase (dark blue in dendrogram), D. 4-Keto-5-deoxyuronate isomerase (light blue in dendrogram. The secondary structural elements of the members are highly conserved.
Figure 4
Figure 4. A schematic of the multiple structural alignments of the representative members of each cluster.
Helices are shown as rectangles in different colors, strands as arrows and black rectangles are loops. The gaps in the alignment are indicated as white space. The region corresponding to motifs containing metal binding residues has been boxed.
Figure 5
Figure 5. The structural superposition of the core of the proteins.
The structure of Ureidoglycolate hydrolase (1YQCa_) is in light blue (A), RmlC epimerase (1EPZa_) in magenta (B), Glucose-6-phosphate isomerase (1QXRa_) in blue (C) and 5-keto-4-deoxyuronate isomerase (1XRUa2) in green (D). The substrates are shown as sticks, metals as spheres and the metal binding residues at the active site as lines. Marked structural variations in the region accommodating metal binding residues can be seen.
Figure 6
Figure 6. Plots of Cα-Cα deviation versus the residue number of 1YQCa_.
A. The Y axis corresponds to the deviation in Cα positions of protein structures when aligned to reference structure, 1YQCa_. The X axis indicates the residue number of 1YQCa_. The proteins performing same function are marked in same color. B. Shows the same plot with deletions and insertions in the proteins with respect to the reference, marked as circles and squares, respectively. The region with very high deviation values has been encircled.
Figure 7
Figure 7. The figures (A–D) show the quaternary structures of the proteins. N to C terminus of one chain of the dimeric proteins has been coloured as spectrum while the other as pink.
The ligands at the active site are shown as spheres. A. Ureidoglycolate hydrolase (1YQCa_), the reference protein, bound to the product. The fifty residues in the N terminus are involved in interaction with another subunit in the dimer through domain swapping. The region interacts with substrate in the other subunit. B. RmlC epimerase (1EPZa_) bound to the substrate. The region of large deviation with respect to 1YQCa_ corresponds to the strands forming a beta swapped dimer and covers the active site in the other subunit. C. Glucose-6-phosphate isomerase (1QXRa_) dimer bound to a substrate analog. The structurally variable region lies on the opposite face of the domain in contrast to the reference 1YQCa_ and forms the active site in the same domain. D. 5-keto-4-deoxyuronate isomerase (1XRU) is a dimeric bicupin with Zn bound at the active site. The functional domain considered for comparison lies at the C terminal end. The structurally different region lies at the active site but is not involved in domain swapping with the N terminal domain in the polypeptide.
Figure 8
Figure 8. Superposition of the three-dimensional structures of hypothetical protein (BacBa2, magenta) and Quercetinase (1H1Ia1, green).
The substitutions at the active site have been indicated. The metal (sphere), metal binding residues (sticks), known substrate (sticks) and the unknown ligand (dots) are also shown.
Figure 9
Figure 9. The equivalent residues at the active site region.
The superposed structures of hypothetical protein (1VJ2a_, magenta) on Quercetinase (1H1Ia1, green) have been shown. The representation of metal, residues and substrates are similar to that of the previous figure.

Similar articles

Cited by

References

    1. Dunwell JM, Culham A, Carter CE, Sosa-Aguirre CR, Goodenough PW. Evolution of functional diversity in the cupin superfamily. Trends Biochem Sci. 2001;26:740–746. - PubMed
    1. Murzin AG, Brenner SE, Hubbard T, Chothia C. SCOP: a structural classification of proteins database for the investigation of sequences and structures. J Mol Biol. 1995;247:536–540. - PubMed
    1. Dunwell JM, Purvis A, Khuri S. Cupins: the most functionally diverse protein superfamily? Phytochemistry. 2004;65:7–17. - PubMed
    1. Finn RD, Tate J, Mistry J, Coggill PC, Sammut SJ, et al. The Pfam protein families database. Nucleic Acids Res. 2008;36:D281–288. - PMC - PubMed
    1. Balaji S, Srinivasan N. Use of a database of structural alignments and phylogenetic trees in investigating the relationship between sequence and structural variability among homologous proteins. Protein Eng. 2001;14:219–226. - PubMed