Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Feb 16;293(7):2342-2357.
doi: 10.1074/jbc.M117.815340. Epub 2017 Nov 28.

A global view of structure-function relationships in the tautomerase superfamily

Affiliations

A global view of structure-function relationships in the tautomerase superfamily

Rebecca Davidson et al. J Biol Chem. .

Abstract

The tautomerase superfamily (TSF) consists of more than 11,000 nonredundant sequences present throughout the biosphere. Characterized members have attracted much attention because of the unusual and key catalytic role of an N-terminal proline. These few characterized members catalyze a diverse range of chemical reactions, but the full scale of their chemical capabilities and biological functions remains unknown. To gain new insight into TSF structure-function relationships, we performed a global analysis of similarities across the entire superfamily and computed a sequence similarity network to guide classification into distinct subgroups. Our results indicate that TSF members are found in all domains of life, with most being present in bacteria. The eukaryotic members of the cis-3-chloroacrylic acid dehalogenase subgroup are limited to fungal species, whereas the macrophage migration inhibitory factor subgroup has wide eukaryotic representation (including mammals). Unexpectedly, we found that 346 TSF sequences lack Pro-1, of which 85% are present in the malonate semialdehyde decarboxylase subgroup. The computed network also enabled the identification of similarity paths, namely sequences that link functionally diverse subgroups and exhibit transitional structural features that may help explain reaction divergence. A structure-guided comparison of these linker proteins identified conserved transitions between them, and kinetic analysis paralleled these observations. Phylogenetic reconstruction of the linker set was consistent with these findings. Our results also suggest that contemporary TSF members may have evolved from a short 4-oxalocrotonate tautomerase-like ancestor followed by gene duplication and fusion. Our new linker-guided strategy can be used to enrich the discovery of sequence/structure/function transitions in other enzyme superfamilies.

Keywords: enzyme structure; enzyme superfamily; evolution; protein evolution; protein sequence; protein structure; structure-function; structure–function relationships; tautomerase superfamily.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no conflicts of interest with the contents of this article

Figures

Figure 1.
Figure 1.
Major types of reactions characterized in the TSF. 4-OT (25), CHMI (8), PPT activity of MIF (11), cis-CaaD (27), and MSAD (28). The pyruvoyl moiety, which is the common functional group for the three tautomerase reactions of 4-OT, CHMI, and MIF, is boxed inside a broken red line for the 4-OT reaction. The proton that is transferred during each of these reactions is highlighted in red.
Figure 2.
Figure 2.
Representative sequence similarity network of the TSF superfamily summarizes putative sequence–function relationships. The 11,395 sequences of the TSF superfamily are represented by 1323 representative nodes, each binned into sets of TSF sequences at >50% pairwise identity. The threshold for drawing edges between representative nodes is 10−11 with geometric mean E-values used as scores (34). This threshold was chosen to optimize visualization of similarities within subgroups and the remote homologies between them (see “Experimental procedures”). The network was laid out using the Organic layout. For this layout, edge lengths, representing the degree of connectivity, qualitatively track with sequence dissimilarity. Diamond-shaped nodes have one or more experimentally characterized proteins with a SwissProt annotation (72); square-shaped nodes have one or more structurally characterized nodes; triangular nodes have one or more proteins that are functionally and structurally characterized. Nodes containing the sequence of a founder protein (described under “A large-scale comparison 11,395 sequences reveals new structural and functional features of the TSF”) are shown in bright yellow. (All of the founder enzymes within the large yellow triangular nodes have been biochemically and structurally characterized, although the experimental characterization of the cis-CaaD and MSAD founders are not designated as reviewed in SwissProt.) Although each subgroup is named for its founder reaction, the great majority of proteins in representative nodes in each subgroup have not been characterized; thus, an unknown proportion of each subgroup may not catalyze the founder reaction but instead may catalyze different reactions or have different or no physiological functions. The subgroups are named and colored as being most similar to their namesake founder proteins: 4-OT–like subgroup, brown (581 nodes representing 4472 nonredundant sequences); CHMI-like subgroup, dark blue (165 nodes representing 1767 nonredundant sequences); MIF-like subgroup, green (194 nodes representing 1655 nonredundant sequences); cis-CaaD–like subgroup, cyan (141 nodes representing 566 nonredundant sequences); MSAD-like subgroup, magenta (143 nodes representing 2050 nonredundant sequences). The labels next to each subgroup in the network denote the simple abbreviations for each founder reaction as given in Fig. 1. Gray nodes designate TSF sequences that have not been assigned to a named subgroup. Except for two larger clusters made up entirely of gray nodes, the majority of the gray nodes are in small clusters of ≤6 nodes or singleton nodes, indicating that they scored below the threshold required to connect them to the named subgroups. The two nodes marked with an asterisk, one in the 4-OT subgroup (brown) and one in an unassigned doubleton (gray), include chains (subunit α and subunit β, respectively) that comprise the heterohexamer CaaD (three α,β dimers form the active enzyme). However, as the α- and β-subunits share only 19% pairwise sequence identity, they are found in different nodes. We note that for many technical reasons (34) the visualized SSNs can only provide an estimate of sequence similarity and so should be considered as a starting point for developing hypotheses about sequence–function relationships rather than as definitive representations of such relationships. This issue is especially relevant in evaluating the significance of remote homologies among subgroups, discussed in detail under “Linkers between cis-CaaD and 4-OT subgroup identify a similarity path between them.”
Figure 3.
Figure 3.
Representative SSN showing sequences that lack an N-terminal proline. Top, SSN as shown in Fig. 2, except that nodes containing one or more sequences that lack an N-terminal proline are colored red. Bottom, observed positions of the N-terminal proline (or its absence) in sequences of short and fused TSF members. Gray arrows and maroon blocks designate β-strands and α-helices, respectively.
Figure 4.
Figure 4.
Phylogenetic representation in the TSF. A, sequence similarity network as shown in Fig. 2, except that the representative nodes are colored according to the dominant type of life: red, archaea; dark blue, bacteria; green, plants; cyan, fungi; pink, invertebrates; yellow, mammals; magenta, other vertebrates besides mammals. Also 26 gray nodes scattered among the dark blue nodes of the 4-OT subgroup come from environmental sequencing projects. The arrow marks an unnamed subgroup enriched in the archaeal sequences. B, a one-sequence-per-node similarity network of the MIF subgroup. Network nodes represent 1679 MIF sequences, 273 of which come from vertebrates. Coloring is according to type of life, as shown in the key except for the white nodes, which were not designated by type of life in the UniProt database. Triangles represent proteins with solved structures. The large triangles represent two characterized human proteins, MIF1 (UniProt P14174) and MIF2 (UniProt P30046), and the large circle represents a MIF from the human parasite W. bancrofti (UniProt O44786) as indicated by the arrows. The threshold for drawing edges between each node is 10e−18. The network shows two separate MIF groups in vertebrates; the sequences of the group containing the human MIF1 sequence are more similar to their invertebrate relatives than to the group containing the human MIF2 sequence.
Figure 5.
Figure 5.
Sequences in the 50% representative network that link the 4-OT and cis-CaaD subgroups. Left, a more detailed version of the region in the 50% representative network (see Fig. 2) that links the 4-OT and cis-CaaD subgroups. Representative nodes are colored as described in the legend for Fig. 2 except that those containing founder 4-OT (labeled 4-OT) and cis-CaaD (labeled CC) sequences are enlarged and colored dark blue and red, respectively. The nodes containing linker sequences Fused 4-OT (labeled f4-OT), Linker 2 (labeled 2), Linker 1 (labeled 1), and CgX are also enlarged. Right, structure-guided multiple sequence alignment of proteins labeled in the SSN enlargement on the left. Structures used and their PDB IDs: 4-OT, 1BJP; Fused 4-OT, 6BLM; Linker 2, 5UNQ; Linker 1, 5UIF; CgX, 3N4G; cis-CaaD, 2FLZ. The catalytic residues of the founder 4-OT, Pro-1, Arg-11, and Arg-39 (positions 69, 79, and 108, respectively, according to the numbering of the MSA) are boxed in blue, and catalytic residues of cis-CaaD, Pro-1, His-28, Arg-70, Arg-73, Tyr-103, and Glu-114 are boxed in red (positions 1, 28, 76, 79, 110, 121, respectively, according to the numbering of the MSA). Note that the short founder 4-OT aligns best with the second β-α-β domain of the five other proteins shown in this figure. The alignment shows that the active-site composition of the linker proteins becomes more cis-CaaD–like across the similarity path shown by the network. (Some catalytic residues in 4-OT and cis-CaaD come from different subunits, as described in the legend for Fig. 6.)
Figure 6.
Figure 6.
Structural comparison of conserved active-site residues in Linker 1 and Linker 2 with respect to the known catalytic residues of founder 4-OT and cis-CaaD. The structures for each of these proteins are the same as those included in the structure-guided MSA shown in Fig. 5. The unprimed, primed, and doubly primed residues indicate that they come from different subunits. A, founder 4-OT. Pro-1 is positioned between Arg-11′ and Arg-39". This arrangement allows binding of the dicarboxylate substrate 2-HM by both arginine residues and proton transfer by Pro-1 from the 2-hydroxyl group of 2-HM to C5. B, Linker 2. The active-site architecture of Linker 2 is much like that of founder 4-OT. Arg-71 and Arg-99′ (boxed in the MSA) are structurally equivalent to Arg-11′ and Arg-39" in founder 4-OT, respectively. Both arginine residues are present in the second β-α-β subdomain of the Linker 2 monomer, which explains their very different position in the protein sequence. Linker 2 also has an Arg-39 in its first β-α-β subdomain, which structurally forms part of the wall of the active site near Arg-99′. Its proximity to both Arg-99′ and Pro-1 could signify a potential role in catalysis. C, Linker 1. Linker 1 exhibits an active-site architecture very different from that of the founder 4-OT and Linker 2 and instead is more similar to the active site of cis-CaaD (in D). One important difference is the absence of an arginine residue that is structurally equivalent to Arg-39" in founder 4-OT and Linker 2. Instead, this arginine residue (Arg-68) appears to be repositioned much closer to Arg-71 (structurally equivalent to Arg-11′ and Arg-71 in founder 4-OT and Linker 2, respectively). The other residues that are highlighted, His-28, Ala-101′, and Glu-112, are structurally equivalent to His-28, Tyr-103′, and Glu-114 in founder cis-CaaD. Except for the missing Tyr-103′, which has Ala-101′ in that position, the catalytic machinery of founder cis-CaaD is complete in Linker 1. D, founder cis-CaaD. The active site of cis-CaaD, showing its known catalytic machinery, is composed of Pro-1, His-28, Arg-70, Arg-73, Tyr-103′, and Glu-114.
Figure 7.
Figure 7.
Phylogenetic tree of the 4-OT and cis-CaaD linkers from the 50% representative SSN. The tree was calculated by a Bayesian analysis of 63 sequences from an expanded linker set of proteins composed of two β-α-β domains. Posterior probabilities, designated according to the color key, are indicated for all interior nodes. The main branches associated with sequences from representative nodes of linkers are labeled. As the single-domain short 4-OTs contain only about half of the sequence information of the two subdomain fused proteins of the linker set, they were not included in the tree. Of the two additional linkers, linker N1 and linker N2 (described under “Identification of linkers” in the “Experimental procedures”), linker N1 is labeled in the tree. Linker N2 does not appear in the tree because it is a short 4-OT.
Figure 8.
Figure 8.
Structure similarity network of the TSF. Each node represents a single structure, colored by subgroup as described in the legend for Fig. 2. The threshold for drawing edges between structures is a TM-align score of 0.8. (A TM-align score of 0.5 is suggested to be statistically significant (66).) This network contains 48 structures for which the PDB accession codes are listed in File S1. The labeled nodes refer to proteins identified in the 4-OT/cis-CaaD linker set shown in Fig. 5.

References

    1. Gerlt J. A., and Babbitt P. C. (2001) Divergent evolution of enzymatic function: Mechanistically diverse superfamilies and functionally distinct suprafamilies. Annu. Rev. Biochem. 70, 209–246 10.1146/annurev.biochem.70.1.209 - DOI - PubMed
    1. Almonacid D. E., and Babbitt P. C. (2011) Toward mechanistic classification of enzyme functions. Curr. Opin. Chem. Biol. 15, 435–442 10.1016/j.cbpa.2011.03.008 - DOI - PMC - PubMed
    1. Murzin A. G. (1996) Structural classification of proteins: New superfamilies. Curr. Opin. Struct. Biol. 6, 386–394 10.1016/S0959-440X(96)80059-5 - DOI - PubMed
    1. Poelarends G. J., Veetil V. P., and Whitman C. P. (2008) The chemical versatility of the β-α-β fold: Catalytic promiscuity and divergent evolution in the tautomerase superfamily. Cell. Mol. Life Sci. 65, 3606–3618 10.1007/s00018-008-8285-x - DOI - PMC - PubMed
    1. Stivers J. T., Abeygunawardana C., Mildvan A. S., Hajipour G., Whitman C. P., and Chen L. H. (1996) Catalytic role of the amino-terminal proline in 4-oxalocrotonate tautomerase: Affinity labeling and heteronuclear NMR studies. Biochemistry 35, 803–813 10.1021/bi951077g - DOI - PubMed

Publication types

MeSH terms