Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2009 Oct;5(10):e1000541.
doi: 10.1371/journal.pcbi.1000541. Epub 2009 Oct 23.

An atlas of the thioredoxin fold class reveals the complexity of function-enabling adaptations

Affiliations

An atlas of the thioredoxin fold class reveals the complexity of function-enabling adaptations

Holly J Atkinson et al. PLoS Comput Biol. 2009 Oct.

Abstract

The group of proteins that contain a thioredoxin (Trx) fold is huge and diverse. Assessment of the variation in catalytic machinery of Trx fold proteins is essential in providing a foundation for understanding their functional diversity and predicting the function of the many uncharacterized members of the class. The proteins of the Trx fold class retain common features-including variations on a dithiol CxxC active site motif-that lead to delivery of function. We use protein similarity networks to guide an analysis of how structural and sequence motifs track with catalytic function and taxonomic categories for 4,082 representative sequences spanning the known superfamilies of the Trx fold. Domain structure in the fold class is varied and modular, with 2.8% of sequences containing more than one Trx fold domain. Most member proteins are bacterial. The fold class exhibits many modifications to the CxxC active site motif-only 56.8% of proteins have both cysteines, and no functional groupings have absolute conservation of the expected catalytic motif. Only a small fraction of Trx fold sequences have been functionally characterized. This work provides a global view of the complex distribution of domains and catalytic machinery throughout the fold class, showing that each superfamily contains remnants of the CxxC active site. The unifying context provided by this work can guide the comparison of members of different Trx fold superfamilies to gain insight about their structure-function relationships, illustrated here with the thioredoxins and peroxiredoxins.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Figure 1
Figure 1. Dithiol and monothiol Trx fold reactions.
A The archetypal thioredoxin reaction, entailing the reduction of a disulfide bond by a thioredoxin-like protein equipped with a dithiol CxxC active site. B The reduction of a mixed disulfide bond between glutathione and a protein by a monothiol glutaredoxin (Grx). In step I, the interaction between the hydroxyl hydrogen of a serine or threonine (green *) is suggested by conserved sequence motifs. Key: B denotes a general base. (Adapted from .).
Figure 2
Figure 2. Most Trx fold active sites involve catalytic cysteines.
A A topological diagram of the Trx fold, showing the four-stranded mixed beta sheet sandwiched by three alpha helices. The archetypal CxxC active site cysteines from thioredoxin are represented by yellow bars near the N-terminus of the first alpha helix. Also shown are common locations for insertions and extensions relative to the Trx fold (dashes), and the position of a cis-proline that is frequently found at the N-terminus of the third beta strand. A grey box denotes the region of the fold shown in C–E. Active site types are abbreviated using a motif like “CxxC”, where a ‘C’ indicates presence of a cysteine, and ‘c’ indicates the presence of some residue other than cysteine. “CxxxC” means the active site cysteines are separated by three amino acids. B The classic CxxC active site, illustrated by human Trx 2 (PDB:1UVZ); Cys 31 and Cys 34 are shown. A grey box denotes the corresponding region of the fold shown in C–E. C The Cxxc active site, where the second cysteine has been mutated to another residue, illustrated by E. coli ArsC (PDB:1I9D); Cys 12 is shown (active site: CxxS). D The cxxC active site, in which the N-terminal Trx Cys has been lost, illustrated by human peroxiredoxin 5 (PDB:1OC3); Cys 47 is shown (active site: TxxC). E The CxxxC active site, in which the N-terminal Cys has been shifted further into the loop between the first beta strand and alpha helix, illustrated by S. cerevisiae SCO1 (PDB:2B7J); a disulfide bond between Cys 148 and Cys 152 is shown.
Figure 3
Figure 3. A structure-based similarity network describes a map of the Trx fold class.
A Structure similarity network, containing 159 structures that are a maximum of 60% identical (by sequence) that span the Trx fold class. Similarity is defined by FAST scores better than a score of 4.5; edges at this threshold represent alignments with a median of 2.75 Å RMSD across 72 aligned positions, while the rest of the edges represent better alignments. As given in the key, each node is colored by a PFAM Thioredoxin-like Clan family if the chain sequence is a member. (Non-members are colored grey and labeled “No hit to Trx Clan.) These classes are discussed briefly in Table 1. Nodes with thick white borders and bold labels denote chains present in the hierarchical clustering tree in D. Labels like “1ON4_A” denote PDB ID 1ON4, chain A. Some additional proteins that may be of interest are labeled with plain face text and labels. B Structure similarity network containing the same structures as in A, shown at the more stringent threshold of 7.5. Edges at this threshold correspond to alignments with a median of 2.45 Å RMSD across 89 aligned positions. Nodes are colored as in A. C Structure similarity network containing the 105 structures from the large connected cluster in B, displayed at a FAST score cutoff of 12.0; edges at this threshold represent alignments with a median of 2.21 Å RMSD across 102 aligned positions. Nodes are colored as in A. D Complete linkage hierarchical clustering tree based on pairwise FAST scores for 15 representative structures singled out in the networks in A–C, with PDB IDs in bold, and associated SwissProt sequence IDs in plain text. Note: this is a static figure generated from interactive protein similarity networks that can be downloaded and viewed from http://babbittlab.compbio.ucsf.edu/resources/TrxFold/.
Figure 4
Figure 4. A sequence similarity network shows how each Trx fold superfamily is distributed.
Sequence similarity network, containing 4,082 representative sequences that are a maximum of 40% identical and span the Trx fold class. Similarity is defined by pairwise BLAST alignments better than an E-value of 1×10−12; edges at this threshold represent alignments with a median 30% identity over 120 residues, while the rest of the edges represent better alignments. Each node is colored by a PFAM Thioredoxin-like Clan family if the sequence is a member. (Non-members are colored grey and labeled “No hit to Trx Clan.) These classes are discussed briefly in Table 1. Large nodes represent sequences that are associated with the 159 structures in Fig. 3. The sequences associated with the 15 representative structures in Fig. 3C are labeled using bold text and white arrows. The general locations of other sequences representing different superfamilies are noted using italicized text. Some edges representing similarity relationships from outside of the domain of interest are colored red, and are discussed in the text. Blue letters in parentheses correspond to the labels defining each group in Figures 5– 7.
Figure 5
Figure 5. Summary of taxonomic and active site motif properties for Trx fold sequence groups (A–F).
Selected sequence classes marked with blue letters in Fig. 4 are summarized here. Coloring varies in the four columns of networks and bar charts—each is colored differently according to the legend at the bottom of each figure. Listed are: Group: the most prevalent PFAM family classification[s], the population without sequence filtering (“Population”) and the population after filtering to a maximum of 40% identity as shown in the adjacent network excerpt (“<40% ID”). See Table S4 for the mapping between these groups and the databases PFAM , SCOP , and CATH . PFAM Family: the network cluster excerpted from Fig. 4. Species: a bar chart showing the distribution of species categories among sequences from the network; note that “Eukaryota” includes all eukyaryotic species without a more specific kingdom, and is primarily associated with protozoan parasites. Active Site: the network cluster colored by predicted active site architecture; these clusters are excerpted from Fig. 8. CxxC means both active site cysteines are present, Cxxc means only the N-terminal cysteine is present, cxxC implies the presence of the C-terminal cysteine, CxxxC indicates that there are three positions between the two cysteines, and “Other” means that neither cysteine is present in the expected position. CxxC Motif: a bar chart indicating the type of residue substitutions at the two key positions of the CxxC motif for that group. The stacked bars include the fraction of active sites incorporating a Cys, Thr, or Ser, as well as any other amino acid occurring more than 10% of the time (orange and light blue in key). Otherwise, residues other than cysteine, threonine, or serine are included in the grey “Other” category. Notes: column lists an example high-frequency CxxC motif and example UniProt IDs for sequences in the group.
Figure 6
Figure 6. Summary of taxonomic and active site motif properties for Trx fold sequence groups (G–L).
See Figure 5 legend.
Figure 7
Figure 7. Summary of taxonomic and active site motif properties for Trx fold sequence groups (M–R).
See Figure 5 legend.
Figure 8
Figure 8. Variations of the CxxC active site are associated with Trx superfamilies.
The same sequence similarity network from Fig. 4, containing 4,082 sequences, is colored according to predicted active site architecture. Active site types are abbreviated using a motif like “CxxC”, where a ‘C’ indicates presence of a cysteine, and ‘c’ indicates the presence of some residue other than cysteine. CxxxC means that the two cysteines are present and separated by three amino acids. Examples of each type are shown in Fig. 2. Large nodes represent sequences that are associated with the structures from Fig. 3. Predictions are based on sequence alignments to PFAM Thioredoxin-like Clan HMMs. Cysteines and selenocysteines are treated as equivalent in this figure. Letter labels in blue correspond to sequence groups in Figures 5– 7.
Figure 9
Figure 9. Transitive similarity relationships link the thioredoxins and the peroxiredoxins.
A Subset of the sequence similarity network from Fig. 4, with nodes colored according to the identity of the amino acid predicted to occcupy the position of the cis-proline at the N-terminus of beta strand 3 in the Trx fold (Pro 75 in human Trx 1). The orange path traces transitive sequence similarity relationships between human Trx 2, passing through B. japonicum CMP (CYCY_BRAJA), and ending at bovine Prx 3 (PRDX3_BOVIN). Large nodes represent sequences that are associated with the structures from Fig. 3. Predictions are based on sequence alignments to PFAM Thioredoxin-like Clan HMMs. B The same path—connecting the structures associated with the sequences in A—traced through a subset of the structure-based network from Fig. 3B. C The same path traced through a subset of the structure-based hierarchical clustering of representative structures from Fig. 3D.

References

    1. Chothia C. Proteins. One thousand families for the molecular biologist. Nature. 1992;357:543–544. - PubMed
    1. Bashton M, Chothia C. The generation of new protein functions by the combination of domains. Structure. 2007;15:85–99. - PubMed
    1. Krishna SS, Grishin NV. Structural drift: a possible path to protein fold change. Bioinformatics. 2005;21:1308–1310. - PubMed
    1. Qi Y, Grishin NV. Structural classification of thioredoxin-like fold proteins. Proteins. 2005;58:376–388. - PubMed
    1. Martin JL. Thioredoxin–a fold for all reasons. Structure. 1995;3:245–250. - PubMed

Publication types