Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018:606:1-71.
doi: 10.1016/bs.mie.2018.06.004. Epub 2018 Jul 24.

Atlas of the Radical SAM Superfamily: Divergent Evolution of Function Using a "Plug and Play" Domain

Affiliations

Atlas of the Radical SAM Superfamily: Divergent Evolution of Function Using a "Plug and Play" Domain

Gemma L Holliday et al. Methods Enzymol. 2018.

Abstract

The radical SAM superfamily contains over 100,000 homologous enzymes that catalyze a remarkably broad range of reactions required for life, including metabolism, nucleic acid modification, and biogenesis of cofactors. While the highly conserved SAM-binding motif responsible for formation of the key 5'-deoxyadenosyl radical intermediate is a key structural feature that simplifies identification of superfamily members, our understanding of their structure-function relationships is complicated by the modular nature of their structures, which exhibit varied and complex domain architectures. To gain new insight about these relationships, we classified the entire set of sequences into similarity-based subgroups that could be visualized using sequence similarity networks. This superfamily-wide analysis reveals important features that had not previously been appreciated from studies focused on one or a few members. Functional information mapped to the networks indicates which members have been experimentally or structurally characterized, their known reaction types, and their phylogenetic distribution. Despite the biological importance of radical SAM chemistry, the vast majority of superfamily members have never been experimentally characterized in any way, suggesting that many new reactions remain to be discovered. In addition to 20 subgroups with at least one known function, we identified additional subgroups made up entirely of sequences of unknown function. Importantly, our results indicate that even general reaction types fail to track well with our sequence similarity-based subgroupings, raising major challenges for function prediction for currently identified and new members that continue to be discovered. Interactive similarity networks and other data from this analysis are available from the Structure-Function Linkage Database.

Keywords: Classification of Radical SAM enzymes by sequence similarity; Multiple domain structures of radical SAM superfamily enzymes; Phylogenetic representation; Radical SAM superfamily census; Sequence similarity networks; Structure–function mapping; Subgroups and families in the radical SAM superfamily.

PubMed Disclaimer

Conflict of interest statement

COMPETING FINANCIAL INTERESTS STATEMENT

None

Figures

Fig. 1.
Fig. 1.
(A) [Fe4-S4] binding motif from biotin synthase (PDB: 1R30). Image created using LigPlot+ (Laskowski & Swindells, 2011). (B) The common activation step associated with the canonical RSS.
Fig. 2.
Fig. 2.
Comparison of a canonical RSS structure with structures from unrelated superfamilies of other fold types whose members catalyze RSS-like chemistry. Chains containing the [Fe4S4] cluster are colored by secondary structure, with helices in blue and strands in orange; one copy of the chain per structure is highlighted. The sulfur atoms from the [Fe4S4] clusters and from their adjacent cysteine residues are shown as spheres. Left to right top row: canonical RSS, biotin synthase, PDB: 1R30; Radical SAM 3-amino-3-carboxypropyl Radical Forming Superfamily, diphthamide synthetase, PDB: 3LZD; bottom row: Radical SAM Phosphomethylpyrimidine Synthase Superfamily, phosphomethylpyrimidine synthase, PDB: 4S28; and Radical SAM Phosphonate Metabolism Superfamily, PDB: 4XB6. The physiological unit of all of these structures is a homo-2-mer except for 4XB6, which is a hetero-8-mer.
Fig. 3.
Fig. 3.
Structural examples of some full-length RSS members of varied architectures. Structures are aligned to show the [Fe4S4] clusters in a similar orientation. For structures with multiple chains, only chain A is shown. Secondary structure coloring is the same as in Fig 2. Top row: 1OLT, coproporphyringen III oxidase, (β/α)6; 1R30, biotin synthase, (β/α)8; 4FHD, spore product lyase, (β/α)6; Bottom row: 4NJK, 7-carboxy-7deazaguanine synthase, (β63); 4M7T, 2-deoxy-scillo-inosamine dehydrogenase (β5/α4).
Fig. 4:
Fig. 4:
Predicted domain architectures created using ArchSchema (Tamuri & Laskowski, 2010). Shown are 435 major architecture types of the more than 1,500 representative domain architectures predicted for the 63,785 representative RSS protein sequences in Pfam version 27. These architectures represent 171 distinct domains. The central green rectangle underlined in red in the figure represents the core superfamily domain shared by all members of the canonical RSS, which is repeated in each MDA image shown. Edges connecting individual domains distinguish each complete MDA. The domain (rectangle) connecting each cluster to the larger MDA network is also underlined in red. The circled clusters represent the SPASM/Twitch-like domain (magenta), BATS-like domain (yellow) and B12-binding-like domain (blue) clusters.
Fig. 5.
Fig. 5.
Representative SSN for the RSS showing major Level 1 subgroups. The SSN was generated from the 113,776 full length RSS sequences in the SFLD. Sequences that share >50% pairwise identical were binned into 10,741 representative nodes (circles). Edges (lines between representative nodes) were drawn between representative nodes if the mean of the BLAST (Altschul et al., 1997) E-values (used as scores) between any pair of sequences in that node was at least 1 × 10−20. At this E-value, the network has 13,591,858 representative edges with a mean sequence identity of 26 % across a mean alignment length of 300 residues The networks are laid out using the prefuse force directed layout in Cytoscape. Twenty subgroups are denoted by distinct colors and numbered according to Supplemental Table 1. Colored nodes are further specified by size and shape: Large nodes are colored if they include at least one sequence assigned to a numbered subgroup. Diamond-shaped nodes specify that at least one of the sequences in that node has been experimentally characterized (but not structurally characterized). Nodes shaped like a downward arrow have at least one protein that has been structurally characterized. Small circular colored nodes have been assigned to a subgroup but are comprised entirely of sequences of unknown function. Small gray circular nodes have not been assigned to a subgroup and are comprised entirely of sequences of unknown function. The 22 largest of these entirely gray clusters are curated in the SFLD as “Uncharacterized Radical SAM Subgroups.” Some small colored clusters and singletons randomly laid out at the bottom of the image are not labeled with a number because they belong to a larger numbered cluster of the same color but fail to meet the E-value cutoff for drawing edges connecting them to that cluster. (The largest subgroup, subgroup 17, provides an example. In this visualization, both the large cluster at the top left and the smaller clusters and singletons that are colored magenta near the bottom of the image belong to subgroup 17, but these nodes are too diverse to be connected to the main subgroup because their similarities fall below the E-value threshold (1 × 10−20) used for drawing edges to the main subgroup. Note that each representative node may contain many individual sequences (see section 2.3).
Fig. 6.
Fig. 6.
Secondary structure topologies of representative radical SAM superfamily domains. Images for several RSS subgroups created using the PDBSum website (de Beer, Berka, Thornton, & Laskowski, 2014). Red: helices, pink: beta strands, green: [Fe4-S4]-AdoMet binding motif. The common abbreviations of the enzyme names and their PDB identifiers are shown on the figure.
Fig. 7.
Fig. 7.
RSS SSNs mapped with type of life. The same representative network shown in Fig. 5 except that node coloring is by type of life as defined in the SFLD. Representative nodes are colored by the dominant type of life in each. Bacteria: gray, Archaea: red, Invertebrates: yellow, Vertebrates: blue, Plants, green.
Fig. 8.
Fig. 8.
SSN of RSS mapped with general types of RSS chemistry. The same representative network shown in Fig. 5 except that the highlighted nodes and coloring are by general reaction type as indicated in the key. Diamonds: large representative nodes include at least one functionally characterized member colored by dominant reaction type as shown in the accompanying key. Downward arrows: representative nodes include at least one structurally characterized member.

References

    1. Akiva E, Brown S, Almonacid DE, Barber AE 2nd, Custer AF, Hicks MA, … Babbitt PC (2014). The Structure-Function Linkage Database. Nucleic Acids Res, 42(Database issue), D521–530. doi: 10.1093/nar/gkt1130 - DOI - PMC - PubMed
    1. Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, & Lipman DJ (1997). Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res, 25(17), 3389–3402. - PMC - PubMed
    1. Anantharaman V, Koonin EV, & Aravind L (2001). TRAM, a predicted RNA-binding domain, common to tRNA uracil methylation and adenine thiolation enzymes. FEMS Microbiol Lett, 197(2), 215–221. - PubMed
    1. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, … Sherlock G (2000). Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet, 25(1), 25–29. doi: 10.1038/75556 - DOI - PMC - PubMed
    1. Atkinson HJ, Morris JH, Ferrin TE, & Babbitt PC (2009). Using sequence similarity networks for visualization of relationships across diverse protein superfamilies. PLoS One, 4(2), e4345. doi: 10.1371/journal.pone.0004345 - DOI - PMC - PubMed

Publication types