Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2021 Aug 27;11(9):1282.
doi: 10.3390/biom11091282.

Comparative Genomic Analysis of the DUF34 Protein Family Suggests Role as a Metal Ion Chaperone or Insertase

Affiliations
Comparative Study

Comparative Genomic Analysis of the DUF34 Protein Family Suggests Role as a Metal Ion Chaperone or Insertase

Colbie J Reed et al. Biomolecules. .

Abstract

Members of the DUF34 (domain of unknown function 34) family, also known as the NIF3 protein superfamily, are ubiquitous across superkingdoms. Proteins of this family have been widely annotated as "GTP cyclohydrolase I type 2" through electronic propagation based on one study. Here, the annotation status of this protein family was examined through a comprehensive literature review and integrative bioinformatic analyses that revealed varied pleiotropic associations and phenotypes. This analysis combined with functional complementation studies strongly challenges the current annotation and suggests that DUF34 family members may serve as metal ion insertases, chaperones, or metallocofactor maturases. This general molecular function could explain how DUF34 subgroups participate in highly diversified pathways such as cell differentiation, metal ion homeostasis, pathogen virulence, redox, and universal stress responses.

Keywords: bioinformatics; comparative genomics; conserved unknowns; function prediction; functional annotation; metabolic reconstruction; orthology.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
Dinuclear metal-binding site of the E. coli DUF34 homolog, YbgI. The crystal structure of YbgI (DUF34 homolog, E. coli) illustrates conserved residues of the protein family specific to the monomeric cleft of the active site and its dinuclear metal center. There are highly conserved residues noted by Ladner et al. [26] to demonstrate involvement in the structure of the binding pocket that are distinctively colorized, annotated (orange; residue identity and location labeled accordingly).
Figure 2
Figure 2
Key motifs of Bacteria and Archaea compared to those of Eukaryota. Sequences were aligned for eukaryotic sequences, separately, and, for bacterial and archaeal sequences, combined. A multiple motif method was used to determine and compare family signatures. A full figure illustrating the distinct levels of conservation per superkingdom can be examined in Figure S3.
Figure 3
Figure 3
Inserted domain lengths across model taxa. The lengths of inserted domains were measured for each homolog. The sequences (organisms listed in Data Table S4) were aligned per superkingdom for delimiting domains, which then allowed for the measurement of each inserted region (if present). An evolutionary tree was generated using PhyloT and iToL, and was mapped with the lengths of inserted domains within each respective homolog. For all inserted domain lengths measured, these data were used to generate Figure S5, a histogram illustrating counts by ranges of domain lengths per superkingdom.
Figure 4
Figure 4
COG-InterPro HMM signature profile relationships and defined subgroups across DUF34 family members. The sequences of organisms across the DUF34 protein family, including all fusions and paralogs, were analyzed for co-occurrence relationships of COGs and HMM-determined InterPro family/superfamily/domain annotations. All organism homologs, paralogs & fusions were validated using eggNOG and KEGG Paralog Search. Sequences missing InterPro annotation were analyzed by NCBI CDD Search and InterProScan Search. See Data Table S5 for categories and respective COG designations/InterPro signature profiles in tabular format. The sequence source organisms considered were those also observed in Data Table S4. Groups were designated by differential keystone signatures shown in (a) and select representative sequences of subgroups (A–G) are shown (b).
Figure 5
Figure 5
Absence–presence of DUF34 architectural domain subgroups. Absence–presence data of COGs and HMM-determined InterPro family/superfamily/domain signature profiles added to a species tree, generated using organisms harboring published homologs and those used in alignments acquired via OrthoInspector (Data Table S4). Proteins are designated as categories A–G, as detailed in Figure 4 and Data Table S5. These homologous domains are classified in the map according to their HMM-defined DUF34 domain identities (see Figure 4a).
Figure 6
Figure 6
Metal ion-binding of proteins encoded in representative Bacterial and Archaeal operons. (a) A radar chart illustrating the proportions of DUF34-operon encoded proteins documented to interact with certain metals or metal-containing moieties. Accounting for the over-representation of magnesium and zinc among available protein structures, a second radar chart (b) was generated to show the same data without proteins found to exclusively bind either or both ions. Bacterial data are shown in blue while Archaeal data are shown in red. Data used to generate these figures can be found in Table S4.
Figure 7
Figure 7
DUF34 fusions and select gene neighborhoods. (a) Domain architectures of DUF34 fusions. The domain rendering dimensions and positions are approximate. DUF34 domains are rendered in white with black outlines. Domain colors correspond to the key shown in panel b. COGs of fusion domains are listed below each. Fusions deemed “invalid” or “inconclusive” were excluded for panels a and b. (b) Pie chart of DUF34 fusions (126 sequences, total). The outer halo surrounding chart indicates the superkingdoms in which respective fusions were observed (Eukaryota: black; Archaea: dark gray; Bacteria: light gray). (c) Neighborhoods of select bacterial and archaeal fusions are shown (12 kb, each), all of at least “conditional” validation confidence (Data Table S11). DUF34 is depicted in bright yellow and fusion domains are indicated by hashing or alternative coloring. For DUF34 sequence labels, “YqfO” denotes a sequence also containing inserted domain, COG3323, while “YbgI” denotes a sequence without the inserted COG3323 domain. Rendered fusion domains do not reflect exact sizes or locations. The color key is divided into two sets of identities (gray boxes): (top) general metabolic theme or specific annotation with bioinformatic precedent; and (bottom) COGs observed in physical clustering analysis (PCA). COGs also observed in PCA (Table 3) are shown in bold. Six minor exceptions to the top-20 rank cut-off are shown in bold with an asterisk (*): COG1196 (top 31st); COG0564 (top 23rd); COG0648 (top 25th); COG0406 (top 48th) in a fusion with COG0328; and COG0041 (top 36th). Others observed in rep. operons but were ranked beyond the “minor exception” threshold (exceeded top-50) in PCA are shown without additional symbols, not bolded: COG0245 (116th) and COG0761 (61st). Finally, one was not observed in PCA (not bolded) but was in at least one rep. operon (double asterisk, **): COG0642 (SAMN05192534_10671 of A. persepolensis; rep. operon, Desulfurispirillum indicum S5) (Data Table S7). Note: COG4111 (NUDIX hydrolase), present in panel c (neighborhood of M. rubeus), was absent from PCA (any rank) and rep. operons, despite the fusion with COG3323 in F. nucleatum having been resolved in preceding homolog capture and literature review.
Figure 7
Figure 7
DUF34 fusions and select gene neighborhoods. (a) Domain architectures of DUF34 fusions. The domain rendering dimensions and positions are approximate. DUF34 domains are rendered in white with black outlines. Domain colors correspond to the key shown in panel b. COGs of fusion domains are listed below each. Fusions deemed “invalid” or “inconclusive” were excluded for panels a and b. (b) Pie chart of DUF34 fusions (126 sequences, total). The outer halo surrounding chart indicates the superkingdoms in which respective fusions were observed (Eukaryota: black; Archaea: dark gray; Bacteria: light gray). (c) Neighborhoods of select bacterial and archaeal fusions are shown (12 kb, each), all of at least “conditional” validation confidence (Data Table S11). DUF34 is depicted in bright yellow and fusion domains are indicated by hashing or alternative coloring. For DUF34 sequence labels, “YqfO” denotes a sequence also containing inserted domain, COG3323, while “YbgI” denotes a sequence without the inserted COG3323 domain. Rendered fusion domains do not reflect exact sizes or locations. The color key is divided into two sets of identities (gray boxes): (top) general metabolic theme or specific annotation with bioinformatic precedent; and (bottom) COGs observed in physical clustering analysis (PCA). COGs also observed in PCA (Table 3) are shown in bold. Six minor exceptions to the top-20 rank cut-off are shown in bold with an asterisk (*): COG1196 (top 31st); COG0564 (top 23rd); COG0648 (top 25th); COG0406 (top 48th) in a fusion with COG0328; and COG0041 (top 36th). Others observed in rep. operons but were ranked beyond the “minor exception” threshold (exceeded top-50) in PCA are shown without additional symbols, not bolded: COG0245 (116th) and COG0761 (61st). Finally, one was not observed in PCA (not bolded) but was in at least one rep. operon (double asterisk, **): COG0642 (SAMN05192534_10671 of A. persepolensis; rep. operon, Desulfurispirillum indicum S5) (Data Table S7). Note: COG4111 (NUDIX hydrolase), present in panel c (neighborhood of M. rubeus), was absent from PCA (any rank) and rep. operons, despite the fusion with COG3323 in F. nucleatum having been resolved in preceding homolog capture and literature review.
Figure 8
Figure 8
DUF34 of E. coli, ybgI, fails complementation in the absence of folE. Plates were imaged after 20 h of growth at 37 °C. (a,b) dT essentiality assay. WT, single mutants, and double mutant (folE, ybgI) strains have been grown at 37 °C in LB supplemented in the absence (a) or presence (b) or dT 0.3 mM. Each curve shown is averaged across 5 replicates. (c) dT essentiality complementation assay. WT, single mutants, and double mutant (folE, ybgI) strains, containing various derivatives of pBAD24 encoding for either E. coli YbgI or FolE, have been streaked on LB plates supplemented with Ampicillin 100 µg/mL in the presence of either 0.2% glucose for repression of the gene expression, or 0.2% arabinose for overexpression of the gene of interest, and in presence or absence of dT 0.3 mM.

Similar articles

Cited by

References

    1. Danchin A., Fang G. Unknown unknowns: Essential genes in quest for function. Microb. Biotechnol. 2016;9:530–540. doi: 10.1111/1751-7915.12384. - DOI - PMC - PubMed
    1. Niehaus T.D., Thamm A.M., de Crécy-Lagard V., Hanson A.D. Proteins of unknown biochemical function—A persistent problem and a roadmap to help overcome it. Plant Physiol. 2015;169:1436–1442. doi: 10.1104/pp.15.00959. - DOI - PMC - PubMed
    1. de Crécy-Lagard V., Haas D., Hanson A.D. Newly-discovered enzymes that function in metabolite damage-control. Curr. Opin. Chem. Biol. 2018;47:101–108. doi: 10.1016/j.cbpa.2018.09.014. - DOI - PubMed
    1. De Crécy-Lagard V., Phillips G., Grochowski L.L., Yacoubi B.E., Jenney F., Adams M.W.W., Murzin A.G., White R.H. Comparative genomics guided discovery of two missing archaeal enzyme families involved in the biosynthesis of the pterin moiety of tetrahydromethanopterin and tetrahydrofolate. ACS Chem. Biol. 2012;7:1807–1816. doi: 10.1021/cb300342u. - DOI - PMC - PubMed
    1. Price M.N., Wetmore K.M., Waters R.J., Callaghan M., Ray J., Liu H., Kuehl J.V., Melnyk R.A., Lamson J.S., Suh Y., et al. Mutant phenotypes for thousands of bacterial genes of unknown function. Nature. 2018;557:503–509. doi: 10.1038/s41586-018-0124-0. - DOI - PubMed

Publication types

LinkOut - more resources