. 2011 Jan 11:12:21.

doi: 10.1186/1471-2164-12-21.

Bioinformatic evidence for a widely distributed, ribosomally produced electron carrier precursor, its maturation proteins, and its nicotinoprotein redox partners

Daniel H Haft¹

Affiliations

PMID: 21223593
PMCID: PMC3023750
DOI: 10.1186/1471-2164-12-21

Bioinformatic evidence for a widely distributed, ribosomally produced electron carrier precursor, its maturation proteins, and its nicotinoprotein redox partners

Daniel H Haft. BMC Genomics. 2011.

. 2011 Jan 11:12:21.

doi: 10.1186/1471-2164-12-21.

Author

Daniel H Haft¹

Affiliation

¹ J Craig Venter Institute, 9704 Rockville, MD 20850, USA. haft@jcvi.org

PMID: 21223593
PMCID: PMC3023750
DOI: 10.1186/1471-2164-12-21

Abstract

Background: Enzymes in the radical SAM (rSAM) domain family serve in a wide variety of biological processes, including RNA modification, enzyme activation, bacteriocin core peptide maturation, and cofactor biosynthesis. Evolutionary pressures and relationships to other cellular constituents impose recognizable grammars on each class of rSAM-containing system, shaping patterns in results obtained through various comparative genomics analyses.

Results: An uncharacterized gene cluster found in many Actinobacteria and sporadically in Firmicutes, Chloroflexi, Deltaproteobacteria, and one Archaeal plasmid contains a PqqE-like rSAM protein family that includes Rv0693 from Mycobacterium tuberculosis. Members occur clustered with a strikingly well-conserved small polypeptide we designate "mycofactocin," similar in size to bacteriocins and PqqA, precursor of pyrroloquinoline quinone (PQQ). Partial Phylogenetic Profiling (PPP) based on the distribution of these markers identifies the mycofactocin cluster, but also a second tier of high-scoring proteins. This tier, strikingly, is filled with up to thirty-one members per genome from three variant subfamilies that occur, one each, in three unrelated classes of nicotinoproteins. The pattern suggests these variant enzymes require not only NAD(P), but also the novel gene cluster. Further study was conducted using SIMBAL, a PPP-like tool, to search these nicotinoproteins for subsequences best correlated across multiple genomes to the presence of mycofactocin. For both the short chain dehydrogenase/reductase (SDR) and iron-containing dehydrogenase families, aligning SIMBAL's top-scoring sequences to homologous solved crystal structures shows signals centered over NAD(P)-binding sites rather than over substrate-binding or active site residues. Previous studies on some of these proteins have revealed a non-exchangeable NAD cofactor, such that enzymatic activity in vitro requires an artificial electron acceptor such as N,N-dimethyl-4-nitrosoaniline (NDMA) for the enzyme to cycle.

Conclusions: Taken together, these findings suggest that the mycofactocin precursor is modified by the Rv0693 family rSAM protein and other enzymes in its cluster. It becomes an electron carrier molecule that serves in vivo as NDMA and other artificial electron acceptors do in vitro. Subclasses from three different nicotinoprotein families show "only-if" relationships to mycofactocin because they require its presence. This framework suggests a segregated redox pool in which mycofactocin mediates communication among enzymes with non-exchangeable cofactors.

PubMed Disclaimer

Figures

**Figure 1**
**Multiple sequence alignment of gene predictions for mycofactocin precursors**. All detectable members of the family defined by TIGR03969 were collected, sorted by length, and aligned by MUSCLE [33], and then made non-redundant to 80% sequence identity, preferentially keeping sequences previously treated as genes and available through NCBI. Significant sequence similarity is restricted to the last 23 amino acids; the seven invariant residues occur among the last eight positions.

**Figure 2**
**Mycofactocin gene cluster regions**. Examples of the mycofactocin cluster are shown from *Geobacter uraniireducens* (Deltaproteobacteria), *Pelotomaculum thermopropionicum* (Firmicutes), *Mycobacterium avium* (Actinobacteria), *Thermomicrobium roseum* (Chlorobi), and *Haloterrigena turkmenica* (Archaea). Additional SDR family oxidoreductases for these species, beyond those shown in the mycofactocin cluster, include four Fe-dependent from *G. uraniireducens* and nine from *P. thermopropionicum*, twenty-five SDR and five Zn-dependent from *M. avium*, and four Zn-dependent from *T. roseum*.

**Figure 3**
**SDR family dehydrogenase SIMBAL heat map and related structure**. **Panel A**: SIMBAL map for 267-residue protein PTH_0592 from *Pelotomaculum thermopropionicum* SI. Data sets numbered 974 sequences in the TRUE partition and 6998 in the FALSE partition after being made non-redundant to less than 80% sequence identity. The X coordinate represents the location of the center of each subsequence along the full-length protein sequence, the Y coordinate represents the size of subsequence tested. The plot is triangular, narrowing as it rises, because longer subsequences are more constrained and have less room to slide from N-terminus to C-terminus. **Panel B**: Structure 1NFQ of Rv2002, an NADH-dependent 3alpha, 20beta-hydroxysteroid dehydrogenase in the SDR family. Bound NAD+ is shown in yellow and androsterone in blue. Active site resides Ser-140, Tyr-153, and Lys-157 are shown in cyan with their side chains as space-filling spheres. Highlighted in brown and red are regions from Rv2002 that map by pairwise alignment to the locally top SIMBAL hit sequences from PTH_0592 at subsequence sizes of 8 (brown) and size 12 (red), both taken from within the absolute highest scoring subsequence.

**Figure 4**
**The SIMBAL heat map for gi|284167095 from *Haloterrigena turkmenica*, a Zn-dependent protein from family TIGR03989**. SIMBAL training sets contained 590 sequences in the TRUE partition and 3620 in the FALSE partition after making the sets non-redundant to no more than 80% sequence identity. The apex score of 30.1 nears that of highest local score, consistent with the ability of TIGR03989 to identify large numbers of members found exclusively in mycofactocin-producing species. Interesting features include extended cool (blue and green) regions such as the N-terminal region of about 80 residues, which contains regions well conserved among Zn-dependent alcohol dehydrogenases yet apparently poorly predictive for whether or not a matching protein's genome of origin contains the mycofactocin cluster.

**Figure 5**
**Iron-dependent dehydrogenase SIMBAL heat map and related structure**. **Panel A** shows the SIMBAL heat map for Dtox_4270, a 419-residue group III iron-dependent dehydrogenase from *Desulfotomaculum acetoxidans* DSM 771. The training set contains 62 sequences in the TRUE partition and 1555 in the FALSE partition. This heat map shows a plume rising from a short subsequence TSNPKDYE, homologous to the sequence VPNPTITV in the sequence of lactaldehyde:1,2-propanediol oxidoreductase Of *Escherichia Coli*, which has a solved crystal structure 2BL4. **Panel B** shows the secondary structure cartoon of 2BL4 in green. The NAD cofactor is shown as space-filling spheres in yellow, and the iron atom as a blue sphere. The sequence identified by homology to the SIMBAL hot spot from panel A is shown in red, making close contact with the NAD but not with the iron atom. The location of the hot spot is consistent with a role in controlling the ability of NAD to exchange electrons to other redox carriers.

See this image and copyright information in PMC

References

1. Bernal A, Ear U, Kyrpides N. Genomes OnLine Database (GOLD): a monitor of genome projects world-wide. Nucleic Acids Res. 2001;29(1):126–127. doi: 10.1093/nar/29.1.126. - DOI - PMC - PubMed
1. Kensche PR, van Noort V, Dutilh BE, Huynen MA. Practical and theoretical advances in predicting the function of a protein by its phylogenetic distribution. J R Soc Interface. 2008;5(19):151–170. doi: 10.1098/rsif.2007.1047. - DOI - PMC - PubMed
1. Overbeek R, Begley T, Butler RM, Choudhuri JV, Chuang HY, Cohoon M, de Crecy-Lagard V, Diaz N, Disz T, Edwards R. et al.The subsystems approach to genome annotation and its use in the project to annotate 1000 genomes. Nucleic Acids Res. 2005;33(17):5691–5702. doi: 10.1093/nar/gki866. - DOI - PMC - PubMed
1. Jensen LJ, Kuhn M, Stark M, Chaffron S, Creevey C, Muller J, Doerks T, Julien P, Roth A, Simonovic M, STRING 8--a global view on proteins and their functional interactions in 630 organisms. Nucleic Acids Res. 2009. pp. D412–416. - DOI - PMC - PubMed
1. Mavromatis K, Chu K, Ivanova N, Hooper SD, Markowitz VM, Kyrpides NC. Gene context analysis in the Integrated Microbial Genomes (IMG) data management system. PLoS One. 2009;4(11):e7979. doi: 10.1371/journal.pone.0007979. - DOI - PMC - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

Grants and funding

R01 HG004881/HG/NHGRI NIH HHS/United States

LinkOut - more resources

Full Text Sources
Molecular Biology Databases
- BacDive
- BioCyc
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Bioinformatic evidence for a widely distributed, ribosomally produced electron carrier precursor, its maturation proteins, and its nicotinoprotein redox partners

Affiliation

Bioinformatic evidence for a widely distributed, ribosomally produced electron carrier precursor, its maturation proteins, and its nicotinoprotein redox partners

Author

Affiliation

Abstract

Figures

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Molecular Biology Databases

Miscellaneous