Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Oct 27;11(5):e02717-20.
doi: 10.1128/mBio.02717-20.

Origins and Molecular Evolution of the NusG Paralog RfaH

Affiliations

Origins and Molecular Evolution of the NusG Paralog RfaH

Bing Wang et al. mBio. .

Abstract

The only universally conserved family of transcription factors comprises housekeeping regulators and their specialized paralogs, represented by well-studied NusG and RfaH. Despite their ubiquity, little information is available on the evolutionary origins, functions, and gene targets of the NusG family members. We built a hidden Markov model profile of RfaH and identified its homologs in sequenced genomes. While NusG is widespread among bacterial phyla and coresides with genes encoding RNA polymerase and ribosome in all except extremely reduced genomes, RfaH is mostly limited to Proteobacteria and lacks common gene neighbors. RfaH activates only a few xenogeneic operons that are otherwise silenced by NusG and Rho. Phylogenetic reconstructions reveal extensive duplications and horizontal transfer of rfaH genes, including those borne by plasmids, and the molecular evolution pathway of RfaH, from "early" exclusion of the Rho terminator and tightened RNA polymerase binding to "late" interactions with the ops DNA element and autoinhibition, which together define the RfaH regulon. Remarkably, NusG is not only ubiquitous in Bacteria but also common in plants, where it likely modulates the transcription of plastid genes.IMPORTANCE In all domains of life, NusG-like proteins make contacts similar to those of RNA polymerase and promote pause-free transcription yet may play different roles, defined by their divergent interactions with nucleic acids and accessory proteins, in the same cell. This duality is illustrated by Escherichia coli NusG and RfaH, which silence and activate xenogenes, respectively. We combined sequence analysis and recent functional and structural insights to envision the evolutionary transformation of NusG, a core regulator that we show is present in all cells using bacterial RNA polymerase, into a virulence factor, RfaH. Our results suggest a stepwise conversion of a NusG duplicate copy into a sequence-specific regulator which excludes NusG from its targets but does not compromise the regulation of housekeeping genes. We find that gene duplication and lateral transfer give rise to a surprising diversity within the only ubiquitous family of transcription factors.

Keywords: NusG; RfaH; Spt5; antitermination; transcription.

PubMed Disclaimer

Figures

FIG 1
FIG 1
RfaH and NusG interactions with the transcription machinery. Autoinhibited RfaH interacts with the ops DNA hairpin formed on the RNAP surface, transforms into an active NusG-like state, and binds to the β′ clamp helices (CHs); NusG makes similar but weaker contacts with RNAP (see Fig. S1 in the supplemental material). The NusG-KOW domain binds to Rho and promotes termination. Residues that make important functionally validated contacts are shown as sticks. PDB accession numbers are as follows: NusG-Rho binary complex, 6DUQ; autoinhibited RfaH, 5OND; RfaH bound to ops-paused transcription elongation complex, 6C6S.
FIG 2
FIG 2
The distribution of NusG-like factors. (A) NusG/Spt5 factors were identified using NusG and Spt5-NGN Pfam models, respectively, in Aquerium (; http://aquerium.zhulinlab.org/). The outer ring shows the number of hits; the darker the color, the more hits it represents. The inner rings represent the major taxonomic ranks and supergroups for eukaryotes (93). E, Eukaryota; A, Archaea; B, Bacteria. Plantae are green. (B) RfaH distribution in bacteria on the phylum level. The genome tree was downloaded from AnnoTree (; http://annotree.uwaterloo.ca/). Phyla with representatives that contain RfaH (based on hits with our new model) are highlighted in purple. Numbers appended after taxons indicate the number of genome hits divided by the total number of genomes. (C) RfaH distribution in Proteobacteria. The percentages of genome hits were calculated for RfaH-containing families with ≥10 genomes. Families with >50% hits are shown in red, and those with <50% hits are shown in blue. A genome tree of representative Gammaproteobacteria is shown. This and other genome trees are maximum-likelihood trees inferred from the alignment of 120 ubiquitous single-copy proteins (53).
FIG 2
FIG 2
The distribution of NusG-like factors. (A) NusG/Spt5 factors were identified using NusG and Spt5-NGN Pfam models, respectively, in Aquerium (; http://aquerium.zhulinlab.org/). The outer ring shows the number of hits; the darker the color, the more hits it represents. The inner rings represent the major taxonomic ranks and supergroups for eukaryotes (93). E, Eukaryota; A, Archaea; B, Bacteria. Plantae are green. (B) RfaH distribution in bacteria on the phylum level. The genome tree was downloaded from AnnoTree (; http://annotree.uwaterloo.ca/). Phyla with representatives that contain RfaH (based on hits with our new model) are highlighted in purple. Numbers appended after taxons indicate the number of genome hits divided by the total number of genomes. (C) RfaH distribution in Proteobacteria. The percentages of genome hits were calculated for RfaH-containing families with ≥10 genomes. Families with >50% hits are shown in red, and those with <50% hits are shown in blue. A genome tree of representative Gammaproteobacteria is shown. This and other genome trees are maximum-likelihood trees inferred from the alignment of 120 ubiquitous single-copy proteins (53).
FIG 3
FIG 3
Maximum-likelihood phylogenetic trees. (A) NusG-like proteins are widespread. (B to D) Topology of bacterial trees, with monophyletic groups colored in the genome tree (B). The two clades of Alphaproteobacteria (Alpha) are red and purple; one clade of Zetaproteobacteria (Zeta) is gray. The remaining clades belong to Gammaproteobacteria (Gamma). The branches of NusG (C) and RfaH (D) trees are colored according to the genome tree. Black dots indicate bootstrap values of >50% (A) or >70% (B to D).
FIG 4
FIG 4
Distribution of RfaH proteins in Enterobacteriaceae. The maximum-likelihood phylogenetic tree was built based on sequences of the 16S rRNA genes. Chromosomal RfaH (pink) and plasmid RfaH (purple) are indicated. Plasmid-borne RfaH genes (purple dots) are connected to their best BLASTP hits among the chromosomal genes.
FIG 5
FIG 5
Molecular evolution of NusG and RfaH. (A) Spt5 (black), NusG (gray), unknown NusGSP (light pink), and RfaH (hot pink) are marked on the maximum-likelihood phylogenetic tree. Archaeal Spt5 is used as an outgroup. NusGs with the same pattern of functional sites are collapsed. (Top) Selected functional residues in RfaH and NusG are color coded and numbered as in E. coli RfaH/NusG (NCBI accession no. NP_418284.1/NP_418409.1). Lighter colors indicate conservative substitutions. CL1 to -8 denote RfaH clusters. (B) A stepwise conversion of NusG into RfaH.
FIG 5
FIG 5
Molecular evolution of NusG and RfaH. (A) Spt5 (black), NusG (gray), unknown NusGSP (light pink), and RfaH (hot pink) are marked on the maximum-likelihood phylogenetic tree. Archaeal Spt5 is used as an outgroup. NusGs with the same pattern of functional sites are collapsed. (Top) Selected functional residues in RfaH and NusG are color coded and numbered as in E. coli RfaH/NusG (NCBI accession no. NP_418284.1/NP_418409.1). Lighter colors indicate conservative substitutions. CL1 to -8 denote RfaH clusters. (B) A stepwise conversion of NusG into RfaH.
FIG 6
FIG 6
RfaH clusters, genomic contexts, and targets. (A) The eight clusters. Footnote a, RfaHs found in GTDB_reps were clustered into eight clusters (Data Set S1H and N). The number of total sequences of different clusters are presented. Footnote b, a subset of different CLs containing NCBI reference sequences only. The number of sequences is shown. (B) Heatmap showing distribution of COG functional categories (represented by A to W) of RfaH neighbor genes; there are five genes on each side. The number of genes in every COG category was normalized by the number of RfaH reference sequences. (C) Operons activated by enterobacterial RfaHs and other NusGSP proteins; positions of ops sites (green) and NusGSP genes (orange) are shown. COG categories can be accessed at https://www.ncbi.nlm.nih.gov/COG/.

Similar articles

Cited by

References

    1. Werner F. 2012. A nexus for gene expression—molecular mechanisms of Spt5 and NusG in the three domains of life. J Mol Biol 417:13–27. doi:10.1016/j.jmb.2012.01.031. - DOI - PMC - PubMed
    1. Steiner T, Kaiser JT, Marinkovic S, Huber R, Wahl MC. 2002. Crystal structures of transcription factor NusG in light of its nucleic acid- and protein-binding activities. EMBO J 21:4641–4653. doi:10.1093/emboj/cdf455. - DOI - PMC - PubMed
    1. Hartzog GA, Fu J. 2013. The Spt4-Spt5 complex: a multi-faceted regulator of transcription elongation. Biochim Biophys Acta 1829:105–115. doi:10.1016/j.bbagrm.2012.08.007. - DOI - PMC - PubMed
    1. Ehara H, Yokoyama T, Shigematsu H, Yokoyama S, Shirouzu M, Sekine SI. 2017. Structure of the complete elongation complex of RNA polymerase II with basal factors. Science 357:921–924. doi:10.1126/science.aan8552. - DOI - PubMed
    1. Kang JY, Mooney RA, Nedialkov Y, Saba J, Mishanina TV, Artsimovitch I, Landick R, Darst SA. 2018. Structural basis for transcript elongation control by NusG family universal regulators. Cell 173:1650–1662.e14. doi:10.1016/j.cell.2018.05.017. - DOI - PMC - PubMed

Publication types

LinkOut - more resources