Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2010 Nov;38(21):7364-77.
doi: 10.1093/nar/gkq617. Epub 2010 Jul 30.

Genomic repertoires of DNA-binding transcription factors across the tree of life

Affiliations
Review

Genomic repertoires of DNA-binding transcription factors across the tree of life

Varodom Charoensawan et al. Nucleic Acids Res. 2010 Nov.

Abstract

Sequence-specific transcription factors (TFs) are important to genetic regulation in all organisms because they recognize and directly bind to regulatory regions on DNA. Here, we survey and summarize the TF resources available. We outline the organisms for which TF annotation is provided, and discuss the criteria and methods used to annotate TFs by different databases. By using genomic TF repertoires from ∼700 genomes across the tree of life, covering Bacteria, Archaea and Eukaryota, we review TF abundance with respect to the number of genes, as well as their structural complexity in diverse lineages. While typical eukaryotic TFs are longer than the average eukaryotic proteins, the inverse is true for prokaryotes. Only in eukaryotes does the same family of DNA-binding domain (DBD) occur multiple times within one polypeptide chain. This potentially increases the length and diversity of DNA-recognition sequence by reusing DBDs from the same family. We examined the increase in TF abundance with the number of genes in genomes, using the largest set of prokaryotic and eukaryotic genomes to date. As pointed out before, prokaryotic TFs increase faster than linearly. We further observe a similar relationship in eukaryotic genomes with a slower increase in TFs.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Historical timeline of TF resources. The timeline to the left shows the years of the first publications describing the databases (not to scale). The panel on the right shows how the number of completely sequenced eukaryotic and bacterial genomes has increased according to the Genome OnLine Database (35). The TF resources are grouped according to their main annotation methods (manual curation, automatic plus manual curation or automatic). They are colored according to the organisms the resources annotate (blue for Bacteria, green for Archaea, red for Eukaryota and white if the resource covers two or three superkingdoms).
Figure 2.
Figure 2.
(A) TF abundance against number of genes per genome in different lineages across the tree of life. Each colored dot represents a genome. Different colors are used to highlight genomes from different phylogenetic groups. According to the linear model fit on a log–log scale, TF expansion in bacteria strictly follows a power law increase, with an exponent close to quadratic (logT = 1.98logG – 4.84 with R2 = 0.87 where T is number of predicted TFs, G is number of genes and R2 is coefficient of determination). The TF increase in eukaryotes has a lower exponent as well as degree of correlation (logT = 1.23logG – 2.53 with R2 = 0.61). (B) The number of unique DBD families increases linearly with the total number of proteins in bacteria (power law exponent = 1.00, R2 = 0.71). In contrast, the number of families is independent of the number of genes in metazoans (pink, exponent = 0.09, R2 = 0.11) and fungi (orange, exponent = 0.13, R2 = 0.23). Grey dots in the figures represent other eukaryotic species that do not belong to the main kingdoms such as apicomplexan and euglenozoa.
Figure 3.
Figure 3.
Examples of lineage-specific DBDs and domain architectures of TFs across the tree of life. Commonly found DBDs and TF architectures in different taxonomic species are projected onto the simplified NCBI taxonomic tree. DBDs and their architectures in TFs at different taxonomic nodes are unique to their descendent branches. DBDs are represented by red oblongs, and other protein domains occurring within the same TFs (partner domains) are represented by colored rectangles.

References

    1. Jacob F, Monod J. Genetic regulatory mechanisms in the synthesis of proteins. J. Mol. Biol. 1961;3:318–356. - PubMed
    1. Struhl K. Fundamentally different logic of gene regulation in eukaryotes and prokaryotes. Cell. 1999;98:1–4. - PubMed
    1. Carroll SB. Endless forms: the evolution of gene regulation and morphological diversity. Cell. 2000;101:577–580. - PubMed
    1. Levine M, Tjian R. Transcription regulation and animal diversity. Nature. 2003;424:147–151. - PubMed
    1. Takahashi K, Yamanaka S. Induction of pluripotent stem cells from mouse embryonic and adult fibroblast cultures by defined factors. Cell. 2006;126:663–676. - PubMed

Publication types