Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2006 Sep 19:7:239.
doi: 10.1186/1471-2164-7-239.

Coding limits on the number of transcription factors

Affiliations
Comparative Study

Coding limits on the number of transcription factors

Shalev Itzkovitz et al. BMC Genomics. .

Abstract

Background: Transcription factor proteins bind specific DNA sequences to control the expression of genes. They contain DNA binding domains which belong to several super-families, each with a specific mechanism of DNA binding. The total number of transcription factors encoded in a genome increases with the number of genes in the genome. Here, we examined the number of transcription factors from each super-family in diverse organisms.

Results: We find that the number of transcription factors from most super-families appears to be bounded. For example, the number of winged helix factors does not generally exceed 300, even in very large genomes. The magnitude of the maximal number of transcription factors from each super-family seems to correlate with the number of DNA bases effectively recognized by the binding mechanism of that super-family. Coding theory predicts that such upper bounds on the number of transcription factors should exist, in order to minimize cross-binding errors between transcription factors. This theory further predicts that factors with similar binding sequences should tend to have similar biological effect, so that errors based on mis-recognition are minimal. We present evidence that transcription factors with similar binding sequences tend to regulate genes with similar biological functions, supporting this prediction.

Conclusion: The present study suggests limits on the transcription factor repertoire of cells, and suggests coding constraints that might apply more generally to the mapping between binding sites and biological function.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Correlation between the maximal number of transcription factors and number of possible sequences for six super-families, for which details of binding mechanism are known.
Figure 2
Figure 2
Distribution of transcription factors for the 9 organisms in Table 1 among the different super-families. On the x axis are the 10 super-families of table 1, on the y axis their counts in each organism. The organisms are sorted according to increasing number of genes in the genome. Note that the y-axis scale is different for different organisms.
Figure 3
Figure 3
Conceptual coding schemes for the assignment of binding sequences to TFs. Binding sequences are displayed as points, TFs as colored spheres. Colors correspond to the biological function of each TF. a) A sphere-packing code – code-words are covered by non-overlapping spheres. The TFs do not share binding sequences. b) A smooth code – code-words are covered by overlapping spheres with similar function. TFs can share binding sequences with neighbor TFs. This type of code is predicted to be smooth, that is where TFs with shared binding sequences tend to have similar biological function, represented by spheres of similar color in the figure.
Figure 4
Figure 4
Transcription factors with overlapping binding sequences in S. cerevisae. Nodes represent TFs, edges connect pairs of TFs if their corresponding sets of binding sequences have significant overlap according to the present measure. Bold edges connect TFs which also have biological similarity according to the functional annotation and transcription network (gene co-regulation) measures. Shown are the TF logos [11]. Logo length was limited to the highly conserved base pairs for clarity.
Figure 5
Figure 5
Transcription factors with overlapping binding sequences in E. coli. Nodes represent TFs, edges connect pairs of TFs if their corresponding sets of binding sequences have significant overlap according to the present measure. Bold edges connect TFs which also have biological similarity according to the functional annotation and transcription network (gene co-regulation) measure. Shown are the TF logos [11]. Logo length was limited to the highly conserved base pairs for clarity.
Figure 6
Figure 6
Transcription factors with significantly overlapping sequences in Humans. Edges connect two TFs with similar binding sequences. Sequence logos are shown for each TF.

References

    1. Robison K, McGuire AM, Church GM. A comprehensive library of DNA-binding site matrices for 55 proteins applied to the complete Escherichia coli K-12 genome. J Mol Biol. 1998;284:241–254. doi: 10.1006/jmbi.1998.2160. - DOI - PubMed
    1. Stormo GD. DNA binding sites: representation and discovery. Bioinformatics. 2000;16:16–23. doi: 10.1093/bioinformatics/16.1.16. - DOI - PubMed
    1. Kellis M, Patterson N, Endrizzi M, Birren B, Lander ES. Sequencing and comparison of yeast species to identify genes and regulatory elements. Nature. 2003;423:241–254. doi: 10.1038/nature01644. - DOI - PubMed
    1. Bussemaker HJ, Li H, Siggia ED. Building a dictionary for genomes: identification of presumptive regulatory sites by statistical analysis. Proc Natl Acad Sci U S A. 2000;97:10096–10100. doi: 10.1073/pnas.180265397. - DOI - PMC - PubMed
    1. Wingender E, Chen X, Hehl R, Karas H, Liebich I, Matys V, Meinhardt T, Pruss M, Reuter I, Schacherer F. TRANSFAC: an integrated system for gene expression regulation. Nucleic Acids Res. 2000;28:316–319. doi: 10.1093/nar/28.1.316. - DOI - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources