Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2004 Oct;14(10B):2041-7.
doi: 10.1101/gr.2584104.

An ORFeome-based analysis of human transcription factor genes and the construction of a microarray to interrogate their expression

Affiliations

An ORFeome-based analysis of human transcription factor genes and the construction of a microarray to interrogate their expression

David N Messina et al. Genome Res. 2004 Oct.

Abstract

Transcription factors (TFs) are essential regulators of gene expression, and mutated TF genes have been shown to cause numerous human genetic diseases. Yet to date, no single, comprehensive database of human TFs exists. In this work, we describe the collection of an essentially complete set of TF genes from one depiction of the human ORFeome, and the design of a microarray to interrogate their expression. Taking 1468 known TFs from TRANSFAC, InterPro, and FlyBase, we used this seed set to search the ScriptSure human transcriptome database for additional genes. ScriptSure's genome-anchored transcript clusters allowed us to work with a nonredundant high-quality representation of the human transcriptome. We used a high-stringency similarity search by using BLASTN, and a protein motif search of the human ORFeome by using hidden Markov models of DNA-binding domains known to occur exclusively or primarily in TFs. Four hundred ninety-four additional TF genes were identified in the overlap between the two searches, bringing our estimate of the total number of human TFs to 1962. Zinc finger genes are by far the most abundant family (762 members), followed by homeobox (199 members) and basic helix-loop-helix genes (117 members). We designed a microarray of 50-mer oligonucleotide probes targeted to a unique region of the coding sequence of each gene. We have successfully used this microarray to interrogate TF gene expression in species as diverse as chickens and mice, as well as in humans.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Creation of the seed set. Known TF genes were gathered from three databases: TRANSFAC, InterPro, and FlyBase. Each gene was manually confirmed to be described as a TF in the literature or annotated as a TF in LocusLink. After removing redundancies and adding some known TFs that were not present in our source databases, our seed set of human TFs contained 1468 members.
Figure 2
Figure 2
Search for paralogous TFs. By using the seed set of 1468 known human TF genes, we searched ScriptSure, a representation of the human transcriptome, using two methods: a high-stringency BLASTN search and an hmmpfam search for DNA-binding domains known to occur exclusively or primarily in TFs. The BLASTN search netted 3338 additional potential TFs, the domain search 2512. There were 494 genes that were found with both search methods; these 494 comprise the “found” set of human TF genes.
Figure 3
Figure 3
Genomic locations of TF clusters. Clusters of TF genes are shown on an ideogram representation of the 24 human chromosomes. As shown in the legend at the top right, the four canonical Hox gene clusters are shown in blue, the previously described chromosome 19 zinc finger gene clusters are shown in green, and putative TF clusters identified in this study are shown in red. The number in parentheses following each in the legend indicates the number of each type of cluster shown in this figure. Clusters containing known genes are labeled. Labels are not included for hypothetical and unnamed genes, and so clusters consisting entirely of these are unlabeled. For a list of the genes comprising each cluster, see Supplemental materials.

References

    1. Akiyama, Y., Hosoya, T., Poole, A.M., and Hotta, Y. 1996. The gcm-motif: A novel DNA-binding motif conserved in Drosophila and mammals. Proc. Natl. Acad. Sci. 93: 14912-14916. - PMC - PubMed
    1. Apweiler, R., Attwood, T.K., Bairoch, A., Bateman, A., Birney, E., Biswas, M., Bucher, P., Cerutti, L., Corpet, L., Croning, M.D.R., et al. 2001. The InterPro database, an integrated documentation resource for protein families, domains and functional sites. Nucleic Acids Res. 29: 37-40. - PMC - PubMed
    1. Boutanaev, A., Kalmykova, A.I., Shevelyov, Y.Y., and Nurminsky, D. 2002. Large clusters of co-expressed genes in the Drosophila genome. Nature 420: 666-669. - PubMed
    1. Boyadiev, S.A. and Jabs, E.W. 2000. Developmental biology: Frontiers for clinical genetics. Clin. Genet. 57: 253-266. - PubMed
    1. Brivanlou, A.H. and Darnell Jr., J.E. 2002. Signal transduction and the control of gene expression. Science 295: 813-818. - PubMed

WEB SITE REFERENCES

    1. http://www.ensembl.org; Ensembl genome browser.
    1. http://flybase.bio.indiana.edu/; FlyBase, a database of the Drosophila genome.
    1. http://www.geneontology.org/; Gene Ontology Consortium.
    1. http://www.ebi.ac.uk/interpro/; InterPro.
    1. http://www.ncbi.nlm.nih.gov/LocusLink/; LocusLink.

Publication types

MeSH terms

Substances