Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2001;2(5):RESEARCH0016.
doi: 10.1186/gb-2001-2-5-research0016. Epub 2001 Apr 24.

Identification of conserved C2H2 zinc-finger gene families in the Bilateria

Affiliations

Identification of conserved C2H2 zinc-finger gene families in the Bilateria

R D Knight et al. Genome Biol. 2001.

Abstract

Background: Identification of orthologous relationships between genes from widely divergent taxa allows partial reconstruction of the gene complement of ancestral genomes. C2H2 zinc-finger genes are one of the largest and most complex gene superfamilies in metazoan genomes, with hundreds of members in the human genome. Here we analyze C2H2 zinc-finger genes from three taxa - Drosophila, Caenorhabditis elegans and human - from which near-complete genome sequence data are available.

Results: Our analyses conclusively identify 39 families of genes, of which 38 can be defined as orthology groups in that they are descended from single ancestral genes in the common ancestor of Drosophila, C. elegans and humans.

Conclusions: On the basis of current metazoan phylogeny, these 39 groups represent the minimum complement of C2H2 zinc-finger genes present in the genome of the bilaterian common ancestor.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Schematic diagram of a C2H2 zinc-finger motif. The paired cysteines (C) and histidines (H) that bind the zinc ion are shown in yellow and blue, respectively. The linker sequence, shown in green with its consensus sequence in the single-letter amino acid code, frequently joins adjacent fingers. This is apparent in the lower panel, which shows the typical arrangement of fingers in a C2H2 ZNF protein. The two large hydrophobic residues, which are also structurally important, are shown in red. The black residues are not structurally important and include those responsible for contacting DNA during sequence-specific binding [16]. The precise number of 'black' residues between the cysteines, histidines and on the loop may vary [10].
Figure 2
Figure 2
Highest percentage-identity match in 5% intervals for the E < 10 datasets of Drosophila and C. elegans compared to the human dataset. Baseline identity between typical C2H2 ZNF domains is between 20 and 44%, and this is where most genes show their highest identity. Values higher than this range are strongly suggestive of orthology. We also examined the difference between this analysis and an analysis of more stringent datasets (E < 1). All but one of the sequences detected at E < 10 but excluded from E < 1 had maximum identity matches below 40%.
Figure 3
Figure 3
Phylogenies of the gene families identified in our analysis for which more than three family members were present. (a) SP and KLF families; (b) Odd-like family; (c) Spalt family; (d) YY1 family; (e) Disco family; (f) IA-1 family; (g) Zep family; (h) Zic and Gli families; (i) Evi-1 family; (j) Snail family; (k) Ovo family; (l) Egr family. In each tree, the scale bar indicates a maximum likelihood branch length of 0.1 inferred substitutions per site and the numbers next to relevant branches are percentage quartet-puzzling support values. Genes and branches are color coded according to species: human genes are red, Drosophila genes are blue and C. elegans genes are green. Most trees are unrooted and built with members of only a single orthology group, as in only two cases could sequences from separate groups be confidently aligned. One of these exceptions is the SP and KLF families (a), which were analyzed together as their similar ZNF number and structure suggest relatively recent common ancestry. The other is the Zic and Gli families (h), which have a similar number and arrangement of C2H2 fingers. This tree also includes two 'orphan' Drosophila genes that have a similar finger arrangement. The phylogenetic analyses, with the exception of the KLF group, either failed to resolve relationships sufficiently to confirm or disprove orthology or showed that each group was descended from a single gene present in the common ancestor of humans, C. elegans and Drosophila. We therefore call these families 'orthology groups', implying that genes from different species within each family are orthologs. Consequently, genes from one species within a family are paralogs. For the KLF and SP genes, the tree topology shows monophyly of the SP genes and suggests that multiple KLF orthology groups may be present, although the poor resolution does not allow definition of these.

Similar articles

Cited by

References

    1. Fitch WM. Homology, a personal view on some of the problems. Trends Genet. 2000;16:227–231. - PubMed
    1. Tatusov RL, Koonin EV, Lipman DJ. A genomic approach to protein families. Science. 1997;278:631–637. - PubMed
    1. Tatusov RL, Galperin M, Natale D, Koonin EV. The COG database: a tool for genome-scale analysis of protein functions and evolution. Nucleic Acids Res. 2000;28:33–36. - PMC - PubMed
    1. Adoutte A, Balavoine G, Lartillot N, Lespinet O, Prud'homme B, de Rosa R. The new animal phylogeny: Reliability and implications. Proc Natl Acad Sci USA. 2000;97:4453–4456. - PMC - PubMed
    1. Pearson WR, Lipmann DJ. Improved tools for biological sequence comparison. Proc Natl Acad Sci USA. 1988;85:2444–2448. - PMC - PubMed

Substances