Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comparative Study
. 2015 Sep 27:2015:bav091.
doi: 10.1093/database/bav091. Print 2015.

Comprehensive comparative homeobox gene annotation in human and mouse

Affiliations
Comparative Study

Comprehensive comparative homeobox gene annotation in human and mouse

Laurens G Wilming et al. Database (Oxford). .

Abstract

Homeobox genes are a group of genes coding for transcription factors with a DNA-binding helix-turn-helix structure called a homeodomain and which play a crucial role in pattern formation during embryogenesis. Many homeobox genes are located in clusters and some of these, most notably the HOX genes, are known to have antisense or opposite strand long non-coding RNA (lncRNA) genes that play a regulatory role. Because automated annotation of both gene clusters and non-coding genes is fraught with difficulty (over-prediction, under-prediction, inaccurate transcript structures), we set out to manually annotate all homeobox genes in the mouse and human genomes. This includes all supported splice variants, pseudogenes and both antisense and flanking lncRNAs. One of the areas where manual annotation has a significant advantage is the annotation of duplicated gene clusters. After comprehensive annotation of all homeobox genes and their antisense genes in human and in mouse, we found some discrepancies with the current gene set in RefSeq regarding exact gene structures and coding versus pseudogene locus biotype. We also identified previously un-annotated pseudogenes in the DUX, Rhox and Obox gene clusters, which helped us re-evaluate and update the gene nomenclature in these regions. We found that human homeobox genes are enriched in antisense lncRNA loci, some of which are known to play a role in gene or gene cluster regulation, compared to their mouse orthologues. Of the annotated set of 241 human protein-coding homeobox genes, 98 have an antisense locus (41%) while of the 277 orthologous mouse genes, only 62 protein coding gene have an antisense locus (22%), based on publicly available transcriptional evidence.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
The Obox cluster and its neighbourhood compared to the orthologous region in human. Figure is not to scale. See figure for a guide to symbols and colours. Overlapping symbols on same strand indicate nested genes; overlapping symbols on opposite strands indicate antisense genes. Gene names in italic between brackets indicate—for yet to be named coding genes—the name of the family or closest homologue or—for pseudogenes—the name of the parent gene or gene family; approved gene names are in bold; pseudogene and lncRNA names are in italic. Some unnamed genes are provided with RefSeq or VEGA identifiers (for the latter, prefix the 11-digit number with OTTHUMG or OTTMUSG for the full ID for human and mouse, respectively). Core duplicated gene cassettes are boxed. Note the complete absence of any OBOX loci in the human genome between the orthologues of the mouse genes that flank its Obox cluster. The bulk of the expansion of the cluster, which contains 52 Obox genes, appears to have been through the tandem duplication of a six-gene cassette—Obox–Obox3–Obox4–Gtpbp4–Ranbp2–Obox—of which eleven copies (not all complete) are present. Also note the expansion of the nearby Sult2a cluster in mouse—12 loci in mouse versus one in human—and the duplication of the Bsph gene in mouse. This region of the genome has clearly been subject to considerable rearrangements throughout evolution. Interestingly, the TPRX1 and Crxos homeobox genes are in syntenic positions, but, unlike their neighbouring loci, they are not orthologous. Neither species appears to have an orthologue for the other species’ gene.
Figure 2.
Figure 2.
Rhox expansion in mouse compared to human. Figure is not to scale. See figure for a guide to colours, Figure 1 for a guide to symbols and Figure 1 legend for notes on naming. Note the considerable expansion of the Rhox genes in mouse. The human genome has three RHOX genes (two of which—RHOXF2 and RHOXF2B—are closely related near-identical duplicates) that share best similarity, amongst the Rhox genes, with Rhox10-14 (RHOXF1) and Rhox6, -8 and -9 (RHOXF2 and RHOXF2B). The main expansion of the mouse cluster comes from the tandem duplication of an Rhox2–Rhox3–Rhox4 cassette of which at least nine copies (not all complete) are present. In all likelihood there are more copies of the cassette, or at least more copies of individual Rhox genes, as there are five genome assembly gaps in this cluster. Also note the inversion of the NKAP–AKAP14–NDUFA1–RNF113A cassette between human and mouse and the tandem duplication of part of the UPF3B gene in human, creating the four UPF3B pseudogenes shown here. This region of the genome has clearly been subject to considerable rearrangements throughout evolution.
Figure 3.
Figure 3.
Different Duxbl and DUX clusters in mouse and human and a mouse-specific Duxf cluster. Figure is not to scale. See figure for a guide to colours, Figure 1 for a guide to symbols and Figure 1 legend for notes on naming. (A) Mouse has seen an expansion of a gene cassette containing a Dux gene. Where mouse has three copies of the cassette, human only has one copy of each of the genes (where orthologues exist). This region is close to a synteny breakpoint. (B) A small cluster of five Duxf (pseudo)genes on mouse chromosome 10 has no equivalent in the human genome. For the genes marked with a question mark, it is unclear at this juncture whether these are the indicated biotypes as there is insufficient or conflicting evidence for an accurate determination of their biotype: coding genes could be pseudogenes and vice versa. The cluster is flanked by gaps and synteny breakpoints. Note the presence of a SULT1C cluster next to the human orthologue of Gcc2, the gene flanking the mouse Dux cluster. The mouse orthologue of this cluster has been subject to duplication and rearrangement as part of a six-gene cassette. Coincidentally, there is a Sult2a cluster next to the Obox cluster (Figure 1). There are many synteny breakpoints in these regions, indicating evolutionary instability.
Figure 4.
Figure 4.
The human-specific DUX4 clusters. Figure is not to scale. See figure for a guide to colours, Figure 1 for a guide to symbols and Figure 1 legend for notes on naming. The two DUX4 clusters found at the q-telomeres of human chromosomes 4 and 10 have no equivalent in mouse. Both regions are flanked by synteny or paralogy breakpoints. The chromosome 4 cluster, with the two, unrelated, FRG genes, is most likely the ancestral cluster, which duplicated and rearranged to form the chromosome 10 cluster with one FRG gene and the other FRG copy on chromosome 20. Another copy of the FRG2 section, without the distal DUX4L duplications, is present on chromosome 3. There are many more copies of FRG1, FRG2, TUBBB, FAM166A and the other genes from the chromosome 4 cluster in other regions of the genome, some of which are shown here; almost all duplicates can be found in subtelomeric and pericentromeric regions and where it relates to the genes on chromosome 4, those duplicates are subsets of the chromosome 4 arrangement.
Figure 5.
Figure 5.
Comparing human and mouse orthologues in the HOXC and HOXD clusters. (A) HOXC cluster. (B) HOXD cluster. Transcript models are shown with exons (boxes) and introns (connecting lines); green depicts protein-coding regions (CDS), red lines non-coding regions. Mouse and human have the same number of HOX genes in these clusters, but they differ in the number of antisense RNAs, with mouse having fewer than human. Antisense loci are indicated by magenta arrows while members of homeobox family are depicted by blue arrows and marked with the numerical part of their gene symbol, e.g. HOXD1 (human) and Hoxd1 (mouse) are shown as ‘1’.

References

    1. Qian Y.Q., Otting G., Billeter M., et al. (1993) Nuclear magnetic resonance spectroscopy of a DNA complex with the uniformly 13C-labeled Antennapedia homeodomain and structure determination of the DNA-bound homeodomain. J Mol Biol., 234, 1070–1083. - PubMed
    1. Bürglin T.R. (2013) Homeobox genes. Brenner’s Encyclopedia of Genetics. Academic Press, London.
    1. Burglin T.R. (2011) Homeodomain subtypes and functional diversity. Subcell Biochem., 52, 95–122. - PubMed
    1. Holland P.W. (2013) Evolution of homeobox genes. Wiley Interdiscip Rev Dev Biol., 2, 31–45. - PubMed
    1. Consortium E.P., Bernstein B.E., Birney E., et al. (2012) An integrated encyclopedia of DNA elements in the human genome. Nature, 489, 57–74. - PMC - PubMed

Publication types