Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2007 Aug;81(2):264-79.
doi: 10.1086/519311. Epub 2007 Jun 27.

Evolutionary conservation of a coding function for D4Z4, the tandem DNA repeat mutated in facioscapulohumeral muscular dystrophy

Affiliations

Evolutionary conservation of a coding function for D4Z4, the tandem DNA repeat mutated in facioscapulohumeral muscular dystrophy

Jannine Clapp et al. Am J Hum Genet. 2007 Aug.

Abstract

Facioscapulohumeral muscular dystrophy (FSHD) is caused by deletions within the polymorphic DNA tandem array D4Z4. Each D4Z4 repeat unit has an open reading frame (ORF), termed "DUX4," containing two homeobox sequences. Because there has been no evidence of a transcript from the array, these deletions are thought to cause FSHD by a position effect on other genes. Here, we identify D4Z4 homologues in the genomes of rodents, Afrotheria (superorder of elephants and related species), and other species and show that the DUX4 ORF is conserved. Phylogenetic analysis suggests that primate and Afrotherian D4Z4 arrays are orthologous and originated from a retrotransposed copy of an intron-containing DUX gene, DUXC. Reverse-transcriptase polymerase chain reaction and RNA fluorescence and tissue in situ hybridization data indicate transcription of the mouse array. Together with the conservation of the DUX4 ORF for >100 million years, this strongly supports a coding function for D4Z4 and necessitates re-examination of current models of the FSHD disease mechanism.

PubMed Disclaimer

Figures

Figure  1.
Figure 1.
ClustalW alignment of ape D4Z4 repeats. The DNA sequences are from EMBL database accession numbers AF117653 (human), BN000980 (chimpanzee [chimp]), and BN000981 (orangutan [orang]). The ORF is underlined.
Figure  1.
Figure 1.
ClustalW alignment of ape D4Z4 repeats. The DNA sequences are from EMBL database accession numbers AF117653 (human), BN000980 (chimpanzee [chimp]), and BN000981 (orangutan [orang]). The ORF is underlined.
Figure  1.
Figure 1.
ClustalW alignment of ape D4Z4 repeats. The DNA sequences are from EMBL database accession numbers AF117653 (human), BN000980 (chimpanzee [chimp]), and BN000981 (orangutan [orang]). The ORF is underlined.
Figure  1.
Figure 1.
ClustalW alignment of ape D4Z4 repeats. The DNA sequences are from EMBL database accession numbers AF117653 (human), BN000980 (chimpanzee [chimp]), and BN000981 (orangutan [orang]). The ORF is underlined.
Figure  2.
Figure 2.
ClustalW alignments of primate D4Z4 sequences. The DNA sequences are from EMBL database accession numbers AF117653 (human), BN000980 (chimpanzee [chimp]), BN000981 (orangutan [orang]), BN000983 (rhesus macaque), and BN000982 (marmoset). Numbering is done according to the location within the database sequences.
Figure  2.
Figure 2.
ClustalW alignments of primate D4Z4 sequences. The DNA sequences are from EMBL database accession numbers AF117653 (human), BN000980 (chimpanzee [chimp]), BN000981 (orangutan [orang]), BN000983 (rhesus macaque), and BN000982 (marmoset). Numbering is done according to the location within the database sequences.
Figure  2.
Figure 2.
ClustalW alignments of primate D4Z4 sequences. The DNA sequences are from EMBL database accession numbers AF117653 (human), BN000980 (chimpanzee [chimp]), BN000981 (orangutan [orang]), BN000983 (rhesus macaque), and BN000982 (marmoset). Numbering is done according to the location within the database sequences.
Figure  3.
Figure 3.
Phylogenetic relationships of DUX proteins. The unrooted maximum-likelihood tree was generated from the aligned, concatenated homeodomain amino acid sequences by use of the PHYLIP package. Branches are scaled according to evolutionary distance. Numbers at nodes represent bootstrap values. The asterisk (*) indicates the location of the rodent Dux node. For both DUXA and DUX4, the chimpanzee and human proteins are identical across the homeodomains; therefore, only the human orthologues are included in the tree. Similar tree topologies were generated using UPGMA and neighbor-joining methods (data not shown).
Figure  4.
Figure 4.
Schematic diagram of mammalian D4Z4–related repeat units. Repeats and sequence elements are drawn to scale. DNA repeat elements were identified using RepeatMasker. Sequence elements are shaded according to the key and are defined in the EMBL database entries. The macaque D4Z4 repeat has an insertion of 4.4 kb of mtDNA sequence that has not interrupted the DUX4 ORF. Such nuclear DNA sequences of mitochondrial origin (NUMTs) are found in many eukaryotes. Human sequence is from EMBL accession number AF117653, and rat sequence is from BAC clone CH230-14H6 (EMBL accession number AC135091). Other sequences are for chimpanzee (P. troglodytes [EMBL accession number BN000980]), orangutan (P. pygmaeus [BN000981]), rhesus macaque (M. mulatta [BN000983]), white tufted-ear marmoset (C. jacchus [BN000982]), tree shrew (T. belangeri [BN000984]), mouse (M. musculus [AM398147]), tenrec (E. telfairi [BN000990]), hyrax (P. capensis [BN000988]), and African elephant (L. africana [BN000989]).
Figure  5.
Figure 5.
Clustal alignments of mammalian DUX proteins. a, Alignment of the two homeodomain regions. The alignment shows several invariant or highly conserved amino acids; comparison with a homeodomain consensus sequence indicates that the majority of these either are hydrophobic residues associated with protein packing or are involved in DNA binding. The number of amino acid residues in the linker region between the two homeodomains varies between both paralogues and orthologues; indeed, the hyrax DUX4 protein has a repeated GQ motif that varies in copy number between D4Z4 repeats. An asterisk (*) indicates amino acid residues predicted to be involved in DNA binding, and “h” indicates residues that are involved in packing of the structure and are usually hydrophobic in homeodomains. b, Alignment of the C-terminal regions of DUXC, DUX4, and the mouse and rat Dux proteins. Residues that are invariant in all species are highlighted in black. Residues that are conserved in at least 60% of sequences are highlighted in dark gray, with conservative substitutions highlighted in light gray. Numbering relates to the sequence deposited in the EMBL database or transcript information from Ensembl (table 2).
Figure  6.
Figure 6.
Physical mapping of the mouse Dux locus. a, Schematic diagram of a mouse Dux repeat, indicating key restriction-enzyme sites and the locations of probes used in this study. E = EcoRI; B = BamHI; H = HindIII. The ORF is shown as a shaded box, with the homeoboxes in black. b, Southern blot of genomic DNA from C57BL/6J or CD1 mice probed with 32P-labeled Dux_3. The filter was washed under high-stringency conditions and was exposed for 4 h. c, FISH analysis of mouse chromosomes. For both the cosmid 6 and the plasmid Dux_4 probes, a single signal was seen on mouse chromosome 10, identified by the chromosome paint in the cosmid 6 panel. d, PFGE analysis of EcoRV-digested genomic DNA. The filter was hybridized with 32P-labeled Dux_5, was washed under high-stringency conditions, and was exposed for 6 h. e, Map of the Dux array region, indicating the locations of BAC and cosmid clones. Dux sequences are shown as red arrowheads. CEN = centromere; TEL = telomere; EV = EcoRV. There are three separate Dux clusters that are not joined in the current mouse genome assembly. The dotted line indicates the unsequenced region. The mouse Dux locus maps to an evolutionary chromosomal break point; genes that lie telomeric to the arrays have orthologues on human chromosome 2 (blue); genes that lie centromeric have orthologues on human chromosome 6 (green). We could not find any DUX-like sequences in either of these human regions. f, Schematic map of the rat and mouse Dux loci (not to scale), following the Ensembl assembly (release 42). The rat genome sequence is incomplete for this locus. In the current assembly, the rat Dux sequences are located in two arrays, with an intervening region of ∼1 Mb. Comparison of gene order between mouse and rat indicates that there has probably been at least one inversion and additional rearrangements in this region during recent murine evolution. Color coding of genes is as in panel e. MMU10 = mouse chromosome 10; RNO20 = rat chromosome 20.
Figure  7.
Figure 7.
RT-PCR analysis of the mouse Dux repeat. Representative agarose gels of RT-PCR and genomic PCR products. M = molecular-weight ladder; -ve = no template; -RT = RNA added after inactivation of reverse transcriptase; +RT = RNA present throughout the OneStep reaction; G = genomic DNA template. The ORF is indicated by the gray rectangle, and the homeobox sequences by the black boxes. The putative polyA addition site is indicated by the black triangle. Primer sequences and reaction conditions are provided in table 1.
Figure  8.
Figure 8.
Evidence of transcription from the mouse Dux locus. a, RT-PCR of mouse tissues by use of primers Dux_2f and Dux_2r, which should give a product of 400 bp (table 1). RT indicates a control reaction, where RNA was added after inactivation of the reverse transcriptase. dpc = days post coitum; C2C12 = mouse myoblast cell line. Dux transcripts were amplified from a range of tissues and embryonic stages. Detection of amplification was robust in the brain. In muscle cells (both in vivo and in vitro), amplification was weak but consistent. Sequencing of the products confirmed that they originated from the array. -ve = no template; -RT = control reaction where RNA was added after inactivation of the reverse transcriptase. +RT = RNA present throughout the OneStep reaction. b, Schematic representation of the mouse Dux and Gcc2 loci, indicating the direction of transcription and the positions of the probes used in RNA FISH experiments; both probes are 1.6 kb. Dux probes (Dux_6 in fig. 6a) were labeled with DIG, and Gcc2 probes with dinitrophenol. c, Representative mouse splenocyte nuclei from RNA FISH experiments. Hybridized probes were detected with secondary antibodies conjugated to FITC (for detection of DIG-labeled Dux probes) or Texas Red (for detection of DNP-labeled Gcc2 probes). In both cases, an antisense probe was used to identify sense Gcc2 transcripts, and a sense Gcc2 probe gave no signal (data not shown). However, for Dux, both antisense and sense probes gave signals that colocalized with Gcc2 signal, indicating that both sense and antisense transcripts are generated from the array. d, RT-PCRs were performed, as for panel a, by use of 1 μg of DNase-treated 7-dpc RNA, except that only one primer (as indicated) was included in the reverse-transcriptase step; the second primer was added after the reverse-transcriptase enzyme had been inactivated. M = molecular-weight ladder; -ve = no RNA added; + = RNA present through the OneStep reaction; − = RNA added after inactivation of reverse transcriptase; -F = control reaction in which the forward primer was not included; -R = control reaction in which the reverse primer was not included; G = genomic DNA positive control. e, Nonradioactive in situ hybridization analysis of mouse adult brain by use of the probe Dux_6. All sections are from adult mouse brain, except for that from P8 cerebellum. In the hippocampus, the black arrow indicates the weakly stained CA region; a red arrow indicates the dentate gyrus. EGL = external granule layer; IGL = inner granule layer. Scale bar = 100 μm, except for hippocampus, for which scale bar = 200 μm.
Figure  9.
Figure 9.
Localization of epitope-tagged mouse Dux protein to the nucleus. Fluorescence images are of EGFP-tagged Dux protein constructs transfected into C2C12 myoblast cells. Nuclei are counterstained with DAPI. Both the full-length protein and the homeodomain regions show nuclear localization. The tagged C-terminal region alone is distributed throughout the cell. Scale bar = 20 μm.

References

Web Resources

    1. BLAST, http://www.ncbi.nlm.nih.gov/blast/
    1. ClustalW, http://www.ebi.ac.uk/clustalw/
    1. EMBL, http://www.ebi.ac.uk/embl/ (for accession numbers AF117653, AC135091, AM398147–AM398151, BN000980–BN000984, and BN000988–BN000990)
    1. Ensembl, http://www.ensembl.org/index.html
    1. GenBank, http://www.ncbi.nlm.nih.gov/Genbank/ (for accession number NM_027375)

References

    1. Padberg GW (2004) Facioscapulohumeral muscular dystrophy. In: Upadhyaya M, Cooper DN (eds) Facioscapulohumeral muscular dystrophy (FSHD): clinical medicine and molecular cell biology. BIOS Scientific Publishers, Oxford, United Kindgom, pp 41–54
    1. Wijmenga C, Hewitt JE, Sandkuijl LA, Clark LN, Wright TJ, Dauwerse HG, Gruter A-M, Hofker MH, Moerer P, Williamson R, et al (1992) Chromosome 4q DNA rearrangements associated with facioscapulohumeral muscular dystrophy. Nat Genet 2:26–3010.1038/ng0992-26 - DOI - PubMed
    1. van Deutekom JCT, Wijmenga C, van Tienhoven EAE, Gruter A-M, Hewitt JE, Padberg GW, van Ommen G-JB, Hofker MH, Frants RR (1993) FSHD associated rearrangements are due to deletion of integral copies of a 3.2 kb tandemly repeated unit. Hum Mol Genet 2:2037–204210.1093/hmg/2.12.2037 - DOI - PubMed
    1. Tawil R, van der Maarel SM (2006) Facioscapulohumeral muscular dystrophy. Muscle Nerve 34:1–1510.1002/mus.20522 - DOI - PubMed
    1. Deidda G, Cacurri S, Grisanti P, Vigneti E, Piazzo N, Felicetti L (1995) Physical mapping evidence for a duplicated region on chromosome 10qter showing high homology with the FSHD locus on chromosome 4qter. Eur J Hum Genet 3:155–167 - PubMed

Publication types