Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2008 Nov 7:9:533.
doi: 10.1186/1471-2164-9-533.

Analysis of the largest tandemly repeated DNA families in the human genome

Affiliations

Analysis of the largest tandemly repeated DNA families in the human genome

Peter E Warburton et al. BMC Genomics. .

Abstract

Background: Tandemly Repeated DNA represents a large portion of the human genome, and accounts for a significant amount of copy number variation. Here we present a genome wide analysis of the largest tandem repeats found in the human genome sequence.

Results: Using Tandem Repeats Finder (TRF), tandem repeat arrays greater than 10 kb in total size were identified, and classified into simple sequence e.g. GAATG, classical satellites e.g. alpha satellite DNA, and locus specific VNTR arrays. Analysis of these large sequenced regions revealed that several "simple sequence" arrays actually showed complex domain and/or higher order repeat organization. Using additional methods, we further identified a total of 96 additional arrays with tandem repeat units greater than 2 kb (the detection limit of TRF), 53 of which contained genes or repeated exons. The overall size of an array of tandem 12 kb repeats which spanned a gap on chromosome 8 was found to be 600 kb to 1.7 Mbp in size, representing one of the largest non-centromeric arrays characterized. Several novel megasatellite tandem DNA families were observed that are characterized by repeating patterns of interspersed transposable elements that have expanded presumably by unequal crossing over. One of these families is found on 11 different chromosomes in >25 arrays, and represents one of the largest most widespread megasatellite DNA families.

Conclusion: This study represents the most comprehensive genome wide analysis of large tandem repeats in the human genome, and will serve as an important resource towards understanding the organization and copy number variation of these complex DNA families.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Analysis of tandem repeats from the human genome. Output from tandem repeats finder (TRF), plotted showing the repeat unit size on the X axis (log scale) and the array length on the Y axis (log scale). 24,358 arrays between 600 bp and 10,000 bp in length were found (grey squares). 503 arrays ≥ 10 kb found by TRF are shown classified into different types of repeats (see legend at top). Prominent "simple sequence" satellites are shown as color coded triangles. Classical satellites are shown as color coded circles. Single locus VNTR repeats are indicated by color coded diamonds. 373 arrays found at multiples of 171 bp repeat unit size represent alpha satellite DNA (purple circles). Arrays greater then 2 kb not found by TRF are also shown, and listed in Table 2. Some arrays containing repeat units greater than ~1.5 kb are also listed in Table 2 because they contain more complex repeat units than those listed in Table 1. Both the 1.5 kb NBPF repeats (square) and the 1.9 kb "mer5A1" repeats (square) were found by TRF but are listed in Table 2. Multiple LTR arrays (Table 3) are shown as red circles at a repeat unit size of 3.5 kb.
Figure 2
Figure 2
Dot plot- analysis of tandem arrays reveals higher-order structure. For each dot-plot shown, the type of repeat, chromosomal location and stringency (window size and % homology) are indicated. Black dots and horizontal lines represent tandem orientation, whereas blue dots and vertical lines represent inverted orientation. The repeat masker tracks for each region are shown below. A-G) Arrays are listed in Table 1. The Repeat Masker tracks indicate a large continuous domain of satellite DNA. A) 70.7 kb array of hsatII from 16q11.2 at low stringency, showing dense pattern indicative of homologous satellite DNA. A large inversion is seen in this array. ~20 kb of neighboring non-satellite DNA is also shown. B) 121 kb array of GsatII from 12p11.1, showing complex multiple inversions within this array. C) Same region as in B at increased stringency, showing 3 distinct domains of homology within overall array. D) Same array as in A at increased stringency, showing higher-order repeats in proximal 50 kb in both orientations. E) ~100 kb array of GAATG on Yq11.1, showing the 3.36 kb higher-order repeats in the distal 60 kb region. F) The 100 kb array of GAATG on Yq12, showing the 3.6 kb HOR across the entire sequenced array. G) The 61 bp VNTR from Xp22.33 at high stringency showing complex higher-order structure. H-L) Arrays listed in Table 2. The Repeat Masker tracks show repetitive patterns containing the different classes of transposable elements. H) The array containing the CT47 genes. I) The DMBT gene, showing the internal repetitive domain structure. J) The LPA gene, showing the internal repetitive domain structure. K) The 54.5 kb array of 5.4 kb megasatellite repeats, each of which contains a Mer33 repeat. L) The 51.4 kb array containing the ~6.0 kb Acro repeats. This array has an inversion in orientation of the repeat units, indicated by the vertical lines visible on the dotplot.
Figure 3
Figure 3
Analysis of large tandem array in 8q21.2. A) Information from the UCSC genome browser (hg18) showing region containing the 12 kb tandem repeat from 8q21.2. This repeat array contains an "87 kb" gap with ~5 repeat units on the proximal side and ~1.5 repeats on the distal side. The repeats can be seen in the repeating patterns of the Repeat Masker Tracks. The AF495523 (Gor1) gene is found once in each repeat unit. Copy number variation was detected at this repeat array using both BAC microarrays and fosmids. The restriction enzyme PmeI does not cut in the 12 kb repeats, but cuts close to the edge of the array in the genomic DNA sequence. The position of the PCR amplified probes used on the Southern blot are indicated. B) Pulsed Field Gel analysis of the array size in two pedigrees (lanes 1–4, and lanes 5–10).
Figure 4
Figure 4
In situ hybridization of megasatellite DNA families. A) FISH using a 636 bp probe to the SST repeats from chromosome 4, which hybridizes to chromosome 19 (two arrays, Table 2d) and chromosome 4. B) FISH using a probe from the acro repeats from chromosome 4p11 (Table 2d), which hybridizes to pericentromeric regions of chromosomes 3 and 4, and the acrocentric chromosomes. C) FISH using a probe to the 3.5 kb repeats from the LTR arrays. Right- Additional acrocentric chromosomes from different individuals showing the variation in hybridization patterns.
Figure 5
Figure 5
Analysis of the repeat unit structure of the LTR arrays from chromosomes 13, 18 and 21. A) Genomic region from chromosome 13q11 containing the LTR array. The REPEAT MASKER Tracks from the UCSC genome browser are shown, which indicate the large 60 kb array of LTR transposons. Homologous monomeric repeat units are indicated by arrows. B) Self similarity dot plot of LTR array. 30 bp windows at 90% homology reveals the ~3.5 kb monomeric repeat units as horizontal lines. C) Schematic of a higher-order repeat unit (HOR) consisting of 6 ~3.5 kb monomeric repeat units. The insertion of an LTR6A into monomer C of each HOR is shown, as well as additional deletions of the MSTA-int repeats. D) Detail of the composition of monomers C and D indicating the MaLR LTR fragments that make up the repeat units, taken from the REPEAT MASKER output and numbered relative to the consensus for each element. The insertion of a full length LTR6A into monomer C can be seen. E) Self-similarity dot plots of LTR array from chromosome 13q11, 18p11 and 21q11 at 50 bp windows 90% homology. The HOR organization is revealed as bold solid horizontal lines, and are shown schematically by arrows below. Putative unequal crossing over events unique to each LTR array are revealed by the gaps and shift of these lines, and deleted monomeric repeat units indicated below.

References

    1. Morris CA, Moazed D. Centromere assembly and propagation. Cell. 2007;128:647–650. doi: 10.1016/j.cell.2007.02.002. - DOI - PubMed
    1. Stam M, Belele C, Dorweiler JE, Chandler VL. Differential chromatin structure within a tandem array 100 kb upstream of the maize b1 locus is associated with paramutation. Genes Dev. 2002;16:1906–1918. doi: 10.1101/gad.1006702. - DOI - PMC - PubMed
    1. Alleman M, Sidorenko L, McGinnis K, Seshadri V, Dorweiler JE, White J, Sikkink K, Chandler VL. An RNA-dependent RNA polymerase is required for paramutation in maize. Nature. 2006;442:295–298. doi: 10.1038/nature04884. - DOI - PubMed
    1. Chan SW, Zhang X, Bernatavichute YV, Jacobsen SE. Two-step recruitment of RNA-directed DNA methylation to tandem repeats. PLoS Biol. 2006;4:e363. doi: 10.1371/journal.pbio.0040363. - DOI - PMC - PubMed
    1. Martienssen RA. Maintenance of heterochromatin by RNA interference of tandem repeats. Nat Genet. 2003;35:213–214. doi: 10.1038/ng1252. - DOI - PubMed

Publication types

LinkOut - more resources