Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2004 Oct;14(10A):1861-9.
doi: 10.1101/gr.2542904.

Inverted repeat structure of the human genome: the X-chromosome contains a preponderance of large, highly homologous inverted repeats that contain testes genes

Affiliations

Inverted repeat structure of the human genome: the X-chromosome contains a preponderance of large, highly homologous inverted repeats that contain testes genes

Peter E Warburton et al. Genome Res. 2004 Oct.

Abstract

We have performed the first genome-wide analysis of the Inverted Repeat (IR) structure in the human genome, using a novel and efficient software package called Inverted Repeats Finder (IRF). After masking of known repetitive elements, IRF detected 22,624 human IRs characterized by arm size from 25 bp to >100 kb with at least 75% identity, and spacer length up to 100 kb. This analysis required 6 h on a desktop PC. In all, 166 IRs had arm lengths >8 kb. From this set, IRs were excluded if they were in unfinished/unassembled regions of the genome, or clustered with other closely related IRs, yielding a set of 96 large IRs. Of these, 24 (25%) occurred on the X-chromosome, although it represents only approximately 5% of the genome. Of the X-chromosome IRs, 83.3% were >/=99% identical, compared with 28.8% of autosomal IRs. Eleven IRs from Chromosome X, one from Chromosome 11, and seven already described from Chromosome Y contain genes predominantly expressed in testis. PCR analysis of eight of these IRs correctly amplified the corresponding region in the human genome, and six were also confirmed in gorilla or chimpanzee genomes. Similarity dot-plots revealed that 22 IRs contained further secondary homologous structures partially categorized into three distinct patterns. The prevalence of large highly homologous IRs containing testes genes on the X- and Y-chromosomes suggests a possible role in male germ-line gene expression and/or maintaining sequence integrity by gene conversion.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Results from Inverted Repeat Finder. (A) Distribution of arm lengths of human IRs detected by IRF, mean of 551 bp, median of 85 bp. Note the log scale on the x-axis. (B) Distribution of arm length (x-axis) by percent identity between arms (y-axis). The x-axes in A and B correspond for direct comparison. The “birdcrest” pattern observed resulted from limited ranges of percent identity values for the shortest most similar repeats, for example, 1/25 bp, 1/26 bp,..., 2/25 bp, 2/26 bp,.... As IR length increases, the pattern becomes less constrained. In all, 166 IRs had arm lengths ≥8000 bp (vertical line), of which 37 were on the X-chromosome and 18 on the Y-chromosome. (C) IRs detected by IRF that are ≥8000 bp, ≥95% identical, of which 27 are from the X-chromosome and 10 from the Y-chromosome. (D) Comparison on each human chromosome of percent total IRs (22,624), percent IRs ≥8 kb (166) detected by IRF, and percent IRs ≥8 kb after exclusion (96; see Supplemental S1). Values are compared to the percent of total assembled genome (3.07 × 109 bp) for each chromosome.
Figure 2
Figure 2
IRs in Xp11.2. (A) Similarity dot-plot of 2 Mb from Xp11.2. Homologous regions indicated by horizontal lines (direct repeats) or vertical lines (inverted repeats). Six large IRs were detected in this region as vertical lines (gray triangles). No other significant sequence similarities were seen in the top portion of the dot-plot not shown. The window size and percent identity for the dot-plot are indicated. (B) Higher-resolution view of the region containing IRX-51.17 and IRX-51.4958, which both contain the GAGE-D genes. Arrows on the same line represent homologous regions as indicated by the dot-plot; percent identity indicated when calculated. Internal IRs detected by IRF are indicated, for example, IRX-51.489 (Supplemental data S1), and the midpoint of the spacer indicated by a dot. IRX-51.17 was not included in the final set of 96 large IRs. (C) Higher-resolution view of IRX-51.73 and surrounding region, which contains an SSX gene cluster. Both arms of the IRX-51.73 contain inverted copies of SSX2 and SSX pseudo5. As in B, internal IRs detected by IRF are indicated, for example, IRX-51.635, and the midpoint of the spacer of the IR for which details are provided is indicated by a dot (Supplemental data S1).
Figure 3
Figure 3
Secondary structure patterns of IRs. (A) Dot-plot similarity analysis of IRX-69.83, which contains smaller, less homologous IRs in each arm. Internal IRs detected by IRF and percent identity are indicated (Supplemental data S1 and S2). Below each dot-plot are the RepeatMasker and TRF (Benson 1999) tracks from the UCSC Genome Browser, where the IR and internal secondary structures can be observed visually as mirror symmetries. (B) Dot-plot similarity analysis of IRX-150.52. The spacer of this IR is homologous to a region in the arms. (C) Analysis of IR7-143.36 and IR7-143.41. One arm of both of these IRs is found in the spacer of the other. IRs with similar secondary structure to A, B, and C are indicated in Table 1. Internal IRs detected in these regions were eliminated from the final data set (Supplemental data S1 and S2). The dot-plot similarity analysis was performed with a window size of 50 bp and mimimum identity of 85%. (D) Possible double cruciform structure suggested by IRs that contain internal IRs, for example, in A and Figure 2C. (Black and gray DNA strands) outside of IR region; (blue and red DNA strands) large, high homology IR; (purple and orange DNA strands) internal, lower homology IRs.
Figure 4
Figure 4
PCR analysis of internal arm/spacer boundary. For each IR indicated, a single primer (A) was designed to hybridize to both arms. Individual primers (L and R) were designed to hybridize to the spacer region. PCR amplification was performed using primer pairs A-L and A-R with human, gorilla, and chimpanzee genomic DNA. For each IR, a + indicates that PCR products were amplified using both primer pairs. All amplified IRs are >99% homologous (Table 1), except IRX-51.924, which is 97% homologous (Fig. 2A; Supplemental data S1 and S2). The PCR primers and DNA sequence of PCR products are shown in Supplemental data S3. The high homology between the arm and spacer of IRX-51.17 and IRX-51.924 (see Fig. 2B) does not allow for specific STSs, although two distinct sets of primers pairs were used (Supplemental data S3).

References

    1. Aradhya, S., Bardaro, T., Galgoczy, P., Yamagata, T., Esposito, T., Patlan, H., Ciccodicola, A., Munnich, A., Kenwrick, S., Platzer, M., et al. 2001. Multiple pathogenic and benign genomic rearrangements occur at a 35 kb duplication involving the NEMO and LAGE2 genes. Hum. Mol. Genet. 10: 2557-2567. - PubMed
    1. Bailey, J.A., Gu, Z., Clark, R.A., Reinert, K., Samonte, R.V., Schwartz, S., Adams, M.D., Myers, E.W., Li, P.W., and Eichler, E.E. 2002. Recent segmental duplications in the human genome. Science 297: 1003-1007. - PubMed
    1. Benham, C.J., Savitt, A.G., and Bauer, W.R. 2002. Extrusion of an imperfect palindrome to a cruciform in superhelical DNA: Complete determination of energetics using a statistical mechanical model. J. Mol. Biol. 316: 563-581. - PubMed
    1. Benson, G. 1999. Tandem Repeats Finder: A program to analyze DNA sequences. Nucleic Acids Res. 27: 573-580. - PMC - PubMed
    1. Chen, Y.T., Scanlan, M.J., Sahin, U., Tureci, O., Gure, A.O., Tsang, S., Williamson, B., Stockert, E., Pfreundschuh, M., and Old, L.J. 1997. A testicular antigen aberrantly expressed in human cancers detected by autologous antibody screening. Proc. Natl. Acad. Sci. 94: 1914-1918. - PMC - PubMed

WEB SITE REFERENCES

    1. http://ftp.genome.washington.edu/RM/RepeatMasker.html; RepeatMasker.
    1. http://tandem.bu.edu/cgi-bin/irdb/irdb.exe; Inverted Repeat Finder.
    1. http://tandem.bu.edu; Inverted Repeat Data Base (IRDB).

Publication types

LinkOut - more resources