Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Aug;21(5):461-74.
doi: 10.1007/s10577-013-9371-y. Epub 2013 Jul 30.

Notable clustering of transcription-factor-binding motifs in human pericentric regions and its biological significance

Affiliations

Notable clustering of transcription-factor-binding motifs in human pericentric regions and its biological significance

Yuki Iwasaki et al. Chromosome Res. 2013 Aug.

Abstract

Since oligonucleotide composition in the genome sequence varies significantly among species even among those possessing the same genome G + C%, the composition has been used to distinguish a wide range of genomes and called as "genome signature". Oligonucleotides often represent motif sequences responsible for sequence-specific protein binding (e.g., transcription-factor binding). Occurrences of such motif oligonucleotides in the genome should be biased compared to those observed in random sequences and may differ among genomes and genomic portions. Self-Organizing Map (SOM) is a powerful tool for clustering high-dimensional data such as oligonucleotide composition on one plane. We previously modified the conventional SOM for genome informatics to batch learning SOM or "BLSOM". When we constructed BLSOMs to analyze pentanucleotide composition in 20-, 50-, and 100-kb sequences derived from the human genome, BLSOMs did not classify human sequences according to chromosome but revealed several specific zones composed primarily of sequences derived from pericentric regions. Interestingly, various transcription-factor-binding motifs were characteristically overrepresented in pericentric regions but underrepresented in most genomic sequences. When we focused on much shorter sequences (e.g., 1 kb), the clustering of transcription-factor-binding motifs was evident in pericentric, subtelomeric and sex chromosome pseudoautosomal regions. The biological significance of the clustering in these regions was discussed in connection with cell-type and -stage-dependent chromocenter formation and nuclear organization.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Penta- and DegePanta-BLSOMs for 50-kb sequences derived from the human genome (a) and for those including TetraRan sequences (b). When we construct BLSOM for a certain window size (e.g., 50 kb), we can set a certain sliding step. While the clustering patterns obtained with and without the sliding step resembled to each other in this study, the pattern listed was the 50-kb BLSOM with a 25-kb step. Examples without a sliding step were listed in Fig. S3A. The map size was chosen as an average number of data was ten at one lattice point. Lattice points containing sequences from multiple chromosomes are indicated in black and those containing sequences from a single chromosome are indicated in color: chr1 (formula image), chr2 (formula image), chr3 (formula image), chr4 (formula image), chr5 (formula image), chr6 (formula image), chr7 (formula image), chr8 (formula image), chr9 (formula image), chr10 (formula image), chr11 (formula image), chr12 (formula image), chr13 (formula image), chr14 (formula image), chr15 (formula image), chr16 (formula image), chr17 (formula image), chr18 (formula image), chr19 (formula image), chr20 (formula image), chr21 (formula image), chr22 (formula image), chrX (formula image), and chrY (formula image). Lattice points that contain only random sequences or no sequences are indicated in white blank. While difference in color specifying 24 chromosomes is not clear, this is of no importance here because colored lattice points shows only that the lattice points contain sequences derived from a single chromosome
Fig. 2
Fig. 2
Distribution of Sz sequences on individual chromosomes. Numbers of sZ sequences per 500-kb were plotted with colored symbols distinguishing sZ: Sz1 (formula image), Sz2 (formula image), Sz3 (formula image), Sz4 (formula image), and Sz5 (formula image). Centromeric and pericentric heterochromatin regions were marked with horizontal bars just above the X-axis and also with two brown arrows
Fig. 3
Fig. 3
Pentanucleotides enriched in Sz. (A) DegePenta-BLSOM listed in Fig. 1d. (Bi–v) TFB pentanucleotides specifically enriched in Sz. After calculating the expected frequency of each pentanucleotide from the mononucleotide composition at each lattice point, the observed/expected ratio for the pentanucleotide was indicated in color presented under the Bi panel
Fig. 4
Fig. 4
Distribution of TFB pentanucleotides on individual chromosomes. Numbers of TFB pentanucleotides per 100 kb were plotted with colored symbols distinguishing TFB pentanucleotide: AATCA/TGATT (formula image), AATCT/AGATT (formula image), ACCAC/GTGGT (formula image), AGATA/TATCT (formula image), ATTGG/CCAAT (formula image), CTATC/GATAG (formula image), CTTCC/GGAAG (formula image), GCCAA/TTGGC (formula image), and TATCA/TGATA (formula image)
Fig. 5
Fig. 5
DegePanta-BLSOM for 1-kb sequences derived from Sz. (A) Lattice points containing sequences from multiple Sz are indicated in black, and those containing sequences from a single sZ are indicated in color: Sz1 (formula image), Sz2 (formula image), Sz3 (formula image), Sz4 (formula image), and Sz5 (formula image). The Sz2-specific sequences (pink) formed a major extended territory at the bottom and a minor territory at the top left. This split in the Sz2 territory may relate to the internal segmentation observed within Sz2 in Fig. 3a. (B) U-matrix. (Ci–iv) TFB pentanucleotides specifically enriched in Sz-specific core territory. The observed/expected ratio for each TFB pentanucleotide was calculated as described in Fig. 3b and indicated in color presented under the Civ panel
Fig. 6
Fig. 6
Distribution of TFB pentanucleotides on individual chromosomes. Numbers of TFB pentanucleotides per 1 kb were plotted with colored symbols distinguishing TFB pentanucleotides: for chr Y, as described in Fig. 4: for other chromosomes, AATCA/TGATT (formula image) and GCCAA/TTGGC (formula image)
Fig. 7
Fig. 7
TfChIP and AcChIP data in 50-kb Sz sequences located within the centromeric band region. A narrow brown bar in the centromeric region (marked with brown in the chromosome ideogram) showed the position of the Sz sequence. a An example of Sz4 sequences. Noncoding RNA gene (arrowed dash line), AcChIP (pale violet chevron mark) and TfChIP (small vertical bar) were listed according to the UCSC Genome Browser. b An example of Sz2 sequences. Only TfChIP data were observed; KAP1 and SETDB1 were repeated 5 and 4 times, respectively. The figures were downloaded from the UCSC Genome Browser, but some data were not clear in the presented figures, and therefore, the nucleotide position and the name and number of TFs observed in the Sz sequence were additionally listed in bigger letters in each figure

Similar articles

Cited by

References

    1. Abe T, Kanaya S, Kinouchi M, Ichiba Y, Kozuki T, Ikemura T. Informatics for unveiling hidden genome signatures. Genome Res. 2003;13:693–702. doi: 10.1101/gr.634603. - DOI - PMC - PubMed
    1. Abe T, Sugawara H, Kinouchi M, Kanaya S, Ikemura T. Novel phylogenetic studies of genomic sequence fragments derived from uncultured microbe mixtures in environmental and clinical samples. DNA Res. 2005;12:281–290. doi: 10.1093/dnares/dsi015. - DOI - PubMed
    1. Abe T, Sugawara H, Kinouchi M, Kanaya S, Ikemura T. A large-scale Self-organizing map (SOM) unveils sequence characteristics of a wide range of eukaryote genomes. Gene. 2006;365:27–34. doi: 10.1016/j.gene.2005.09.040. - DOI - PubMed
    1. Abe T, Wada K, Iwasaki Y, Ikemura T. Novel bioinformatics for inter- and intraspecies comparison of genome signatures in plant genomes. Plant Biotech. 2009;26:469–477. doi: 10.5511/plantbiotechnology.26.469. - DOI
    1. Bernardi G. Structural and evolutionary genomics: natural selection in genome evolution. New York: Elsevier; 2004.

Publication types

Substances

LinkOut - more resources