Notable clustering of transcription-factor-binding motifs in human pericentric regions and its biological significance
- PMID: 23896648
- PMCID: PMC3761090
- DOI: 10.1007/s10577-013-9371-y
Notable clustering of transcription-factor-binding motifs in human pericentric regions and its biological significance
Abstract
Since oligonucleotide composition in the genome sequence varies significantly among species even among those possessing the same genome G + C%, the composition has been used to distinguish a wide range of genomes and called as "genome signature". Oligonucleotides often represent motif sequences responsible for sequence-specific protein binding (e.g., transcription-factor binding). Occurrences of such motif oligonucleotides in the genome should be biased compared to those observed in random sequences and may differ among genomes and genomic portions. Self-Organizing Map (SOM) is a powerful tool for clustering high-dimensional data such as oligonucleotide composition on one plane. We previously modified the conventional SOM for genome informatics to batch learning SOM or "BLSOM". When we constructed BLSOMs to analyze pentanucleotide composition in 20-, 50-, and 100-kb sequences derived from the human genome, BLSOMs did not classify human sequences according to chromosome but revealed several specific zones composed primarily of sequences derived from pericentric regions. Interestingly, various transcription-factor-binding motifs were characteristically overrepresented in pericentric regions but underrepresented in most genomic sequences. When we focused on much shorter sequences (e.g., 1 kb), the clustering of transcription-factor-binding motifs was evident in pericentric, subtelomeric and sex chromosome pseudoautosomal regions. The biological significance of the clustering in these regions was discussed in connection with cell-type and -stage-dependent chromocenter formation and nuclear organization.
Figures
), chr2 (
), chr3 (
), chr4 (
), chr5 (
), chr6 (
), chr7 (
), chr8 (
), chr9 (
), chr10 (
), chr11 (
), chr12 (
), chr13 (
), chr14 (
), chr15 (
), chr16 (
), chr17 (
), chr18 (
), chr19 (
), chr20 (
), chr21 (
), chr22 (
), chrX (
), and chrY (
). Lattice points that contain only random sequences or no sequences are indicated in white blank. While difference in color specifying 24 chromosomes is not clear, this is of no importance here because colored lattice points shows only that the lattice points contain sequences derived from a single chromosome
), Sz2 (
), Sz3 (
), Sz4 (
), and Sz5 (
). Centromeric and pericentric heterochromatin regions were marked with horizontal bars just above the X-axis and also with two brown arrows
), AATCT/AGATT (
), ACCAC/GTGGT (
), AGATA/TATCT (
), ATTGG/CCAAT (
), CTATC/GATAG (
), CTTCC/GGAAG (
), GCCAA/TTGGC (
), and TATCA/TGATA (
)
), Sz2 (
), Sz3 (
), Sz4 (
), and Sz5 (
). The Sz2-specific sequences (pink) formed a major extended territory at the bottom and a minor territory at the top left. This split in the Sz2 territory may relate to the internal segmentation observed within Sz2 in Fig. 3a. (B) U-matrix. (Ci–iv) TFB pentanucleotides specifically enriched in Sz-specific core territory. The observed/expected ratio for each TFB pentanucleotide was calculated as described in Fig. 3b and indicated in color presented under the Civ panel
) and GCCAA/TTGGC (
)
References
-
- Abe T, Wada K, Iwasaki Y, Ikemura T. Novel bioinformatics for inter- and intraspecies comparison of genome signatures in plant genomes. Plant Biotech. 2009;26:469–477. doi: 10.5511/plantbiotechnology.26.469. - DOI
-
- Bernardi G. Structural and evolutionary genomics: natural selection in genome evolution. New York: Elsevier; 2004.
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources
