Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Apr 16;25(8):4395.
doi: 10.3390/ijms25084395.

Novel Concept of Alpha Satellite Cascading Higher-Order Repeats (HORs) and Precise Identification of 15mer and 20mer Cascading HORs in Complete T2T-CHM13 Assembly of Human Chromosome 15

Affiliations

Novel Concept of Alpha Satellite Cascading Higher-Order Repeats (HORs) and Precise Identification of 15mer and 20mer Cascading HORs in Complete T2T-CHM13 Assembly of Human Chromosome 15

Matko Glunčić et al. Int J Mol Sci. .

Abstract

Unraveling the intricate centromere structure of human chromosomes holds profound implications, illuminating fundamental genetic mechanisms and potentially advancing our comprehension of genetic disorders and therapeutic interventions. This study rigorously identified and structurally analyzed alpha satellite higher-order repeats (HORs) within the centromere of human chromosome 15 in the complete T2T-CHM13 assembly using the high-precision GRM2023 algorithm. The most extensive alpha satellite HOR array in chromosome 15 reveals a novel cascading HOR, housing 429 15mer HOR copies, containing 4-, 7- and 11-monomer subfragments. Within each row of cascading HORs, all alpha satellite monomers are of distinct types, as in regular Willard's HORs. However, different HOR copies within the same cascading 15mer HOR contain more than one monomer of the same type. Each canonical 15mer HOR copy comprises 15 monomers belonging to only 9 different monomer types. Notably, 65% of the 429 15mer cascading HOR copies exhibit canonical structures, while 35% display variant configurations. Identified as the second most extensive alpha satellite HOR, another novel cascading HOR within human chromosome 15 encompasses 164 20mer HOR copies, each featuring two subfragments. Moreover, a distinct pattern emerges as interspersed 25mer/26mer structures differing from regular Willard's HORs and giving rise to a 34-monomer subfragment. Only a minor 18mer HOR array of 12 HOR copies is of the regular Willard's type. These revelations highlight the complexity within the chromosome 15 centromeric region, accentuating deviations from anticipated highly regular patterns and hinting at profound information encoding and functional potential within the human centromere.

Keywords: GRM algorithm; T2T-CHM13 human assembly; alpha satellites; centromere; higher-order repeats.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflicts of interest.

Figures

Figure 1
Figure 1
(a) GRM (Global Repeat Map) diagram for tandemly arranged alpha satellite monomers in the complete T2T-CHM13 assembly of human chromosome 15. Horizontal axis: GRM periods (in ~171 bp units). Vertical axis: frequency of monomer repeats period. Identified GRM peaks exhibit periods 4, 5, 7, 11, 15, 18, 20, 25, 26 and 34. The significance of these GRM peaks (HORs or associated subfragment repeats) can be inferred from the monomer distance (MD) diagram. (b) Monomer distance (MD) diagram for tandemly organized monomers identified in T2T-CHM13 assembly of human chromosome 15. Horizontal axis: enumeration of tandemly organized alpha satellite monomers, in sequential order as revealed via GRM analysis of the T2T assembly. Vertical axis: period (the distance between the start of a monomer and the next monomer of the same type). Four distinct regions of monomer tandems, denoted as h1, h2, h4, and h3, correlate with HORs designated as hor1, hor2, hor4, and hor3, respectively. Additionally, there are sporadic MD points that do not correspond to HORs or their subfragments.
Figure 2
Figure 2
Ideogram of alpha satellite HOR arrays in human chromosome 15. All four regions of HOR arrays are located on the short (p) arm of chromosome 15. The 18mer, 25mer, and 20mer HORs are situated on the p11.2 pericentromeric subregion (light blue color). This gene-rich band is positioned on the distal portion of the p arm (p11.2q13). The array of 15mer HORs initiates within the p11.2 pericentromeric subregion and extends within the p arm heterochromatic centromeric region (red color). The positions of these HOR arrays within the T2T CHM13 genomic sequence are as follows: Willard’s type 18mer HOR array 14,699,937 bp–14,737,927 bp, Willard’s type 25mer HOR array 15,417,939 bp–15,692,443 bp, cascading 20mer HOR array 15,993,645 bp–16,555,446 bp, and cascading 15mer HOR array 16,679,039 bp–17,683,163 bp. The 20mer HOR array extends into the 15mer HOR array, while non-repetitive genetic material is present between the 18mer HOR array and the 25mer HOR array.
Figure 3
Figure 3
Aligned scheme for cascading 15mer HOR alignment in hor3. (a) Canonical HOR copy displayed in the cascading HOR presentation (n = 15, τ = 9) (15 monomers of 9 different types). Monomers within HOR copy are labeled as m1, m2, … m15, in order of their appearance within the canonical HOR copy (from left to right within each row and from top to bottom). Each monomer is depicted by a colored box, with distinct colors corresponding to different monomer types. Monomers are organized into columns based on their monomer types: monomer type t1 in the first column, monomer type t2 in the second column, and so forth. The number of columns, i.e., the number of different monomer types in the canonical HOR copy, is denoted by τ. (b) Attribution of 9 monomer types t1, t2, … t15 to 15 monomers m1, m2, … m15 in the one-row presentation of the canonical 15mer HOR copy. (c) Comparison of canonical 15mer HOR copy and some of its variants.
Figure 4
Figure 4
Scheme for interspersed repetitive monomeric subfragments in cascading 15mer HOR array. In the MD diagram (Figure 1b), within this same monomer enumeration interval, there exist equidistant MD line segments for periods 4, 7 and 11, which represent subfragments of the 15mer HOR. Distances between monomers of the same type are illustrated for two neighboring canonical HOR copies.In the first HOR copy, the monomers are labeld as 11 2142, and in the second HOR copy as 112142. Here, 11 represents the first monomer of type t1 in the first HOR copy, 21 the first monomer of type t2 in the first HOR copy, and so forth, while 42 denotes the second monomer of type t4 in the first HOR copy. Similarly, 11 denotes the first monomer type t1 in the second HOR copy; 21 denotes the first monomer of type t2 in the second HOR copy, and so on, with 42 representing the second monomer of type t4 in the second HOR copy. Highly similar monomers are not numbered consecutively, as depicted in the figures above, to facilitate the explanation of monomer distances and MD monomeric subfragments in Figure 1b. The distance between monomers 11 and 12 is denoted d(11, 12). This distance, d(11, 12), is equal to the sum of distances between 11 and 21,21 and 31,31 and 41,41 and 12, i.e., d(11, 12)=d(11, 21)+d(21,31) + d(31,41) + d(41, 12) ~4 monomer units. In this way, we obtain the monomer distances given in the text.
Figure 5
Figure 5
Cascading 20mer HOR alignment in hor4. (a) Canonical HOR copy displayed in cascading HOR presentation (n = 20, τ = 19). Monomers within the HOR copy are labeled as m1, m2, … m20, in order of appearance within canonical HOR copy (from left to right within each row and from first to second). The number of columns, i.e., the number of different monomer types in the canonical HOR copy, is denoted by τ = 19. (b) Attribution of 19 monomer types t1, t2, … t19 to 20 monomers m1, m2, … m20 in one-row presentation of canonical 20mer HOR copy. (c) Comparison of canonical 15mer HOR copy and some of its variants.
Figure 6
Figure 6
Interspersed repetitive monomeric subfragments in cascading 20mer HOR array. In the MD diagram (Figure 1b), within this same monomer enumeration interval, there exist equidistant MD line segments for periods 5 and 10, which represent subfragments of the 20mer HOR. Distances between monomers of the same type are exemplified for two adjacent canonical HOR copies. Relevant inter-monomer distances are provided for two neighboring canonical 20mer HOR copies. Relevant distances between the monomers of repeated monomers are d91,92=5, d(91, 91,) = 15, giving rise to subfragment MD line segment periods 5 and 15.
Figure 7
Figure 7
Alignment of Willard’s type 18mer HOR array.
Figure 8
Figure 8
Aligned scheme of six interspersed sections of Willard’s type 25mer and 26mer HOR arrays (hor2). (a) From top to bottom: four 26mer HOR copies, one 25mer HOR copy, three 26mer HOR copies, two 25mer HOR copies, seven 26mer HOR copies and forty-eight 25mer HOR copies. Of the 48 25mer HOR copies at the bottom part of the figure, only the 1st and 48th 25mer HOR copies are shown. Canonical 25mer and 26mer HOR copies have 22 common t-monomers, while they differ in 7 t-monomers: in the canonical 25mer HOR copy, the missing monomers are t11, t16, t17, and t28, while in the canonical 26mer HOR copy, the missing monomers are t10, t15, and t27. Thus, the 25mer and 26mer HOR copies together comprise 29 types of HOR copies. They are all included in the t-monomer sequence at the top of the figure to demonstrate the common alignment of both 25mer and 26mer HOR copies. (b) Alignment of 25mer and 26mer canonical HOR copies with 29 common monomer types. (c) A segment of HOR copies No. 6 to 17 from the set of 48 25mer HOR copies, illustrating the origin of tertiary 34-monomer repeats (comprising three two-variant copies: No. 8–9, 11–12, and 14–15, each consisting of 12 + 22 = 34 monomers). In this graphical presentation, only the t-monomers present in canonical 25mer HOR are included (i.e., t11, t16, t17 and t28 are omitted from the t-sequence).

Similar articles

Cited by

References

    1. Miga K.H. Centromere studies in the era of ‘telomere-to-telomere’ genomics. Exp. Cell Res. 2020;394:112127. doi: 10.1016/j.yexcr.2020.112127. - DOI - PMC - PubMed
    1. Nurk S., Koren S., Rhie A., Rautiainen M., Bzikadze A.V., Mikheenko A., Vollger M.R., Altemose N., Uralsky L., Gershman A., et al. The complete sequence of a human genome. Science. 2022;376:44–53. doi: 10.1126/science.abj6987. - DOI - PMC - PubMed
    1. Cechova M., Miga K.H. Comprehensive variant discovery in the era of complete human reference genomes. Nat. Methods. 2023;20:17–19. doi: 10.1038/s41592-022-01740-8. - DOI - PMC - PubMed
    1. Altemose N., Logsdon G.A., Bzikadze A.V., Sidhwani P., Langley S.A., Caldas G.V., Hoyt S.J., Uralsky L., Ryabov F.D., Shew C.J., et al. Complete genomic and epigenetic maps of human centromeres. Science. 2022;376:eabl4178. doi: 10.1126/science.abl4178. - DOI - PMC - PubMed
    1. Miga K.H. The Promises and Challenges of Genomic Studies of Human Centromeres. Prog. Mol. Subcell Biol. 2017;56:285–304. - PubMed

MeSH terms

LinkOut - more resources