Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2020 Sep 15;394(2):112127.
doi: 10.1016/j.yexcr.2020.112127. Epub 2020 Jun 3.

Centromere studies in the era of 'telomere-to-telomere' genomics

Affiliations
Review

Centromere studies in the era of 'telomere-to-telomere' genomics

Karen H Miga. Exp Cell Res. .

Abstract

We are entering into an exciting era of genomics where truly complete, high-quality assemblies of human chromosomes are available end-to-end, or from 'telomere-to-telomere' (T2T). This technological advance offers a new opportunity to include endogenous human centromeric regions in high-resolution, sequence-based studies. These emerging reference maps are expected to reveal a new functional landscape in the human genome, where centromere proteins, transcriptional regulation, and spatial organization can be examined with base-level resolution across different stages of development and disease. Such studies will depend on innovative assembly methods of extremely long tandem repeats (ETRs), or satellite DNAs, paired with the development of new, orthogonal validation methods to ensure accuracy and completeness. This review reflects the progress in centromere genomics, credited by recent advancements in long-read sequencing and assembly methods. In doing so, I will discuss the challenges that remain and the promise for a new period of scientific discovery for satellite DNA biology and centromere function.

Keywords: Centromere; Genomics; Long-read assembly; Satellite DNA; Telomere-to-telomere.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.. Genomic organization of human centromeric regions.
(a) The assembled human centromeric regions on chromosome 7 (HiCanu contig: tig00000794:44788681–50796794) is characterized by RepeatMasker [112] to be defined by long tracts of composed of alpha satellite DNA (ALR/Alpha; shown in orange) with interspersed repeats (green) and pericentromeric satellites (blue). (b) Alpha satellite ~171 bp monomers (shown as white arrows) are organized into a multi-monomeric repeat unit, or higher order repeat (HOR), with the orientation of the repeats indicated. Chromosome 7 has two HOR arrays: D7Z2 (16-mer HOR) and D7Z1 (6-mer HOR) [38]. The D7Z1 and D7Z2 array sizes are within the expected range as determined by PFGE Southerns. The D7Z1 has been previously determined to have CENP-A enrichment (shown in red, live or active array) and D7Z2 array, shown in grey is typically inactive, or not expected to be bound to CENP-A [42] The genomic region between the D7Z1 and D7Z2 array (~700 kb) is concordant with previous physical mapping data for this region [39]. (c) Sequences found flanking HOR arrays are typically pericentromeric satellites, shown in blue (HSat), segmental duplications, and monomeric (divergent 171 bp alpha satellite) with interspersed transposable elements.
Fig. 2.
Fig. 2.. Assembly of extra-long tandem repeats (ETRs).
(a) Long tandem repeats (shown as blue arrows) are flanked by unique sequences (indicated in green and orange to mark regions upstream and downstream of the repeat array). These transitions are confidently detected in long-error prone ultra-long reads, and anchoring reads that can fully traverse the repeat region bypasses the need for assembly. Rather, one can derive a consensus of the underlying repeat region. Alternatively, short (often single nucleotide) unique markers within the repeat units are sufficient to distinguish copies of the repeat and lead to assembly of the array using mid-length reads (here defined as a read that is less than the length of the array and incapable of spanning these regions completely). Assembly using these rare, single nucleotide markers requires extraordinary quality, like CCS data from PacBio, to minimize overlaps due to sequencing errors. (b) A structural variant (SV) based assembly strategy shown labels each HOR as canonical (blue), with the number of uninterrupted canonical repeats (e.g. 16x HOR) indicated. Rearranged HOR structures, or structural variants (SV) are indicated as colored circles. Focusing on the first 18 repeats to illustrate the repeat heterogeneity in the array that can be used to guide assemblies of extremely high quality mid-length reads. The challenge is traversing arrays where the spacing between unique markers is longer than the length of the read (e.g. shown in read a spacing greater than 20 kb, and indicated as a break in the assembly above in grey shading). The SV-based assembly method uses the spacing and organization of HOR rearrangements (colored circles) in array-assigned ultra-long reads. Overlap between SV-maps in ultra-long read data results in a repeat contigs with improved sequence quality by consensus.
Fig. 3.
Fig. 3.. High-resolution genomic study of the CHM13 T2T DXZ1 centromeric array.
(a) The CHM13 DXZ1 is defined by 3.1 megabases of alpha satellite (shown as orange band, which is interrupted once with an L1Hs LINE insertion (green, closest to q-arm). The DXZ1 repeats are orientation from q-arm to p-arm (relative to the published BamHI DXZ1 repeat (GenBank: X02418)), with no shifts in repeat direction [21]. The ~2 kbp canonical repeat is shown as grey and the position of structural variants, or rearrangements (insertion/deletions) are noted with color. SVs that have shared repeat structure are connected with a line, or an arc. DXZ1 canonical HOR have four active CENP-B boxes. Each HOR in the array was colored based on the number of active CENP-B boxes: light grey (4/4), teal (3/4), blue (2/4), purple (1/4), and dark purple (0/4). HORs with less than two active CENP-B boxes are < 3% of the array and cannot be detected by eye at the resolution of the entire array. The plot of methylation data (obtained from nanopolish [100] from nanopore alignment and signal data) demonstrates a drop in methylation in the middle of the array. (b) Repeat variation patterns relative to the canonical 12-mer HOR, with blue circles mapping the sites of CENP-B boxes. SV HOR structures are colored to match the SV annotation in panel (a). Dashed lines mark sites of deletion. The LIHs/LINE element insertion is indicated as an inserted orange bar. Event numbers reference the occurence of each SV in the CHM13 DXZ1 array. A consensus sequence was derived from the 1537 HORs. Pairwise alignments with each HOR with the derived consensus was used to generate a database of nucleotide differences and positions. The low-frequency variants (< 10% of the array) are shown in black. Light blue is used to show data in regions that span the 17-bp CENP-B box. Stars over the CENP-B boxes in the two A repeats indicate that the variant is high-frequency (greater than 10% of the HORs) and modifies one of the 9 conserved, functional bases in the motif. Red peaks show the remaining 35 high-frequency (37 total with two peaks in the CENP-B boxes) sites, or regions that differ from the consensus sequence in more than 10% of the HOR repeats. (c) Pairwise identity between the ordered 1537 HORs in the 37 high-frequency variant positions is shown (using heatmap function in R), large similarity domains are defined manually into two groups: A/A’ and B. The SV-annotation and CENP-B status data are positioned on either side of the matrix for genomic context. The array similarity matrix data is reverse complemented to the array in (a) to match the orientation of the canonical published repeat.

Similar articles

Cited by

References

    1. Yunis JJ, Yasmineh WG, Heterochromatin, satellite DNA, and cell function, Science 174 (1971) 1200–1209. - PubMed
    1. Vafa O, Sullivan KF, Chromatin containing CENP-A and α-satellite DNA is a major component of the inner kinetochore plate, Curr. Biol 7 (1997) 897–900. - PubMed
    1. Pardue ML, Gall JG, Chromosomal localization of mouse satellite DNA, Science 168 (1970) 1356–1358. - PubMed
    1. Henikoff S, Ahmad K, Malik HS, The centromere paradox: stable inheritance with rapidly evolving DNA, Science 293 (2001) 1098–1102. - PubMed
    1. Melters DP, Bradnam KR, Young HA, Telis N, May MR, Ruby JG, Sebra R, Peluso P, Eid J, Rank D, Garcia JF, DeRisi JL, Smith T, Tobias C, Ross-Ibarra J, Korf I, Chan SWL, Comparative analysis of tandem repeats from hundreds of species reveals unique insights into centromere evolution, Genome Biol. 14 (2013) R10. - PMC - PubMed

Publication types