Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Jan;21(1):137-45.
doi: 10.1101/gr.111278.110. Epub 2010 Nov 16.

Genome-wide characterization of centromeric satellites from multiple mammalian genomes

Affiliations

Genome-wide characterization of centromeric satellites from multiple mammalian genomes

Can Alkan et al. Genome Res. 2011 Jan.

Abstract

Despite its importance in cell biology and evolution, the centromere has remained the final frontier in genome assembly and annotation due to its complex repeat structure. However, isolation and characterization of the centromeric repeats from newly sequenced species are necessary for a complete understanding of genome evolution and function. In recent years, various genomes have been sequenced, but the characterization of the corresponding centromeric DNA has lagged behind. Here, we present a computational method (RepeatNet) to systematically identify higher-order repeat structures from unassembled whole-genome shotgun sequence and test whether these sequence elements correspond to functional centromeric sequences. We analyzed genome datasets from six species of mammals representing the diversity of the mammalian lineage, namely, horse, dog, elephant, armadillo, opossum, and platypus. We define candidate monomer satellite repeats and demonstrate centromeric localization for five of the six genomes. Our analysis revealed the greatest diversity of centromeric sequences in horse and dog in contrast to elephant and armadillo, which showed high-centromeric sequence homogeneity. We could not isolate centromeric sequences within the platypus genome, suggesting that centromeres in platypus are not enriched in satellite DNA. Our method can be applied to the characterization of thousands of other vertebrate genomes anticipated for sequencing in the near future, providing an important tool for annotation of centromeres.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
The RepeatNet algorithm. (A) The layout of the alphoid repeat array in the centromere and the paired-end inserts in the centromeric region is shown. Note that since the centromere is larger than the inserts (fosmids, plasmids, BACs, or short inserts used in next-generation sequencing), both ends of the same insert contain alphoid sequence. (B) Close-up view of a paired-end insert over the alphoid repeat array. We also show all possible k-mers (sliding by 1 bp) that can be generated from the reads. (C) The ideal case for the k-mer structure in the end sequences. When both ends of a paired-end insert contain alphoid sequence, we expect that the k-mers in the forward end will be represented with their reverse-complement counterparts in the reverse end. For simplicity, we show only the nonoverlapping k-mers; however, RepeatNet considers all possible overlapping k-mers. In this figure, w1-m1, w2-m2, w3-m3, w′1-m′1, w′2-m′2, w′3-m′3, w′′1-m′′1, w′′2-m′′2, w′′3-m′′3 are the k-mer pairs that are reverse complements of each other, and the triplet k-mer groups (w1-w′1-w′′1), (w2-w′2-w′′2), (w3-w′3-w′′3) are highly similar k-mers. In the case of exact repeats, these k-mers are identical. (D) Since k-mer pairs w1-m1, w2-m2, and w3-m3 exist in the same read pairs, we put an edge between the nodes that represent such k-mers. (E) The repeat graph for the ideal case of a 31-mer tandem repeat with exact repeat units is shown. This graph includes 20 vertices for 20 k-mer pairs that can be generated from a 31-mer repeat structure, and there exists an edge between all pairs of k-mers. Note that this graph is a clique of size 20. For non-ideal cases, the clique property will be lost; however, the graph will still be very dense in terms of the average degree of the vertices. RepeatNet finds such dense subgraphs of the repeat graph with a heuristic that selects the vertex with the highest degree, and other vertices that share an edge with this selected vertex. Alternatively, a maximum density subgraph algorithm can be used (Fratkin et al. 2006), though this algorithm has a high running time complexity of O[n.m.log(n2m)].
Figure 2.
Figure 2.
Example of FISH results on ECA metaphase spreads using horse PCR products obtained with primers designed on ECA consensus sequences (Table 1). Partial RepeatNet graph is reported in C showing two different clusters colored in red and green, respectively. ECAcons70 and ECAcons71 were extracted from the red cluster, while ECA1cons421, ECA2cons424, ECA3cons221, ECA4cons450, and ECA5cons451 were obtained from the green cluster. (A) FISH with PCR product of ECAcons70. (B) FISH with PCR product of ECAcons71. (D) FISH with PCR product of ECAcons421 + 424. (E) FISH with PCR product of ECA3cons221. (F) FISH with PCR product of ECA4cons450. (G) FISH with PCR product of ECA5cons451.
Figure 3.
Figure 3.
CENPB box-like motifs extracted from consensus sequences. Conserved bases in the evolutionarily conserved domain (ECD) have been reported in red, and conserved bases compared with human (HSA) other than the ECD domain are reported in blue. The number of total conserved bases is reported in last column. At left, a phylogenetic tree according to Prasad et al. (2008).

Similar articles

Cited by

References

    1. Alexandrov IA, Mitkevich SP, Yurov YB 1988. The phylogeny of human chromosome specific alpha satellites. Chromosoma 96: 443–453 - PubMed
    1. Alexandrov I, Kazakov A, Tumeneva I, Shepelev V, Yurov Y 2001. Alpha-satellite DNA of primates: Old and new families. Chromosoma 110: 253–266 - PubMed
    1. Alkan C, Ventura M, Archidiacono N, Rocchi M, Sahinalp SC, Eichler E.E 2007. Organization and evolution of primate centromeric DNA from whole-genome shotgun sequence data. PLoS Comput Biol 3: 1807–1818 - PMC - PubMed
    1. Alves G, Seuanez HN, Fanning T 1994. Alpha satellite DNA in neotropical primates (Platyrrhini). Chromosoma 103: 262–267 - PubMed
    1. Benson G 1999. Tandem repeats finder: A program to analyze DNA sequences. Nucleic Acids Res 27: 573–580 - PMC - PubMed

Publication types

LinkOut - more resources