Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Jan 30;14(1):R10.
doi: 10.1186/gb-2013-14-1-r10.

Comparative analysis of tandem repeats from hundreds of species reveals unique insights into centromere evolution

Comparative analysis of tandem repeats from hundreds of species reveals unique insights into centromere evolution

Daniël P Melters et al. Genome Biol. .

Abstract

Background: Centromeres are essential for chromosome segregation, yet their DNA sequences evolve rapidly. In most animals and plants that have been studied, centromeres contain megabase-scale arrays of tandem repeats. Despite their importance, very little is known about the degree to which centromere tandem repeats share common properties between different species across different phyla. We used bioinformatic methods to identify high-copy tandem repeats from 282 species using publicly available genomic sequence and our own data.

Results: Our methods are compatible with all current sequencing technologies. Long Pacific Biosciences sequence reads allowed us to find tandem repeat monomers up to 1,419 bp. We assumed that the most abundant tandem repeat is the centromere DNA, which was true for most species whose centromeres have been previously characterized, suggesting this is a general property of genomes. High-copy centromere tandem repeats were found in almost all animal and plant genomes, but repeat monomers were highly variable in sequence composition and length. Furthermore, phylogenetic analysis of sequence homology showed little evidence of sequence conservation beyond approximately 50 million years of divergence. We find that despite an overall lack of sequence conservation, centromere tandem repeats from diverse species showed similar modes of evolution.

Conclusions: While centromere position in most eukaryotes is epigenetically determined, our results indicate that tandem repeats are highly prevalent at centromeres of both animal and plant genomes. This suggests a functional role for such repeats, perhaps in promoting concerted evolution of centromere DNA across chromosomes.

PubMed Disclaimer

Figures

Figure 1
Figure 1
A bioinformatic pipeline to identify candidate centromere DNAs based on their tandem repeat nature and abundance. (a) Random shotgun sequences from a variety of platforms can be used to identify the most common tandem repeat monomer. Sanger and PacBio reads are usually long enough to contain multiple copies of a tandem repeat. Illumina and 454 reads are generally too short, and must be assembled to create longer sequences. Tandem repeat monomers were identified by Tandem Repeats Finder (TRF). (b) Identification of known centromere tandem repeats from three species. The human centromere repeat is 171 bp in length. The 728-bp monkeyflower centromere repeat is too long to be found in Sanger reads, but a PRICE assembly of Illumina reads reveals the known repeat. The 1,419-bp cattle centromere repeat and a less abundant 680-bp tandem repeat were directly identified from PacBio reads. Note that the graph for monkeyflower has no background of low abundance tandem repeats because these were not assembled by PRICE. (c) Three examples of de novo identification of centromere tandem repeats. Sanger WGS reads from the American pika, Hydra, and Colorado Blue Columbine revealed 253-bp, 183-bp, and 329-bp repeat monomers, respectively. nt, nucleotides.
Figure 2
Figure 2
Centromere tandem repeat details from diverse animal and plant genomes. The phylogenetic relationships between 282 species (204 Animalia and 78 Plantae) are shown. For each species, the figure shows tandem repeat length, GC content, and genomic fraction (log 2 scale) for the (candidate) centromere repeat monomer. Taxonomic relationships were derived from the NCBI taxonomy website. Approximately one-third of the species (84 out of 282) could be clustered into 26 groups (light red horizontal bars) that exhibited sequence similarity of the tandem repeat monomer within each group. No sequence similarity was found outside these groups, or between them. The most distantly related species within a group diverged about 50 million years ago.
Figure 3
Figure 3
Evolution by indel acquisition and coexistence of repeat variants support the 'library' hypothesis. (a) Candidate centromere repeat sequences of eight cichlids were analyzed for interspecies sequence similarity. The Princess cichlid Neolamprologus brichardi lacked centromere repeat similarity with its sister clade of Lake Malawi cichlids (shown in orange, and also including Nile tilapia). (b) Sequence alignment of candidate centromere repeats shows that Nile tilapia (Oreochromis niloticus) has a deletion relative to other cichlid species. (c) Candidate centromere repeat sequences of 15 grass species were analyzed for interspecies sequence similarity. We found two groups of species with centromere repeat sequences that were similar. The closely related Sorghum and Miscanthus species have similar 137 bp repeats (blue bars). The clade shown by red bars contains Oryza sativa (rice), which is relatively distant from the other species that have similar centromere tandem repeats (red bars). Although the centromere repeats of Oryza brachyantha and Brachypodium distachyon have repeat monomer length similar to the orange-highlighted group, no sequence similarity was found between them. Interestingly, no sequence similarity was found between the closely related Zea species and Sorghum species or between Oryza species and Brachypodium, Aegilops, or Hordeum. (d) Sequence alignment of candidate centromere repeats from eight grass species. Switchgrass (Panicum virgatum) is distinguished by the presence of a short insertion relative to the other species.
Figure 4
Figure 4
Centromere tandem repeat monomers are conserved only between closely related species. (a) Percentage identity between candidate centromere repeat sequences plotted against estimated divergence time. We averaged percentage identity between comparisons to generate a single value for each node in the phylogenetic tree (Figure 2). To accommodate unresolved relationships, we repeated the analysis on random resolutions of the tree. One such analysis is shown (quantitative results were very similar between analyses). (b) For primates and grasses, the phylogenetic signal was tested using Blomberg's K analysis for three different parameters: repeat monomer length, repeat monomer GC content and genomic abundance. In primates both repeat length and GC content were more conserved than expected (K > 1), whereas genomic abundance was less conserved than expected by a model of Brownian evolution (K < 1). Though K < 1 for all three traits in the grasses, none were significantly different from 1. P-values are shown in brackets.
Figure 5
Figure 5
Centromere tandem repeats lack conserved sequence properties. (a-c) No strong bias was observed in distribution of centromere repeat monomer length (a), GC content (b), or genomic fraction (c).
Figure 6
Figure 6
Higher order repeat structures are prevalent in diverse animals and plants. (a) Graphical representation of higher order repeat structure compared to simple monomer repeats. In the higher order repeat, two variants, A and B, form a single dimer repeat that is repeated in tandem. When plotting repeat monomer length by GC content by genomic fraction, two distinct peaks are seen for Sorghum bicolor. The second peak (2) is exactly double the length of the first peak (1). (b) Sequence alignment of repeat units from a single Sorghum bicolor Sanger read that exhibits a higher order repeat structure consisting of an AB dimer. The arrows point to SNPs unique for either the A or B repeat of the dimer. (c) Neighbor joining analysis showing grouping of A and B repeats from sequence alignment in B. Bootstrap numbers are shown. (d) Higher order repeat structures can lead to novel centromere repeats. In New World monkeys, the two halves of the 343-bp monomer are weakly related to each other and to the 171-bp repeat in Old World monkeys and apes.
Figure 7
Figure 7
Chromosomal localization of repeat variants in grasses is consistent with repeat abundance measured by our bioinformatic pipeline. Chromosomal localization of the different grass repeat variants (maize variant A, switchgrass variants B1 and B2, witchgrass variant C, and foxtail millet variant D) was determined by FISH on metaphase chromosomes of maize (Zea mays), switchgrass (Panicum virgatum), witchgrass (Panicum capillare), and foxtail millet (Setaria italica). Switchgrass variants B1 and B2 differ by a 9-bp deletion, whereas both variants differ from maize, witchgrass and foxtail millet by a 20-bp insertion. Maize and foxtail millet chromosomes hybridized only to variants A and D, respectively. Only one switchgrass chromosome hybridized to variant A (arrow), but variants B1, B2 and C labeled most chromosomes (arrowheads indicate chromosomes that showed weaker hybridization to variant C). Witchgrass chromosomes were most consistently labeled by variant C, but showed chromosome-specific hybridization to variants B1 and B2, consistent with their lower abundance in the genome. In all cases the FISH probes hybridized to the primary constriction, which is indicative of centromere localization. The percentages below the panels represent computational predictions of repeat variant ratios in each species.
Figure 8
Figure 8
Pacific Biosciences sequencing shows homogeneity of repeat arrays and detects long higher order repeat structures. (a) Switchgrass variant B1 hybridized to all switchgrass chromosomes, whereas witchgrass variant C hybridized to all but three switchgrass chromosomes. The three chromosomes that only showed hybridization of variant B1 (arrows) were stained green (see merged). (b) Although both switchgrass variants B1 and B2 co-hybridize to all switchgrass chromosomes, the hybridization signal showed a chromosome-specific pattern. The arrows highlight chromosomes with stronger hybridization signal for one sub-variant over the other. (c) The strength of PacBio sequencing is the extreme length of a small fraction of the reads. In the AP13 switchgrass PacBio sequencing run, the longest inserted sequence was almost 12 kbp in length, although the mean of all the PacBio reads was about 2 kbp. Sanger reads are shorter, but have a more consistent length, whereas both Illumina and 454 reads are very short and very homogeneous in length (longest reads in our study only shown). (d) Although no repeat variant mixing was detected in the PacBio reads, several HOR structures were found in longer PacBio reads. These HOR structures consisted of a mixture of complete and trunctated repeats. Two switchgrass variant B1 centromere reads with higher order structure and one switchgrass variant B2 centromere repeat are shown. The 1,131-bp HOR structure consisted of six repeat monomers and a truncated repeat (about one-third the size of 175 bp repeat). In total, five-and-half copies of the 1,131-bp repeat were found within the 7 kbp read. One variant B2-containing read is shown, containing three copies of a 886-bp HOR structure (composed of six 166-bp repeats).

References

    1. Stoler S, Keith KC, Curnick KE, Fitzgerald-Hayes M. A mutation in CSE4, an essential gene encoding a novel chromatin-associated protein in yeast, causes chromosome nondisjunction and cell cycle arrest at mitosis. Genes Dev. 1995;14:573–586. doi: 10.1101/gad.9.5.573. - DOI - PubMed
    1. Shang WH, Hori T, Toyoda A, Kato J, Popendorf K, Sakakibara Y, Fujiyama A, Fukagawa T. Chickens possess centromeres with both extended tandem repeats and short non-tandem-repetitive sequences. Genome Res. 2010;14:1219–1228. doi: 10.1101/gr.106245.110. - DOI - PMC - PubMed
    1. Talbert PB, Bryson TD, Henikoff S. Adaptive evolution of centromere proteins in plants and animals. J Biol. 2004;14:18. doi: 10.1186/jbiol11. - DOI - PMC - PubMed
    1. Henikoff S, Ahmad K, Malik HS. The centromere paradox: stable inheritance with rapidly evolving DNA. Science. 2001;14:1098–1102. doi: 10.1126/science.1062939. - DOI - PubMed
    1. Meraldi P, McAinsh AD, Rheinbay E, Sorger PK. Phylogenetic and structural analysis of centromeric DNA and kinetochore proteins. Genome Biol. 2006;14:R23. doi: 10.1186/gb-2006-7-3-r23. - DOI - PMC - PubMed

Publication types