Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2020 Dec 18;9(12):2714.
doi: 10.3390/cells9122714.

Dark Matter of Primate Genomes: Satellite DNA Repeats and Their Evolutionary Dynamics

Affiliations
Review

Dark Matter of Primate Genomes: Satellite DNA Repeats and Their Evolutionary Dynamics

Syed Farhan Ahmad et al. Cells. .

Abstract

A substantial portion of the primate genome is composed of non-coding regions, so-called "dark matter", which includes an abundance of tandemly repeated sequences called satellite DNA. Collectively known as the satellitome, this genomic component offers exciting evolutionary insights into aspects of primate genome biology that raise new questions and challenge existing paradigms. A complete human reference genome was recently reported with telomere-to-telomere human X chromosome assembly that resolved hundreds of dark regions, encompassing a 3.1 Mb centromeric satellite array that had not been identified previously. With the recent exponential increase in the availability of primate genomes, and the development of modern genomic and bioinformatics tools, extensive growth in our knowledge concerning the structure, function, and evolution of satellite elements is expected. The current state of knowledge on this topic is summarized, highlighting various types of primate-specific satellite repeats to compare their proportions across diverse lineages. Inter- and intraspecific variation of satellite repeats in the primate genome are reviewed. The functional significance of these sequences is discussed by describing how the transcriptional activity of satellite repeats can affect gene expression during different cellular processes. Sex-linked satellites are outlined, together with their respective genomic organization. Mechanisms are proposed whereby satellite repeats might have emerged as novel sequences during different evolutionary phases. Finally, the main challenges that hinder the detection of satellite DNA are outlined and an overview of the latest methodologies to address technological limitations is presented.

Keywords: alpha satellite; centromere; evolution; heterochromatin; non-human primates; tandem repeats.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 2
Figure 2
A comprehensive phylogeny of 301 primate species based on mitochondrial DNA sequences using Bayesian inference. Pie charts for selected common primate species show percentage differences of repeat types in the respective genomes. The abundance of satellite DNA in primate genomes varies considerably among lineages (red colored area of pie charts). Additionally, the comparative repeatomic landscape shows LINEs and SINEs emerged as the most expanded elements of primate genomes (blue- and orange-colored areas of pie charts) with consistent pattern across diverse lineages. Phylogenetic data were retrieved from the Primates Section of the 10kTrees website (https://10ktrees.nunn-lab.org/Primates/dataset.html) [53]. The phylogenetic tree was customized and filtered using iTOL v5 software (Interactive Tree Of Life; https://itol.embl.de/) [54]. Different colors represent different clades. Cartoons of the representative primates were drawn using Inkscape software.
Figure 3
Figure 3
Schematic illustration of satellite DNA repeats and their organization in primate genomes. (a) (i) Primate centromeric (red) and pericentromeric (green) regions are enriched with alpha satellite (AS) DNA as the most abundant satellite repeats of primate genomes and form the bulk of the heterochromatin core. (ii) A sketch highlighting the orthologous chromosomes and centromeric repositioning as evolutionary new centromeres (ENCs) between human and rhesus macaque. The circos plot depicts the syntenic relationship between the two genomes. Circos graphics was plotted using Synteny Portal [117]. Note that human chromosome 6 is completely orthologous to macaque chromosome 4, with evolved centromeres [109,116]. (iii) The AS constitute the tandem repeat units (blue triangles) and can be either organized as disordered arrays (monomeric) mostly located in pericentromeres, or highly ordered in a head-to-tail fashion (HORs) forming longer arrays in centromeres. Some monomers may also have a short sequence termed the CENP-B box (yellow line), which binds the centromeric regions to the DNA-binding proteins. Diverged monomers (orange and dark triangles), and interspersed repeats (purple rectangles) are also depicted. (b) Telomeric and subtelomeric regions of primate chromosomes are enriched with distinct microsatellites (light blue) and minisatellites (dark blue). Various primate-associated satellite examples are shown.
Figure 1
Figure 1
Overview of repeat contents in vertebrates and primate genome size variation. (a) Relationship of genome size (C value; red line) with repeat contents (blue bars) in representative vertebrate organisms. (b) Boxplot of the distribution and variability of genome size among primate families. Each dot represents a primate species. Note: Data are sourced from the Animal Genome Size Database (http://www.genomesize.com/) [14] and graphics were created in R.
Figure 4
Figure 4
Diagrammatic summary of satellite DNA transcription highlighting various functional roles in cellular processes. The most studied transcripts of satellite repeats are those localized in the centromeres. The centromere/pericentromere core contains the active regions with satellite sequences that can be transcribed into satncRNA. These satncRNAs are associated with various functions. For example, during cellular stress, the satncRNA can regulate the expression of important genes, such as HSF1 (Heat Shock Factor 1), to produce nuclear stress bodies. In addition, satncRNA can also regulate the splicing of associated genes that are vital in stress responses. More importantly, satncRNA transcripts have been linked with centromere-related functions and cell-cycle progression. During the G1 phase, the satncRNA can facilitate the loading of CENP-A (yellow circle) at centromeres, which is distributed to every daughter strand in the S phase. In the G2 phase, the satncRNA transcripts form associations with SUV39H1 (purple pentagon) before initiation of cell division. During mitosis, satncRNA binds with SGO1 and AURORA B proteins, and assists in kinetochore assembly, spindle attachment, and chromosome segregation-related functions.
Figure 5
Figure 5
Model of satellite DNA evolution. Genomic birth of satellite DNA can occur as a result of different mechanisms. According to the Library hypothesis, the two main proposed phenomena are DNA replication slippage and unequal crossing over, which can cause mutations and de novo formation of satellite DNA (variants shown as red triangles). The newly formed satellite region can undergo several duplications and subsequent transposition events that expand the new satellite throughout the genome. Transposable elements mediate the spreading of newly evolved satellite repeats to different loci. This is followed by cohesive evolution of the genomic region to homogenize the entire array through selection. Finally, the evolved satellite repeats are established by sexual reproduction. The chromosome region with expanded satellite is inherited preferentially via a molecular mechanism known as “drive”. Two homologs have the same satellite DNA (red) but with a larger centromere, and one homolog with expanded satellites is attracted by spindle fibers and driven to the daughter cells.
Figure 6
Figure 6
Schematic illustration of genome assembly limitations and new next-generation sequencing approaches for satellite DNA analysis. (a) Centromere region of the human chromosome that could not be assembled using short-read sequencing and is not represented in primary human assemblies. The assembly algorithms cannot be used for short reads (red lines) of the centromere owing to high-level reiteration of tandem repeats (black triangles) and therefore could not be recovered in the assembly. The fragmented assembly may contain gaps, thereby missing satellite DNA sequences causing bias to genome annotation. (b) A cheap alternative to analyze the satellite directly is the development of clustering-based pipelines, which can graphically predict different repeats and cluster them into groups. These programs yield assembled contigs, which can be further utilized for downstream analyses, such as repeats abundance, divergence, and comparative genomic analysis. (c) Recent developments (ultra-long-read technology) have successfully recovered the complete human genome assembly [279].

References

    1. Rogers J., Gibbs R.A. Comparative primate genomics: Emerging patterns of genome content and dynamics. Nat. Rev. Genet. 2014;15:347–359. doi: 10.1038/nrg3707. - DOI - PMC - PubMed
    1. Enard W., Pääbo S. Comparative primate genomics. Annu. Rev. Genom. Hum. Genet. 2004;5:351–378. doi: 10.1146/annurev.genom.5.061903.180040. - DOI - PubMed
    1. Mikkelsen T.S., Hillier L.W., Eichler E.E., Zody M.C., Jaffe D.B., Yang S.P., Enard W., Hellmann I., Lindblad-Toh K., Altheide T.K., et al. Initial sequence of the chimpanzee genome and comparison with the human genome. Nature. 2005;437:69–87. doi: 10.1038/nature04072. - DOI - PubMed
    1. Gibbs R.A., Rogers J., Katze M.G., Bumgarner R., Weinstock G.M., Mardis E.R., Remington K.A., Strausberg R.L., Venter J.C., Wilson R.K., et al. Evolutionary and biomedical insights from the rhesus macaque genome. Science. 2007;316:222–234. doi: 10.1126/science.1139247. - DOI - PubMed
    1. Alföldi J., Lindblad-Toh K. Comparative genomics as a tool to understand evolution and disease. Genome Res. 2013;23:1063–1068. doi: 10.1101/gr.157503.113. - DOI - PMC - PubMed

Publication types

LinkOut - more resources