Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2023 Jun 13;12(6):849.
doi: 10.3390/biology12060849.

Compositional Structure of the Genome: A Review

Affiliations
Review

Compositional Structure of the Genome: A Review

Pedro Bernaola-Galván et al. Biology (Basel). .

Abstract

As the genome carries the historical information of a species' biotic and environmental interactions, analyzing changes in genome structure over time by using powerful statistical physics methods (such as entropic segmentation algorithms, fluctuation analysis in DNA walks, or measures of compositional complexity) provides valuable insights into genome evolution. Nucleotide frequencies tend to vary along the DNA chain, resulting in a hierarchically patchy chromosome structure with heterogeneities at different length scales that range from a few nucleotides to tens of millions of them. Fluctuation analysis reveals that these compositional structures can be classified into three main categories: (1) short-range heterogeneities (below a few kilobase pairs (Kbp)) primarily attributed to the alternation of coding and noncoding regions, interspersed or tandem repeats densities, etc.; (2) isochores, spanning tens to hundreds of tens of Kbp; and (3) superstructures, reaching sizes of tens of megabase pairs (Mbp) or even larger. The obtained isochore and superstructure coordinates in the first complete T2T human sequence are now shared in a public database. In this way, interested researchers can use T2T isochore data, as well as the annotations for different genome elements, to check a specific hypothesis about genome structure. Similarly to other levels of biological organization, a hierarchical compositional structure is prevalent in the genome. Once the compositional structure of a genome is identified, various measures can be derived to quantify the heterogeneity of such structure. The distribution of segment G+C content has recently been proposed as a new genome signature that proves to be useful for comparing complete genomes. Another meaningful measure is the sequence compositional complexity (SCC), which has been used for genome structure comparisons. Lastly, we review the recent genome comparisons in species of the ancient phylum Cyanobacteria, conducted by phylogenetic regression of SCC against time, which have revealed positive trends towards higher genome complexity. These findings provide the first evidence for a driven progressive evolution of genome compositional structure.

Keywords: DNA compositional structure; evolutionary adaptive trends; hierarchical genome structure; segment compositional signature; sequence compositional complexity.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
T2T human isochores. The image shows the isochore map of the T2T-CHM13 DNA sequence of human chromosome 1, obtained by plotting the isochores predicted by IsoFinder [19] with the help of the UCSC Genome Browser [56,57]. Blue line indicate the GC content of each isochore. The complete chromosome sequence was obtained by the Telomere-to-Telomere (T2T) Consortium [13], which includes gapless assemblies for all chromosomes except Y. The completed regions include all centromeric satellite arrays and recent segmental duplications. Tracks for G+C density in 5-base windows, genes, and CpG islands, taken from the UCSC Genome Browser database, are also plotted for comparison. The online isochore maps for all chromosomes are available at the UCSC Genome Browser: https://genome.ucsc.edu/s/oliver/T2T%20human%20isochores (accessed on 20 April 2023).
Figure 2
Figure 2
Variation in nucleotide composition along T2T human chromosome 22 at scales between 10 Kbp and 4 Mbp, as revealed by wavelets. The two genome superstructures of this chromosome (green on the left and red on the right) are clearly revealed. The finer-grained isochore structure at lower scales is also discernible.
Figure 3
Figure 3
Comparison of isochore and superstructure maps in human T2T chromosome 21 by means of the UCSC Genome Browser. The online image can be observed at the following website: https://genome.ucsc.edu/cgi-bin/hgTracks?db=hub_3267197_GCA_009914755.4&lastVirtModeType=default&lastVirtModeExtraState=&virtModeType=default&virtMode=0&nonVirtPosition=&position=chr21%3A1%2D45090682&hgsid=1583990213_Mq2SxT3AB7CVJP4gui2lAeO3ZljM (accessed on 20 April 2023). The blue lines indicate the GC content of each isochore or superstructure. The orange arrows point to a region of chromosome 21 with low GC content, known as the big ‘gene desert’. The green arrows indicate a region where isochore and superstructure boundaries overlap.
Figure 4
Figure 4
G+C composition histograms of the segments obtained by means of a segmentation algorithm at the s = 0.95 significance level of the complete genomes of three primates: human (a), gorilla (b), and chimpanzee (c); three carnivorous: cat (d), dog (e), and polecat (f); and three rodents: rat (g), mouse (h), and Chinese hamster (i). Note that all histograms in the same row, which correspond to closely related species in terms of evolutionary divergence time (http://www.timetree.org (accessed on 20 April 2023)), look quite similar to each other.

References

    1. Zhou V., Goren A., Bernstein B. Charting Histone Modifications and the Functional Organization of Mammalian Genomes. Nat. Rev. Genet. 2010;12:7–18. doi: 10.1038/nrg2905. - DOI - PubMed
    1. Bernardi G. Structural and Evolutionary Genomics: Natural Selection in Genome Evolution. Elsevier; Amsterdam, The Netherlands: 2004.
    1. Moya A., Oliver J.L., Verdú M., Delaye L., Arnau V., Bernaola-Galván P., de la Fuente R., Díaz W., Gómez-Martín C., González F., et al. Driven Progressive Evolution of Genome Sequence Complexity in Cyanobacteria. Sci. Rep. 2020;10:19073. doi: 10.1038/s41598-020-76014-4. - DOI - PMC - PubMed
    1. Elhaik E., Graur D. A Comparative Study and a Phylogenetic Exploration of the Compositional Architectures of Mammalian Nuclear Genomes. PLoS Comput. Biol. 2014;10:e1003925. doi: 10.1371/journal.pcbi.1003925. - DOI - PMC - PubMed
    1. Thiery J.P., Macaya G., Bernardi G. An Analysis of Eukaryotic Genomes by Density Gradient Centrifugation. J. Mol. Biol. 1976;108:219–235. doi: 10.1016/S0022-2836(76)80104-0. - DOI - PubMed