Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 May 23;18(5):e1010220.
doi: 10.1371/journal.pgen.1010220. eCollection 2022 May.

Genome size distributions in bacteria and archaea are strongly linked to evolutionary history at broad phylogenetic scales

Affiliations

Genome size distributions in bacteria and archaea are strongly linked to evolutionary history at broad phylogenetic scales

Carolina A Martinez-Gutierrez et al. PLoS Genet. .

Abstract

The evolutionary forces that determine genome size in bacteria and archaea have been the subject of intense debate over the last few decades. Although the preferential loss of genes observed in prokaryotes is explained through the deletional bias, factors promoting and preventing the fixation of such gene losses often remain unclear. Importantly, statistical analyses on this topic typically do not consider the potential bias introduced by the shared ancestry of many lineages, which is critical when using species as data points because of the potential dependence on residuals. In this study, we investigated the genome size distributions across a broad diversity of bacteria and archaea to evaluate if this trait is phylogenetically conserved at broad phylogenetic scales. After model fit, Pagel's lambda indicated a strong phylogenetic signal in genome size data, suggesting that the diversification of this trait is influenced by shared evolutionary histories. We used a phylogenetic generalized least-squares analysis (PGLS) to test whether phylogeny influences the predictability of genome size from dN/dS ratios and 16S copy number, two variables that have been previously linked to genome size. These results confirm that failure to account for evolutionary history can lead to biased interpretations of genome size predictors. Overall, our results indicate that although bacteria and archaea can rapidly gain and lose genetic material through gene transfers and deletions, respectively, phylogenetic signal for genome size distributions can still be recovered at broad phylogenetic scales that should be taken into account when inferring the drivers of genome size evolution.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1
A) Distribution of genome size within bacteria and archaea taxonomic groups at the phylum level. First and third quantiles, as well as median are shown for each phylum distribution. B) Relationship between mean genome size and genome size variance for each genus cluster. Abbreviations: TDS = Thermotogota, Deinococcota, and Synergistota. Raw data for genome size can be found in S2 Data.
Fig 2
Fig 2. Genome size distribution across the Tree of Life of bacteria and archaea using one representative genome for each genus.
Phylogenetic tree was built using a concatenated alignment of ribosomal and RNA polymerase sequences through a maximum likelihood approach and the substitution model LG+R10. Abbreviations: TACK = Thaumarchaeota, Aigarchaeota, Crenarchaeota and Korarchaeota; TDS = Thermotogota, Deinococcota, and Synergistota; AMND = Acidobacteriota, Methylomirabilota, Nitrospirota, Deferribacterota. Raw data for genome size can be found in S2 Data.
Fig 3
Fig 3. Relationship between genome size and genomic traits for bacteria and archaea using one representative genome for each genus.
A) Regression line of the relationship between genome size and dN/dS ratio before (dashed line) and after (solid line) taking phylogenetic relationships into account through the Pagel’s model. B) Regression line of the relationship between genome size and 16S rRNA copies before (dashed line) and after (solid line) taking phylogenetic relationships into account through the Brownian Motion model. Parameters of the regression equation for both relationships can be found in Table 2. Raw data can be found in S2 Data.
Fig 4
Fig 4. Relationship between genome size and dN/dS.
dN/dS values represent the median estimate for each genus cluster. Dots represent a representative genome for each genus and size is equivalent to the number of 16S rRNA gene copies. Raw data can be found in S2 Data.

References

    1. Mira A, Ochman H, Moran NA. Deletional bias and the evolution of bacterial genomes. Trends Genet. 2001;17: 589–596. doi: 10.1016/s0168-9525(01)02447-7 - DOI - PubMed
    1. Lynch M. Streamlining and simplification of microbial genome architecture. Annu Rev Microbiol. 2006;60: 327–349. doi: 10.1146/annurev.micro.60.080805.142300 - DOI - PubMed
    1. Koonin EV. Evolution of genome architecture. Int J Biochem Cell Biol. 2009;41: 298–306. doi: 10.1016/j.biocel.2008.09.015 - DOI - PMC - PubMed
    1. Lawrence JG, Hendrix RW, Casjens S. Where are the pseudogenes in bacterial genomes? Trends Microbiol. 2001;9: 535–540. doi: 10.1016/s0966-842x(01)02198-9 - DOI - PubMed
    1. Wolf YI, Aravind L, Grishin NV, Koonin EV. Evolution of aminoacyl-tRNA synthetases—analysis of unique domain architectures and phylogenetic trees reveals a complex history of horizontal gene transfer events. Genome Res. 1999;9: 689–710. - PubMed

Publication types