Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Oct 3;15(1):34482.
doi: 10.1038/s41598-025-22383-7.

Widely-distributed freshwater microorganisms with streamlined genomes co-occur in cohorts with high abundance

Affiliations

Widely-distributed freshwater microorganisms with streamlined genomes co-occur in cohorts with high abundance

Alejandro Rodríguez-Gijón et al. Sci Rep. .

Abstract

Genome size is known to reflect the eco-evolutionary history of prokaryotic species, including their lifestyle, environmental preferences, and habitat breadth. However, it remains uncertain how strongly genome size is linked to prokaryotic prevalence, relative abundance and co-occurrence. To address this gap, we present a systematic and global-scale evaluation of the relationship between genome size, relative abundance and prevalence in freshwater ecosystems. Our study includes 80,561 medium-to-high quality genomes, from which we identified 9,028 species (ANI > 95%) present in a manually curated dataset of 636 freshwater metagenomes. Our results show that prokaryotes with reduced genomes exhibited higher prevalence and relative abundance, suggesting that genome streamlining may promote cosmopolitanism. Furthermore, network analyses revealed that the most prevalent prokaryotes have streamlined genomes that are found in co-occurrent cohorts potentially sustained by metabolic dependencies. Overall, species in these groups possess a diminished capacity for synthesizing different essential metabolites such as vitamins, amino acids and nucleotides, potentially fostering metabolic complementarities within the community. Moreover, we found the presence of the essential biosynthetic functions to be usage-dependent: nucleotide and amino acids biosynthesis are the most complete, whereas vitamin biosynthesis is most incomplete. Our results underscore genome streamlining as a central eco-evolutionary strategy that both shapes and is shaped by community dynamics, ultimately fostering interdependences among prokaryotes.

Keywords: Archaea; Bacteria; Cohorts; Comparative genomics; Freshwater; Genome size; Prevalence.

PubMed Disclaimer

Conflict of interest statement

Declarations. Competing interests: The authors declare no competing interests.

Figures

Fig. 1
Fig. 1
Overview of the relationship between estimated genome size (Mbp), prevalence (%, over 636 freshwater metagenomes), and average relative abundance (%) across the 9,028 species-clusters (ANI > 95%) representative genomes of the FRESH-MAP database. A shows the relationship between the estimated genome size of major phyla. Numbers next to boxes indicate the number of species-clusters per phylum. B shows the relationship between estimated genome size and prevalence. C compares prevalence between phyla. D shows the relationship between estimated genome size and average relative abundance. E compares average relative abundances between phyla. Different letters in A, C and E indicate statistical differences (p < 0.05; Kruskal–Wallis non-parametric test corrected with Benjamini-Hochberg) between phyla. Different colors in A-E indicate different phyla according to the legend at the top-right of the figure.
Fig. 2
Fig. 2
Overview of the co-occurrence network and analyses. A shows the 1,202 species-clusters representative genomes (nodes) included in the co-occurrence network and the connections (edges, grey) between them (rSparCC > 0.4, p-value < 0.05). Different colors denote different co-occurrence cohorts as it can be inferred from B. B shows the preferred environmental conditions for each cohort, where red indicates estimations above the baseline and blue below the baseline. The preferred environmental condition is calculated as the weighted average of relative abundances of each cohort in each sample for each environmental parameter (absolute latitude, temperature and oxygen). C shows the relation between the prevalence (%, over 636 freshwater metagenomes) and average relative abundance (%). Datapoints in black correspond to species-clusters not included in the co-occurrence network, and datapoints with different colors refer to different cohorts as indicated in the subplot. The subplot also compares the residuals of each cohort and those species-clusters out of the co-occurrence network, and includes the number of species-clusters per cohort. D shows the correlation between prevalence and the degree of connectedness (number of edges) within major cohorts (i.e., with more than 200 species-clusters) for each species-cluster. The subplot in D compares residuals of the linear regression for each cohort. E and F compare the average estimated genome size (Mbp) and the coding density (%) between major cohorts, respectively. Different letters in C-D (subplots) and E–F indicate statistical differences (p < 0.05; Kruskal–Wallis non-parametric test corrected with Benjamini-Hochberg) between cohorts.
Fig. 3
Fig. 3
Exploration of the relationship between biosynthetic potential to produce essential metabolites and the estimated genome size (Mbp). A-C show the relationship between estimated genome size and average pathway completeness (%) for different KEGG modules across all 4,725 high-quality representative genomes (completeness > 90% and contamination < 5%)) from the FRESH-MAP database. KEGG modules include biosynthesis of amino acids (A), nucleotides (B) and vitamins (C). In A-C, ‘n’ indicates the number of modules per category, and the different colors indicate different phyla according to the legend at the bottom of the figure.
Fig. 4
Fig. 4
Overview of module completeness (%; rows in the heatmap) for biosynthesis of amino acids, nucleotides, and vitamins across the species-clusters (columns) in cohort 3. Module completeness is colored in yellow between 0 and 30%, green between 30 and 70%, light blues between 70 and 100%, and dark blue for 100%. We include information on average relative abundance (%), prevalence (%), estimated genome size (Mbp), and genome completeness (%), according to the legend to the right of the figure. Overviews for cohorts 1, 2 and 6 can be found in Figures S12-S14.
Fig. 5
Fig. 5
Overview of the relationship between estimated genome size and the genetic potential for catabolic and structural functions, expressed as the number of KEGG orthologs (KOs) per Mbp, across all 4,725 high-quality representative genomes (completeness > 90% and contamination < 5%) from the FRESH-MAP database. Analyzed functions include sigma factors (A), two-component systems (B), flagella (C), nitrogen cycle (D), sulfur cycle (E), and carbon fixation (F). On the top-right of each panel indicates the total number of KOs per category. Regular linear regressions refer to all datapoints (i.e., all genomes), and the dashed linear regressions exclude those datapoints where 0 KOs per Mbp for that function were detected. Different colors in A-F refer to different phyla according to the legend at the bottom of the figure.

References

    1. Maistrenko, O. M. et al. Disentangling the impact of environmental and phylogenetic constraints on prokaryotic within-species diversity. ISME J.14, 1247–1259 (2020). - DOI - PMC - PubMed
    1. Martinez-Gutierrez, C. A. & Aylward, F. O. Genome size distributions in bacteria and archaea are strongly linked to evolutionary history at broad phylogenetic scales. PLoS Genet.18, e1010220 (2022). - DOI - PMC - PubMed
    1. Konstantinidis, K. T. & Tiedje, J. M. Trends between gene content and genome size in prokaryotic species with larger genomes. Proc. Natl. Acad. Sci.101, 3160–3165 (2004). - DOI - PMC - PubMed
    1. Hug, L. A. et al. A new view of the tree of life. Nat. Microbiol.1, 16048 (2016). - DOI - PubMed
    1. Rodríguez-Gijón, A. et al. A genomic perspective across Earth’s microbiomes reveals that genome size in Archaea and Bacteria is linked to ecosystem type and trophic strategy. Front. Microbiol.12, 761869 (2022). - DOI - PMC - PubMed

LinkOut - more resources