Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 May;14(5):1247-1259.
doi: 10.1038/s41396-020-0600-z. Epub 2020 Feb 11.

Disentangling the impact of environmental and phylogenetic constraints on prokaryotic within-species diversity

Affiliations

Disentangling the impact of environmental and phylogenetic constraints on prokaryotic within-species diversity

Oleksandr M Maistrenko et al. ISME J. 2020 May.

Abstract

Microbial organisms inhabit virtually all environments and encompass a vast biological diversity. The pangenome concept aims to facilitate an understanding of diversity within defined phylogenetic groups. Hence, pangenomes are increasingly used to characterize the strain diversity of prokaryotic species. To understand the interdependence of pangenome features (such as the number of core and accessory genes) and to study the impact of environmental and phylogenetic constraints on the evolution of conspecific strains, we computed pangenomes for 155 phylogenetically diverse species (from ten phyla) using 7,000 high-quality genomes to each of which the respective habitats were assigned. Species habitat ubiquity was associated with several pangenome features. In particular, core-genome size was more important for ubiquity than accessory genome size. In general, environmental preferences had a stronger impact on pangenome evolution than phylogenetic inertia. Environmental preferences explained up to 49% of the variance for pangenome features, compared with 18% by phylogenetic inertia. This observation was robust when the dataset was extended to 10,100 species (59 phyla). The importance of environmental preferences was further accentuated by convergent evolution of pangenome features in a given habitat type across different phylogenetic clades. For example, the soil environment promotes expansion of pangenome size, while host-associated habitats lead to its reduction. Taken together, we explored the global principles of pangenome evolution, quantified the influence of habitat, and phylogenetic inertia on the evolution of pangenomes and identified criteria governing species ubiquity and habitat specificity.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no conflict of interest.

Figures

Fig. 1
Fig. 1. Study design.
We used the proGenomes database version 1 [32] of high-quality genomes to compute pangenomes (using the Roary pipeline) and pangenome features. Species were assigned to their preferred habitats using three databases: PATRIC, Microbe Atlas Project, and Global Microbial Gene Catalog (see Methods). As many pangenome features are interdependent (covariates) or affected by sampling bias, we used a multivariate analysis framework to disentangle habitat properties from phylogenetic inertia. This allows for the quantification of environmental and phylogenetic factors that impact diversity within species. To construct the phylogenetic tree, we used the concatenated protein sequences of 40 conserved universal marker genes which were aligned using the ClustalOmega aligner (default parameters). The tree was constructed using FastTree2 (JTT model) [52].
Fig. 2
Fig. 2. Relationship between different pangenome features.
a Correlation matrix between (I) the number of conspecific genomes used to estimate pangenome features, (II) 21 pangenome features, (III) the ubiquity of species as an environmental feature computed from habitat preference of strains, and (IV) major habitat groups from the Microbial Atlas project. The heatmap visualizes Spearman Rho values for correlations between sample size (I), 21 pangenome features (II), and species ubiquity (III). Four major habitats (aquatic, animal host, plant host, soil (IV)) were correlated to the (I) number of conspecific genomes, (II) pangenome features, and (III) ubiquity via point-biserial correlation. Statistical significance of correlations was determined using adjusted p values (using Benjamin-Hochberg correction) <0.05. b Clustering of a subset of nine pangenome features based on their pairwise correlation strengths. Horizontal stacked charts present amount of variance explained by various predictors (number of genomes, phylogeny, and habitat represented by their principal components (PCs), and genome size or diversity). The first set of stacked charts (“no correction”) shows variance explained in pangenome features by the number of genomes used to compute pangenome features as well as species’ phylogeny and habitat preferences; the second and the third sets of stacked charts represent the amount of variance explained (see “Methods”) by the same set of predictors when correcting for genome size or nucleotide diversity in core-genome respectively. Size and diversity estimates form distinct feature groups.
Fig. 3
Fig. 3. Effect of ubiquity on core-genome size and functional content.
a Species ubiquity (number of habitats a species was assigned to), a habitat feature, is linked to core-genome sizes after correction for phylogenetic effect (Phylogenetic generalized least squares, p value = 0.00005, λ = 0.98 (95% CI 0.957, 0.992), partial R-square (for ubiquity coefficient) 0.09, see also Supplementary Table 6). b Correlation of ubiquity with the relative frequency of functional categories (COG categories assigned by eggNog v4.5 [47]) in core and accessory genomes. Species of high ubiquity tend to encode more proteins involved in lipid metabolism (I) and secondary metabolite biosynthesis (Q).
Fig. 4
Fig. 4
Partitioning of variance in pangenome features explained by phylogenetic inertia and habitat preferences (R-square (car score)) based on model {1} from Fig. 2b.
Fig. 5
Fig. 5. Phylogenetic tree of 155 microbial species with scatter plots of core-genome size and average nucleotide diversity of core genomes.
Soil-associated species tend to have larger core genomes (marked in red in the left scatter plot), aquatic species tend to be more diverse (marked in blue in right scatter plot). Tree labels and background of scatter plots are colored by their taxonomic annotations (phylum). Bottom panel: Relationships between habitats and core-genome size and average nucleotide diversity of core genomes.

References

    1. Puigbò P, Lobkovsky AE, Kristensen DM, Wolf YI, Koonin EV. Genomes in turmoil: quantification of genome dynamics in prokaryote supergenomes. BMC Biol. 2014;12:66. - PMC - PubMed
    1. Snel B, Bork P, Huynen MA. Genome phylogeny based on gene content. Nat Genet. 1999;21:108–10. - PubMed
    1. Hugenholtz P. Exploring prokaryotic diversity in the genomic era. Genome Biol. 2002;3:1–8. - PMC - PubMed
    1. Lerat E, Daubin V, Ochman H, Moran NA. Evolutionary origins of genomic repertoires in bacteria. PLoS Biol. 2005;3:e130. - PMC - PubMed
    1. Vernikos G, Medini D, Riley DR, Tettelin H. Ten years of pan-genome analyses. Curr Opin Microbiol. 2015;23:148–54. - PubMed

Publication types