Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Sep 27;14(1):6013.
doi: 10.1038/s41467-023-41699-4.

A systematic analysis of marine lysogens and proviruses

Affiliations

A systematic analysis of marine lysogens and proviruses

Yi Yi et al. Nat Commun. .

Abstract

Viruses are ubiquitous in the oceans, exhibiting high abundance and diversity. Here, we systematically analyze existing genomic sequences of marine prokaryotes to compile a Marine Prokaryotic Genome Dataset (MPGD, consisting of over 12,000 bacterial and archaeal genomes) and a Marine Temperate Viral Genome Dataset (MTVGD). At least 40% of the MPGD genomes contain one or more proviral sequences, indicating that they are lysogens. The MTVGD includes over 12,900 viral contigs or putative proviruses, clustered into 10,897 viral genera. We show that lysogens and proviruses are abundant in marine ecosystems, particularly in the deep sea, and marine lysogens differ from non-lysogens in multiple genomic features and growth properties. We reveal several virus-host interaction networks of potential ecological relevance, and identify proviruses that appear to be able to infect (or to be transferred between) different bacterial classes and phyla. Auxiliary metabolic genes in the MTVGD are enriched in functions related to carbohydrate metabolism. Finally, we experimentally demonstrate the impact of a prophage on the transcriptome of a representative marine Shewanella bacterium. Our work contributes to a better understanding of the ecology of marine prokaryotes and their viruses.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Overview of the Marine Prokaryotic Genome dataset (MPGD) and the lysogeny landscape in the ocean.
a Geographic distribution of genomes in the MPGD. Each point represents a geographic site, and the colour indicates the genome derived environment: seawater, sediment, both or others (information unavailable). The map was drawn using the R package maps (3.4.1), in which the “world” data derived from the Natural Earth (v2.0) (https://www.naturalearthdata.com/). b Water depth distribution of genomes in the MPGD. Each circle in the scatter plots shows the water depth, and the circle size is proportional to the number of genomes found at that depth. The pie chart shows the proportions of genomes sampled from different ocean zones: the epipelagic (water depth: 0–200 m), mesopelagic (200–1000 m) and deep-sea (>1000 m) zones. c Ratios of lysogeny in different marine prokaryotic taxa. The bacterial and archaeal phylogenetic trees were constructed based on 120 and 122 concatenated marker proteins, respectively, using the maximum-likelihood algorithm. All branches were collapsed at the class level. Each pie chart corresponds to a class (with ≥ 20 genomes), showing the proportions of lysogenic and nonlysogenic genomes. d Comparison of lysogeny ratios between bacteria and archaea at the class level. n1: number of prokaryotic classes; n2: number of prokaryotic genomes contained in these classes. The significant difference between LyRs of bacterial and archaeal classes was determined by a two-sided Wilcoxon rank-sum test, and the P-value is shown above boxes. Each box represents the interquartile range (IQR), in which the middle line represents the median. The whiskers extend to 1.5 × IQR, and all contained data are shown as the individual points. e Occurrence of lysogeny in different ocean zones. Source data are provided as a Source Data file.
Fig. 2
Fig. 2. Comparison of genomic features and growth traits between marine lysogens and nonlysogens.
ac Box plots representing the genome size (a), protein coding density (b), and GC content (c), of marine lysogens and nonlysogens. All parameters of lysogens were calculated based on genomes with temperate viral sequences excluded. All the significant differences between lysogens and nonlysogens were determined by two-sided Wilcoxon rank-sum tests, and the P-values are shown above boxes. Each box represents IQR, in which the middle line represents the median. The whiskers extend to 1.5 × IQR, and all contained data are shown as the individual points. d Coloured and grey bars represent the percentage of fast growers (MDT < 5 h) in marine lysogens and nonlysogens, respectively. The marine prokaryotes were grouped based on ocean zone (left two panels) or host taxon (right two panels). Data groups in the (ad) shared the same sample sizes (number of genomes in each group) which were shown at the top of the (a). Source data are provided as a Source Data file.
Fig. 3
Fig. 3. Interactions between marine temperate viruses and prokaryotic hosts.
a, b Distribution and host ranges of temperate viral clusters (tVCs) in seawater (a) and sediment (b). For clarity, only tVCs with ≥15 (in seawater) or ≥5 (in sediment) viral genomes are shown in the heatmaps. The heatmaps show the number of viral genomes in tVCs derived from 3 depth-stratified ocean zones, which are hierarchically clustered by tVCs. The tVCs present in all 6 ocean zones are considered as putative cosmopolitan tVCs and are marked in green next to the tVC names. Each tVC is connected to the class of its host(s), and the transparency of the connecting lines is proportional to the number of infections. c, d Interaction networks of prokaryotic genera and tVCs in seawater (c) and sediment (d). For clarity, only prokaryotic genera with ≥20 genomes (in seawater) or ≥5 genomes (in sediment) are displayed in the network. The circles and hexagons represent host genera and tVCs, respectively, and the sizes are proportional to the numbers of genomes included. The coloured circles represent different bacterial classes or archaeal phyla, the inner letters mark ecologically relevant marine microbial hosts (at the genus level), and the yellow hexagons represent tVCs that infect multiple host genera. The numbers of infections are displayed as the shared edges and proportional to the transparency and width. The networks are visualized using the edge-weighted spring-embedded model, which places the host genera and tVCs sharing higher co-occurrence in closer proximity. Source data are provided as a Source Data file.
Fig. 4
Fig. 4. Distribution of auxiliary metabolic genes (AMGs) in marine temperate viruses.
a, b Heatmap showing the relative abundance of all AMGs (a) and the functionally categorized AMGs (b) encoded by marine temperate viruses in each group. The relative abundance was calculated as the average number of AMGs carried by per viral genome. The groupings are based on the ocean zone or host taxon. Functional categories of AMGs are annotated by DRAM-v. The number of viral genomes contained in each group is shown at the top of the heatmap. CAZy Carbohydrate-Active enZYmes, MISC miscellaneous. c Composition of AMGs involved in CAZy. The asterisks indicate that AMGs were functionally characterized in this study. CBM carbohydrate-binding module, CE carbohydrate esterase, GH glycoside hydrolase, GT glycosyl transferase, PL polysacchaide lyase. Source data are provided as a Source Data file.
Fig. 5
Fig. 5. Assessment of marine temperate virus-host complementarity.
a Correlation between marine temperate virus-host complementarity and lyRs. Pearson’s correlation coefficients between the median d2*/codon distance and lyRs of genera (with ≥5 genomes) were calculated and are shown by the colour gradient of squares. Significant correlations were determined by two-sided tests and marked by white asterisks and P-values are represented by different sizes of asterisks and shown above squares. The slashes indicate statistical unachievability due to the small sample size (number of genera < 5). n1: number of host genera used for calculation of pearson’s correlation; n2: number of host genomes contained in the genera. b, c Distribution of temperate virus-host nucleotide (b) and amino acid complementarity (c) in different ocean zones or host taxa. The differences among groups were analysed by the two-sided Wilcoxon rank-sum test, and P-values of the significant differences are shown above boxes (other compared groups are shown in Supplementary Data 19). Each box represents the IQR, in which the middle line represents the median. The whiskers extend to 1.5 × IQR, and all contained data are shown as the individual points. The number of virus-host pairs contained in each group is shown at the top of the graphs. Source data are provided as a Source Data file.
Fig. 6
Fig. 6. Influence of the marine temperate virus SP1 on the host transcriptome.
a Genomic map of the prophage SP1 in the marine bacterium S. psychrophila WP2 (WP2). The arrows depict the location and direction of predicted proteins on the phage genomes, and the fill colours indicate different functional categories of genes, as indicated in the legend. b Verification of SP1 excision by PCR. The left schematic graph shows the process of SP1 excision, in which site-specific recombination occurs through the crossover between attL and attR sites to generate the SP1-deleted WP2 genome and a circular SP1 genome. The locations of the primer pairs used for verification are also shown. The right panel shows the electrophoresis of PCR products. The primer pairs and template DNA used for PCR are indicated for each lane, and the target bands are marked with an arrow. M, DNA size marker. Image of representative agarose gel from two independent experiments are shown. c Excision rates of SP1 in S. psychrophila WP2 under different treatments. The data represent the mean ± SD and are based on three biologically independent samples. MMC, Mitomycin C. d Graphic display of differentially expressed genes (DEGs) categorized by function in S. psychrophila WP2 after SP1 deletion. The transcriptome data represent three biologically independent samples for each strain (WP2 and WP2ΔSP1). Normalized differential expression levels (fold changes) are represented by heatmaps in boxes according to the scale bar (log2 scale) from most upregulated to most downregulated. The proteins encoded by the DEGs are shown in each box. Source data are provided as a Source Data file.

References

    1. Dance A. The incredible diversity of viruses. Nature. 2021;595:22–25.
    1. Feiner R, et al. A new perspective on lysogeny: prophages as active regulatory switches of bacteria. Nat. Rev. Microbiol. 2015;13:641–650. - PubMed
    1. Howard-Varona C, Hargreaves KR, Abedon ST, Sullivan MB. Lysogeny in nature: mechanisms, impact and ecology of temperate phages. ISME J. 2017;11:1511–1520. - PMC - PubMed
    1. Touchon M, Bernheim A, Rocha EP. Genetic and life-history traits associated with the distribution of prophages in bacteria. ISME J. 2016;10:2744–2754. - PMC - PubMed
    1. Kim M-S, Bae J-W. Lysogeny is prevalent and widely distributed in the murine gut microbiota. ISME J. 2018;12:1127–1141. - PMC - PubMed

Publication types