Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Feb 4;13(2):e0106624.
doi: 10.1128/spectrum.01066-24. Epub 2024 Dec 31.

Identification and characterization of Faecalibacterium prophages rich in diversity-generating retroelements

Affiliations

Identification and characterization of Faecalibacterium prophages rich in diversity-generating retroelements

Anastasia Gulyaeva et al. Microbiol Spectr. .

Abstract

Metagenomics has revealed the incredible diversity of phages within the human gut. However, very few of these phages have been subjected to in-depth experimental characterization. One promising method of obtaining novel phages for experimental characterization is through induction of the prophages integrated into the genomes of cultured gut bacteria. Here, we developed a bioinformatic approach to prophage identification that builds on prophage genomic properties, existing prophage-detecting software, and publicly available virome sequencing data. We applied our approach to 22 strains of bacteria belonging to the genus Faecalibacterium, resulting in identification of 15 candidate prophages, and validated the approach by demonstrating the activity of five prophages from four of the strains. The genomes of three active phages were identical or similar to those of known phages, while the other two active phages were not represented in the Viral RefSeq database. Four of the active phages possessed a diversity-generating retroelement (DGR), and one retroelement had two variable regions. DGRs of two phages were active at the time of the induction experiments, as evidenced by nucleotide variation in sequencing reads. We also predicted that the host range of two active phages may include multiple bacterial species. Finally, we noted that four phages were less prevalent in the metagenomes of inflammatory bowel disease patients compared to a general population cohort, a difference mainly explained by differences in the abundance of the host bacteria. Our study highlights the utility of prophage identification and induction for unraveling phage molecular mechanisms and ecological interactions.IMPORTANCEWhile hundreds of thousands of phage genomes have been discovered in metagenomics studies, only a few of these phages have been characterized experimentally. Here, we explore phage characterization through bioinformatic identification of prophages in genomes of cultured bacteria, followed by prophage induction. Using this approach, we detect the activity of five prophages in four strains of commensal gut bacteria Faecalibacterium. We further note that four of the prophages possess diversity-generating retroelements implicated in rapid mutation of phage genome loci associated with phage-host and phage-environment interactions and analyze the intricate patterns of retroelement activity. Our study highlights the potential of prophage characterization for elucidating complex molecular mechanisms employed by the phages.

Keywords: DGR; bacteriophage; microbiome.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Fig 1
Fig 1
Analysis of bacterial contig properties that can be indicative of prophage presence. The contig NODE_4 of the HTF−128 strain is shown as an example, with the approximate candidate prophage region highlighted by a light gray background. (A) ORF organization of the bacterial contig. The contig is shown as a black rectangular contour. ORFs encoded in three positive (negative) reading frames are shown as red (blue) bars. (B) Average content of the four nucleotides along the contig. (C) Prophage regions of the contig that are predicted by specialized software tools are indicated by dark gray bars. The y-axis indicates the different tools used. (D) Breadth and (E) depth of contig coverage by viral metagenome reads. Each line corresponds to a sample. Line color indicates the respective study. (F) Depth of contig coverage by reads used for bacterial genome assembly. Values on panels B, D–F were recorded using a 3,001-nt window sliding with a 500-nt step. Analysis of all contigs containing candidate prophages is presented in File S2.
Fig 2
Fig 2
Coverage of bacterial contigs by sequencing reads obtained following prophage induction experiments. Each row corresponds to a Faecalibacterium strain (strain names in bold). Each black frame corresponds to a bacterial contig (>50 kb, ordered by size). Each light gray rectangle indicates the position of a candidate prophage. Colored lines depict the depth of contig coverage by sequencing reads from the different DNA fractions (bacterial or VLP DNA) obtained after different prophage induction treatments. See Table S2 for MMC concentrations. Depth of coverage was recorded using a 3,001-nt window sliding with a 500-nt step.
Fig 3
Fig 3
Similarity between the known phages and phages detected in this study. Similarity between pairs of genomes assigned to the same genus-level cluster, illustrated by dot plots. X-axis of each dot plot corresponds to a phage described in the literature. Y-axis corresponds to a phage or CP detected in this study. Coordinates are indicated in kilobases. Every 12-letter word (i.e., a 12-nucleotide block) shared by a pair of sequences is presented as a black dot on a dot plot. The percent of columns with identical nucleotides in a pairwise alignment of the two sequences is given below each dot plot.
Fig 4
Fig 4
Genome maps of the active phages and DGR-containing candidate prophages. Each genome is depicted as a black rectangular contour. ORFs encoded in three positive and three negative reading frames are shown as light gray bars. Regions of ORFs matching Pfam profiles of virus structural proteins, as well as proteins involved in virus DNA packaging and virion assembly, are highlighted in blue. Regions of ORFs matching the Pfam RVT_1 profile are highlighted in orange. The orange arrows point from a DGR template repeat to a DGR variable repeat location.
Fig 5
Fig 5
Nucleotide variation and adenine conservation in multiple sequence alignments of cognate phage genomes. Data for the phage Roos, Lagaffe_CP, and CP11 are presented above their genome maps (see Fig. 4 legend for details). Locations of variable regions are highlighted by light orange vertical lines. Nucleotide variation in an MSA column was calculated as a proportion of symbols different from the most frequent one and subsequently averaged using a 101-nt window sliding with a 20-nt step. Adenine conservation was estimated as the percent of conserved adenine columns among all conserved columns in a 101-nt sliding window. A conserved MSA column was defined as a column in which the most frequent symbol occupied ≥90% positions.
Fig 6
Fig 6
Nucleotide variation in sequencing reads mapped to phage genomes. Data for the four DGR-containing phages are shown above their genome maps (see Fig. 4 legend for details). Locations of variable regions are highlighted by light orange vertical lines. Information about each sequencing sample originating from the host culture of a phage is presented on a separate line. They are shown in black for the initial bacterial genome sequencing and in color for sequencing following prophage induction experiments (see legend for colors). Only samples with coverage depth ≥10 along ≥95% phage genome length are shown. The nucleotide variation in reads mapped to a genome position was estimated as a proportion of nucleotides different from the most frequent one and subsequently averaged using a 101-nt window sliding with a 20-nt step.
Fig 7
Fig 7
Sequence similarities of the DGR target proteins. Pairwise comparisons between the DGR target proteins and (A) themselves and (B) selected PDB and Pfam database entries. Each comparison is presented in a frame where the X-axis and Y-axis coordinates correspond to query and target amino acid residues, respectively. Query and target variable regions are designated by orange vertical and horizontal lines, respectively. HHalign alignment paths are shown by gray lines, and the darkness of each line indicates the Probability value assigned to the respective alignment. Panel A alignments are between individual sequences. Panel B alignments are between HHpred-generated query profiles and database entries: DUF6273, Pfam domain of unknown function; Big_3_3, Pfam bacterial immunoglobulin-like domain; 6HHK_A, a profile based on the PDB structure of the Listeria phage A511 tail fiber protein gp105; and 1YU0_A, a profile based on the PDB structure of the Bordetella phage BPP-1 major tropism determinant.
Fig 8
Fig 8
Phage ecology. (A) Predicted hosts of the phages. Host UHGG species-level clusters are specified. Hosts were assigned based on the phage detection source in our study (gray square), phage detection in a UHGG isolate (cross), and phage match with a UHGG isolate CRISPR spacer (black circle). (B) Prevalence of the phages in the gut metagenomes of Lifelines-DEEP and 1000IBD cohort participants. (C) Prevalence and (D) relative abundance of the genus Faecalibacterium in the gut metagenomes of Lifelines-DEEP and 1000IBD cohort participants. P-values on panel B–C are corrected for multiple testing using the Benjamini–Hochberg method. * P-values < 0.05, ** P-values < 0.01, and *** P-values < 0.001.

Similar articles

Cited by

References

    1. Shkoporov AN, Turkington CJ, Hill C. 2022. Mutualistic interplay between bacteriophages and bacteria in the human gut. Nat Rev Microbiol 20:737–749. doi:10.1038/s41579-022-00755-4 - DOI - PubMed
    1. Nayfach S, Páez-Espino D, Call L, Low SJ, Sberro H, Ivanova NN, Proal AD, Fischbach MA, Bhatt AS, Hugenholtz P, Kyrpides NC. 2021. Metagenomic compendium of 189,680 DNA viruses from the human gut microbiome. Nat Microbiol 6:960–970. doi:10.1038/s41564-021-00928-6 - DOI - PMC - PubMed
    1. Camargo AP, Nayfach S, Chen I-MA, Palaniappan K, Ratner A, Chu K, Ritter SJ, Reddy TBK, Mukherjee S, Schulz F, Call L, Neches RY, Woyke T, Ivanova NN, Eloe-Fadrosh EA, Kyrpides NC, Roux S. 2023. IMG/VR v4: an expanded database of uncultivated virus genomes within a framework of extensive functional, taxonomic, and ecological metadata. Nucleic Acids Res 51:D733–D743. doi:10.1093/nar/gkac1037 - DOI - PMC - PubMed
    1. Dutilh BE, Cassman N, McNair K, Sanchez SE, Silva GGZ, Boling L, Barr JJ, Speth DR, Seguritan V, Aziz RK, Felts B, Dinsdale EA, Mokili JL, Edwards RA. 2014. A highly abundant bacteriophage discovered in the unknown sequences of human faecal metagenomes. Nat Commun 5:4498. doi:10.1038/ncomms5498 - DOI - PMC - PubMed
    1. Guerin E, Shkoporov A, Stockdale SR, Clooney AG, Ryan FJ, Sutton TDS, Draper LA, Gonzalez-Tortuero E, Ross RP, Hill C. 2018. Biology and taxonomy of crass-like bacteriophages, the most abundant virus in the human gut. Cell Host Microbe 24:653–664. doi:10.1016/j.chom.2018.10.002 - DOI - PubMed

LinkOut - more resources