Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2012 Mar 6;109(10):3962-6.
doi: 10.1073/pnas.1119061109. Epub 2012 Feb 21.

Hypervariable loci in the human gut virome

Affiliations

Hypervariable loci in the human gut virome

Samuel Minot et al. Proc Natl Acad Sci U S A. .

Abstract

Genetic variation is critical in microbial immune evasion and drug resistance, but variation has rarely been studied in complex heterogeneous communities such as the human microbiome. To begin to study natural variation, we analyzed DNA viruses present in the lower gastrointestinal tract of 12 human volunteers by determining 48 billion bases of viral DNA sequence. Viral genomes mostly showed low variation, but 51 loci of ∼100 bp showed extremely high variation, so that up to 96% of the viral genomes encoded unique amino acid sequences. Some hotspots of hypervariation were in genes homologous to the bacteriophage BPP-1 viral tail-fiber gene, which is known to be hypermutagenized by a unique reverse-transcriptase (RT)-based mechanism. Unexpectedly, other hypervariable loci in our data were in previously undescribed gene types, including genes encoding predicted Ig-superfamily proteins. Most of the hypervariable loci were linked to genes encoding RTs of a single clade, which we find is the most abundant clade among gut viruses but only a minor component of bacterial RT populations. Hypervariation was targeted to 5'-AAY-3' asparagine codons, which allows maximal chemical diversification of the encoded amino acids while avoiding formation of stop codons. These findings document widespread targeted hypervariation in the human gut virome, identify previously undescribed types of genes targeted for hypervariation, clarify association with RT gene clades, and motivate studies of hypervariation in the full human microbiome.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Fig. 1.
Fig. 1.
Assembly, functional assessment, and identification of viral sequences. (A) Summary of contigs assembled from viral sequences. Each contig is shown as a point, the length is shown on the x axis and the depth of reads mapped to each contig on the y axis. Circular contigs are shown in red. (B) Assignment of gene functions from viral contigs using the Pfam database of protein families; assignment of Pfam domains to viral functions is described in ref. . The proportion of sequences that were assigned for each function is indicated on the y axis. On average, 21% of each sample was assigned to a Pfam protein family. (C) The five viral RefSeq genomes with the most similarity to sequences generated in this study. Vertical lines indicate the depth of sequencing at that position, and colored lines indicate mismatches with the reference sequence. The range of coverage is noted to the left of each plot. Blue boxes below each genome indicate annotated genes.
Fig. 2.
Fig. 2.
RT-associated hypervariable regions from the human gut virome. (A) Hypervariation in a gene predicted to encode a protein with an MTD-like C-type lectin fold. (B) Hypervariation in a gene predicted to encode an Ig-superfamily fold. In A and B, the Upper section shows the contig of origin, with gray vertical lines showing sequencing depth and boxes showing annotated proteins. The indicated area is expanded below to show the TR, the corresponding VR, RT, and the ORF that contains the targeted VR. The inferred direction of information transfer between the TR and VR is shown with an arrow. The Lower section of each plot shows an alignment of the sequences spanning the TR and VR for each element (white space indicates gaps between reads). Above the VR sequence is a barplot indicating the proportion of bases in the VR that differ from the consensus base in the TR. DNA bases are indicated by colors as indicated on the sides of the panels.
Fig. 3.
Fig. 3.
Characteristics of RT-associated hypervariation in the gut virome. (A) Heatmap showing the relationship of positions in the TR (y axis) to the resulting nucleotides in the VR (x axis). Of 15,447 mutated bases, 14,930 (97%) are located at adenine-positions relative to the TR. (B) Amino acid substitution heatmap showing the relationship of codons in the TR (y axis) to the resulting codons in the VR (x axis). Of 11,462 mutated codons, 9,212 (80%) are located at asparagine (N) codons in the TR.
Fig. 4.
Fig. 4.
RT sequences found in DNA viruses of the human gut. (A) Phylogentic tree of RT sequences. Each sequence was aligned to a position-specific scoring matrix to construct a multiple sequence alignment. The tree was constructed using the maximum-likelihood method. Green circles indicate RT sequences on viral contigs from this dataset that contain hypervariable regions and TR/VR pairs. Purple circles indicate other RT sequences from this dataset; the remaining leaves indicate reference sequences from the NCBI. RT clades were adapted from refs. and , and are indicated by gray lines. The bootstrap support of internal nodes is indicated by the color of internal branches as described in the key. Clades are marked according to refs. and : Abi, abortive-phage-infection; DGR, diversity generating retroelements; G2L, group II intron-like families; Hpdn, hepadnaviruses; LTR, LTR retrotransposons and retroviruses; NLTR, non-LTR retrotransposons; PLE, Penelope-like elements; Rpls, retroplasmid; Telo, telomerase; Unk, unknown families (19). The scale bar indicates the log-corrected distance metric used by FastTree, adapted from BLOSUM45. Distances range from 0, indicating a perfect match, to 3, indicating no overlap. (Scale bar, 1.0). (B) Relative proportions of RTs in viruses studied here, the RefSeq phage genome database, and the RefSeq bacterial genome database.

References

    1. Craig NL, Craigie R, Gellert M, Lambowitz AM. Mobile DNA II. Washington, DC: ASM,; 2002.
    1. Bushman FD. Lateral DNA Transfer: Mechanisms and Consequences. Cold Spring Harbor, NY: Cold Spring Harbor Lab Press; 2001.
    1. McMahon SA, et al. The C-type lectin fold as an evolutionary solution for massive sequence variation. Nat Struct Mol Biol. 2005;12:886–892. - PubMed
    1. Miller JL, et al. Selective ligand recognition by a diversity-generating retroelement variable protein. PLoS Biol. 2008;6:e131. - PMC - PubMed
    1. Dai W, et al. Three-dimensional structure of tropism-switching Bordetella bacteriophage. Proc Natl Acad Sci USA. 2010;107:4347–4352. - PMC - PubMed

Publication types

LinkOut - more resources