Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Aug 1;89(15):7673-7695.
doi: 10.1128/JVI.00578-15. Epub 2015 May 13.

High-throughput analysis of human cytomegalovirus genome diversity highlights the widespread occurrence of gene-disrupting mutations and pervasive recombination

Affiliations

High-throughput analysis of human cytomegalovirus genome diversity highlights the widespread occurrence of gene-disrupting mutations and pervasive recombination

Steven Sijmons et al. J Virol. .

Abstract

Human cytomegalovirus is a widespread pathogen of major medical importance. It causes significant morbidity and mortality in the immunocompromised and congenital infections can result in severe disabilities or stillbirth. Development of a vaccine is prioritized, but no candidate is close to release. Although correlations of viral genetic variability with pathogenicity are suspected, knowledge about strain diversity of the 235kb genome is still limited. In this study, 96 full-length human cytomegalovirus genomes from clinical isolates were characterized, quadrupling the available information for full-genome analysis. These data provide the first high-resolution map of human cytomegalovirus interhost diversity and evolution. We show that cytomegalovirus is significantly more divergent than all other human herpesviruses and highlight hotspots of diversity in the genome. Importantly, 75% of strains are not genetically intact, but contain disruptive mutations in a diverse set of 26 genes, including immunomodulative genes UL40 and UL111A. These mutants are independent from culture passaging artifacts and circulate in natural populations. Pervasive recombination, which is linked to the widespread occurrence of multiple infections, was found throughout the genome. Recombination density was significantly higher than in other human herpesviruses and correlated with strain diversity. While the overall effects of strong purifying selection on virus evolution are apparent, evidence of diversifying selection was found in several genes encoding proteins that interact with the host immune system, including UL18, UL40, UL142 and UL147. These residues may present phylogenetic signatures of past and ongoing virus-host interactions.

Importance: Human cytomegalovirus has the largest genome of all viruses that infect humans. Currently, there is a great interest in establishing associations between genetic variants and strain pathogenicity of this herpesvirus. Since the number of publicly available full-genome sequences is limited, knowledge about strain diversity is highly fragmented and biased towards a small set of loci. Combined with our previous work, we have now contributed 101 complete genome sequences. We have used these data to conduct the first high-resolution analysis of interhost genome diversity, providing an unbiased and comprehensive overview of cytomegalovirus variability. These data are of major value to the development of novel antivirals and a vaccine and to identify potential targets for genotype-phenotype experiments. Furthermore, they have enabled a thorough study of the evolutionary processes that have shaped cytomegalovirus diversity.

PubMed Disclaimer

Figures

FIG 1
FIG 1
Diversity and evolution of the HCMV genome. Shown is an overview of genetic diversity and evolutionary pressure along the HCMV genome. The genome is divided into four panels. Each panel consists of four separate tracks. In the top track, nucleotide diversity is calculated in a sliding window of 500 nt with a step size of 100 nt. As only ungapped residues are included in each window, the distance between two data points may vary in areas with many indels. In the second track, the extents of recombination and positive selection are assessed for each gene. Displayed above the center of the appropriate gene (bottom track), dark and light blue bars represent recombination breakpoint density and the percentage of codons under positive selection, respectively. For optimal resolution, values were cut off at 10 breakpoints (brp)/kb and 4% codons under positive selection. Green and red bars in the third track indicate the genome positions of conserved and variable tandem repeats. The bottom track annotates genes and other genome elements in four layers. The first two layers show genes carried on the forward strand, and the last two layers show genes carried on the reverse strand. Spliced exons are connected with thin black lines. Genomic inverted repeats (TRL, IRL/IRS, and TRS) and long noncoding RNAs are represented in black; genes are colored on a scale from green to red, indicating the frequency of ORF-disrupting mutations in separate clinical isolates.
FIG 2
FIG 2
Variability, recombination, and selection in HCMV gene families. Gene diversity (dN), recombination breakpoint density, and the percentages of codons under positive and negative selection are indicated for HCMV genes within gene families; each dot represents a gene. Only genes belonging to specific gene families are represented. Group averages are designated with horizontal lines.
FIG 3
FIG 3
Tandem repeats (TRs) in the HCMV genome. TRs in reference strain Merlin were identified, and orthologous repeats were searched for in a data set of 124 complete HCMV genomes. Only TRs with orthologs in >50% of strains were included in the analysis. (A) TR nucleotide content (9,008 nt) in coding and noncoding (intergenic, intron, and ncRNA) regions compared to the distribution of total nucleotides (231,784 nt) over these regions. (B) Similar to panel A, where TR and total nucleotide distributions over unique long (UL), unique short (US), and internal repeat long and short (IRL-IRS) genome regions are compared. Percentages in panels A and B do not add up to 100% because of rounding errors. (C) Conservation of TRs between coding and noncoding TRs and between different TR types (homopolymers, microsatellites, and minisatellites). TRs were reported to be conserved if >50% of strains had identical repeat sequences and copy numbers.
FIG 4
FIG 4
Distribution of ORF-disrupting mutations. Shown is a graphical representation of the distribution of ORF-disrupting mutations over 124 clinical isolates. Rows represent different isolates, and columns represent all 26 genes containing disruptive mutations. Disrupted genes are represented in red, and intact genes are represented in green.
FIG 5
FIG 5
ORF-disrupting mutations in the UL111A gene. Shown is a nucleotide alignment of wild-type UL111A (strain Merlin) and all 12 mutants. Countries of isolation are listed for all strains with the international two-letter code (GB, Great Britain; BE, Belgium; CZ, Czech Republic; DE, Germany). Mutations (deletions, insertions, and substitutions) are highlighted in red, and the predicted stop codons are underlined, with untranslated sequences after stop codons being crossed out. Introns have a gray background, unless they are aberrantly translated because of the deletion of splice donor sites. LAcmvIL-10 transcripts are similar, but the second intron is not spliced, with translation proceeding into it.
FIG 6
FIG 6
Widespread recombination between HCMV strains. Recombination between separate HCMV strains was analyzed. (A) Neighbor-net split network of 124 full-genome sequences showing numerous reticulate connections that are indicative of recombination. The Phi-test for recombination gave strong statistical evidence for recombination. Countries of isolation of different strains are represented with the international two-letter country codes at the beginning of strain names (CN, China; KR, South Korea; IT, Italy; US, United States; other codes are defined in the legend to Fig. 5). Asian strains JHC and HAN are highlighted with a gray background. (B) BootScan analysis of 9 strains highlighted in the split network. Strain BE/25/2010 (highlighted in black) was used as a reference strain.
FIG 7
FIG 7
Diversity and recombination in HSV-1, VZV, and EBV. Using the full-genome sequences listed in Table S1 in the supplemental material, neighbor-net split networks were constructed for HSV-1, VZV, and EBV strains. In all three cases, statistically significant evidence for recombination was detected. Countries of isolation of different strains are represented with the international two-letter country codes at the beginning of strain names (KE, Kenya; JP, Japan; HK, Hong Kong; NG, Nigeria; GH, Ghana; CA, Canada; RU, Russia; NL, the Netherlands; MA, Morocco; MX, Mexico; other codes are defined in the legends to Fig. 5 and Fig. 6). In all three networks, distinct clusters are recognizable. Strains chosen for recombination analysis with RDP3 are underlined.
FIG 8
FIG 8
The majority of HCMV genes are under strong purifying selection. The selection mode acting on genes is represented by calculation of dN/dS ratios. A ratio close to zero indicates strong negative/purifying selection, and a ratio close to 1 indicates neutral selection or genetic drift. A ratio significantly higher than 1 indicates positive/diversifying selection. Genes are binned in groups with similar dN/dS ratios in steps of 0.1.
FIG 9
FIG 9
Residues under diversifying selection in pUL18. Codons under positive/diversifying selection in the UL18 gene were determined with the SLAC, FEL, MEME, and FUBAR algorithms of the HyPhy package. Sites that showed significant evidence of positive selection by at least two of four methods are represented in red on the protein structure of pUL18 (green). The structure shows a complex of pUL18 (a viral MHC-I homolog), human β-2-microglobulin (blue) (a MHC-I light chain), and an actin peptide (pink) bound to the inhibitory immunoglobulin receptor LIR-1 (yellow) (136). The three-dimensional structure is visualized from two opposite angles. All selected residues are located at the surface of pUL18.

References

    1. Cannon MJ, Schmid DS, Hyde TB. 2010. Review of cytomegalovirus seroprevalence and demographic characteristics associated with infection. Rev Med Virol 20:202–213. doi:10.1002/rmv.655. - DOI - PubMed
    1. Sinclair J, Reeves M. 2014. The intimate relationship between human cytomegalovirus and the dendritic cell lineage. Front Microbiol 5:389. doi:10.3389/fmicb.2014.00389. - DOI - PMC - PubMed
    1. Boeckh M, Geballe AP. 2011. Cytomegalovirus: pathogen, paradigm, and puzzle. J Clin Invest 121:1673–1680. doi:10.1172/JCI45449. - DOI - PMC - PubMed
    1. Manicklal S, Emery VC, Lazzarotto T, Boppana SB, Gupta RK. 2013. The “silent” global burden of congenital cytomegalovirus. Clin Microbiol Rev 26:86–102. doi:10.1128/CMR.00062-12. - DOI - PMC - PubMed
    1. Arvin AM, Fast P, Myers M, Plotkin S, Rabinovich R, National Vaccine Advisory Committee . 2004. Vaccine development to prevent cytomegalovirus disease: report from the National Vaccine Advisory Committee. Clin Infect Dis 39:233–239. doi:10.1086/421999. - DOI - PubMed