. 2016 Nov 23;11(11):e0167089.

doi: 10.1371/journal.pone.0167089. eCollection 2016.

Synonymous Co-Variation across the E1/E2 Gene Junction of Hepatitis C Virus Defines Virion Fitness

Brendan A Palmer¹, Liam J Fanning¹

Affiliations

PMID: 27880830
PMCID: PMC5120871
DOI: 10.1371/journal.pone.0167089

Synonymous Co-Variation across the E1/E2 Gene Junction of Hepatitis C Virus Defines Virion Fitness

Brendan A Palmer et al. PLoS One. 2016.

. 2016 Nov 23;11(11):e0167089.

doi: 10.1371/journal.pone.0167089. eCollection 2016.

Authors

Brendan A Palmer¹, Liam J Fanning¹

Affiliation

¹ Molecular Virology Diagnostic & Research Laboratory, Department of Medicine, University College Cork, Cork, Ireland.

PMID: 27880830
PMCID: PMC5120871
DOI: 10.1371/journal.pone.0167089

Abstract

Hepatitis C virus is a positive-sense single-stranded RNA virus. The gene junction partitioning the viral glycoproteins E1 and E2 displays concurrent sequence evolution with the 3'-end of E1 highly conserved and the 5'-end of E2 highly heterogeneous. This gene junction is also believed to contain structured RNA elements, with a growing body of evidence suggesting that such structures can act as an additional level of viral replication and transcriptional control. We have previously used ultradeep pyrosequencing to analyze an amplicon library spanning the E1/E2 gene junction from a treatment naïve patient where samples were collected over 10 years of chronic HCV infection. During this timeframe maintenance of an in-frame insertion, recombination and humoral immune targeting of discrete virus sub-populations was reported. In the current study, we present evidence of epistatic evolution across the E1/E2 gene junction and observe the development of co-varying networks of codons set against a background of a complex virome with periodic shifts in population dominance. Overtime, the number of codons actively mutating decreases for all virus groupings. We identify strong synonymous co-variation between codon sites in a group of sequences harbouring a 3 bp in-frame insertion and propose that synonymous mutation acts to stabilize the RNA structural backbone.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

**Fig 1. Within-lineage codon fixation during the study timeframe of the L1a, L1b, L1c and L2 sequence subsets.**
Left panel: For the purposes of this analysis, a site is designated as fixed (black) when a single codon accounts for all the sequences in that sample and all subsequent samples thereafter. For all sequence subsets overtime, the proportion of codon sites actively mutating decreased across the length of the amplicon including the HVR1. Notably, just 3/27 L1b HVR1 codon sites displayed ongoing codon switching events post sample 7. L2 contained the highest proportion of sites that were invariant throughout the sampling timeframe. In spite of the sample space expansion of L2 between samples 8–10, the number of fixed codon sites increased overall. The E1/E2 gene junction and the last codon of the HVR1 are identified by a solid red line and a dashed red line, respectively. Samples with absent or insufficient sequence data (less than two unique sequences) are shaded as grey horizontal bars. Tick marks along the X-axis identify each codon position of the amplicon sequence. Right panel: The sample specific frequency of each (sub-)lineage.

**Fig 2. Effective number of codons utilized by each HCV (sub-)lineage overtime.**
Values less than the threshold of 40 (dashed line) are considered as biased utilization of the available redundancy within the genetic code.

**Fig 3. Pronounced epistasis was evident for both L1a and L1b sequence sets covering 10 years of continuous adaptive evolution of the E1/E2 gene junction.**
Nodes (representing codon positions) within the graph are connected by an edge if the probability of a change detected simultaneously at both sites was statistically significant (p-value < 0.01). (A) L1a epistasis was highly ordered with the majority of significantly linked sites participating in a single large connected component. (B) Epistasis within the L1b sequence set was observed among a greater number of codon sites overall. The majority of sites identified for L1b exclusively underwent synonymous mutation. (C) Two sites were observed in L2 sequences that were below the significance threshold. White nodes define codons within the E1 coding sequence, while grey nodes identify E2 codons. Sites containing nonsynonymous mutations are identified by black numbers while sites exclusively undergoing synonymous mutation are given by red numbers. Nodes are numbered in accordance with the amino acid positions of the H77 reference genome (Genbank accession: AF009606).

**Fig 4. Bipartite mapping of co-evolving sites to the amplicon.**
Top panel: Co-evolving pairs are identified as nonsynonymous-nonsynonymous, synonymous-synonymous or a combination of one site mutating nonsynonymously and one site mutating synonymously. Bottom panel: The combined odds of a mutation co-varying with a mutation at a second site are given by the colored scale bar. Co-varying pairs represented by grey bars have combined odds <0.6 which indicates that, for a given sequence set, one of the two sites has a greater observed mutational flexibility than that observed at the second site. Raw data counts, individual odds and combined odds are provided in S4 and S5 Tables for reference.

**Fig 5. Sub-division of L1a and L1b sequences by HVR1 motif defined sequence stability.**
Significant co-variation between sites was determined separately for L1a sequences and L1b sequences split by HVR1 motif. White nodes define codons within the E1 coding sequence, while grey nodes identify E2 codons. Sites containing nonsynonymous mutations are identified by black numbers while sites exclusively undergoing synonymous mutation are given by red numbers. Nodes are numbered in accordance with the amino acid positions of the H77 reference genome (Genbank accession: AF009606).

**Fig 6. Mapping of the L1b sequence set in the context of HVR1 motif groups.**
All trees have been rooted using the nearest detectable ancestor to the insertion event, GQ985348 [red square, 17]. The sample specific isolation of each sequence is defined by the color legend. (A) The L1b sequence subset was split into two groups, A (black squares) and B (open squares), which were defined by the constituent HVR1 amino acid motifs. (B) Colored nodes identify those sequences coding for HVR1 motifs that have previously been associated with IgG-bound virions [7]. A single IgG-associated motif from group A was isolated from sample 1 (red circle). An additional IgG-associated motif for group B was isolated from sample 6 (grey circle), four years after the motif was first detected in whole patient serum. A phylogenetically diverse branch of L1b group A sequences not subject to detectable IgG binding is indicated by red branches. (C) Group A sequences not subject to IgG-targeting exhibited the greatest sequence diversity. Nevertheless, this subpopulation collapsed post-sample 5. The genetic distance is shown as a bracketed scale bar. Bootstrap values >80 of 1000 resamplings are shown. Genetic distance is given by the scale bar.

**Fig 7. Putative reorganization of conserved structures within the HCV genome over time.**
The local RNA structure for codons 357–370 were modeled using example sequences from L1a. (A) L1a dominant sequence motif from sample 1. (B) L1a dominant sequence motif from sample 8. Whereas the mutations observed led to an overall decrease in the minimum free energy of the structure, it is the cumulative contribution of mutations across the length of the amplicon that determine overall stability. The predicted ΔG for the full length amplicon was -93.94 and -100.4 for the dominant sequence motifs from sample 1 and 8, respectively. Synonymous mutations at codons 363, 366 and 367 are shown in red.

See this image and copyright information in PMC

References

1. Smith DB, Bukh J, Kuiken C, Muerhoff AS, Rice CM, Stapleton JT, et al. Expanded classification of hepatitis C virus into 7 genotypes and 67 subtypes: updated criteria and genotype assignment web resource. Hepatology. 2014;59(1):318–27. Epub 2013/10/12. 10.1002/hep.26744 - DOI - PMC - PubMed
1. Simmonds P, Bukh J, Combet C, Deleage G, Enomoto N, Feinstone S, et al. Consensus proposals for a unified system of nomenclature of hepatitis C virus genotypes. Hepatology. 2005;42(4):962–73. Epub 2005/09/09. 10.1002/hep.20819 . - DOI - PubMed
1. Steinhauer DA, Domingo E, Holland JJ. Lack of evidence for proofreading mechanisms associated with an RNA virus polymerase. Gene. 1992;122(2):281–8. . - PubMed
1. Duffy S, Shackelton LA, Holmes EC. Rates of evolutionary change in viruses: patterns and determinants. Nat Rev Genet. 2008;9(4):267–76. Epub 2008/03/06. nrg2323 [pii] 10.1038/nrg2323 . - DOI - PubMed
1. Geller R, Estada U, Peris JB, Andreu I, Bou JV, Garijo R, et al. Highly heterogeneous mutation rates in the hepatitis C virus genome. Nat Microbiol. 2016;1(7):16045 10.1038/nmicrobiol.2016.45 . - DOI - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Synonymous Co-Variation across the E1/E2 Gene Junction of Hepatitis C Virus Defines Virion Fitness

Affiliation

Synonymous Co-Variation across the E1/E2 Gene Junction of Hepatitis C Virus Defines Virion Fitness

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

Similar articles

References

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Other Literature Sources

Abstract

Conflict of interest statement

Figures

Similar articles

References

MeSH terms

Substances

Related information

LinkOut - more resources

Full Text Sources

Other Literature Sources