Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 May 2;18(5):e1010179.
doi: 10.1371/journal.pgen.1010179. eCollection 2022 May.

Assessing in vivo mutation frequencies and creating a high-resolution genome-wide map of fitness costs of Hepatitis C virus

Affiliations

Assessing in vivo mutation frequencies and creating a high-resolution genome-wide map of fitness costs of Hepatitis C virus

Kaho H Tisthammer et al. PLoS Genet. .

Abstract

Like many viruses, Hepatitis C Virus (HCV) has a high mutation rate, which helps the virus adapt quickly, but mutations come with fitness costs. Fitness costs can be studied by different approaches, such as experimental or frequency-based approaches. The frequency-based approach is particularly useful to estimate in vivo fitness costs, but this approach works best with deep sequencing data from many hosts are. In this study, we applied the frequency-based approach to a large dataset of 195 patients and estimated the fitness costs of mutations at 7957 sites along the HCV genome. We used beta regression and random forest models to better understand how different factors influenced fitness costs. Our results revealed that costs of nonsynonymous mutations were three times higher than those of synonymous mutations, and mutations at nucleotides A or T had higher costs than those at C or G. Genome location had a modest effect, with lower costs for mutations in HVR1 and higher costs for mutations in Core and NS5B. Resistance mutations were, on average, costlier than other mutations. Our results show that in vivo fitness costs of mutations can be site and virus specific, reinforcing the utility of constructing in vivo fitness cost maps of viral genomes.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Estimated in vivo mutation frequencies in HCV1a, as determined by analysis of 195 viral populations.
(A) Box plots and average estimated frequencies (± 95% confidence intervals) for all, synonymous (Syn), non-synonymous (Nonsyn), and nonsense mutations, stratified by transitions (Ts) vs. transversions (Tvs). Transition mutation frequencies are much higher for every class of mutation. *** denotes statistical significance with adjusted P-values <0.001 by the Holm correction. (B) Genome-wide transition mutation frequencies, ordered by mutation frequency and colored by mutation type, show that synonymous mutations are more common than non-synonymous mutations in the HCV genome. (C) Transition mutation frequencies along the HCV genome, colored by mutation type, show that average mutation frequency is roughly consistent across the genome, with synonymous mutations more common than nonsynonymous mutations. Each dot represents the average mutation frequency at a nucleotide position, across the 195 viral populations. The line represents the sliding window average of 100 bases. The regions with the highest (HVR1) and the lowest (Core) average mutation frequencies were highlighted in yellow.
Fig 2
Fig 2. Estimated transition mutation frequencies of HCV by gene.
Aggregated observed frequencies by gene and type of mutations; dots indicate the averages and the error bars represent the estimated 95% confidence intervals from 195 samples. *** denotes statistically significant difference (adjusted P<0.0001) between synonymous and nonsynonymous mutations by Mann-Whitney test.
Fig 3
Fig 3. Various factors that affect the frequency of mutations in HCV, based on analysis of 195 viral populations, each derived from a different patient.
(A) Predicted effects of ancestral nucleotide (T, C, or G, vs. A); CpG-creating status (vs. non-CpG-creating status); nonsynonymous (Nonsyn; vs. synonymous); amino acid-changing (AAChange; vs. non-amino acid-changing); presence in the Core, HVR1, E2, NS1, NS2, NS4A, and NS5B genes (vs. the NS3/NS4B/NS5A regions) on mutation frequencies in the HCV genome. Beta regression models were used to determine the effects of the different factors on mutation frequencies across the genome, and this figure reflects the results of the best-fit model based on AIC. (B) Estimated average transition mutation frequencies from the beta regression model (black dots with standard errors) and the actual observed frequencies from 195 patients infected with HCV (in colors). (C) Top 8 important features identified from the predictive random forest regression model on mutation frequencies (for all features tested, see S1 Table).
Fig 4
Fig 4. Estimated genome-wide selection coefficients (fitness costs) in the HCV genome.
(A) Selection coefficients (1/replication cycle) along the HCV genome, colored by mutation type (syn = synonymous; nonsynon = nonsynonymous); each dot represents the average at each position across 195 patient samples, and lines represent the sliding window average for 50 bases. (B) Selection coefficients (1/replication cycle) stratified by nucleotide and syn/nonsyn status, colored by mutation type. (C) Estimated mutation frequencies stratified by starting nucleotide and syn/nonsyn status, colored by mutation type. Comparison of (B) and (C) shows higher estimated selection coefficients at A and T sites than at C and G sites, even though mutation frequencies were higher at A and T sites compared to C and G sites.
Fig 5
Fig 5. Factors that affect the fitness cost (selection coefficient) in HCV, based on analysis of 195 viral populations, each derived from a different patient.
(A) Predicted effects from beta regression models on selection coefficients (SC) in the HCV genome, shown together with predicted effects on mutation frequencies (See Fig 3A for description of different factors in the figure.) (B) Top 8 important features identified from the predictive random forest regression model (for all features tested, see S1 Table). (C) Overlapping features ranked in the top 20 in the random forest regression models for mutation frequencies (MF) and selection coefficients (SC).
Fig 6
Fig 6. Occurrence and fitness costs of resistance-associated variants in the HCV genome.
Estimated fitness costs (top) and natural prevalence (bottom) of resistance-associated variants, where each dot represents a mutation frequency observed in a patient. Variant names in black are created by transition and those in brown are created by transversion mutations. RAVs that can be created by different mutations are specified in names, in a format of variant name, nucleotide position, and type of mutation (Tv1 stands for transition mutations that result in C or A and Tv2 stands for transition mutations that results in G or T).

Similar articles

Cited by

References

    1. Bush RM, Bender CA, Subbarao K, Cox NJ, Fitch WM. Predicting the Evolution of Human Influenza A. Science. 1999;286: 1921–1925. Available: https://www.science.org/doi/10.1126/science.286.5446.1921 - DOI - PubMed
    1. Łuksza M, Lässig M. A predictive fitness model for influenza. Nature. 2014;507: 57–61. doi: 10.1038/nature13087 - DOI - PubMed
    1. Gaunt E, Wise HM, Zhang H, Lee LN, Atkinson NJ, Nicol MQ, et al.. Elevation of CpG frequencies in influenza A genome attenuates pathogenicity but enhances host response to infection. Elife. 2016;5: e12735. doi: 10.7554/eLife.12735 - DOI - PMC - PubMed
    1. Crotty S, Cameron CE, Andino R. RNA virus error catastrophe: Direct molecular test by using ribavirin. Proceedings of the National Academy of Sciences. 2001;98: 6895–6900. doi: 10.1073/pnas.111085598 - DOI - PMC - PubMed
    1. Sanjuan R, Nebot MR, Chirico N, Mansky LM, Belshaw R. Viral Mutation Rates. Journal of Virology. 2010;84: 9733–9748. doi: 10.1128/JVI.00694-10 - DOI - PMC - PubMed

Publication types