Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Aug 12;368(1626):20130168.
doi: 10.1098/rstb.2013.0168. Print 2013 Sep 19.

Evolutionary analysis of hepatitis C virus gene sequences from 1953

Affiliations

Evolutionary analysis of hepatitis C virus gene sequences from 1953

Rebecca R Gray et al. Philos Trans R Soc Lond B Biol Sci. .

Abstract

Reconstructing the transmission history of infectious diseases in the absence of medical or epidemiological records often relies on the evolutionary analysis of pathogen genetic sequences. The precision of evolutionary estimates of epidemic history can be increased by the inclusion of sequences derived from 'archived' samples that are genetically distinct from contemporary strains. Historical sequences are especially valuable for viral pathogens that circulated for many years before being formally identified, including HIV and the hepatitis C virus (HCV). However, surprisingly few HCV isolates sampled before discovery of the virus in 1989 are currently available. Here, we report and analyse two HCV subgenomic sequences obtained from infected individuals in 1953, which represent the oldest genetic evidence of HCV infection. The pairwise genetic diversity between the two sequences indicates a substantial period of HCV transmission prior to the 1950s, and their inclusion in evolutionary analyses provides new estimates of the common ancestor of HCV in the USA. To explore and validate the evolutionary information provided by these sequences, we used a new phylogenetic molecular clock method to estimate the date of sampling of the archived strains, plus the dates of four more contemporary reference genomes. Despite the short fragments available, we conclude that the archived sequences are consistent with a proposed sampling date of 1953, although statistical uncertainty is large. Our cross-validation analyses suggest that the bias and low statistical power observed here likely arise from a combination of high evolutionary rate heterogeneity and an unstructured, star-like phylogeny. We expect that attempts to date other historical viruses under similar circumstances will meet similar problems.

Keywords: molecular epidemiology; phylogenetics.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Evolutionary analysis of HCV subgenomic sequence fragments obtained from blood sampled in 1953. (a) Pairwise nucleotide diversity. Histogram of all pairwise genetic distances among 170 subtype 1b reference sequences, calculated using a 336 nt region of the NS5b gene. The pairwise genetic distance between the two 1953 sequences is noted with an arrow. (b) ML tree of HCV subtype 1b. All non-1953 sequences were full genomes, and the tree was mid-point rooted. Sampling locations are indicated by the colour of the branch as follows: USA (blue branches), Europe (cyan), Brazil (orange), Japan (magenta) and China (green). The two USA isolates from 1953 are shown in red and highlighted with a dotted line. Branches represent number of substitutions per site according to the scale at the bottom. Selected branches with bootstrap values greater than 70% are noted with an asterisk. (c) Plot of root-to-tip genetic distances against sampling time. Each y-axis value represents the genetic distance from a given tip (sampled sequence) to the root, and the x-axis value represents the corresponding sampling date of that tip. The points and regression line shown in red were obtained from the ML tree presented in (b). The points in black were obtained from 25 ML bootstrap replicate trees, each of which was mid-point rooted.
Figure 2.
Figure 2.
Results of the tip-dating analyses for the two 1953 sequences and four target reference sequences. (a) Marginal posterior probability distributions of sequence sampling dates. The posterior probability densities of the estimated sampling dates for six sequences are shown: (i) sequence US1953a (red) and US1953b (blue); (ii) EU155336 (purple) and EU482849 (grey) and (iii) EU256088 (blue) and HQ110091 (green). (b) Comparison of true and estimated sampling times. Three values are shown for each of the six sequences analysed (the two 1953 sequences plus the four target reference sequences): (i) the mean of the corresponding posterior distribution (squares), (ii) the lower 95% CI of the corresponding posterior distribution (triangles) and (iii) the upper 95% CI of the corresponding posterior distribution (circles). Colours match those used above in (a). The black lines show the best-fit regression for each value, calculated using the four target reference sequences, and subsequently extrapolated back to 1953 (dotted line).
Figure 3.
Figure 3.
Estimates of the epidemic history of HCV subtype 1b in the USA. (a) Estimated mean evolutionary rates with 95% CIs (vertical bars). (b) Estimated dates of the most recent common ancestor of US HCV1b sequences with 95% CIs (vertical bars). Three different molecular clock and coalescent model combinations were tested for each dataset: SC, strict molecular clock and constant size coalescent model; RC, relaxed molecular clock and constant size coalescent model; RBSP, relaxed molecular clock and Bayesian skyline coalescent model. (c) Bayesian skyline plot for HCV subtype 1b in the USA. Estimates include the two 1953 sequences and used a relaxed molecular clock model.

References

    1. Pybus OG, Rambaut A. 2009. Evolutionary analysis of the dynamics of viral infectious disease. Nat. Rev. Genet. 10, 540–550 (doi:10.1038/nrg2583) - DOI - PMC - PubMed
    1. Drummond AJ, Pybus OG, Rambaut A, Forsberg R, Rodrigo AG. 2003. Measurably evolving populations. Trends Ecol. Evol. 18, 481–488 (doi:10.1016/S0169-5347(03)00216-7) - DOI
    1. Taubenberger JK, Reid AH, Krafft AE, Bijwaard KE, Fanning TG. 1997. Initial genetic characterization of the 1918 ‘Spanish’ influenza virus. Science 275, 1793–1796 (doi:10.1126/science.275.5307.1793) - DOI - PubMed
    1. Biagini P, et al. 2012. Variola virus in a 300-year-old Siberian mummy. N. Engl. J. Med. 367, 2057–2059 (doi:10.1056/NEJMc1208124) - DOI - PubMed
    1. Katzourakis A, Gifford RJ. 2010. Endogenous viral elements in animal genomes. PLoS Genet. 6, e1001191 (doi:10.1371/journal.pgen.1001191) - DOI - PMC - PubMed

Publication types

Associated data

LinkOut - more resources