Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Mar 29;8(1):veac029.
doi: 10.1093/ve/veac029. eCollection 2022.

Genome-wide diversity of Zika virus: Exploring spatio-temporal dynamics to guide a new nomenclature proposal

Affiliations

Genome-wide diversity of Zika virus: Exploring spatio-temporal dynamics to guide a new nomenclature proposal

Sofia G Seabra et al. Virus Evol. .

Abstract

The Zika virus (ZIKV) disease caused a public health emergency of international concern that started in February 2016. The overall number of ZIKV-related cases increased until November 2016, after which it declined sharply. While the evaluation of the potential risk and impact of future arbovirus epidemics remains challenging, intensified surveillance efforts along with a scale-up of ZIKV whole-genome sequencing provide an opportunity to understand the patterns of genetic diversity, evolution, and spread of ZIKV. However, a classification system that reflects the true extent of ZIKV genetic variation is lacking. Our objective was to characterize ZIKV genetic diversity and phylodynamics, identify genomic footprints of differentiation patterns, and propose a dynamic classification system that reflects its divergence levels. We analysed a curated dataset of 762 publicly available sequences spanning the full-length coding region of ZIKV from across its geographical span and collected between 1947 and 2021. The definition of genetic groups was based on comprehensive evolutionary dynamics analyses, which included recombination and phylogenetic analyses, within- and between-group pairwise genetic distances comparison, detection of selective pressure, and clustering analyses. Evidence for potential recombination events was detected in a few sequences. However, we argue that these events are likely due to sequencing errors as proposed in previous studies. There was evidence of strong purifying selection, widespread across the genome, as also detected for other arboviruses. A total of 50 sites showed evidence of positive selection, and for a few of these sites, there was amino acid (AA) differentiation between genetic clusters. Two main genetic clusters were defined, ZA and ZB, which correspond to the already characterized 'African' and 'Asian' genotypes, respectively. Within ZB, two subgroups, ZB.1 and ZB.2, represent the Asiatic and the American (and Oceania) lineages, respectively. ZB.1 is further subdivided into ZB.1.0 (a basal Malaysia sequence sampled in the 1960s and a recent Indian sequence), ZB.1.1 (South-Eastern Asia, Southern Asia, and Micronesia sequences), and ZB.1.2 (very similar sequences from the outbreak in Singapore). ZB.2 is subdivided into ZB.2.0 (basal American sequences and the sequences from French Polynesia, the putative origin of South America introduction), ZB.2.1 (Central America), and ZB.2.2 (Caribbean and North America). This classification system does not use geographical references and is flexible to accommodate potential future lineages. It will be a helpful tool for studies that involve analyses of ZIKV genomic variation and its association with pathogenicity and serve as a starting point for the public health surveillance and response to on-going and future epidemics and to outbreaks that lead to the emergence of new variants.

Keywords: Zika virus; arbovirus; evolutionary biology; molecular epidemiology; phylogeography.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
ML tree of the full dataset (759 sequences). Colour annotations are given in the circles around the terminal nodes. From inner to outer circle: geographical regions (same colours as in the map); host; proposed classification. Red circles indicate support values SH-aLRT ≥80 per cent and UFBoot ≥95 per cent. A version of this tree including the node labels is provided in Supplementary Fig. S3.
Figure 2.
Figure 2.
A) Entropy per nucleotide position. B) Entropy per AA position, with labelled AAs having entropy values higher than 0.2. Shaded region between NS4A and NS4B is the peptide 2K.
Figure 3.
Figure 3.
TCS haplotype network, with nodes coloured by geographical region. A total of 610 sequences and 714 segregating sites were used to construct the TCS network. Sizes of the nodes are proportional to the number of sequences in that node. The perpendicular dashes on the branches connecting two nodes represent the number of mutations between those nodes. The drawn polygons represent the proposed classification based on the clusterings from HierBAPS. A version of this network including the node labels is provided in Supplementary Fig. S6.
Figure 4.
Figure 4.
ML tree of the ‘Afro-Asian’ dataset (759 sequences). Colour annotations represent the clusterings and proposed nomenclature. Branch support values for the clusterings were obtained with SH-aLRT test (first value) and ultrafast bootstrap (second value).
Figure 5.
Figure 5.
Histograms of the pairwise genetic distances (nucleotide substitutions per sites) for each clustering.
Figure 6.
Figure 6.
ACR of ZIKV geographical locations, either countries (top) or regions (bottom). The compressed visualizations were obtained in PastML from the rooted ML tree, where each node represents the ancestral state (geographical region), and the size of the node is proportional to the number of tips collapsed into that node. This represents the transmissions happening in the same geographical regions and with the same source within that region. The marginal probability of each node being in the state represented is shown on top of the node. The colours correspond to the geographical regions. The results of ACR for the time tree are found in Supplementary Fig. S7.

References

    1. Alcantara L. C. J. et al. (2009) ‘A Standardized Framework for Accurate, High-throughput Genotyping of Recombinant and Non-recombinant Viral Sequences’, Nucleic Acids Research, 37: 634–42. - PMC - PubMed
    1. Aubry F. et al. (2021) ‘Recent African Strains of Zika Virus Display Higher Transmissibility and Fetal Pathogenicity than Asian Strains’, Nature Communications, 12: 1–14. - PMC - PubMed
    1. Benson D. A. et al. (2013) ‘GenBank’, Nucleic Acids Research, 41: 36–42. - PMC - PubMed
    1. Boni M. F., Posada D., and Feldman M. W. (2007) ‘An Exact Nonparametric Method for Inferring Mosaic Structure in Sequence Triplets’, Genetics, 176: 1035–47. - PMC - PubMed
    1. Bruen T. C., Philippe H., and Bryant D. (2006) ‘A Simple and Robust Statistical Test for Detecting the Presence of Recombination’, Genetics, 172: 2665–81. - PMC - PubMed