Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Review
. 2022 Sep;23(9):547-562.
doi: 10.1038/s41576-022-00483-8. Epub 2022 Apr 22.

Phylogenetic and phylodynamic approaches to understanding and combating the early SARS-CoV-2 pandemic

Affiliations
Review

Phylogenetic and phylodynamic approaches to understanding and combating the early SARS-CoV-2 pandemic

Stephen W Attwood et al. Nat Rev Genet. 2022 Sep.

Abstract

Determining the transmissibility, prevalence and patterns of movement of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infections is central to our understanding of the impact of the pandemic and to the design of effective control strategies. Phylogenies (evolutionary trees) have provided key insights into the international spread of SARS-CoV-2 and enabled investigation of individual outbreaks and transmission chains in specific settings. Phylodynamic approaches combine evolutionary, demographic and epidemiological concepts and have helped track virus genetic changes, identify emerging variants and inform public health strategy. Here, we review and synthesize studies that illustrate how phylogenetic and phylodynamic techniques were applied during the first year of the pandemic, and summarize their contributions to our understanding of SARS-CoV-2 transmission and control.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Phylodynamic approaches to the investigation of SARS-CoV-2 transmission.
Relevant clinical and public health questions are defined (top row), phylodynamic and epidemiological data and models are then combined (middle row), and used in combined or joint analyses to provide actionable insight into virus transmission (bottom row). a | Phylogenetic approaches estimate the rate of international lineage introductions and distinguish introductions from community transmission. b | Genome sequences and phylogenetics support outbreak analyses by identifying or refuting links between local cases; this can lead to identification of outbreak sources and drivers or assessment of nosocomial transmission. c | Phylodynamic techniques using epidemiological demographic models, such as the susceptible–exposed–infected–recovered (SEIR) model, allow us to compare transmission rates between lineages bearing different key genotypes (for example, variants of concern (VOCs) and pre-existing lineages). d | Relative timing of variant and lineage emergence from the global (or regional) phylogeny, and scattering of case genomes across clades can distinguish persistent from repeat infections in some scenarios. Phylogenetics is also useful in studies of lineage turnover and interactions within the host. Panel colours indicate related themes: blue, public health; green, epidemiological parameters; red, clinical parameters. TMRCA, time to the most recent common ancestor.
Fig. 2
Fig. 2. The emergence of E484-bearing lineages from late 2020 to March 2021.
Spike amino acid mutations and deletions are shown as symbols on the pins marking the approximate locations of first detection. The symbols include only those mutations that were implicated in possible immune escape or as suspected drivers of lineage growth and that were shared by two or more lineages. The locality of first detection may not be that of the lineage’s origin; however, the intercontinental spread of first detections is consistent with multiple independent origins. The B.1.1.7 lineage coloured in red differs from the other B.1.1.7 viruses in that it bears S494P rather than a substitution at E484. Lineage B.1.617 bears E484Q rather than E484K. Some lineages (B.1.1.7 and A.23.1) also have members that lack E484K, and some virus genotypes may have arisen multiple times (for example, B.1.1.7 with E484K). The near coincidental first detection of the same variants in genomes of phylogenetically distant lineages in countries worldwide, in early 2020, is a clear sign of convergent evolution and was a major factor leading to numerous studies aimed at detecting any selective advantage of the variants of concern (VOCs), including the search for vaccine escape phenotypes. Lineages and variants are based on the following publications: A.23.1 (ref.); B.1.1.318, B.1.1.7 + E484K, B.1.1.7 + S494P, B.1.324.1 (ref.); B.1.351 (refs,,); B.1.525 (ref.); B.1.617 (ref.); P.1 (ref.); P.2 (refs,); P.3 (ref.). Note that B.1.324.1 was not designated as a sublineage of B.1.324, and reference here is to the variant described as B.1.324.1 in the Technical briefing Table 17 of ref.. Pin heights indicate time relative to detection of the first lineage, that is, P.2 in Rio de Janeiro, 13 October 2020 (not to scale, but ranked in time, with days since detection of P.2 marked on each pin).
Fig. 3
Fig. 3. Convergent evolution of SARS-CoV-2 spike protein.
a | Phylogenies for the first year of the pandemic show the independent emergence of spike ΔH69/V70, indicated in red, in genomes of the B.1.1.7 and B.1.258 lineages respectively — note, the B.1.258 clade in red includes some branches without the deletion. Phylogeny from Nextstrain, (which used data from the Europe ncov GISAID data set), visualized in Figtree. Acknowledgements of authors responsible for the genetic sequence data generated, shared via the GISAID initiative and used to generate the Nextstrain tree, may be found in Supplementary Table 1. For clarity, not all Pango lineages are shown. b | By the start of 2020 several commonly occurring spike substitutions and deletions had been recognized as shared between lineages. The illustrated substitutions are found in the exposed (that is, outermost on the surface of the virion) subunit of spike, termed S1, or in the spike N-terminal domain (NTD), and are those shared by variants of interest or concern, excluding those shared sporadically or in minor sublineages. B.1.351 and P.1 share K417T/N and (in some B.1.351 sublineages) L18F, as well as two other recurrent substitutions; this is indicated by the overlap of their extended shading. ‘Mink’ refers to the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) mink–human sublineage, termed ‘cluster 5’, which exhibited ΔH69/V70 and N501T (and other spike substitutions); the second B.1.1.7 lineage (VOC-202102/02, the grey ellipse with broken-line border) is a cluster of B.1.1.7 that also bears E484K. N501T is a homoplasy that emerged in mink and may have transferred to humans; it is relatively uncommon, as it was found in only five mink in the original mink farm epidemic in Denmark. Nevertheless, N501T seemed to have emerged independently four times and has been detected in ten human cases. L18F is an NTD substitution found in some B.1.351 and several of its sublineages, and it is increasing in frequency in B.1.1.7 (ref.). As in Fig. 2, we see that the same substitutions appear in multiple lineages, implying that they arose independently at different times and places. Here, we also see that not only are individual substitutions shared, but constellations of several changes also seem to co-occur in more than one lineage; this suggests epistatic interactions, with perhaps compensatory changes following immune escape variants.
Fig. 4
Fig. 4. Effects of within-host evolution and dynamics on epidemiological observations.
Phylogenetic and phylodynamic approaches help detect and understand complex infections, measure within-patient lineage turnover and explore how host-induced mutation affects outbreak investigations. a | Co-infections may confound transmissibility and aetiological studies, but they can be detected using phylogenetics. Specifically, co-infections are identified when viral genomes sequenced from multiple isolates from the same patient are not monophyletic. b | Lineage turnover can occur if within-host lineages share a recent common ancestor and arise from evolution within the host itself. Lineage turnover may complicate patient treatment, as a lineage with lesser susceptibility to host immune responses may give way to a more transmissible lineage after apparently successful completion of a course of therapy. Nevertheless, phylogenetic features, such as longitudinal samples falling into different sister lineages and relative branch lengths, can help detect and account for lineage turnover. c | The antiviral activities of host APOBEC cytidine deaminases, which promote C → U hypermutation, adenosine deaminases that act on RNA (ADARs) and similar host systems, can lead to biases such as C → U homoplasies (convergent evolution) in the case of APOBECs, and changes in virus genome CpG content as a response. Phylogenetics can highlight such convergent changes, which will be seen arising in lineages that are not closely related, and phylogenetic and phylodynamic approaches can be adjusted to account for the elevated rate of particular transitions. d | Co-infections and superinfections can complicate attempts to trace transmission chains, through either lineage turnover or sampling bias (for example, differential PCR amplification or through effects of organotropy). The result can be failure to connect two related transmission chains. A superinfected individual could also cryptically contribute to more than one heterochronous outbreak. The schema shows potential transmission events within households, or similar units (for example, workplaces), in a simplified transmission scenario. The dashed lines indicate transmission events between households. Circles represent individuals, with empty circles indicating infection chains involving lineage 1 and filled circles those involving lineage 2. The red asterisk indicates a co-infected individual who carries both lineages. The phylogeny shows that the true relationship between individuals X and Y may be unclear if lineage 1 dominates the co-infection at the time of sampling.

References

    1. Eickmann M, et al. Phylogeny of the SARS coronavirus. Science. 2003;302:1504–1505. doi: 10.1126/science.302.5650.1504b. - DOI - PubMed
    1. Arias A, et al. Rapid outbreak sequencing of Ebola virus in Sierra Leone identifies transmission chains linked to sporadic cases. Virus Evol. 2016;2:vew016. doi: 10.1093/ve/vew016. - DOI - PMC - PubMed
    1. Dudas G, et al. Virus genomes reveal factors that spread and sustained the Ebola epidemic. Nature. 2017;544:309–315. doi: 10.1038/nature22040. - DOI - PMC - PubMed
    1. Grubaugh ND, et al. Genomic epidemiology reveals multiple introductions of Zika virus into the United States. Nature. 2017;546:401–405. doi: 10.1038/nature22400. - DOI - PMC - PubMed
    1. Ingle DJ, Howden BP, Duchene S. Development of phylodynamic methods for bacterial pathogens. Trends Microbiol. 2021;29:788–797. doi: 10.1016/j.tim.2021.02.008. - DOI - PubMed

Publication types