Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Sep:83:104351.
doi: 10.1016/j.meegid.2020.104351. Epub 2020 May 5.

Emergence of genomic diversity and recurrent mutations in SARS-CoV-2

Affiliations

Emergence of genomic diversity and recurrent mutations in SARS-CoV-2

Lucy van Dorp et al. Infect Genet Evol. 2020 Sep.

Abstract

SARS-CoV-2 is a SARS-like coronavirus of likely zoonotic origin first identified in December 2019 in Wuhan, the capital of China's Hubei province. The virus has since spread globally, resulting in the currently ongoing COVID-19 pandemic. The first whole genome sequence was published on January 5 2020, and thousands of genomes have been sequenced since this date. This resource allows unprecedented insights into the past demography of SARS-CoV-2 but also monitoring of how the virus is adapting to its novel human host, providing information to direct drug and vaccine design. We curated a dataset of 7666 public genome assemblies and analysed the emergence of genomic diversity over time. Our results are in line with previous estimates and point to all sequences sharing a common ancestor towards the end of 2019, supporting this as the period when SARS-CoV-2 jumped into its human host. Due to extensive transmission, the genetic diversity of the virus in several countries recapitulates a large fraction of its worldwide genetic diversity. We identify regions of the SARS-CoV-2 genome that have remained largely invariant to date, and others that have already accumulated diversity. By focusing on mutations which have emerged independently multiple times (homoplasies), we identify 198 filtered recurrent mutations in the SARS-CoV-2 genome. Nearly 80% of the recurrent mutations produced non-synonymous changes at the protein level, suggesting possible ongoing adaptation of SARS-CoV-2. Three sites in Orf1ab in the regions encoding Nsp6, Nsp11, Nsp13, and one in the Spike protein are characterised by a particularly large number of recurrent mutations (>15 events) which may signpost convergent evolution and are of particular interest in the context of adaptation of SARS-CoV-2 to the human host. We additionally provide an interactive user-friendly web-application to query the alignment of the 7666 SARS-CoV-2 genomes.

Keywords: Betacoronavirus; Homoplasies; Mutation; Phylogenetics.

PubMed Disclaimer

Conflict of interest statement

Declaration of Competing Interest The authors have no competing interests to declare.

Figures

Fig. 1
Fig. 1
Global sequencing efforts have contributed hugely to our understanding of the genomic diversity of SARS-CoV-2. a) Viral assemblies available from global regions as of 19/04/2020. b) Cumulative total of viral assemblies uploaded to GISAID included in our analysis. c) Radial Maximum Likelihood phylogeny for 7666 complete SARS-CoV-2 genomes. Colours represent continents where isolates were collected. Green: Asia; Red: Europe; Purple: North America; Orange: Oceania; Dark blue: South America according to metadata annotations available on NextStrain (https://github.com/nextstrain/ncov/tree/master/data). (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Fig. 2
Fig. 2
Genomic diversity of SARSCoV-2 in the USA, UK, Iceland and China. Strains collected from all four countries are highlighted on the global phylogenetic tree. a) Strains collected in the USA shown in purple. b) Strains from the UK shown in red. c) Strains collected in Iceland shown in red. d) Strains collected in China shown in green. Regional colours match to the global phylogeny shown in Fig. 1c. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Fig. 3
Fig. 3
Inspection of a major homoplastic site in Orf1ab of SARS-CoV-2 genome (position 11,083). Panel A shows a colour-coded schematic of the SARS-CoV-2 genome annotated as per NC_045512.2 and a plot of all potential homoplastic sites in Orf1ab measured as minimal number of character-state changes on a Maximum Parsimony tree (see Methods). Exemplar homoplasy (denoted with *) has been shown on the radial ML phylogenetic tree in panel B. Panel C shows the distribution of cophenetic distances between isolates carrying the identified homoplasy (red) and the distribution for all isolates (grey), showing that isolates with the homoplasy tend to cluster in the phylogeny. Equivalent figures for other filtered homoplasies are generated as part of the filtering method (see Methods). (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)

References

    1. Anderson R.M., May R.M. Oxford University Press; Oxford: 1991. Infectious Diseases of Humans. Dynamics and Control.
    1. Balloux F., van Dorp L. Q&a: what are pathogens, and what have they done to and for us? BMC Biol. 2017;15:6. - PMC - PubMed
    1. Cagliani R., Forni D., Clerici M., Sironi M. Computational inference of selection underlying the evolution of the novel coronavirus, SARS-CoV-2. J. Virol. 2020 - PMC - PubMed
    1. Crispell J., Balaz D., Gordon S.V. HomoplasyFinder: a simple tool to identify homoplasies on a phylogeny. Microbial Genom. 2019;5(1):10. - PMC - PubMed
    1. Dearlove B.L., Lewitus E., Bai H., Li Y., Reeves D.B., Joyce M.G. A SARS-CoV-2 vaccine candidate would likely match all currently circulating strains. bioRxiv. 2020 2020.04.27.064774. - PMC - PubMed