Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Nov:100:216-223.
doi: 10.1016/j.ijid.2020.08.052. Epub 2020 Aug 22.

A genetic barcode of SARS-CoV-2 for monitoring global distribution of different clades during the COVID-19 pandemic

Affiliations

A genetic barcode of SARS-CoV-2 for monitoring global distribution of different clades during the COVID-19 pandemic

Qingtian Guan et al. Int J Infect Dis. 2020 Nov.

Abstract

Objective: The SARS-CoV-2 pathogen has established endemicity in humans. This necessitates the development of rapid genetic surveillance methodologies to serve as an adjunct with existing comprehensive, albeit though slower, genome sequencing-driven approaches.

Methods: A total of 21,789 complete genomes were downloaded from GISAID on May 28, 2020 for analyses. We have defined the major clades and subclades of circulating SARS-CoV-2 genomes. A rapid sequencing-based genotyping protocol was developed and tested on SARS-CoV-2-positive RNA samples by next-generation sequencing.

Results: We describe 11 major mutations which defined five major clades (G614, S84, V251, I378 and D392) of globally circulating viral populations. The clades can specifically identify using an 11-nucleotide genetic barcode. An analysis of amino acid variation in SARS-CoV-2 proteins provided evidence of substitution events in the viral proteins involved in both host entry and genome replication.

Conclusion: Globally circulating SARS-CoV-2 genomes could be classified into 5 major clades based on mutational profiles defined by an 11-nucleotide barcode. We have successfully developed a multiplexed sequencing-based, rapid genotyping protocol for high-throughput classification of major clade types of SARS-CoV-2 in clinical samples. This barcoding strategy will be required to monitor decreases in genetic diversity as treatment and vaccine approaches become widely available.

Keywords: SARS-CoV-2; barcoding; genetic surveillance; genome variation.

PubMed Disclaimer

Figures

Fig. 1
Fig. 1
Clades of SARS-CoV-2. (A) A global SNP-based radial phylogeny of SARS-CoV-2 genomes defining five major clades (G614, S84, V251, I378 and D392) and several subclades based on nucleotide substitution events. (B) A simplified phylogenetic tree to illustrate the evolutionary relationship of the clades/subclades based on random sampling of complete genomes from each subclade. (C) The clade and subclade-defining SNPs for each clade and subclade. *These SNPs are developed independently in more than one clade hence are not clade-defining SNPs (refer to Fig. S2). (D) A comparative guide to clades defined by our study and the lineages recently defined by Rambaut et al. (Rambaut et al. 2020) and GISAID (Shu and McCauley 2017).
Fig. 2
Fig. 2
Global distribution of various major and minor clades of SARS-CoV2 genomes and their relative prevalence over a 6 month period from December 24, 2019 to June 30, 2020 from the outbreak and early stages of the pandemic. The size of each pie chart is proportional to the numbers within each respective clade. The cumulative trend of the clades is shown on the right and the span of time indicates the first and last observed case in each particular clade.
Fig. 3
Fig. 3
Workflow from clinical sample collection, next-generation sequencing and SARS-CoV-2 clade assignment. (A) Schematic representation of the genotyping method described in this study. Positive samples were subjected to RNA extraction and multiplex RT-PCR. The amplicons were purified and prepared for the Illumina library. The sequencing was performed using MiSeq 600 cycles V3 kit and results were analyzed using our clade-defining script. (B) Boxplot of the coverage showing the log fold depth of the 11 clade-defining positions across the 24 SARS-CoV-2 genomes in multiplex sequencing-based genotyping. The primer sequences and PCR products of each pair of the primer are shown in Table 1 and Fig.S3.
Fig. 4
Fig. 4
Mapping of SARS-CoV-2 clade-defining mutations onto the proteins. Nonsynonymous mutations for proteins where the 3D structure was experimentally determined (spike, nsp12/7/8) or can be inferred with reasonable confidence. Mutations are colour-coded as for the corresponding clades in Fig.3(D: magenta; G: light green; I: blue; V: orange). For a detailed analysis, see Fig.S5-11. (A) The structure of the SARS-CoV-2 spike trimer in its open conformation (chains are cyan, magenta and grey) bound to the human receptor ACE2 (black) modeled based on PDB accessions 6m17 and 6vyb. Identified nonsynonymous mutations are shown as spheres in the model. For reasons of visibility only mutations of two of the three spike chains are labeled. memb. indicates the plasma membrane. (B) Fragment comprising residues 180-534 of nsp2, modelled by AlphaFold35. Both clade-defining mutations are located in solvent-exposed regions and would not lead to steric clashes. (C) The substitution A876 T (corresponding to residue A58 in the nsp3 cleavage product numbering) is situated in the N-terminal ubiquitin-like domain of nsp3. The structure of this domain can be inferred based on the 79% identical structure of residues 1-112 from SARS-CoV (PDB id 2idy). The substitution A876 T can be accommodated with only minor structural adjustments and is not expected to have a substantial influence on the protein stability or function. (D) The structure shows the nsp12 in complex with nsp7 (magenta) and nsp8 (cyan and teal), based on PDB 7btf. P4720 (P323 in nsp12 numbering) is located in the ‘interface domain’ (black). In this position, the P323 L substitution is not predicted to disrupt the folding or protein interactions and hence is not expected to have strong effects. (E) A theoretical model for the Orf3a monomer has been proposed by AlphaFold36. The structure-function relationship of this protein remains to be clarified. The mutation G251 V is located C-terminal to the β-sandwich domain and the tail (marked by an asterisk).

References

    1. Arnold K., Bordoli L., Kopp J., Schwede T. The SWISS-MODEL workspace: A web-based environment for protein structure homology modelling. Bioinformatics. 2006 - PubMed
    1. Báez-Santos Y.M., St. John SE, Mesecar AD. The SARS-coronavirus papain-like protease: Structure, function and inhibition by designed antiviral compounds. Antiviral Research. 2015 - PMC - PubMed
    1. Bárcena M., Oostergetel G.T., Bartelink W., Faas F.G.A., Verkleij A., Rottier P.J.M. Cryo-electron tomography of mouse hepatitis virus: Insights into the structure of the coronavirion. Proc Natl Acad Sci U S A. 2009 - PMC - PubMed
    1. Brown K.E., Rota P.A., Goodson J.L., Williams D., Abernathy E., Takeda M. Genetic characterization of measles and rubella viruses detected through global measles and rubella elimination surveillance, 2016-2018. Morb Mortal Wkly Rep. 2019 - PMC - PubMed
    1. Capella-Gutiérrez S., Silla-Martínez J.M. Gabaldón T. trimAl: A tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics. 2009;25(15):1972–1973. - PMC - PubMed