Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
[Preprint]. 2021 Aug 24:2021.04.23.441209.
doi: 10.1101/2021.04.23.441209.

Insertions in SARS-CoV-2 genome caused by template switch and duplications give rise to new variants that merit monitoring

Affiliations

Insertions in SARS-CoV-2 genome caused by template switch and duplications give rise to new variants that merit monitoring

Sofya K Garushyants et al. bioRxiv. .

Update in

Abstract

The appearance of multiple new SARS-CoV-2 variants during the winter of 2020-2021 is a matter of grave concern. Some of these new variants, such as B.1.617.2, B.1.1.7, and B.1.351, manifest higher infectivity and virulence than the earlier SARS-CoV-2 variants, with potential dramatic effects on the course of the COVID-19 pandemic. So far, analysis of new SARS-CoV-2 variants focused primarily on point nucleotide substitutions and short deletions that are readily identifiable by comparison to consensus genome sequences. In contrast, insertions have largely escaped the attention of researchers although the furin site insert in the spike protein is thought to be a determinant of SARS-CoV-2 virulence and other inserts might have contributed to coronavirus pathogenicity as well. Here, we investigate insertions in SARS-CoV-2 genomes and identify 347 unique inserts of different lengths. We present evidence that these inserts reflect actual virus variance rather than sequencing errors. Two principal mechanisms appear to account for the inserts in the SARS-CoV-2 genomes, polymerase slippage and template switch that might be associated with the synthesis of subgenomic RNAs. We show that inserts in the Spike glycoprotein can affect its antigenic properties and thus merit monitoring. At least, three inserts in the N-terminal domain of the Spike (ins245IME, ins246DSWG, and ins248SSLT) that were first detected in 2021 are predicted to lead to escape from neutralizing antibodies, whereas other inserts might result in escape from T-cell immunity.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.. Insertions in SARS-CoV-2 genomes.
(a) Distribution of insert lengths. (b) Nucleotide composition of inserts of different lengths and full SARS-CoV-2 genome. (c) Distribution of inserts along the genome. Each triangle represents one insertion event. The level of confidence in each variant is represented by color: dark green, confirmed by sequencing read analysis; green, monophyletic in the tree, no read data available; light green, observed multiple times, but not monophyletic; grey, singletons (Supplementary Table 2). The positions of inserts are marked with grey dashed lines. (d) Experimental data on SARS-CoV-2 transcriptome showing template switch hotspots during the formation of sgRNAs, showing the distribution of junction reads connecting recombination hotspots along the genome. (e) Distance from inserts to closest template switching hotspot site (green) compared with random expectation (grey). Wilcoxon rank sum test p-value is provided. (f) The number of inserts that occur in structured regions of SARS-CoV-2 genomic RNA (blue) compared with random expectation (grey). Permutation test p-value is provided. The data on SARS-CoV-2 structure was obtained from.
Figure 2.
Figure 2.. Long insertions possibly occur through template switching and formation of nc sgRNAs.
(a) Each triangle shows an independent insertion event, colored as in Fig. 1. Curves on the upper side of the plot connect the insertion origin site and insertion position, brown color indicates that the origin sequence is on the same strand, and grey color shows that the origin sequence is on complementary strand, Curves at the bottom of the plot represent the experimental data on sgRNAs from Kim et al.. Curves highlighted in violet correspond to the three cases when insert and corresponding origin site co-occur with sgRNA junctions. The SARS-CoV-2 genes are colored as in Fig. 1. Permutation tests show the number of template switches co-occurring with RdRp jumps (x-axis) expected at random (blue), (b) when only the positions of the origins were randomly sampled 10000 times from the genome. (c) when both ends were randomly sampled. Red vertical line represents the number observed in data.
Figure 3.
Figure 3.. Location of insertion sites in SARS-CoV-2 S protein.
(a) Surface representation showing that all observed insertions can potentially change surface properties (PDB ID: 7cn8). (b) Ins 245, 246 and 248 are located on the surface interacting with 4A8 antibody (PDB ID: 7cl2). Enlarged is the interacting surface. Cyan, N-terminal domain (NTD), wheat, receptor-binding domain (RBD), dark red, receptor binding motif (RBM), aquamarine, heavy chain of the 4A8 antibody (PDB ID: 7cl2). Each insertion is shown in a distinct color. The models for each insertion were generated with the SWISS-model web server. (c) Location of insertions in the genome of SARS-CoV-2. Full description of insertions is provided in the Supplementary Tables 4. Triangle size is proportional to the insert length.

References

    1. Candido D. S. et al. Evolution and epidemic spread of SARS-CoV-2 in Brazil. Science 369, 1255–1260 (2020). - PMC - PubMed
    1. du Plessis L. et al. Establishment and lineage dynamics of the SARS-CoV-2 epidemic in the UK. Science 371, 708–712 (2021). - PMC - PubMed
    1. Munnink B. B. O. et al. Jumping back and forth: anthropozoonotic and zoonotic transmission of SARS-CoV-2 on mink farms. bioRxiv 2020.09.01.277152 (2020) doi: 10.1101/2020.09.01.277152. - DOI
    1. Komissarov A. B. et al. Genomic epidemiology of the early stages of the SARS-CoV-2 outbreak in Russia. Nat. Commun. 12, 649 (2021). - PMC - PubMed
    1. Martin D. P. et al. The emergence and ongoing convergent evolution of the N501Y lineages coincides with a major global shift in the SARS-CoV-2 selective landscape. medRxiv 2021.02.23.21252268 (2021) doi: 10.1101/2021.02.23.21252268. - DOI - PMC - PubMed

Publication types