Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Jan 4:13:1089399.
doi: 10.3389/fmicb.2022.1089399. eCollection 2022.

Variable number tandem repeats of a 9-base insertion in the N-terminal domain of severe acute respiratory syndrome coronavirus 2 spike gene

Affiliations

Variable number tandem repeats of a 9-base insertion in the N-terminal domain of severe acute respiratory syndrome coronavirus 2 spike gene

Tetsuya Akaishi et al. Front Microbiol. .

Abstract

Introduction: The world is still struggling against the pandemic of coronavirus disease 2019 (COVID-19), caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), in 2022. The pandemic has been facilitated by the intermittent emergence of variant strains, which has been explained and classified mainly by the patterns of point mutations of the spike (S) gene. However, the profiles of insertions/deletions (indels) in SARS-CoV-2 genomes during the pandemic remain largely unevaluated yet.

Methods: In this study, we first screened for the genome regions of polymorphic indel sites by performing multiple sequence alignment; then, NCBI BLAST search and GISAID database search were performed to comprehensively investigate the indel profiles at the polymorphic indel hotspot and elucidate the emergence and spread of the indels in time and geographical distribution.

Results: A polymorphic indel hotspot was identified in the N-terminal domain of the S gene at approximately 22,200 nucleotide position, corresponding to 210-215 amino acid positions of SARS-CoV-2 S protein. This polymorphic hotspot was comprised of adjacent 3-base deletion (5'-ATT-3'; Spike_N211del) and 9-base insertion (5'-AGCCAGAAG-3'; Spike_ins214EPE). By performing NCBI BLAST search and GISAID database search, we identified several types of tandem repeats of the 9-base insertion, creating an 18-base insertion (Spike_ins214EPEEPE, Spike_ins214EPDEPE). The results of the searches suggested that the two-cycle tandem repeats of the 9-base insertion were created in November 2021 in Central Europe, whereas the emergence of the original one-cycle 9-base insertion (Spike_ins214EPE) would date back to the middle of 2020 and was away from the Central Europe. The identified 18-base insertions based on 2-cycle tandem repeat of the 9-base insertion were collected between November 2021 and April 2022, suggesting that these mutations could not survive and have been already eliminated.

Discussion: The GISAID database search implied that this polymorphic indel hotspot to be with one of the highest tolerability for incorporating indels in SARS-CoV-2 S gene. In summary, the present study identified a variable number of tandem repeat of 9-base insertion in the N-terminal domain of SARS-CoV-2 S gene, and the repeat could have occurred at different time from the insertion of the original 9-base insertion.

Keywords: BLAST search; GISAID; N-terminal domain; insertions/deletions; severe acute respiratory syndrome coronavirus 2; spike gene; variable number tandem repeats.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

Figure 1
Figure 1
Polymorphic insertion/deletion hotspot in SARS-CoV-2 S1-NTD. The result of the multiple sequence alignment with some of the initially recruited sequences by random selection based on geographical distribution and other additional sequences identified with subsequent BLAST searches. This polymorphic indel site was comprised of two adjacent distinct indels: 3-base deletion and 9-base insertion. The combination of these 3-base deletion and 9-base insertion was confirmed in 5 of the randomly selected initial 20 sequences. Further BLAST searches with virtual RNA sequences of different indel patterns revealed the presence of SARS-CoV-2 strains with a 2-cycle tandem repeat of the 9-base insertion in the past. BLAST, basic local alignment search tool; S1-NTD, N-terminal domain of S1 gene; SARS-CoV-2, severe acute respiratory syndrome coronavirus 2.
Figure 2
Figure 2
BLAST search results and geographic distribution of two-cycle tandem repeat of the 9-base insertion. (A) The aligned sequences at the identified polymorphic indel hotspot in SARS-CoV-2 S1-NTD are shown, together with the numbers of identified sequences with 100% sequence identity for each sequence based on the NCBI BLAST search. Both of the one-cycle and two-cycle tandem repeat of the 9-base insertion were dated from the November 2021 in Switzerland. The two-cycle tandem repeat did not spread across the globe, whereas the one-cycle 9-base insertion rapidly spread across the globe in 2022, estimated to account for 20–50% of the overall SARS-CoV-2 sampled from humans in 2022. (B,C) Geographic distributions of the one-cycle and two-cycles of the 9-base insertion with the nearby 3-base deletion. Although both cycles were suggested to originate in November 2021 in central Europe, the former rapidly spread across the globe, whereas the latter was limited in Switzerland. BLAST, basic local alignment search tool; S1-NTD, N-terminal domain of S1 gene; SARS-CoV-2, severe acute respiratory syndrome coronavirus 2.
Figure 3
Figure 3
GISAID database search for the insertions/deletions in the N-terminal domain and receptor-binding domain of SARS-CoV-2 spike gene. The line graphs show the numbers of sequences sampled from humans with insertions/deletions (indels) at each amino acid position in the N-terminal domain (NTD) and receptor-binding domain (RBD) of the SARS-CoV-2 spike gene, which were registered in the GISAID database by December 01, 2022. The observed variable numbers tandem repeat of this study matches to the peak of insertion at 214 amino acid position of the S1-NTD. The asymmetrical line graphs suggest the different tolerability for incorporating indels between the SARS-CoV-2 S1-NTD and the S1-RBD, with a lower tolerability in the S1 RBD. The Y-axis scale is logarithmic.

Similar articles

References

    1. Akaishi T. (2022a). Comparison of insertion, deletion, and point mutations in the genomes of human adenovirus HAdvC-2 and SARS-CoV-2. Tohoku J. Exp. Med. 258, 23–27. doi: 10.1620/tjem.2022.J049, PMID: - DOI - PubMed
    1. Akaishi T. (2022b). Insertion-and-deletion mutations between the genomes of SARS-CoV, SARS-CoV-2, and bat coronavirus RaTG13. Microbiol. Spectr. 10:e0071622. doi: 10.1128/spectrum.00716-22, PMID: - DOI - PMC - PubMed
    1. Akaishi T., Fujiwara K., Ishii T. (2022a). Insertion/deletion hotspots in the Nsp2, Nsp3, S1, and ORF8 genes of SARS-related coronaviruses. BMC Ecol. Evol. 22:123. doi: 10.1186/s12862-022-02078-7, PMID: - DOI - PMC - PubMed
    1. Akaishi T., Horii A., Ishii T. (2022b). Sequence exchange involving dozens of consecutive bases with external origin in SARS-related coronaviruses. J. Virol. 96:e0100222. doi: 10.1128/jvi.01002-22, PMID: - DOI - PMC - PubMed
    1. Alexandridi M., Mazej J., Palermo E., Hiscott J. (2022). The coronavirus pandemic - 2022: viruses, variants, and vaccines. Cytokine Growth Factor Rev. 63, 1–9. doi: 10.1016/j.cytogfr.2022.02.002, PMID: - DOI - PMC - PubMed