Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Jun 29;10(3):e0071622.
doi: 10.1128/spectrum.00716-22. Epub 2022 Jun 6.

Insertion-and-Deletion Mutations between the Genomes of SARS-CoV, SARS-CoV-2, and Bat Coronavirus RaTG13

Affiliations

Insertion-and-Deletion Mutations between the Genomes of SARS-CoV, SARS-CoV-2, and Bat Coronavirus RaTG13

Tetsuya Akaishi. Microbiol Spectr. .

Abstract

The evolutional process of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) development remains inconclusive. This study compared the genome sequences of severe acute respiratory syndrome coronavirus (SARS-CoV), bat coronavirus RaTG13, and SARS-CoV-2. In total, the genomes of SARS-CoV-2 and RaTG13 were 77.9% and 77.7% identical to the genome of SARS-CoV, respectively. A total of 3.6% (1,068 bases) of the SARS-CoV-2 genome was derived from insertion and/or deletion (indel) mutations, and 18.6% (5,548 bases) was from point mutations from the genome of SARS-CoV. At least 35 indel sites were confirmed in the genome of SARS-CoV-2, in which 17 were with ≥10 consecutive bases long. Ten of these relatively long indels were located in the spike (S) gene, five in nonstructural protein 3 (Nsp3) gene of open reading frame (ORF) 1a, and one in ORF8 and noncoding region. Seventeen (48.6%) of the 35 indels were based on insertion-and-deletion mutations with exchanged gene sequences of 7-325 consecutive bases. Almost the complete ORF8 gene was replaced by a single 325 consecutive base-long indel. The distribution of these indels was roughly in accordance with the distribution of the rate of point mutation rate around the indels. The genome sequence of SARS-CoV-2 was 96.0% identical to that of RaTG13. There was no long insertion-and-deletion mutation between the genomes of RaTG13 and SARS-CoV-2. The findings of the uneven distribution of multiple indels and the presence of multiple long insertion-and-deletion mutations with exchanged consecutive base sequences in the viral genome may provide insights into SARS-CoV-2 development. IMPORTANCE The developmental mechanism of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) remains inconclusive. This study compared the base sequence one-by-one between severe acute respiratory syndrome coronavirus (SARS-CoV) or bat coronavirus RaTG13 and SARS-CoV-2. The genomes of SARS-CoV-2 and RaTG13 were 77.9% and 77.7% identical to the genome of SARS-CoV, respectively. Seventeen of the 35 sites with insertion and/or deletion mutations between SARS-CoV-2 and SARS-CoV were based on insertion-and-deletion mutations with the replacement of 7-325 consecutive bases. Most of these long insertion-and-deletion sites were concentrated in the nonstructural protein 3 (Nsp3) gene of open reading frame (ORF) 1a, S1 domain of the spike protein, and ORF8 genes. Such long insertion-and-deletion mutations were not observed between the genomes of RaTG13 and SARS-CoV-2. The presence of multiple long insertion-and-deletion mutations in the genome of SARS-CoV-2 and their uneven distributions may provide further insights into the development of the virus.

Keywords: bat coronavirus RaTG13; coronavirus disease 2019 (COVID-19); insertion-and-deletion mutation; mutation; severe acute respiratory syndrome coronavirus (SARS-CoV); severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2).

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

FIG 1
FIG 1
Spike gene sequence of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and mutation types in severe acute respiratory syndrome coronavirus (SARS-CoV). A comparison of S gene sequence between SARS-CoV and SARS-CoV-2 revealed that at least seven 14–245 base-long insertion-and-deletion mutations were concentrated in the N-terminal domain (NTD) of the S1 gene. The shown point substitution rates (26.5% for S1 and 18.9% for S2) are for the sequences excluding the insertion and/or deletion sites. If these excluded sites are included in the count, the mutation rate in the S1 gene increases up to 45.0%. Turquoise blue, yellow, blue, pink, and gray colors indicate reserved bases, substituted bases by point mutations, mutated bases with insertions, mutated bases based on insertion-and-deletion mutation with preserved base size, and mutated bases based on insertion-and-deletion mutations with changed base size, respectively.
FIG 2
FIG 2
Examples of insertion-and-deletion mutations in the N-terminal domain of spike gene. In the present study, gene mutations were categorized into the following four general subtypes: point mutation, insertion, deletion, and insertion-and-deletion mutation. With an insertion-and-deletion mutation, consecutive bases were exchanged by totally different sequences with the same or different base size. Most of the observed insertion-and-deletion mutations in spike N-terminal domain involved ≥10 consecutive bases and resulted in changed base sizes.
FIG 3
FIG 3
Amino acid substitution status in SARS-CoV-2 spike protein compared with SARS-CoV. The substitution status of amino acids (AA) in SARS-CoV-2 spike protein compared with those in SARS-CoV spike protein is shown. As similar to the base substitution status, amino acid substitutions were also concentrated in the S1 domain, especially in the N-terminal domain. The insertion-and-deletion mutations in S1 gene resulted in totally different amino acid sequences with the replacements of consecutive amino acids.
FIG 4
FIG 4
Molecular structures of the spike protein in SARS-CoV and SARS-CoV-2. Three-dimensional molecular structures of the S protein (closed state) in SARS-CoV and those in SARS-CoV-2 with closed and open states are shown. (Top) overall pictures of these proteins; (bottom) enlarged views of their S1 domains. The S1 NTD is shown in blue, receptor-binding domain (RBD) is in yellow, S2 domain is in green, and other subdomains including S1 CTD are in gray. The indel sites in SARS-CoV-2 S are shown as the consecutive amino acids colored in red. Conformational changes in SARS-CoV-2 spike NTD and RBD, compared with those in SARS-CoV, can be seen, and the RBD of SARS-CoV-2 are more centralized to the central pore than those of SARS-CoV. CTD, C-terminal domain; H-bonds, hydrogen bonds; indel: insertion and/or deletion; PDB, Protein Data Bank; S, spike.
FIG 5
FIG 5
Sequences of structural proteins in SARS-CoV-2 and the mutations in SARS-CoV. The shown point substitution rates are for the sequences excluding the gray-colored insertion–deletion sites (sequences with preserved bases or point substitutions). Turquoise blue, yellow, and blue colors indicate reserved bases, bases substituted by point mutations, and bases with insertions, respectively.
FIG 6
FIG 6
Sequences of ORF3a–ORF10 in SARS-CoV-2 and the mutations in SARS-CoV. The mutation rate in the ORF10 gene was significantly lower than that in other ORF family genes. Almost the complete ORF8 was substituted by a 333 consecutive base-long insertion-and-deletion mutation. Turquoise blue, yellow, blue, and gray colors indicate reserved bases, bases substituted by point mutations, bases with insertion, and mutated bases based on insertion-and-deletion mutations with changed base size, respectively. ORF, open reading frame.
FIG 7
FIG 7
Evolution of mutations in noncoding regions in the three betacoronaviruses. Insertions or deletions occurred in four of the nine noncoding regions between coding genes, among which two of the three coincidentally occurred in the Kozak sequence-related positions (−3 to −1 positions from the start codon “aug”). One mutation upstream of the M gene realized the ideal Kozak motif of “gcc” in RaTG13 and SARS-CoV-2. Interestingly, noncoding regions upstream of the S gene and ORF7a gene contained different insertions or deletions at exactly the same position in RaTG13 and SARS-CoV-2. Turquoise blue, yellow, and gray colors indicate reserved bases, bases substituted by point mutations, and mutated bases based on insertion and/or deletion mutations with changed base size, respectively.
FIG 8
FIG 8
Distribution of insertion and/or deletion mutations on the genomes of SARS-CoV-2 and RaTG13. (a) Medium- to large-sized insertion and/or deletion mutations with approximately 10–300 consecutive base long were concentrated in the Nsp3 gene of the ORF1a, S1 domain of S gene, and ORF8 gene. (b) The line graphs show the simple moving average (±50 bases) for the point mutation rate across the genome of SARS-CoV-2, compared to the genome of SARS-CoV (top) or RaTG13 (bottom). The distribution of the indels (red and blue bars) was roughly matched to the distribution of the point mutation rate. (c) The line graph shows the simple moving average (±50 bases) for the point mutation rate across the genome of RaTG13, compared to that of SARS-CoV. The pattern of the point mutation rate and the distribution of indels roughly matched to those between SARS-CoV and SARS-CoV-2.

References

    1. Tang X, Wu C, Li X, Song Y, Yao X, Wu X, Duan Y, Zhang H, Wang Y, Qian Z, Cui J, Lu J. 2020. On the origin and continuing evolution of SARS-CoV-2. Natl Sci Rev 7:1012–1023. doi:10.1093/nsr/nwaa036. - DOI - PMC - PubMed
    1. Boni MF, Lemey P, Jiang X, Lam TT-Y, Perry BW, Castoe TA, Rambaut A, Robertson DL. 2020. Evolutionary origins of the SARS-CoV-2 sarbecovirus lineage responsible for the COVID-19 pandemic. Nat Microbiol 5:1408–1417. doi:10.1038/s41564-020-0771-4. - DOI - PubMed
    1. Cui J, Li F, Shi ZL. 2019. Origin and evolution of pathogenic coronaviruses. Nat Rev Microbiol 17:181–192. doi:10.1038/s41579-018-0118-9. - DOI - PMC - PubMed
    1. Segreto R, Deigin Y. 2021. The genetic structure of SARS-CoV-2 does not rule out a laboratory origin: SARS-COV-2 chimeric structure and furin cleavage site might be the result of genetic manipulation. Bioessays 43:e2000240. doi:10.1002/bies.202000240. - DOI - PMC - PubMed
    1. Sirotkin K, Sirotkin D. 2020. Might SARS-CoV-2 have arisen via serial passage through an animal host or cell culture?: a potential explanation for much of the novel coronavirus' distinctive genome. Bioessays 42:e2000091. doi:10.1002/bies.202000091. - DOI - PMC - PubMed