Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Jan 12;10(1):52.
doi: 10.3390/biology10010052.

The Long-Term Evolutionary History of Gradual Reduction of CpG Dinucleotides in the SARS-CoV-2 Lineage

Affiliations

The Long-Term Evolutionary History of Gradual Reduction of CpG Dinucleotides in the SARS-CoV-2 Lineage

Sankar Subramanian. Biology (Basel). .

Abstract

Recent studies suggested that the fraction of CG dinucleotides (CpG) is severely reduced in the genome of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). The CpG deficiency was predicted to be the adaptive response of the virus to evade degradation of the viral RNA by the antiviral zinc finger protein that specifically binds to CpG nucleotides. By comparing all representative genomes belonging to the genus Betacoronavirus, this study examined the potential time of origin of CpG depletion. The results of this investigation revealed a highly significant correlation between the proportions of CpG nucleotide (CpG content) of the betacoronavirus species and their times of divergence from SARS-CoV-2. Species that are distantly related to SARS-CoV-2 had much higher CpG contents than that of SARS-CoV-2. Conversely, closely related species had low CpG contents that are similar to or slightly higher than that of SARS-CoV-2. These results suggest a systematic and continuous reduction in the CpG content in the SARS-CoV-2 lineage that might have started since the Sarbecovirus + Hibecovirus clade separated from Nobecovirus, which was estimated to be 1213 years ago. This depletion was not found to be mediated by the GC contents of the genomes. Our results also showed that the depletion of CpG occurred at neutral positions of the genome as well as those under selection. The latter is evident from the progressive reduction in the proportion of arginine amino acid (coded by CpG dinucleotides) in the SARS-CoV-2 lineage over time. The results of this study suggest that shedding CpG nucleotides from their genome is a continuing process in this viral lineage, potentially to escape from their host defense mechanisms.

Keywords: COVID-19; CpG dinucleotide; SARS-CoV-2; Sarbecovirus; adaptation; host defense; virus evolution; zinc finger protein.

PubMed Disclaimer

Conflict of interest statement

The author declares no conflict of interest.

Figures

Figure 1
Figure 1
Phylogenetic relationship among 79 representative genomes (Supplementary Table S1) belonging to the genus Betacoronavirus including severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) (NC_045512.2). The tree was inferred using the maximum likelihood method. Red to blue colors indicate the levels of CpG content: red (2.82%), orange (2.12%), dark green (1.98%), pale green (1.69%), sky blue (1.54%), and dark blue (1.46%). Bayesian MCMC method base was used to estimate the time of divergence for each node that are shown in red (Supplementary Figure S1). Inset: column graph showing the CpG contents (%) of various clades of the tree. Error bars indicate standard error of the mean.
Figure 2
Figure 2
Relationship between divergence times of other betacoronaviruses from SARS-CoV-2 and their CpG contents. The divergence times for each node shown in Figure 1. (A) All genomes were included. The correlation is not significant (ρ = 0.18, p = 0.12). (B) Embecovirus and Merbecovirus genomes were excluded. The correlation is highly significant (ρ = 0.86, p < 0.00001). Best fitting regression lines are shown.
Figure 3
Figure 3
(A) Proportion of CpG containing the codons code for the amino acid arginine in three groups of virus genomes. (B) Proportion of arginine amino acid present in the exomes of SARS-CoV-2 (and its relatives—dark blue in Figure 1), SARS-CoV (dark green in Figure 1) and Nobecovirus (red in Figure 1). Error bars denote standard error of the mean.
Figure 4
Figure 4
Correlation between the ratio of observed-to-expected CpG contents (ICpG—Equation (2)—see methods) of the betacoronavirus genomes and their respective times of divergence from SARS-CoV-2. The relationship is highly significant (ρ = 0.87, p < 0.00001).

Similar articles

Cited by

References

    1. Bchetnia M., Girard C., Duchaine C., Laprise C. The outbreak of the novel severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2): A review of the current global status. J. Infect Public Health. 2020;13:1601–1610. doi: 10.1016/j.jiph.2020.07.011. - DOI - PMC - PubMed
    1. Boni M.F., Lemey P., Jiang X., Lam T.T., Perry B.W., Castoe T.A., Rambaut A., Robertson D.L. Evolutionary origins of the SARS-CoV-2 sarbecovirus lineage responsible for the COVID-19 pandemic. Nat. Microbiol. 2020;5:1408–1417. doi: 10.1038/s41564-020-0771-4. - DOI - PubMed
    1. Coronaviridae Study Group of the International Committee on Taxonomy of Viruses The species Severe acute respiratory syndrome-related coronavirus: Classifying 2019-nCoV and naming it SARS-CoV-2. Nat. Microbiol. 2020;5:536–544. doi: 10.1038/s41564-020-0695-z. - DOI - PMC - PubMed
    1. Lu R., Zhao X., Li J., Niu P., Yang B., Wu H., Wang W., Song H., Huang B., Zhu N., et al. Genomic characterisation and epidemiology of 2019 novel coronavirus: Implications for virus origins and receptor binding. Lancet. 2020;395:565–574. doi: 10.1016/S0140-6736(20)30251-8. - DOI - PMC - PubMed
    1. Mercatelli D., Giorgi F.M. Geographic and Genomic Distribution of SARS-CoV-2 Mutations. Front. Microbiol. 2020;11:1800. doi: 10.3389/fmicb.2020.01800. - DOI - PMC - PubMed

LinkOut - more resources