Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Jun 1;94(12):e00411-20.
doi: 10.1128/JVI.00411-20. Print 2020 Jun 1.

Computational Inference of Selection Underlying the Evolution of the Novel Coronavirus, Severe Acute Respiratory Syndrome Coronavirus 2

Affiliations

Computational Inference of Selection Underlying the Evolution of the Novel Coronavirus, Severe Acute Respiratory Syndrome Coronavirus 2

Rachele Cagliani et al. J Virol. .

Abstract

The novel coronavirus severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) that recently emerged in China is thought to have a bat origin, as its closest known relative (BatCoV RaTG13) was described previously in horseshoe bats. We analyzed the selective events that accompanied the divergence of SARS-CoV-2 from BatCoV RaTG13. To this end, we applied a population genetics-phylogenetics approach, which leverages within-population variation and divergence from an outgroup. Results indicated that most sites in the viral open reading frames (ORFs) evolved under conditions of strong to moderate purifying selection. The most highly constrained sequences corresponded to some nonstructural proteins (nsps) and to the M protein. Conversely, nsp1 and accessory ORFs, particularly ORF8, had a nonnegligible proportion of codons evolving under conditions of very weak purifying selection or close to selective neutrality. Overall, limited evidence of positive selection was detected. The 6 bona fide positively selected sites were located in the N protein, in ORF8, and in nsp1. A signal of positive selection was also detected in the receptor-binding motif (RBM) of the spike protein but most likely resulted from a recombination event that involved the BatCoV RaTG13 sequence. In line with previous data, we suggest that the common ancestor of SARS-CoV-2 and BatCoV RaTG13 encoded/encodes an RBM similar to that observed in SARS-CoV-2 itself and in some pangolin viruses. It is presently unknown whether the common ancestor still exists and, if so, which animals it infects. Our data, however, indicate that divergence of SARS-CoV-2 from BatCoV RaTG13 was accompanied by limited episodes of positive selection, suggesting that the common ancestor of the two viruses was poised for human infection.IMPORTANCE Coronaviruses are dangerous zoonotic pathogens; in the last 2 decades, three coronaviruses have crossed the species barrier and caused human epidemics. One of these is the recently emerged SARS-CoV-2. We investigated how, since its divergence from a closely related bat virus, natural selection shaped the genome of SARS-CoV-2. We found that distinct coding regions in the SARS-CoV-2 genome evolved under conditions of different degrees of constraint and are consequently more or less prone to tolerate amino acid substitutions. In practical terms, the level of constraint provides indications about which proteins/protein regions are better suited as possible targets for the development of antivirals or vaccines. We also detected limited signals of positive selection in three viral ORFs. However, we warn that, in the absence of knowledge about the chain of events that determined the human spillover, these signals should not be necessarily interpreted as evidence of an adaptation to our species.

Keywords: N protein; Nsp1; ORF8; SARS-CoV-2; positive selection; spike protein; viral evolution.

PubMed Disclaimer

Figures

FIG 1
FIG 1
Selective patterns of SARS-CoV-2. (A) Similarity plot (generated with SimPlot) of BatCoV RaTG13 relative to SARS-CoV-2 (Wuhan-Hu-1 reference strain, NC_045512.2). Similarity (Kimura distance) was calculated within sliding windows of 250 bp moving with steps of 50 bp. A schematic representation of the SARS-CoV-2 genome is also shown. ORF and nsp (nonstructural protein) names, lengths, and relative positions are in accordance with the annotation for the reference Wuhan-Hu-1 sequence. Box colors indicate the level of amino acid identity between the SARS-CoV-2 and BatCoV RaTG13 sequences. Black triangles indicate amino acid changes that are polymorphic in the analyzed SARS-CoV-2 genomes. Asterisks denote positively selected sites, and their sizes are proportional to the number of selected sites/region. Short ORFs with names in red were not analyzed with gammaMap. (B and C) Violin plots (median, white dot; interquartile range, black bar) of selection coefficients (γ) for the longest (more that 80 codons) ORFs (B) and nsp3 subdomains (C) are shown. Nsp3 domains were retrieved from the SARS-CoV annotation (68).
FIG 2
FIG 2
SARS-CoV-2 positively selected sites. A schematic representation of the nsp1, ORF8, spike (S), and nucleocapsid (N) proteins is presented. Positively selected sites (magenta) and amino acid substitutions between SARS-CoV-2 and BatCoV RaTG13 (red) and between SARS-CoV-2 and pangolin-CoV MP789 (blue) are indicated in the alignments. The location of an insertion (insPRRA) in the spike glycoprotein is also shown. This insertion is predicted to occur in the S1/S2 furin-like cleavage site (69, 70).
FIG 3
FIG 3
Homology modeling of positively selected SARS-CoV-2 proteins. Selected sites are mapped onto the 3D structure of models obtained using SARS-CoV proteins as a templates (PDB ID: 6ACG for panel A, 2CJR for panel B, 2HSX for panel C). Coronavirus proteins are colored in hues of blue based on the most likely selection coefficient. Positively selected sites are marked in red. (A) Ribbon representation of the spike glycoprotein model (one monomer is shown) in complex with human ACE2 (green) (48). The binding interface is shown in the enlargement. (B) Ribbon representation of the C-terminal domain of the nucleocapsid protein. (C) Ribbon representation of the N-terminal portion of nsp1. Note that although some sites had the highest posterior probability for γ = 1 (yellow), they were not called as positively selected because the 0.5 threshold was not reached.

References

    1. Coronaviridae Study Group of the International Committee on Taxonomy of Viruses. 2020. The species Severe acute respiratory syndrome-related coronavirus: classifying 2019-nCoV and naming it SARS-CoV-2. Nat Microbiol 5:536–544. doi:10.1038/s41564-020-0695-z. - DOI - PMC - PubMed
    1. Zhu N, Zhang D, Wang W, Li X, Yang B, Song J, Zhao X, Huang B, Shi W, Lu R, Niu P, Zhan F, Ma X, Wang D, Xu W, Wu G, Gao GF, Tan W; China Novel Coronavirus Investigating and Research Team. 2020. A novel coronavirus from patients with pneumonia in China, 2019. N Engl J Med 382:727–733. doi:10.1056/NEJMoa2001017. - DOI - PMC - PubMed
    1. Cui J, Li F, Shi ZL. 2019. Origin and evolution of pathogenic coronaviruses. Nat Rev Microbiol 17:181–192. doi:10.1038/s41579-018-0118-9. - DOI - PMC - PubMed
    1. Forni D, Cagliani R, Clerici M, Sironi M. 2017. Molecular evolution of human coronavirus genomes. Trends Microbiol 25:35–48. doi:10.1016/j.tim.2016.09.001. - DOI - PMC - PubMed
    1. Luk HKH, Li X, Fung J, Lau SKP, Woo PCY. 2019. Molecular epidemiology, evolution and phylogeny of SARS coronavirus. Infect Genet Evol 71:21–30. doi:10.1016/j.meegid.2019.03.001. - DOI - PMC - PubMed

Publication types