Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Sep:83:104353.
doi: 10.1016/j.meegid.2020.104353. Epub 2020 May 5.

Coding potential and sequence conservation of SARS-CoV-2 and related animal viruses

Affiliations

Coding potential and sequence conservation of SARS-CoV-2 and related animal viruses

Rachele Cagliani et al. Infect Genet Evol. 2020 Sep.

Abstract

In December 2019, a novel human-infecting coronavirus (SARS-CoV-2) was recognized in China. In a few months, SARS-CoV-2 has caused thousands of disease cases and deaths in several countries. Phylogenetic analyses indicated that SARS-CoV-2 clusters with SARS-CoV in the Sarbecovirus subgenus and viruses related to SARS-CoV-2 were identified from bats and pangolins. Coronaviruses have long and complex genomes with high plasticity in terms of gene content. To date, the coding potential of SARS-CoV-2 remains partially unknown. We thus used available sequences of bat and pangolin viruses to determine the selective events that shaped the genome structure of SARS-CoV-2 and to assess its coding potential. By searching for signals of significantly reduced variability at synonymous sites (dS), we identified six genomic regions, one of these corresponding to the programmed -1 ribosomal frameshift. The most prominent signal of dS reduction was observed within the E gene. A genome-wide analysis of conserved RNA structures indicated that this region harbors a putative functional RNA element that is shared with the SARS-CoV lineage. Additional signals of reduced dS indicated the presence of internal ORFs. Whereas the presence ORF9a (internal to N) was previously proposed by homology with a well characterized protein of SARS-CoV, ORF3h (for hypothetical, within ORF3a) was not previously described. The predicted product of ORF3h has 90% identity with the corresponding predicted product of SARS-CoV and displays features suggestive of a viroporin. Finally, analysis of the putative ORF10 revealed high dN/dS (3.82) in SARS-CoV-2 and related coronaviruses. In the SARS-CoV lineage, the ORF is predicted to encode a truncated protein and is neutrally evolving. These data suggest that ORF10 encodes a functional protein in SARS-CoV-2 and that positive selection is driving its evolution. Experimental analyses will be necessary to validate and characterize the coding and non-coding functional elements we identified.

Keywords: Coding potential; Coronaviruses; Functional RNA elements; SARS-CoV-2.

PubMed Disclaimer

Conflict of interest statement

Declaration of Competing Interest The authors declare that they have no competing interests.

Figures

Fig. 1
Fig. 1
Sequence conservation, coding potential, and RNA structure prediction in the SARS-CoV-2 lineage. (a) Phylogenetic tree of SARS-CoV-2 and related animal viruses. The maximum-likelihood tree was calculated using the ORF1a non-recombining region (see panel b). Colored dots indicate viral sequences. (b) Schematic representation of recombination events. Each bar represents the recombinant sequence with the corresponding name and the genomic location of the recombination event. Colors indicate the major and the minor parental sequences (color coding as in panel a); gray color indicates an unknown parental sequence. The location of the spike protein, its receptor-binding domain (RBD), and the receptor-binding motif (RBM) are also shown. (c) Distribution of synonymous site variation. Plot of synonymous sites substitution (dS) along SARS-CoV-2 lineage coding sequences. The brown line indicates relative dS variability calculated as the ratio of the observed over the expected values of dS in a sliding window of 25 codons. The red line shows the corresponding p-value and the dashed line represents the p-value cutoff. A schematic representation of SARS-CoV-2 ORFs is reported; ORFs that are not annotated in the SARS-CoV-2 reference genome (NC_045512.2) are in red (genomic positions 25,457–25,582, 28,284–28,577, and 28,734–28,955 of the reference SARS-CoV-2 genome correspond to ORF3h, ORF9a, and ORF9b). Gray boxes indicate recombination events that were masked in the analysis. The location of the three predicted conserved RNA secondary structures is shown. Structures were rendered using RNAalifold. The color scheme reflects the mutational pattern with respect to the structure. Thus, colors indicate conserved base pairs: from red (conservation of only one base-pair type) to purple (all six base-pair types are found); from dark (all sequences contain this base pair) to light colors (1 or 2 sequences are unable to form this base pair). (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
Fig. 2
Fig. 2
ORF3 Transmembrane regions. Schematic representation of the protein product of ORF3a, with the predicted location of transmembrane (TM) regions. The potential alternative protein encoded by ORF3h is also shown, along with an amino acid alignment of SARS-CoV-2 and related animal viruses. PSIPRED predicted the boxed region to be transmembrane (dark orange) and pore-lining (blue profile). (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
Fig. 3
Fig. 3
(a) ORF10 phylogenetic tree for the SARS-CoV-2 and the SARS-CoV lineages. Viral sequences belonging to the SARS-CoV-2 lineage are colored in red, viruses belonging to the SARS-CoV lineage are colored in blue. Their corresponding dN/dS values, calculated using SLAC, are also reported. (b) ORF10 protein alignment. An amino acid alignment of the same viral sequences is shown, along with transmembrane region (TM) prediction (dark orange) by PSIPRED. Asterisks indicate stop codons. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

References

    1. Bay R.A., Bielawski J.P. Recombination detection under evolutionary scenarios relevant to functional divergence. J. Mol. Evol. 2011;73:273–286. - PubMed
    1. Bentley K., Evans D.J. Mechanisms and consequences of positive-strand RNA virus recombination. J. Gen. Virol. 2018;99:1345–1356. - PubMed
    1. Bernhart S.H., Hofacker I.L., Will S., Gruber A.R., Stadler P.F. RNAalifold: improved consensus structure prediction for RNA alignments. BMC Bioinformatics. 2008;9 474-2105-9-474. - PMC - PubMed
    1. Boerneke M.A., Ehrhardt J.E., Weeks K.M. Physical and functional analysis of viral RNA genomes by SHAPE. Annu. Rev. Virol. 2019;6:93–117. - PMC - PubMed
    1. von Brunn A., Teepe C., Simpson J.C., Pepperkok R., Friedel C.C., Zimmer R., Roberts R., Baric R., Haas J. Analysis of intraviral protein-protein interactions of the SARS coronavirus ORFeome. PLoS One. 2007;2 - PMC - PubMed

Publication types

LinkOut - more resources