Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Oct:562:149-157.
doi: 10.1016/j.virol.2021.07.011. Epub 2021 Jul 28.

Prediction of two novel overlapping ORFs in the genome of SARS-CoV-2

Affiliations

Prediction of two novel overlapping ORFs in the genome of SARS-CoV-2

Angelo Pavesi. Virology. 2021 Oct.

Abstract

Six candidate overlapping genes have been detected in SARS-CoV-2, yet current methods struggle to detect overlapping genes that recently originated. However, such genes might encode proteins beneficial to the virus, and provide a model system to understand gene birth. To complement existing detection methods, I first demonstrated that selection pressure to avoid stop codons in alternative reading frames is a driving force in the origin and retention of overlapping genes. I then built a detection method, CodScr, based on this selection pressure. Finally, I combined CodScr with methods that detect other properties of overlapping genes, such as a biased nucleotide and amino acid composition. I detected two novel ORFs (ORF-Sh and ORF-Mh), overlapping the spike and membrane genes respectively, which are under selection pressure and may be beneficial to SARS-CoV-2. ORF-Sh and ORF-Mh are present, as ORF uninterrupted by stop codons, in 100% and 95% of the SARS-CoV-2 genomes, respectively.

Keywords: Codon usage; Membrane protein; Multivariate statistics; Overlapping reading frame; Selection pressure; Spike protein; Virus evolution.

PubMed Disclaimer

Conflict of interest statement

None.

Figures

Fig. 1
Fig. 1
Location of the eight overlapping ORFs detected in the 3′ genome region of SARS-CoV-2.
Fig. 2
Fig. 2
Example workflow for CodScr + SeqComp analysis. As input data, CodScr + SeqComp requires the nucleotide sequence of a protein coding region (the ancestral reading frame) which contains an overlapping ORF shifted one nucleotide 3’ (+1 overlapping ORF) or an overlapping ORF shifted two nucleotides 3’ (+2 overlapping ORF). ORF indicates a contiguous stretch of codons, beginning with a start AUG codon, ending with a stop codon, not interrupted by premature stop codons, and having a length ≥ 90 nt. A detailed example of calculation of the five prediction scores (P-value, PLS-DA score, LDA score, LDA-ancestral score, and LDA-novel score) is shown in Supplementary File S1.
Fig. 3
Fig. 3
Nucleotide and amino acid sequence of the two predicted overlapping ORFs in the 3′ genome region of SARS-CoV-2. (A) Overlapping ORF-Sh: the nucleotide sequence (from nt 24,050 to 24,172) encodes the region of protein S spanning residues 830–868, while the +1 overlapping ORF-Sh (from nt 24,051 to 24,170) encodes a predicted protein of 39 aa (underlined characters). Bold characters indicate a predicted transmembrane helix. (B) Overlapping ORF-Mh: the nucleotide sequence (from nt 26,691 to 26,873) encodes the region of protein M spanning residues 57–116, while the +2 overlapping ORF-Mh (from nt 26,693 to 26,872) encodes a predicted protein of 59 aa (underlined characters). Bold characters indicate two predicted transmembrane helices.

Similar articles

Cited by

References

    1. Aragonés L., Guix S., Ribes E., Bosch A., Pintó R.M. Fine-tuning translation kinetics selection as the driving force of codon usage bias in the hepatitis A virus capsid. PLoS Pathog. 2010;6 https://doi:10.1371/journal.ppat.1000797 - DOI - PMC - PubMed
    1. Boni M.F., Lemey P., Jiang X., Lam T.T., Perry B.W., Castoe T.A., Rambaut A., Robertson D.L. Evolutionary origins of the SARS-CoV-2 sarbecovirus lineage responsible for the COVID-19 pandemic. Nat. Microbiol. 2020;5:1408–1417. https://doi:10.1038/s41564-020-0771-4 - DOI - PubMed
    1. Cagliani R., Forni D., Clerici M., Sironi M. Coding potential and sequence conservation of SARS-CoV-2 and related animal viruses. Infect. Genet. Evol. 2020;83:104353. https://doi:10.1016/j.meegid.2020.104353 - DOI - PMC - PubMed
    1. Chan W.S., Wu C., Chow S.C., Cheung T., To K.F., Leung W.K., Chan P.K., Lee K.C., Ng H.K., Au D.M., Lo A.W. Coronaviral hypothetical and structural proteins were found in the intestinal surface enterocytes and pneumocytes of severe acute respiratory syndrome (SARS) Mod. Pathol. 2005;18:1432–1439. doi: 10.1038/modpathol.3800439. - DOI - PMC - PubMed
    1. Chirico N., Vianelli A., Belshaw R. Why genes overlap in viruses. Proc. Biol. Sci. 2010;277:3809–3817. https://doi:10.1098/rspb.2010.1052 - DOI - PMC - PubMed

Publication types

Substances