Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2010 Jan 25:7:17.
doi: 10.1186/1743-422X-7-17.

Candidates in Astroviruses, Seadornaviruses, Cytorhabdoviruses and Coronaviruses for +1 frame overlapping genes accessed by leaky scanning

Affiliations

Candidates in Astroviruses, Seadornaviruses, Cytorhabdoviruses and Coronaviruses for +1 frame overlapping genes accessed by leaky scanning

Andrew E Firth et al. Virol J. .

Abstract

Background: Overlapping genes are common in RNA viruses where they serve as a mechanism to optimize the coding potential of compact genomes. However, annotation of overlapping genes can be difficult using conventional gene-finding software. Recently we have been using a number of complementary approaches to systematically identify previously undetected overlapping genes in RNA virus genomes. In this article we gather together a number of promising candidate new overlapping genes that may be of interest to the community.

Results: Overlapping gene predictions are presented for the astroviruses, seadornaviruses, cytorhabdoviruses and coronaviruses (families Astroviridae, Reoviridae, Rhabdoviridae and Coronaviridae, respectively).

PubMed Disclaimer

Figures

Figure 1
Figure 1
Coding potential statistics for mamastrovirus (human-porcine-feline astrovirus clade) ORF2 and the overlapping ORFX. (1) Map of the ORF2 region of human astrovirus [GenBank: Z25771], showing the proposed new coding sequence, ORFX, overlapping ORF2 in the +1 reading frame. (2-6) Coding potential statistics based on an alignment of 88 mamastrovirus sequences with complete coverage ORF2 (see Methods for accession numbers). For clarity, regions with alignment gaps in the arbitrary reference sequence (viz. Z25771) have been removed (e.g. regions where a single sequence in the alignment has an insertion, resulting in alignment gaps in all the other sequences). (2-4) Positions of stop codons in each of the three forward reading frames. The +0 frame corresponds to ORF2 and is therefore devoid of stop codons. Note the conserved absence of stop codons in the +1 frame within the ORFX region. (5-6) Conservation at synonymous sites within ORF2 (see [5] for details). (5) depicts the probability that the degree of conservation within a given window could be obtained under a null model of neutral evolution at synonymous sites, while (6) depicts the absolute amount of conservation as represented by the ratio of the observed number of substitutions within a given window to the number expected under the null model. Note the unusually high conservation within the ORFX region.
Figure 2
Figure 2
Sequence data for mamastrovirus ORFX. (A) Representative initiation codon contexts for mamastrovirus ORF2 and ORFX. Spaces separate ORF2-frame codons. Colour coding is as follows: blue - ORF2 initiation codon; green - potential ORFX initiation codon; yellow (olive) - flanking nucleotides matching the optimal (suboptimal) Kozak context. (B) Representative ORFX amino acid sequence.
Figure 3
Figure 3
Coding potential statistics for the seadornavirus VP7 CDS and the overlapping ORFX. (1) Map of the VP7 CDS of Banna virus [GenBank: AF052018], showing the proposed new coding sequence, ORFX, overlapping the VP7 CDS in the +1 reading frame. (2-7) Coding potential statistics based on an alignment of six Banna virus sequences with complete coverage of the VP7 CDS (see Figure 4 for accession numbers). (2-4) Positions of stop codons in each of the three forward reading frames. Note the conserved absence of stop codons in the +1 frame within the ORFX region. (5-6) Conservation at synonymous sites within the VP7 CDS (see Figure 1 caption for details). Note the unusually high conservation within the ORFX region. (7) MLOGD statistics for ORFX (see [2] for details). The null model is that the sequence in the ORFX region is only coding in the +0 (VP7 CDS) frame, while the alternative model is that the ORFX region is coding in both the +0 and the +1 (ORFX) reading frames. Positive scores favour the alternative model. MLOGD coding potential scores are produced for each alignment column and averaged over a 21 nt sliding window for clarity. The predominantly positive scores indicate that ORFX is likely to be a coding sequence.
Figure 4
Figure 4
Sequence data for seadornavirus ORFX. (A) Initiation codon contexts for the seadornavirus segment 7 VP7 CDS and ORFX. Colour coding is as follows: blue - VP7 initiation codon; green - potential ORFX initiation codon; yellow (olive) - flanking nucleotides matching the optimal (suboptimal) Kozak context. (B) Representative ORFX amino acid sequence.
Figure 5
Figure 5
Coding potential statistics for the cytorhabdovirus P CDS and the overlapping ORFX. (1) Map of the P CDS of LNYV [GenBank: AJ867584], showing the proposed new coding sequence, ORFX, overlapping the P CDS in the +1 reading frame. (2-5) Coding potential statistics based on an alignment of LNYV and LYMoV (see Figure 6 for accession numbers). (2-4) Positions of stop codons in each of the three forward reading frames. Note the conserved absence of stop codons in the +1 frame within the ORFX region. (5) MLOGD statistics for ORFX (see Figure 3 for details). The predominantly positive scores indicate that ORFX is likely to be a coding sequence.
Figure 6
Figure 6
Sequence data for cytorhabdovirus ORFX. (A) Initiation codon contexts for the cytorhabdovirus P CDS and ORFX. Spaces separate ORFX-frame codons. Colour coding is as follows: blue - P initiation codon; green - potential ORFX initiation codon; yellow - flanking nucleotides matching the optimal Kozak context. (B) Representative ORFX amino acid sequence.
Figure 7
Figure 7
Coding potential statistics for the Group 3c coronavirus NS6 CDS and the overlapping ORFX. (1) Map of the NS6 CDS of BuCoV [GenBank: FJ376620], showing the proposed new coding sequence, ORFX, overlapping the NS6 CDS in the +1 reading frame. (2-5) Coding potential statistics based on an alignment of five Group 3c coronavirus sequences (see Figure 8 for accession numbers). (2-4) Positions of stop codons in each of the three forward reading frames. Note the conserved absence of stop codons in the +1 frame within the ORFX region. (5) MLOGD statistics for NS6 relative to a non-coding null model. (6) MLOGD statistics for ORFX (see Figure 3 for details). The predominantly positive scores indicate that ORFX is likely to be a coding sequence, but is subject to significantly weaker purifying selection than NS6. The negative scores at the 3' end of ORFX indicate that the C-terminal region of the putative product is not subject to strong functional constraints.
Figure 8
Figure 8
Sequence data for Group 3c coronavirus ORFX. (A) Initiation codon contexts for the Group 3c coronavirus NS6 CDS and ORFX. Spaces separate NS6-frame codons. Colour coding is as follows: blue - NS6 initiation codon; green - potential ORFX initiation codon; yellow (olive) - flanking nucleotides matching the optimal (suboptimal) Kozak context. Potential, albeit imperfect, TRSs are indicated in bold. The termination codon of the upstream M CDS is underlined. (B) Representative ORFX amino acid sequence.
Figure 9
Figure 9
Coding potential statistics for bat coronavirus 1A/1B/HKU8 ORF3 and the overlapping ORFX. (1) Map of the ORF3 region of BtCoV 1A [GenBank: EU420138], showing the proposed new coding sequence, ORFX, overlapping ORF3 in the +1 reading frame. (2-5) Coding potential statistics based on an alignment of BtCoV 1A, 1B and HKU8 (see Figure 10 for accession numbers). (2-4) Positions of stop codons in each of the three forward reading frames. Note the conserved absence of stop codons in the +1 frame within the ORFX region. (5) MLOGD statistics for ORF3 relative to a non-coding null model. (6) MLOGD statistics for ORFX (see Figure 3 for details). The predominantly positive scores indicate that ORFX is likely to be a coding sequence, but is subject to significantly weaker purifying selection than ORF3. The negative scores at the 3' end of ORFX indicate that the C-terminal region of the putative product is not subject to strong functional constraints.
Figure 10
Figure 10
Sequence data for bat coronavirus 1A/1B/HKU8 ORFX. (A) Initiation codon contexts for bat coronavirus 1A/1B/HKU8 ORF3 and ORFX. Spaces separate ORF3-frame codons. Colour coding is as follows: blue - ORF3 initiation codon; green - potential ORFX initiation codon; yellow (olive) - flanking nucleotides matching the optimal (suboptimal) Kozak context. The termination codon of the upstream S CDS is underlined. (B) Representative ORFX amino acid sequence.

References

    1. Firth AE, Brown CM. Detecting overlapping coding sequences with pairwise alignments. Bioinformatics. 2005;21:282–292. doi: 10.1093/bioinformatics/bti007. - DOI - PubMed
    1. Firth AE, Brown CM. Detecting overlapping coding sequences in virus genomes. BMC Bioinformatics. 2006;7:75. doi: 10.1186/1471-2105-7-75. - DOI - PMC - PubMed
    1. Chung BYW, Miller WA, Atkins JF, Firth AE. An overlapping essential gene in the Potyviridae. Proc Natl Acad Sci USA. 2008;105:5897–5902. doi: 10.1073/pnas.0800468105. - DOI - PMC - PubMed
    1. Firth AE, Chung BY, Fleeton MN, Atkins JF. Discovery of frameshifting in Alphavirus 6K resolves a 20-year enigma. Virol J. 2008;5:108. doi: 10.1186/1743-422X-5-108. - DOI - PMC - PubMed
    1. Firth AE, Atkins JF. A conserved predicted pseudoknot in the NS2A-encoding sequence of West Nile and Japanese encephalitis flaviviruses suggests NS1' may derive from ribosomal frameshifting. Virol J. 2009;6:14. doi: 10.1186/1743-422X-6-14. - DOI - PMC - PubMed

Publication types

LinkOut - more resources