Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013 Jul 30;110(31):12744-9.
doi: 10.1073/pnas.1303526110. Epub 2013 Jul 11.

Identification of an overprinting gene in Merkel cell polyomavirus provides evolutionary insight into the birth of viral genes

Affiliations

Identification of an overprinting gene in Merkel cell polyomavirus provides evolutionary insight into the birth of viral genes

Joseph J Carter et al. Proc Natl Acad Sci U S A. .

Abstract

Many viruses use overprinting (alternate reading frame utilization) as a means to increase protein diversity in genomes severely constrained by size. However, the evolutionary steps that facilitate the de novo generation of a novel protein within an ancestral ORF have remained poorly characterized. Here, we describe the identification of an overprinting gene, expressed from an Alternate frame of the Large T Open reading frame (ALTO) in the early region of Merkel cell polyomavirus (MCPyV), the causative agent of most Merkel cell carcinomas. ALTO is expressed during, but not required for, replication of the MCPyV genome. Phylogenetic analysis reveals that ALTO is evolutionarily related to the middle T antigen of murine polyomavirus despite almost no sequence similarity. ALTO/MT arose de novo by overprinting of the second exon of T antigen in the common ancestor of a large clade of mammalian polyomaviruses. Taking advantage of the low evolutionary divergence and diverse sampling of polyomaviruses, we propose evolutionary transitions that likely gave birth to this protein. We suggest that two highly constrained regions of the large T antigen ORF provided a start codon and C-terminal hydrophobic motif necessary for cellular localization of ALTO. These two key features, together with stochastic erasure of intervening stop codons, resulted in a unique protein-coding capacity that has been preserved ever since its birth. Our study not only reveals a previously undefined protein encoded by several polyomaviruses including MCPyV, but also provides insight into de novo protein evolution.

Keywords: disordered motifs; gene evolution; synonymous substitution.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Fig. 1.
Fig. 1.
ALTO is a unique protein encoded by the MCPyV ER. (A) Sliding window plot (10-aa window size) of amino acid identity between MCPyV and eight other human polyomaviruses. The arrows depict the LT and ALTO ORFs; gray boxes indicate functional domains (14) (DNAj, activators of DNAk chaperones; OBD, origin binding domain; YGS/T and LxCxE, conserved linear motifs). (B) Four transcripts from the early region (ER) are present in MCPyV positive tumors (14). ORFs encoding ST, LT, 57 kT, and ALTO are shown as rectangles with colors indicating reading frames. Start sites, termination sites, and splice sites are shown (accession no. HM355825). (C) ALTO protein sequence is shown. Underlined peptides were used to generate immune rabbit sera used in Fig. 2. Blue and red boxes indicate the C-terminal basic and hydrophobic motifs, respectively. (D) Sliding window plot (100-aa window size) of the synonymous site divergence (dS) between the indicated LT ortholog pair. Below is the schematic of LT and ALTO ORFs as in A.
Fig. 2.
Fig. 2.
ALTO is expressed from the MCPyV early region during viral genome replication. (A) Cells (HEK293) were transfected with religated wild-type MCPyVw156 DNA (MCPyV) or MCPyV genome harboring a point mutation disrupting the ATG start codon of ALTO at nt 880 (880ko). An equal amount of an unrelated plasmid was used as a negative control. Immunoblots for ALTO and GAPDH were performed on protein lysates collected 48 h after transfection or with a lysate from cells transfected with an ER expression plasmid (using 50-fold less lysate). (B) Low molecular weight DNA was isolated from transfected cells and digested with BamHI and DpnI. Digestion products were separated on an agarose gel and visualized by staining with ethidium bromide or (C) Southern blotting using a radioactive MCPyV DNA probe. The input control plasmid contained 350 bp of the MCPyV genome overlapping the region used for hybridization. Input and replicated MCPyV DNAs are indicated (SI Appendix, Fig. S3). (D) Cells (U2OS) were transfected with expression plasmids containing the MCPyV ER, ER with a C-terminally truncated ALTO (ALTO 1–234), or an empty vector control. After 48 h, the cells were fixed and stained for ALTO (green), MCPyV LT (red), and DNA (blue). Individual and merged images are shown. (Scale bar: 30 μm.)
Fig. 3.
Fig. 3.
A single clade of polyomaviruses has a predicted alternate ORF. (A) Genomic sequences from several polyomaviruses (accession nos. in SI Appendix, Table S2) were translated as a +1 frameshift from the coding region of LT exon 2. Predicted stop codons are shown in red. MCPyV ALTO is shown in blue, and exon 2 of MT from MPyV is in gray. Due to length differences and lack of similarity between exon 2 sequences, translations are aligned by the conserved region of the genome overlapping the OBD. The phylogeny shown here is based on an amino acid alignment of LT (SI Appendix, Fig. S5A) but is consistent with a larger phylogeny of polyomaviruses based on the entire genomic sequence (SI Appendix, Fig. S5B). Asterisks indicate >75% bootstrap support in both phylogenies. (B) The C terminus of the indicated alternate ORFs is shown aligned to their conserved hydrophobic region. Conservation data and hydrophobicity plots were generated using Geneious software (40). Stop codons are shown as red asterisks. (C) Alignment of the non-Almipolyomaviruses as in B.
Fig. 4.
Fig. 4.
A model for polyomavirus de novo gene birth by overprinting. ORFs corresponding to the LT and the ALTO/MT alternate frame are shown as in Fig. 1. Dark blue denotes the region of ALTO/MT that encodes the hydrophobic domain and overlaps the evolutionarily conserved OBD of LT. Red asterisks denote stop codons. Each schematic represents a proposed step along the evolutionary pathway that led to the current repertoire of LT, ALTO, and MT ORFs. An extant viral example of each schematic is also given, with the exception of the inferred ancestor to all Almipolyomaviruses, which gave birth to all current ALTO/MT-containing viruses. Although MCPyV encodes ALTO wholly within exon 2, it remains to be experimentally determined whether other Almipolyomaviruses besides MPyV and HamsterPyV have splice variants that encode an MT-like protein. If all basally branching Almipolyomaviruses were found to encode an MT protein, we would revise this model to suggest that that the initial ALTO/MT innovation was actually MT-like and later became ALTO-like due to the use of the downstream methionine (conserved due to the LT YGS/T motif).

References

    1. Keese PK, Gibbs A. Origins of genes: “Big bang” or continuous creation? Proc Natl Acad Sci USA. 1992;89(20):9489–9493. - PMC - PubMed
    1. Chirico N, Vianelli A, Belshaw R. Why genes overlap in viruses. Proc Biol Sci. 2010;277(1701):3809–3817. - PMC - PubMed
    1. Sabath N, Wagner A, Karlin D. Evolution of viral proteins originated de novo by overprinting. Mol Biol Evol. 2012;29(12):3767–3780. - PMC - PubMed
    1. Rancurel C, Khosravi M, Dunker AK, Romero PR, Karlin D. Overlapping genes produce proteins with unusual sequence properties and offer insight into de novo protein creation. J Virol. 2009;83(20):10719–10736. - PMC - PubMed
    1. Firth AE, Atkins JF. Candidates in Astroviruses, Seadornaviruses, Cytorhabdoviruses and Coronaviruses for +1 frame overlapping genes accessed by leaky scanning. Virol J. 2010;7:17. - PMC - PubMed

Publication types

MeSH terms