. 2022 Apr 1;18(4):e1010401.

doi: 10.1371/journal.ppat.1010401. eCollection 2022 Apr.

Long-read sequencing reveals complex patterns of wraparound transcription in polyomaviruses

Jason Nomburg^{1

2

3}, Wei Zou⁴, Thomas C Frost^{1

3}, Chandreyee Datta^{5

6

7

8}, Shobha Vasudevan^{5

6

7

8}, Gabriel J Starrett⁹, Michael J Imperiale^{4

10}, Matthew Meyerson^{1

2

11

12}, James A DeCaprio^{1

3

12}

Affiliations

¹ Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, Massachusetts, United States of America.
² Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America.
³ Harvard Program in Virology, Harvard University Graduate School of Arts and Sciences, Boston, Massachusetts, United States of America.
⁴ Department of Microbiology and Immunology, University of Michigan, Ann Arbor, Michigan, United States of America.
⁵ Massachusetts General Hospital Cancer Center, Harvard Medical School, 185 Cambridge St, CPZN4202, Boston, Massachusetts, United States of America.
⁶ Department of Medicine, Massachusetts General Hospital and Harvard Medical School, Boston, Massachusetts, United States of America.
⁷ Center for Regenerative Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts, United States of America.
⁸ Harvard Stem Cell Institute, Harvard University, Cambridge, Massachusetts, United States of America.
⁹ Laboratory of Cellular Oncology, CCR, NCI, NIH, Bethesda, Maryland, United States of America.
¹⁰ Rogel Cancer Center, Ann Arbor, Michigan, United States of America.
¹¹ Department of Genetics, Harvard Medical School, Boston, Massachusetts, United States of America.
¹² Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts, United States of America.

PMID: 35363834
PMCID: PMC9007360
DOI: 10.1371/journal.ppat.1010401

Long-read sequencing reveals complex patterns of wraparound transcription in polyomaviruses

Jason Nomburg et al. PLoS Pathog. 2022.

. 2022 Apr 1;18(4):e1010401.

doi: 10.1371/journal.ppat.1010401. eCollection 2022 Apr.

Authors

Affiliations

¹ Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, Massachusetts, United States of America.
² Broad Institute of MIT and Harvard, Cambridge, Massachusetts, United States of America.
³ Harvard Program in Virology, Harvard University Graduate School of Arts and Sciences, Boston, Massachusetts, United States of America.
⁴ Department of Microbiology and Immunology, University of Michigan, Ann Arbor, Michigan, United States of America.
⁵ Massachusetts General Hospital Cancer Center, Harvard Medical School, 185 Cambridge St, CPZN4202, Boston, Massachusetts, United States of America.
⁶ Department of Medicine, Massachusetts General Hospital and Harvard Medical School, Boston, Massachusetts, United States of America.
⁷ Center for Regenerative Medicine, Massachusetts General Hospital, Harvard Medical School, Boston, Massachusetts, United States of America.
⁸ Harvard Stem Cell Institute, Harvard University, Cambridge, Massachusetts, United States of America.
⁹ Laboratory of Cellular Oncology, CCR, NCI, NIH, Bethesda, Maryland, United States of America.
¹⁰ Rogel Cancer Center, Ann Arbor, Michigan, United States of America.
¹¹ Department of Genetics, Harvard Medical School, Boston, Massachusetts, United States of America.
¹² Department of Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, Massachusetts, United States of America.

PMID: 35363834
PMCID: PMC9007360
DOI: 10.1371/journal.ppat.1010401

Abstract

Polyomaviruses (PyV) are ubiquitous pathogens that can cause devastating human diseases. Due to the small size of their genomes, PyV utilize complex patterns of RNA splicing to maximize their coding capacity. Despite the importance of PyV to human disease, their transcriptome architecture is poorly characterized. Here, we compare short- and long-read RNA sequencing data from eight human and non-human PyV. We provide a detailed transcriptome atlas for BK polyomavirus (BKPyV), an important human pathogen, and the prototype PyV, simian virus 40 (SV40). We identify pervasive wraparound transcription in PyV, wherein transcription runs through the polyA site and circles the genome multiple times. Comparative analyses identify novel, conserved transcripts that increase PyV coding capacity. One of these conserved transcripts encodes superT, a T antigen containing two RB-binding LxCxE motifs. We find that superT-encoding transcripts are abundant in PyV-associated human cancers. Together, we show that comparative transcriptomic approaches can greatly expand known transcript and coding capacity in one of the simplest and most well-studied viral families.

PubMed Disclaimer

Conflict of interest statement

I have read the journal’s policy and the authors of this manuscript have the following competing interests: M.M. receives research support from Bayer, Janssen, Ono; consults for Bayer, Interline, Isabl; and receives patent royalties from Labcorp and Bayer. J.A.D. has received research support from Rain Therapeutics, Inc. and is a consultant for Rain Therapeutics, Inc. and Takeda, Inc.

Figures

**Fig 1. RNA sequencing expands known SV40 and BKPyV transcript diversity.**
A. Overview of experimental procedures. Cells were infected with a polyomavirus, and RNAs extracted. RNA was sequenced using long-read (Nanopore dRNAseq and PacBio SMRTseq) and short-read (Illumina short-RNAseq (total) and short-RNAseq (polyA)). Transcripts were analyzed, and the impact of observed splice events on viral open reading frames was assessed. B. Mechanism of transcript clustering in this study. Transcripts were aligned to the viral genome and grouped into transcript classes based on the presence of shared introns. Thus, within a transcript class there may be variation in the exact transcript start and end positions. This clustering strategy was used for both long- and short-RNAseq data. C. Viral RNA sequence coverage for SV40 and BKPyV as determined from dRNAseq, SMRTseq, and short-RNAseq (total) data. The Y axis indicates the scaled coverage, with X axis indicating the position on the viral genome. Coverage for late transcripts (mapping to the + strand) is above the x axis, while coverage for early transcripts (mapping to the − strand) is below the x axis. Coverage is scaled separately for each strand such that the maximum observed coverage for each strand is 1. Arrows at the top of the plot indicate the positions of viral genes. **D-E**. UpSet plot indicating the overlap between existing transcript annotations, dRNAseq data, and SMRTseq data for SV40 (D) and BKPyV Dunlop (E). Bars indicating overlap with existing transcript annotations are black, while those indicating no overlap with existing annotations are blue. These blue bars indicate the number of novel, unannotated transcripts identified. F. Overview of polysome profiling of SV40-infected cells. BSC40 cells were infected with SV40. Cells were lysed, and a portion of the lysate was subjected to dRNAseq (representative of the RNA content of the whole cell). The remaining lysates was centrifuged through a sucrose gradient, after which fractions containing RNA associated with two or more ribosomes were pooled and subjected to dRNAseq. Created with BioRender.com. G. Relative abundance of SV40 early and late transcripts in the whole-cell and polysome fractions of SV40-infected cells. Y-axis indicates the percentage of early or late transcripts and is log scale. X axis indicates each transcript, with black dots indicating each transcript’s whole-cell relative abundance and red dots indicating each transcript’s polysome relative abundance. A black star indicates the transcript is a wraparound transcript.

**Fig 2. Annotated and novel SV40 transcripts.**
A. Transcripts are shown relative to the viral genome. Each line is a viral transcript, with red lines indicating exons and dashed blue lines indicating introns. Spokes indicate the positions of common splice donors and splice acceptors. Transcripts that were annotated prior to this study are on a yellow background, and novel transcripts are on a while background. Wraparound transcription that results in multiple copies of a region is annotated with double lines, and the number of copies is indicated in parentheses. The line labeled “pA” indicates the approximate position of the polyA signal sequence. B. The relative abundance of early or late transcripts in SV40 dRNAseq data. If a transcript was observed in SMRTseq but not dRNAseq, it is not present. The abundance of each transcript in both replicates of SV40 dRNAseq are plotted as individual dots.

**Fig 3. Annotated and novel BKPyV transcripts.**
A. Transcripts are shown relative to the viral genome. Each line is a viral transcript, with red lines indicating exons and dashed blue lines indicating introns. Spokes indicate the positions of common splice donors and splice acceptors. Transcripts that were annotated prior to this study are on a yellow background, and novel transcripts are on a while background. Wraparound transcription that results in multiple copies of a region is annotated with double lines, and the number of copies is indicated in parentheses. The line labeled “pA” indicates the approximate position of the polyA signal sequence. B. The relative abundance of early or late transcripts in BKPyV (Dunlop) dRNAseq data. If a transcript was observed in SMRTseq but not dRNAseq, it is not present.

**Fig 4. Pervasive wraparound transcription across PyV.**
**A-C**. Watch plots indicating the top 4 highest abundance late wraparound transcript classes in dRNAseq data from SV40 **(A)**, BKPyV Dunlop **(B)**, and MPyV **(C)**. The outer ring of each watch plot indicates the position of the viral ORFs. The inner arms are histograms detailing the distribution of transcript starts (in blue) and ends (in red) for transcripts within each transcript class. The red segments indicate exons. Transcripts start in the innermost ring—a second or third ring indicates that the pre-mRNA that generated the transcript must have circled the viral genome multiple times. The 3’ end of the transcript and the direction in which these plots are oriented is indicated by the red arrow at the end of the last exon segment. The red exon segments start at the most common transcript start site within the transcript class, and end at the most common transcript end site within the class. The watch plot key shows an example of the path of the pre-mRNA for SV40 transcript class L6_I. D. Bar plots indicating the percentage of late transcripts that span a given number of genome lengths in SV40, BKPyV Dunlop, and MPyV dRNAseq data. E. The leader-leader junction, that connects the pre-mRNA from one genome to the subsequent wraparound, was identified in Illumina short-RNAseq (total) data. The intron in question is plotted as a black line in this plot, with the x axis indicating the genomic position of the intron. The top late wraparound transcript for each virus was plotted. The gene map indicates the approximate gene position and is accurate for SV40—the exact position of the viral genes varies between viruses. Percentages indicate the percentage of late junction-spanning transcripts that support the plotted wraparound leader-leader junction. F. Schematic illustrating how leader-leader wraparound transcription can be detected from short read short-RNAseq (total). Leader-leader splicing can be seen as a repetitive exon in watch plots from long-read RNAseq data. Ultimately, there was an original processed mRNA in the cell that contained two tandem leader sequences. When this transcript of origin is sequenced via short read sequencing, reads will be generated across its length. A minority of these reads will span the leader-leader junction, and mapping against the viral reference genome can be used to uncover leader-leader splicing.

**Fig 5. Detection of novel, conserved splicing events that expand PyV coding capacity.**
**A-D**. Schematics illustrating identified ORFs. Each row is a reading frame (except for ST and the LT 1^st exon, which are in the same frame), and unannotated amino acids are represented by grey boxes. The measured intron is indicated by the red arrow. Colored ORFs are annotated, while grey ORFs are unannotated. Percentages on the right side of the Fig are the percentage of spliced viral transcripts on the same strand as determined from short-read short-RNAseq (total) data. Numbers after each virus name indicate the transcript class within each short-RNAseq (total) dataset. The measured intron is indicated by the red arrow. A) ST2: This ORF is generating from a splicing event that uses the LT first exon donor and an acceptor within the ST ORF. In HPyV7 and BKPyV Dunlop, the splice lands in frame and results in an internal deletion within ST. In MPyV and MCPyV the splice lands out of frame, resulting in an ORF that contains the N-terminal region of ST and novel amino acids at the C terminus. B) MT: MPyV encodes a MT following splicing connecting the end of the ST ORF with an ORF in an alternate frame of the LT second exon. In BKPyV, a similar splice occurs connecting ST with an MT-like ORF in an alternative frame of the LT second exon. C) VP1X: JCPyV encodes two VP1X ORFs generated by splicing within VP1 and landing in an alternative frame of VP1, or earlier in the late region due to wraparound transcription. While predominant in JCPyV, VP1X is likewise present in many other PyV. D) superT: The superT-specific splice utilizes the splice donor canonically associated with truncated T antigens such as 17kT in SV40 and truncT in BKPyV. Due to wraparound transcription, a LT second exon acceptor is available to the 3’ of this donor and acts as the acceptor. For the superT ORF to form, an initial LT splice is required. Ultimately, superT contains a duplication in part of the LT second exon that includes the RB-binding LxCxE motif. E. Schematics detailing BKPyV Dik isolates used for querying the existence of superT. BKPyV WT is wild type virus. M1 contains a LT intron that has been replaced with an intron from the plasmid pCI. Both WT and M1 are expected to generate LT and superT of expected sizes. M2 has a completely removed LT intron, and the pCI intron is located directly 5’ of the LT ORF. M2 is expected to encode LT of expected size, but a larger superT variant due to incorporation of a second copy of the LT first exon. F. Western blot of cells infected with BKPyV Dik WT, M1, or M2 and probed with an antibody reactive against LT. The lower molecular weight band is LT, and the higher molecular weight bands are consistent with superT.

**Fig 6. Detection superT-encoding transcripts in PyV-associated cancers.**
A. Schematic detailing the generation of superT during lytic infection and from integrated virus in cancer. During viral infection, the RNA polymerase can circle the viral genome multiple times, resulting in a pre-mRNA that can be spliced to generate superT. In the case of host integration, a polyomavirus can be integrated in tandem copies such that a pre-mRNA is generated with more than one copy of the viral early region. This pre-mRNA can be similarly spliced to generate a superT transcript. B. Heatmap indicating the abundance of the superT, ST, and LT introns from RNAseq data from two replicates of a BKPyV-positive bladder cancer and six MCPyV-associated MCCs. Percentages indicate the percentage of spliced early viral reads for each sample. The splice measured in each row is indicated by the red arrow in the schematics on the right side of the Fig.

See this image and copyright information in PMC

References

1. Jiang M, Abend JR, Johnson SF, Imperiale MJ. The role of polyomaviruses in human disease. Virology. 2009;384(2):266–73. doi: 10.1016/j.virol.2008.09.027 - DOI - PMC - PubMed
1. Starrett GJ, Yu K, Golubeva Y, Lenz P, Piaskowski ML, Peterson D, et al. Common Mechanisms of Virus-Mediated Oncogenesis in Bladder Cancers Arising In Solid Organ Transplant Recipients. medRxiv. 2021:2021.11.11.21266080. doi: 10.1101/2021.11.11.21266080 - DOI - PMC - PubMed
1. Nguyen KD, Lee EE, Yue Y, Stork J, Pock L, North JP, et al. Human polyomavirus 6 and 7 are associated with pruritic and dyskeratotic dermatoses. Journal of the American Academy of Dermatology. 2017;76(5):932–40. e3. doi: 10.1016/j.jaad.2016.11.035 - DOI - PMC - PubMed
1. Fields BN. Fields’ virology: Lippincott Williams & Wilkins; 2007.
1. DeCaprio JA, Garcea RL. A cornucopia of human polyomaviruses. Nature Reviews Microbiology. 2013;11(4):264–76. doi: 10.1038/nrmicro2992 - DOI - PMC - PubMed

Publication types

Actions
Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Long-read sequencing reveals complex patterns of wraparound transcription in polyomaviruses

Affiliations

Long-read sequencing reveals complex patterns of wraparound transcription in polyomaviruses

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources