Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2021 Nov;125(10):1408-1419.
doi: 10.1038/s41416-021-01545-0. Epub 2021 Sep 15.

Multi-omics mapping of human papillomavirus integration sites illuminates novel cervical cancer target genes

Affiliations

Multi-omics mapping of human papillomavirus integration sites illuminates novel cervical cancer target genes

Marissa Iden et al. Br J Cancer. 2021 Nov.

Abstract

Background: Integration of human papillomavirus (HPV) into the host genome is a dominant feature of invasive cervical cancer (ICC), yet the tumorigenicity of cis genomic changes at integration sites remains largely understudied.

Methods: Combining multi-omics data from The Cancer Genome Atlas with patient-matched long-read sequencing of HPV integration sites, we developed a strategy for using HPV integration events to identify and prioritise novel candidate ICC target genes (integration-detected genes (IDGs)). Four IDGs were then chosen for in vitro functional studies employing small interfering RNA-mediated knockdown in cell migration, proliferation and colony formation assays.

Results: PacBio data revealed 267 unique human-HPV breakpoints comprising 87 total integration events in eight tumours. Candidate IDGs were filtered based on the following criteria: (1) proximity to integration site, (2) clonal representation of integration event, (3) tumour-specific expression (Z-score) and (4) association with ICC survival. Four candidates prioritised based on their unknown function in ICC (BNC1, RSBN1, USP36 and TAOK3) exhibited oncogenic properties in cervical cancer cell lines. Further, annotation of integration events provided clues regarding potential mechanisms underlying altered IDG expression in both integrated and non-integrated ICC tumours.

Conclusions: HPV integration events can guide the identification of novel IDGs for further study in cervical carcinogenesis and as putative therapeutic targets.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Fig. 1
Fig. 1. Filtering and prioritisation results of four candidate IDGs chosen for validation and functional testing.
Clonality (IAF = integration allele fraction) values and Z-score plots for BNC1 (a), RSBN1 (b), USP36 (c), and TAOK3 (d) demonstrate how each IDG passed our first two filtering criteria. Z-score plots depict a black bar for each TCGA ICC sample (x-axis; CESC = cervical squamous cell carcinoma and endocervical adenocarcinoma; n = 304) and their corresponding IDG-specific expression levels (y-axis; TCGA RNAseq data). Red lines mark IDG expression in the integrated tumour. Numerical values in red (b and c) = RSEM values for the integrated tumour. Next, we filtered on the association of BNC1 (e), RSBN1 (f), USP36 (g), and TAOK3 (h) expression with overall survival in the TCGA ICC cohort (HR = hazard ratio). ICC-specific overall survival association of BNC1 (i), RSBN1 (j), USP36 (k), and TAOK3 (l) expression (measured via qRT-PCR) was validated in a second ICC cohort (MCW-ICC; n = 142).
Fig. 2
Fig. 2. Annotation of HPV integration affecting BNC1 expression in TCGA-C5-A2LV.
PacBio long-read sequencing, TCGA, and UCSC Genome Browser (http://genome.ucsc.edu) data were used to annotate the HPV integration site proximal to the candidate IDG, BNC1. TCGA-C5-A2LV long-read data (PacBio) and TCGA sequencing (RNAseq), CNV (blue = loss; red = gain), and methylation data (blue = hypomethylation; red = hypermethylation) covering the area of integration (a; red box). The Integrative Genomics Viewer (IGV) was used for the visualisation of PacBio and TCGA RNAseq read alignments. PacBio coverage displays the read depths at each locus with a grey bar chart. PacBio alignments show individual aligned reads, where grey lines represent reads aligning to the human reference genome. For RNAseq, the coverage and alignment tracks are the same, but in between the two is a splice junction track that provides a visualisation of reads spanning splice junctions. Blue lines in the RNAseq IGV image connect reads spanning splice junctions. UCSC Genome Browser GeneHancer track suggests that the integration site is adjacent to a BNC1-specific enhancer (EnH; grey bar) ~88 kb from its promoter (red bar). In addition, HeLa cell-specific cistrome analysis (bottom of a) suggests that these regulatory regions are indeed applicable to cervical cancer. The Ribbon programme was used to generate a schematic of a single PacBio read covering the area of integration, showing how the HPV genome is inserted (red) with human sequence flanking both sides (blue; b). Thick bars across the top (b) represent the HPV and human reference genomes that are connected by dashed lines to a single PacBio read covering the integration to show how it specifically mapped to each genome. Data from all PacBio long reads covering the integration event were used to schematically annotate the integration event (c). Breakpoints identified from TCGA short-read sequencing (SR) of tumour RNA are highlighted in the yellow boxes. The dashed line represents a portion of the human genome not covered by PacBio reads. Collectively, the data support potential HPV integration-induced upregulation of BNC1 enhancer RNA (eRNA), leading to increased BNC1 expression.
Fig. 3
Fig. 3. Annotation of HPV integration affecting RSBN1 expression in TCGA-C5-A3HD.
PacBio long-read sequencing, TCGA, and UCSC Genome Browser (http://genome.ucsc.edu) data were used to annotate the HPV integration site within the candidate IDG, RSBN1. TCGA-C5-A3HD long-read data (PacBio) and TCGA sequencing (RNAseq), CNV (blue = loss; red = gain), and methylation data (blue = hypomethylation; red = hypermethylation) covering the integration event, which spans ~50 kb of the human genome (a). The Ribbon programme was used to generate a schematic of a single PacBio read (b) covering the area of integration including the human–viral breakpoint with the highest IAF value (exon 2 of RSBN1; red box in a). Thick bars across the top (b) represent the HPV and human reference genomes, which are connected by dashed lines to a single PacBio read covering the integration to show how it specifically mapped to each genome. Data from all PacBio long reads covering the integration event were used to hand annotate the integration event (c). Breakpoints identified from TCGA short-read sequencing (SR) are highlighted in the yellow boxes. SRa and SRb are segments (connected by dotted line) of a single Illumina read spanning a breakpoint connecting two non-contiguous sequences of the human genome (represented as a diagonal double line in PacBio long read). The dashed purple line represents a portion of the human genome not covered by PacBio reads. The regions of greatest amplification harbour an RSBN1-specific promoter and enhancer (GeneHancer track; red and grey boxes, respectively) poised adjacent to the inserted HPV16 genome, possibly suggesting viral-driven expression of these gene-specific regulatory elements.
Fig. 4
Fig. 4. Annotation of HPV integration affecting USP36 expression in TCGA-C5-A8XH.
PacBio long-read sequencing, TCGA, and UCSC Genome Browser (http://genome.ucsc.edu) data were used to annotate the HPV integration site within the candidate IDG, USP36. TCGA-C5-A8XH long-read data (PacBio) and TCGA sequencing (RNAseq), CNV (blue = loss; red = gain), and methylation data (blue = hypomethylation; red = hypermethylation) covering the integration event spanning ~200 kb of the human genome (a). The Ribbon programme was used to generate a schematic of two PacBio reads covering the area of integration. Thick bars across the top (b) represent the HPV and human reference genomes, which are connected by dashed lines to two unique PacBio reads covering the integration to show how they are specifically mapped to each genome. Data from all PacBio long reads covering the integration event were used to hand annotate the integration event (c). Breakpoints identified from TCGA short-read sequencing (SR) are highlighted in the yellow boxes. The diagonal double line represents a breakpoint connecting two non-contiguous sequences of the human genome. TCGA RNAseq data suggest the expression of the fused USP36-encoding DNA with upstream intergenic DNA located between the SCAT and CYTH1 genes and sharp upregulation of USP36 expression beginning at intron 4, potentially driven by the inserted viral URR.
Fig. 5
Fig. 5. Annotation of HPV integration affecting TAOK3 expression in TCGA-C5-A2LX.
PacBio long-read sequencing, TCGA, and UCSC Genome Browser (http://genome.ucsc.edu) data were used to annotate the HPV integration site within the candidate IDG, TAOK3. TCGA-C5-A2LX long-read data (PacBio) and TCGA sequencing (RNAseq), CNV (blue = loss; red = gain), and methylation data (blue = hypomethylation; red = hypermethylation) covering integration event is depicted in panel (a). The Ribbon programme was used to generate a schematic of two PacBio reads covering the area of integration. Thick bars across the top (b) represent the HPV and human reference genomes, which are connected by dashed lines to two unique PacBio reads covering the integration to show how they are specifically mapped to each genome. Data from all PacBio long reads covering the integration event were used to hand annotate the integration event (c). Breakpoints identified from TCGA short-read sequencing (SR) are highlighted in the yellow boxes. PacBio sequencing successfully captured the entirety of the HPV insertion, which comprised almost two full copies of the HPV16 genome (pink) flanked on both sides by intron 9 of TAOK3 (olive).
Fig. 6
Fig. 6. Functional testing of candidate IDGs in cervical cancer cell lines.
Two cervical cancer established cell lines (SiHa and HeLa) were subjected to siRNA-mediated knockdown (KD) of each candidate IDG (labelled #1) or scrambled negative control (siCONT) and tested in three functional assays. Of note, results were validated in SiHa cells using a second, unique siRNA targeting each IDG (labelled #2 in each SiHa graph). Knockdown of each IDG was first confirmed via qRT-PCR (a). KD of all four candidate IDGs significantly decreased SiHa and HeLa cell migration (b). KD of all four IDGs significantly decreased SiHa proliferation (c; day 5). In HeLa cells, TAOK3 KD did not significantly affect cell proliferation, while KD of BNC1 (3d and 5d), RSBN1 (3d and 5d) and USP36 (3d) significantly decreased cell proliferation (c). SiHa colony formation was significantly decreased following KD of all four IDGs, while HeLa colony formation was only significantly affected by KD of BNC1 and USP36 (d). Each experiment was run in triplicate and data are presented as mean ± standard error of the mean. *p ≤ 0.05; **p ≤ 0.01; ***p ≤ 0.001; ****p ≤ 0.0001.

References

    1. Wentzensen N, Vinokurova S, von Knebel Doeberitz M. Systematic review of genomic integration sites of human papillomavirus genomes in epithelial dysplasia and invasive cancer of the female lower genital tract. Cancer Res. 2004;64:3878–84. doi: 10.1158/0008-5472.CAN-04-0009. - DOI - PubMed
    1. Pett M, Coleman N. Integration of high-risk human papillomavirus: a key event in cervical carcinogenesis? J Pathol. 2007;212:356–67. doi: 10.1002/path.2192. - DOI - PubMed
    1. Moody CA, Laimins LA. Human papillomavirus oncoproteins: pathways to transformation. Nat Rev Cancer. 2010;10:550–60. doi: 10.1038/nrc2886. - DOI - PubMed
    1. Bester AC, Roniger M, Oren YS, Im MM, Sarni D, Chaoat M, et al. Nucleotide deficiency promotes genomic instability in early stages of cancer development. Cell. 2011;145:435–46. doi: 10.1016/j.cell.2011.03.044. - DOI - PMC - PubMed
    1. Parfenov M, Pedamallu CS, Gehlenborg N, Freeman SS, Danilova L, Bristow CA, et al. Characterization of HPV and host genome interactions in primary head and neck cancers. Proc Natl Acad Sci USA. 2014;111:15544–9. doi: 10.1073/pnas.1416074111. - DOI - PMC - PubMed

Publication types

MeSH terms