Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Jan;32(1):55-70.
doi: 10.1101/gr.275911.121. Epub 2021 Dec 13.

Diverse tumorigenic consequences of human papillomavirus integration in primary oropharyngeal cancers

Affiliations

Diverse tumorigenic consequences of human papillomavirus integration in primary oropharyngeal cancers

David E Symer et al. Genome Res. 2022 Jan.

Abstract

Human papillomavirus (HPV) causes 5% of all cancers and frequently integrates into host chromosomes. The HPV oncoproteins E6 and E7 are necessary but insufficient for cancer formation, indicating that additional secondary genetic events are required. Here, we investigate potential oncogenic impacts of virus integration. Analysis of 105 HPV-positive oropharyngeal cancers by whole-genome sequencing detects virus integration in 77%, revealing five statistically significant sites of recurrent integration near genes that regulate epithelial stem cell maintenance (i.e., SOX2, TP63, FGFR, MYC) and immune evasion (i.e., CD274). Genomic copy number hyperamplification is enriched 16-fold near HPV integrants, and the extent of focal host genomic instability increases with their local density. The frequency of genes expressed at extreme outlier levels is increased 86-fold within ±150 kb of integrants. Across 95% of tumors with integration, host gene transcription is disrupted via intragenic integrants, chimeric transcription, outlier expression, gene breaking, and/or de novo expression of noncoding or imprinted genes. We conclude that virus integration can contribute to carcinogenesis in a large majority of HPV-positive oropharyngeal cancers by inducing extensive disruption of host genome structure and gene expression.

PubMed Disclaimer

Figures

Figure 1.
Figure 1.
Frequent clustering of virus–host breakpoints in individual tumors. (A) Breakpoints (dots, n = 874) identified in 105 HPV-positive OPSCCs mapped to the HPV16 genome (x-axis). Non-HPV16 breakpoint (n = 50) coordinates are approximated; y-axis, log10 of WGS reads supporting each breakpoint. (B) Breakpoints uniquely mapped to the human genome (x-axis, n = 756, hg19) are clustered within 500-kb windows; y-axis as in A. Colors: breakpoints in individual tumors in clusters, as per key in panel C. (C) Counts of uniquely mapped breakpoints (y-axis), ranked by frequency in individual HPV-positive OPSCCs (x-axis). Colors: breakpoint counts per cluster. (D) Overall frequencies of breakpoints in clusters across all tumors (x-axis; colors, breakpoint counts per cluster). (E) Overall frequencies of distinct genetic loci harboring clusters of various breakpoint counts within 500 kb (x-axis) in individual tumors. See also Supplemental Figures S1.1–S1.6 and Supplemental Tables S1.1–S1.9.
Figure 2.
Figure 2.
Genomic hotspots of integration near genes involved in epithelial stem cell maintenance and immune evasion. (A) Counts of independent tumors harboring ≥1 virus–host breakpoints (y-axis) in 1-Mbp genomic segments across the human genome (x-axis). Recurrent hotspots are identified at five segments containing SOX2, TP63, FGFR3, MYC, and CD274, each across at least three tumors (orange, n = 3 tumors; red, n = 4; empirical probability, P = 1 × 10−6). Other integration sites are not statistically significant hotspots in these tumors (blue, n = 1 tumor; green, n = 2). (B) IGV browser views of WGS depth of coverage (y-axis) and virus–host breakpoints (red) in four independent tumors within a 1-Mbp genomic segment containing MYC (x-axis; light blue vertical lines, exons). Colors as in key at top. Y-axis scale, tick mark, range (maximal count) of mapped WGS reads at locus in each tumor. See also Supplemental Figure S2.1. (C) Transcript levels of SOX2, TP63, FGFR3, MYC, and CD274 (y-axis, Z-score of log2 TPM value), in tumors quantified by RNA-seq (circles, n = 103). Red fill: tumors with breakpoints near the hotspot genes (panels A,B). Box and whiskers, median (brown horizontal line), quartiles (light gray box). See also Supplemental Figures S2.1, S2.2 and Supplemental Tables S2.1, S2.2.
Figure 3.
Figure 3.
Associations between HPV integrants, CNVs, and SVs in individual tumors. (A) Strudel plot shows virus–host breakpoints in a representative OPSCC, GS18047. Breakpoints mapped to the HPV16 genome (top, x-axis) are connected (black lines) with the host genome (bottom, x-axis), clustered on Chr 4, 5, 9, 10, 19, and 22 (colored dots, as per key in Fig. 1C). Disrupted genes include CD274 and EP300. (B) Haplotype-resolved linked reads (blue, host depths of sequencing coverage; red, HPV) connect HPV16 sequences (right), virus–host breakpoints (red peaks) and host–host breakpoints (gray) on one allele (haplotype 2) at the CD274 locus on Chr 9p24.1. Graph (bottom), shared linked-read barcodes connect HPV16 exclusively to haplotype 2 (red) but not haplotype 1 (black). (C) Fraction of genes with (red) or without (gray) HPV breakpoints within ±500 kb that are annotated cancer genes (y-axis) (Sondka et al. 2018). Fisher's exact test, P = 6.3 × 10−5 (Supplemental Table S3.2). (D) Scatterplot shows strong correlation between read counts supporting individual breakpoints (red dots) identified with HPV capture-seq (y-axis, n = 164) versus WGS (x-axis, n = 86) in the same tumor (r = 0.91; P = 1.8 × 10−63). (E) Adding 53 tumors studied by HPV capture-seq to 105 tumors studied by WGS (Fig. 1A), tumors harboring ≥1 virus–host breakpoints (y-axis) in 1-Mbp genomic segments were recounted across the human genome (x-axis). Statistically significant, recurrent hotspots (orange, n = 3 tumors; red, n = 4 or 5) are detected at segments containing SOX2, TP63, FGFR3, MYC, CD274, and KLF5 (empirical probability, P = 7 × 10−6). See also Figure 2, Supplemental Figures S3.1–S3.3, and Supplemental Tables S3.1–S3.7.
Figure 4.
Figure 4.
HPV integrants are associated with CNVs and SVs across tumors. (A) Shown are distinct frequency distributions (y-axis) of copy numbers (x-axis) of 100-kb genomic segments with (red) versus without (blue) virus–host breakpoints across 105 HPV-positive OPSCC (χ2, P = 1.8 × 10−18). (B) Quantile-quantile (Q-Q) plot confirms differences in copy numbers of genomic segments with (y-axis) and without (x-axis) breakpoints, deviating significantly from the line of identity (P < 2.2 × 10−16, Kolmogorov–Smirnov test). (C) Frequencies (y-axis) of structural variation (SV, left) and step-changes in copy number (copy number transition [CT] ±0.5 n, right) are significantly greater in 100-kb segments with a breakpoint (red) versus without (gray) (SV; binomial test, one-tailed, P = 3.3 × 10−224; CT, P = 2.39 × 10−14). (D,E) Among 500-kb genomic segments with ≥1 breakpoints, frequencies (y-axis) of (D) SVs and (E) CTs increase with breakpoint counts in the cluster. See also Supplemental Table S4.
Figure 5.
Figure 5.
HPV integrants are associated with outlier expression of neighboring host genes. (A) Q-Q plot compares Z-score distributions of expression levels for genes near (±500 kb) virus–host breakpoints (y-axis) versus expression of the same genes without nearby breakpoints in all other tumors (x-axis; Kolmogorov–Smirnov test, P < 2.2 × 10−16). Line of identity (dark gray). (B) Percent of genes with outlier expression that are not cancer genes (–, left) or are cancer genes (+, right) as per Cancer Gene Census Database, and are not (gray) or are (red) within ±150 kb of an HPV integrant (Fisher's exact test, FDR correction, P = 2.2 × 10−11 [left] and P = 7.2 × 10−55 [right], respectively). (C) Of 220 genes expressed at outlier levels (Z-score ≥ 2 or ≤−2) in ≥1 tumor and within ±500 kb of a breakpoint, 16 are cancer genes as shown. Box and whiskers plot, Z-scores (y-axis) for cancer genes (x-axis) in samples harboring nearby breakpoints (red) versus lacking them (no fill). (D) Comparison of gene counts (y-axis) expressed at various levels (Z-scores, x-axis), grouped in 50-kb genomic distances from the nearest breakpoint, in tumors with (red) and without (blue) breakpoints in those segments. Of 194 genes harboring breakpoints, 155 are expressed as per available RNA-seq data. Left to right, breakpoints across the tumors inside or outside genes as indicated; n, counts; q = adj. P-values, binomial test. (E) Percentages of genes expressed at outlier levels (Z ≥ 2) at indicated copy numbers (x-axis) in absence (gray) or presence (red) of breakpoints within ±500 kb. Copy number loss, n < 1.5; normal, 1.5 ≤ n ≤ 2.5; gain 2.5 ≤ n ≤ 5; hyper-gain n > 5. Asterisks, adj. P < 1 × 10−4, binomial test, one-tailed, adjusted by FDR. (F) Q-Q plot of χ2adjusted P-values calculated from comparison of gene expression Z-score distributions (i.e., sum of the square of Z-scores) at chromosomal loci with HPV-mediated rearrangements (y-axis) versus matched loci without rearrangements (x-axis). (G, top) Depth of sequencing coverage (y-axis); (bottom) Z-scores of log2 TPM (y-axis) for genes in a tumor with an HPV-linked rearrangement on Chr 11q13.3 (x-axis); genes with outlier expression (red fill). Breakpoints (red vertical lines) mapping within a 2.4-Mbp region with eightfold amplification result in outlier expression (Z-score ≥ 2) for 22 (67%) of 33 genes including cyclin D1 (CCND1). Tumor with HPV-linked rearrangement: Z-score < 2 (pink), outlier Z-score ≥ 2 (red); all other tumors without detectable local HPV insertions (black). See also Supplemental Figures S4.1, S4.2 and Supplemental Tables S5.1–S5.3.
Figure 6.
Figure 6.
HPV integrants induce various forms of genetic disruption including gene breakage and chimeric transcription. (A) Counts of virus–host chimeric transcript junctions (y-axis) in 91 HPV16-positive tumors, aligned to the HPV16 genome (x-axis) with known splice donor (SD coordinate, red), splice acceptor (SA coordinate, blue), and other (gray) sites. (B) Counts of split or discordant RNA-seq reads (y-axis) in 103 HPV-positive tumors supporting chimeric transcript junctions (n = 673), aligned to the human genome (x-axis). (C) Frequency distribution (y-axis, percent total) of log10-transformed genomic distances (x-axis) between virus–host junctions from RNA-seq versus nearest DNA breakpoint (n = 604). (D) Venn diagram counts chimeric transcripts expressed at 147 genes, via host splice donor (SD, blue, n = 61); splice acceptor (SA, red, n = 75); readthrough transcription (purple, n = 40), and/or cryptic splice sites (green, n = 114). (EG) Sashimi plots depict counts of mapped RNA-seq reads at genes with HPV integrants in affected tumor (top, center panels) versus without integrants in control tumor (bottom). Intron sequences not shown to scale. Center, bottom panels, identical scale of reads (y-axis, brackets). Black arcs, numbers, read counts connecting spliced exons. (E) Intragenic HPV integrants in MAML2 (red) flank a ∼75-kb duplication including exon 2, and delete small intronic segment C. Gene breaking involves premature transcriptional termination of MAML2 after exon 2 and de novo initiation of downstream transcripts from HPV. Segment B is truncated for visualization. (F) Intragenic HPV integrants in IPO8 (red) delete distal exons 23–25, disrupting 3′ transcripts and up-regulating upstream exons. (G) Intergenic HPV integrants both upstream of and downstream from INSIG2 flank a ∼665-kb duplication on Chr 2q14, extending from Chr 2:118.826 to 119.492 Mbp. Numerous chimeric transcripts originating from an upstream, intergenic HPV16 integrant are spliced to a novel exon, novel splice acceptor site, and exons 1, 2, and 3, causing gene disruption. See also Supplemental Figures S5.1–S5.4 and Supplemental Tables S6.1–S6.6.
Figure 7.
Figure 7.
Simple HPV69 integrants induce high expression of an imprinted oncogene, RTL1, and of C15orf65. (AC) Box and whisker plots depict log10-transformed counts of (A) SNVs or small indels per megabase pair, (B) copy number step-changes (±0.5 n), and (C) SV breakpoints in 105 OPSCC (circles): red, tumor with HPV69 integrants; no fill, all others. (D,E) Sashimi plots of (top) chimeric transcripts initiated in HPV69, spliced to exons of (D) RTL1 and (E) C15orf65, leading to extremely high expression relative to controls. Black line, numbers, read counts connecting spliced exons; bottom, read counts of conventional transcripts in control tumor. Some RTL1 fusion transcripts extend past the 3′ transcription termination signal. (F,G) Box and whiskers plots of expression levels of (F) RTL1 and (G) C15orf65 in 103 HPV-positive OPSCC. Red, HPV69-positive tumor; gray, all others. See also Supplemental Table S7.

References

    1. Adey A, Burton JN, Kitzman JO, Hiatt JB, Lewis AP, Martin BK, Qiu R, Lee C, Shendure J. 2013. The haplotype-resolved genome and epigenome of the aneuploid HeLa cancer cell line. Nature 500: 207–211. 10.1038/nature12064 - DOI - PMC - PubMed
    1. Akagi K, Li J, Broutian TR, Padilla-Nash H, Xiao W, Jiang B, Rocco JW, Teknos TN, Kumar B, Wangsa D, et al. 2014. Genome-wide analysis of HPV integration in human cancers reveals recurrent, focal genomic instability. Genome Res 24: 185–199. 10.1101/gr.164806.113 - DOI - PMC - PubMed
    1. Bass AJ, Watanabe H, Mermel CH, Yu S, Perner S, Verhaak RG, Kim SY, Wardwell L, Tamayo P, Gat-Viks I, et al. 2009. SOX2 is an amplified lineage-survival oncogene in lung and esophageal squamous cell carcinomas. Nat Genet 41: 1238–1242. 10.1038/ng.465 - DOI - PMC - PubMed
    1. Bernard E, Pons-Salort M, Favre M, Heard I, Delarocque-Astagneau E, Guillemot D, Thiébaut AC. 2013. Comparing human papillomavirus prevalences in women with normal cytology or invasive cervical cancer to rank genotypes according to their oncogenic potential: a meta-analysis of observational studies. BMC Infect Dis 13: 373. 10.1186/1471-2334-13-373 - DOI - PMC - PubMed
    1. Bodelon C, Untereiner ME, Machiela MJ, Vinokurova S, Wentzensen N. 2016. Genomic characterization of viral integration sites in HPV-related cancers. Int J Cancer 139: 2001–2011. 10.1002/ijc.30243 - DOI - PMC - PubMed

Publication types

MeSH terms

Substances