Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Apr 26;113(17):E2373-82.
doi: 10.1073/pnas.1520010113. Epub 2016 Apr 7.

The tandem duplicator phenotype as a distinct genomic configuration in cancer

Affiliations

The tandem duplicator phenotype as a distinct genomic configuration in cancer

Francesca Menghi et al. Proc Natl Acad Sci U S A. .

Abstract

Next-generation sequencing studies have revealed genome-wide structural variation patterns in cancer, such as chromothripsis and chromoplexy, that do not engage a single discernable driver mutation, and whose clinical relevance is unclear. We devised a robust genomic metric able to identify cancers with a chromotype called tandem duplicator phenotype (TDP) characterized by frequent and distributed tandem duplications (TDs). Enriched only in triple-negative breast cancer (TNBC) and in ovarian, endometrial, and liver cancers, TDP tumors conjointly exhibit tumor protein p53 (TP53) mutations, disruption of breast cancer 1 (BRCA1), and increased expression of DNA replication genes pointing at rereplication in a defective checkpoint environment as a plausible causal mechanism. The resultant TDs in TDP augment global oncogene expression and disrupt tumor suppressor genes. Importantly, the TDP strongly correlates with cisplatin sensitivity in both TNBC cell lines and primary patient-derived xenografts. We conclude that the TDP is a common cancer chromotype that coordinately alters oncogene/tumor suppressor expression with potential as a marker for chemotherapeutic response.

Keywords: BRCA1; TP53; cisplatin; tandem duplications; triple-negative breast cancer.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Fig. 1.
Fig. 1.
TDP scoring and sample classification. (A) Circos plots showing structural variations of representative cancer genomes with different levels of TDP scores. For each plot, sample identification number, the TDP score, and number of TDs over the total number of detected rearrangements are indicated (top to bottom). Structural variations were classified based on the four basic discordant paired-end mappings as TDs (red), deletions (blue), unpaired inversions (green), or interchromosomal translocations (gray). (B) Trimodal distribution of the TDP score values across the 277 cancer samples examined.
Fig. S1.
Fig. S1.
Structural variation-based score distributions and TDP status assignment. (A) Trimodal distribution of TDP scores (n = 266 samples with detected TDs) and cutoff for TDP classification. We resolved the trimodal distribution of TDP scores using the normalmixEM function of the mixtools package in R. The fraction of samples belonging to each one of the three underlying normal distributions, as well as the median and SD values of each curve, is shown in the table. The cutoff value to classify TDP samples is set to −0.71, which corresponds to the median + 2 × SDs of the second distribution. For better visualization, TDP scores were then centered around 0, as shown in C. (B) Scatter plot of TDP scores and TD numbers across tumor types (n = 266 cancer genomes analyzed by WGS). A color code differentiates between tumor types that are TDP-enriched (red), TDP-depleted (blue), or with no significant TDP prevalence (grey), as indicated in Table 1. (C) Distribution of the four basic structural variation scores across all cancer samples (n = 277). A calculation analogous to the one used to compute TDP scores was applied to other structural variation types (deletion, inversion, and interchromosomal translocation). Only the distribution of TD scores (red) shows a clear sample subpopulation characterized by distinctively higher scores.
Fig. S2.
Fig. S2.
TDP status prediction using array-based copy number (CN) data. (A) TD-like segmental duplications were defined as CN segments ranging between 1 kb and 2 Mb in length, which showed an increase in C compared with both of their neighboring segments (log2 CN ratio ≥ 0.3) in a genomic region of otherwise homogeneous CN (difference in log2 CN ratios between the two flanking segments ≤ 0.3). Scatter plots of the number of TD (B) and TDP (C) scores as predicted by WGS or SNP array CN analysis for each one of the 81 TCGA cancer samples for which both types of data were available. (D) Sensitivity and specificity of TDP predictions based on CN data. The TDP classification obtained based on WGS data is used as a reference. (E and F) More stringent differentiation between TDP and non-TDP samples improves the sensitivity (0.80) of TDP sample detection using SNP array data, while maintaining a high degree of specificity (0.94). TDP tumors are defined as samples whose TDP score is higher than 0, as previously defined for WG-sequenced genomes. However, non-TDP samples are identified relative to a non-TDP SNP array-based threshold computed based on the trimodal distribution of TDP scores across the entire SNP array dataset (n = 3,535 samples, threshold = −0.4).
Fig. S3.
Fig. S3.
Molecular features of the genomic regions affected by TD breakpoints in TDP cancer genomes. (A) TD breakpoints cluster in gene-dense regions. The scatter plot shows a positive correlation between gene density and TD breakpoint density, computed per 10-Mb overlapping windows (1-Mb offset) along the entire genome. The combined TD coordinate data corresponding to the total of 50 TDP tumor genomes identified via WGS (including all available tumor types) were used in this analysis. The Pearson correlation coefficient (R) and its corresponding P value are reported in the graphs. (B) TDs are more likely to engage gene bodies than intergenic regions. Histogram bars represent the fraction of TD breakpoints that map within gene bodies in TDP genomes. A red line indicates the overall fraction of the genome occupied by gene bodies (including coding and noncoding sequences). ***P < 0.0001, computed using the binomial test. (C) Genes that are frequently located at the boundaries of TDs in TDP breast cancer genomes are generally expressed at high levels in the normal breast epithelium. Density plots represent the distribution of gene expression levels in normal breast tissue samples from the TCGA dataset (n = 106). Median values for each distribution are indicated by dashed lines. A P value (vs. all RefSeq genes, n = 20,502) was computed using the Mann–Whitney U test. (D) Pol2 binding site enrichment in the proximity of breast cancer TD break points. Histogram bars correspond to the average OR of 43 Pol2 ChIP-seq datasets. ***P < 0.0001. (E and F) Histone modification marks enrichment/depletion in the proximity of breast cancer TD breakpoints. The results shown correspond to ChIP-seq datasets generated from the HMEC (E) and the vHMEC (F) cell lines. ***P < 0.0001. (G) Enrichment ORs for different histone modification marks in the proximity of breast cancer TD breakpoints in TDP breast tumors (n = 23 tumors). ChIP-seq data for both the HMEC (Top) and the vHMEC (Bottom) cell lines are shown. Each bin on the horizontal axis represents a range of nonoverlapping distances (e.g., a mark between 10 kb and 20 kb corresponds to the enrichment in regions >10 kb but <20 kb from the nearest TD breakpoint).
Fig. 2.
Fig. 2.
Genomic features of TDs in TDP and non-TDP tumors. (A) Correlation of TDP score and median TD span size across the 277 tumor genomes analyzed by WGS. Horizontal lines indicate the overall median span size for the TDP and non-TDP sample subgroups. A P value was computed using Student’s t test. (B) TD span distributions for the TDP and the non-TDP sample groups. TDP samples feature TDs with span peaks at ∼10 kb and ∼150 kb. Non-TDP samples feature a much larger TD span range, which homogeneously ranges from ∼1 to ∼10 Mb. A P value for the distance between the two empirical distributions was generated using the two-sample Kolmogorov–Smirnov test. (C) Sequence analysis of TD breakpoints across TDP (n = 4) and non-TDP (n = 7) TNBC cell line genomes. ORs and P values were computed using Fisher’s exact test. (D) Replication time (RT) of genes located inside or on the boundary of TDs in TDP and non-TDP samples based on the breast cancer dataset. RT is expressed on a scale of 100 (early) to 1,500 (late). P values were computed based on the Mann–Whitney U test.
Fig. 3.
Fig. 3.
TDP is characterized by the coordinated perturbation of several cancer genes. (A) Fold change (FC) in gene expression (breast tumor/normal breast) for genes frequently located inside or at the boundary of TDs in TDP tumors (P values determined by the Mann–Whitney U test). (B) Genes frequently affected by a TD breakpoint are enriched in anticancer genes (Left), whereas genes frequently spanned by a TD are enriched in procancer genes (Middle). (Right) Short-span TDs appear to interfere with anticancer most frequently as opposed to procancer gene integrity. (P values determined by Fisher’s exact test).
Fig. S4.
Fig. S4.
TD-like features specifically affect TSGs and oncogenes. (A) Data from 418 TDP genomes assessed by SNP array (TNB, NTNB, OV, and UCEC datasets). P values and ORs were computed using Fisher’s exact test. NS, not significant. (B) Histograms of frequencies for genes found at the boundaries (Left) or inside (Right) TD-like features in TDP tumors. Thresholds for frequency significance were defined based on 1,000 random gene sampling as described in Materials and Methods. Specific examples of oncogenes (red) and TSGs (blue) are indicated by arrows, together with the number of unique TDP tumors in which they are affected. (C) Heat map of co-occurrences for the top 25 genes found inside (red) and at the boundaries of (blue) TD-like features in TDP tumors. The top known cancer genes are indicated with the percentage of samples in which they are affected. (D) Co-occurences are likely for genes that map within a short distance of each other, and are therefore affected by the same TDs. The top 25 TD-inside genes shown in C are clustered based on chromosomal location. (E) Overview of all TD-like features at specific chromosomal loci. TD-like features are color-coded based on their effect on the gene of interest depicted in each graph [i.e. PAX8 (Top), PTEN (Bottom)]: gray, no effect; red, gene duplication (the target gene is located inside the TD); blue, gene disruption [the target gene located at the TD boundary (i.e., BP)]. BP, breakpoint.
Fig. S5.
Fig. S5.
Short-span TDs cause TSG disruption. (A) Short-span TDs (<100 kb) are more likely to fall completely within gene bodies than expected by chance. Short-span TD genomic coordinates (n = 3,086, based on WGS data from 50 TDP cancer genomes) were randomly permuted 1,000 times, preserving their sizes. At each permutation, the percentage of TDs integrally falling within gene bodies was recorded to generate the expected distribution. A red vertical line indicates the observed percentage of gene-embedded TDs, which exceeds all of the 1,000 permuted values. (B) UCSC Genome Browser screen shot showing the location of two short-span TDs affecting the integrity of the PTEN TSG on chromosome (chr) 10.
Fig. S6.
Fig. S6.
TDP samples do not consistently show a higher mutation burden compared with non-TDP samples. Box plots represent distributions in the number of unique genes per sample that are affected by nonsilent somatic mutations. Although there is a significant increase in the overall number of mutations detected in TDP compared with non-TDP samples in the two breast cancer datasets analyzed, and with a more modest significant increase in the OV dataset, the trend was completely reversed in the UCEC dataset. TDP status was assigned based on SNP array data. P values were computed using the Mann–Whitney U test.
Fig. 4.
Fig. 4.
Loss of the TP53 and BRCA1 tumor suppressor genes in the context of abnormal DNA replication may provide a permissive background for the insurgence of the TDP. (A) TP53 mutation rate is recurrently higher in TDP samples compared with non-TDP samples. ORs and corresponding P values refer to the enrichment of TDP samples for samples with gene disruption. Percentages of TDP and non-TDP samples carrying the gene disruption are indicated in purple and green, respectively. (B and C) DNA replication genes are consistently up-regulated in TDP vs. non-TDP samples. (B) Top 10 GO terms significantly enriched in up-regulated genes (TDP vs. non-TDP) across the four different datasets analyzed. (C) Heat map of individual gene expression levels. Tumor samples are sorted based on tumor type and increasing TDP score. Only the 23 DEGs closely involved in DNA replication are shown. (D) TDP samples are significantly enriched in BRCA1 low expressors across different tumor types. The threshold for low BRCA1 expression was defined based on the bimodal distribution of BRCA1 transcriptional levels in each individual dataset. Graph annotations are as in A. Expression levels of the BRCA1 gene in TDP (purple) and non-TDP (green) TNBC cell lines (E) and PDXs (F) are shown. TDP scores for these genomes were computed based on WGS data. The BRCA1 somatic mutational status is indicated in brackets. mt, mutated; na, not available; wt, wild type. Pearson correlation coefficients (R) and their corresponding P values are reported in each graph. (Right) Box plots of BRCA1 expression values for TDP and non-TDP sample groups, log2-fold changes and Student’s t test P values are shown. (G) TDP samples are enriched for BRCA1-deficient tumors in both the TNB and OV datasets. BRCA1 loss is defined by the presence of germline or somatic mutations, or promoter methylation.
Fig. S7.
Fig. S7.
Loss of BRCA1, but not of BRCA2, in TDP tumors. (A) Box plot of BRCA1 expression values for the TNB dataset. The BRCA1 gene is significantly down-regulated in TDP compared with non-TDP samples. Adj., adjusted. (B) Bimodal distribution of BRCA1 expression values was resolved to identify low expressors. Low BRCA1 expressors are significantly enriched for TDP samples. (C) BRCA1 expression levels are inversely correlated with BRCA1 promoter methylation levels in the TNB and OV datasets (Pearson correlation: R = −0.61, P = 2.3E-07 for the TNB dataset; R = −0.74, P < 1.0E-05 for the OV dataset). The 10% most highly methylated samples at the BRCA1 promoter are indicated in red. (D) Contrary to the BRCA1 gene, the BRCA2 gene is more frequently mutated in non-TDP compared with TDP tumors across different tumor types. Only somatic mutations were analyzed for the UCEC dataset.
Fig. S8.
Fig. S8.
TDP-associated overexpression of DNA replication genes does not depend on their duplication status. Frequently up-regulated DNA replication genes that are also often affected by TDs across TDP samples were tested to assess whether their expression levels could be explained by the presence of TDs that increased their CN status. For each gene, TDP samples with TDs spanning its entire length were removed from the analysis of differential gene expression. In all four cases, differences in expression levels between non-TDP and TDP tumors remained significant. ***P < 0.0001, Mann–Whitney U test.
Fig. 5.
Fig. 5.
TDP as a genomic marker for drug sensitivity. (A) TDP scores correlate with cisplatin or carboplatin sensitivities in TNBC cell lines. Pearson correlation coefficients (R) and their corresponding P values are reported in the graph. Ln, natural logarithm. (B) TDP scores associate with cisplatin sensitivity in vivo. Waterfall plots representing cisplatin response for eight TNB PDX models sorted by decreasing values of TDP scores are shown. Response calls are indicated underneath each bar and were computed based on adapted Response Evaluation Criteria in Solid Tumors (RECIST) criteria as described in SI Materials and Methods.
Fig. S9.
Fig. S9.
Molecular and functional features discriminating between TDs found in TDP and non-TDP cancer genomes. (A) Graphic summary. (B) OncoPrints for the 90 TNBC samples for which RNA-seq, SNP array, and mutation data were available. BRCA1 down-regulation was defined as in Fig. S7B. CCNE1 and CDT1 up-regulation was defined as a greater than twofold increase in expression compared with the average gene expression level across all TNB non-TDP tumors. Thirteen of 33 TDP tumors show perturbation of three or four of the candidate genes, whereas only two of 57 non-TDP tumors do. OR = 17.2, P = 2.1E-05 (Fisher’s exact test).

Comment in

References

    1. Hanahan D, Weinberg RA. Hallmarks of cancer: The next generation. Cell. 2011;144(5):646–674. - PubMed
    1. Yates LR, Campbell PJ. Evolution of the cancer genome. Nat Rev Genet. 2012;13(11):795–806. - PMC - PubMed
    1. Stratton MR, Campbell PJ, Futreal PA. The cancer genome. Nature. 2009;458(7239):719–724. - PMC - PubMed
    1. Baca SC, et al. Punctuated evolution of prostate cancer genomes. Cell. 2013;153(3):666–677. - PMC - PubMed
    1. Stephens PJ, et al. Massive genomic rearrangement acquired in a single catastrophic event during cancer development. Cell. 2011;144(1):27–40. - PMC - PubMed

Publication types

MeSH terms