. 2010 Jan;6(1):e1000631.

doi: 10.1371/journal.pcbi.1000631. Epub 2010 Jan 1.

Identification of networks of co-occurring, tumor-related DNA copy number changes using a genome-wide scoring approach

Christiaan Klijn¹, Jan Bot, David J Adams, Marcel Reinders, Lodewyk Wessels, Jos Jonkers

Affiliations

PMID: 20052266
PMCID: PMC2791203
DOI: 10.1371/journal.pcbi.1000631

Identification of networks of co-occurring, tumor-related DNA copy number changes using a genome-wide scoring approach

Christiaan Klijn et al. PLoS Comput Biol. 2010 Jan.

. 2010 Jan;6(1):e1000631.

doi: 10.1371/journal.pcbi.1000631. Epub 2010 Jan 1.

Authors

Christiaan Klijn¹, Jan Bot, David J Adams, Marcel Reinders, Lodewyk Wessels, Jos Jonkers

Affiliation

¹ Division of Molecular Biology, The Netherlands Cancer Institute, Amsterdam, The Netherlands.

PMID: 20052266
PMCID: PMC2791203
DOI: 10.1371/journal.pcbi.1000631

Abstract

Tumorigenesis is a multi-step process in which normal cells transform into malignant tumors following the accumulation of genetic mutations that enable them to evade the growth control checkpoints that would normally suppress their growth or result in apoptosis. It is therefore important to identify those combinations of mutations that collaborate in cancer development and progression. DNA copy number alterations (CNAs) are one of the ways in which cancer genes are deregulated in tumor cells. We hypothesized that synergistic interactions between cancer genes might be identified by looking for regions of co-occurring gain and/or loss. To this end we developed a scoring framework to separate truly co-occurring aberrations from passenger mutations and dominant single signals present in the data. The resulting regions of high co-occurrence can be investigated for between-region functional interactions. Analysis of high-resolution DNA copy number data from a panel of 95 hematological tumor cell lines correctly identified co-occurring recombinations at the T-cell receptor and immunoglobulin loci in T- and B-cell malignancies, respectively, showing that we can recover truly co-occurring genomic alterations. In addition, our analysis revealed networks of co-occurring genomic losses and gains that are enriched for cancer genes. These networks are also highly enriched for functional relationships between genes. We further examine sub-networks of these networks, core networks, which contain many known cancer genes. The core network for co-occurring DNA losses we find seems to be independent of the canonical cancer genes within the network. Our findings suggest that large-scale, low-intensity copy number alterations may be an important feature of cancer development or maintenance by affecting gene dosage of a large interconnected network of functionally related genes.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

**Figure 1. Co-occurrence score for paired continuous variables.**
a. Four possibilities of pairs of hypothetical DNA copy number change measurements are shown, for a set of samples. Each of the four hypothetical measurement pairs is plotted in scatter plot, giving each sample in the set an x- and y-coordinate. The random pair (first panel) is a noisy pair containing no effect. The constitutive member pair (second panel) consists of one measurement that is continuously high, paired with a measurement that varies between two noisy levels. The co-occurring signal (third panel) consists of two noisy measurements that alternate between a high and a basal level, but show concerted change. The mutual exclusive pair (fourth panel) also alternates between two levels but one measurement excludes the other from also reporting a high value. b. In this example we show scoring for co-occurring gains. Therefore we set all negative values to zero. To score for loss-loss pairs we would need to set all positive value to zero and continue using the absolute values. For loss-gain analysis we would set the positive values of the x (y) axis to zero and use the absolute values in the x (y) direction. c. The first panel shows the resulting scores of the four pairs of measurements if only the sum of the minimum is used. The second panel shows the score when the covariance is included.

**Figure 2. Schematic overview of co-occurrence analysis.**
a. Overview of aCGH data. Both and are vectors of genomic grid points spanning a chromosome arm (see Materials and Methods). The genomic grid is constructed from aCGH probe measurements, as explained in the Materials and Methods section. b. The combinations of and are used to construct a genomic pair-wise space in which all further calculations are performed. In this panel a schematic view of the genomic pair-wise space is shown. Each pair of genomic grid points between and is a point in this space and each point contains two values. A pair-wise genomic matrix exists for each tumor in the data set. c. To score for co-occurrence, the minimum value of the pairs of genomic grid points are summed over the tumors and the co-variance over tumors of all genomic grid points is calculated. This results in two equally sized matrices which are multiplied element wise to produce the co-occurrence score matrix. This matrix is again represented in the genomic pair-wise space (). d. The co-occurrence score matrix is convolved with a Gaussian matrix to find local enrichment of high co-occurrence scores in the pair-wise space. Peaks in the convolved co-occurrence matrix are translated back to two genomic regions ( and ) that are annotated as being co-aberrated across the tumor set. e. For the n-th peak in the Convolved Co-occurrence Matrix (CCM) two gene sets, and , are defined, based on a 2σ window centered on the peak. f1. Using a protein-protein interaction database the interactions between gene sets derived from a single co-occurrence peak are analyzed, producing a set of interactions (). f2. Using the Cancer Gene Census we inspect the resulting gene sets for presence of known tumor-suppressor genes and oncogenes.

formula image — **Figure 2. Schematic overview of co-occurrence analysis.**
a. Overview of aCGH data. Both and are vectors of genomic grid points spanning a chromosome arm (see Materials and Methods). The genomic grid is constructed from aCGH probe measurements, as explained in the Materials and Methods section. b. The combinations of and are used to construct a genomic pair-wise space in which all further calculations are performed. In this panel a schematic view of the genomic pair-wise space is shown. Each pair of genomic grid points between and is a point in this space and each point contains two values. A pair-wise genomic matrix exists for each tumor in the data set. c. To score for co-occurrence, the minimum value of the pairs of genomic grid points are summed over the tumors and the co-variance over tumors of all genomic grid points is calculated. This results in two equally sized matrices which are multiplied element wise to produce the co-occurrence score matrix. This matrix is again represented in the genomic pair-wise space (). d. The co-occurrence score matrix is convolved with a Gaussian matrix to find local enrichment of high co-occurrence scores in the pair-wise space. Peaks in the convolved co-occurrence matrix are translated back to two genomic regions ( and ) that are annotated as being co-aberrated across the tumor set. e. For the n-th peak in the Convolved Co-occurrence Matrix (CCM) two gene sets, and , are defined, based on a 2σ window centered on the peak. f1. Using a protein-protein interaction database the interactions between gene sets derived from a single co-occurrence peak are analyzed, producing a set of interactions (). f2. Using the Cancer Gene Census we inspect the resulting gene sets for presence of known tumor-suppressor genes and oncogenes.

**Figure 3. Two co-occurring losses detected in the 2Mb scale analysis.**
Raw aCGH data of two co-occurring losses corresponding to four genomic loci are shown. The y-axis of the heatmaps contains the samples, ordered through standard hierarchical clustering. The x-axis contains the probes present in the four genomic loci, ordered by genomic location. The sample information bar contains the names of the cell lines analyzed, the disease of origin and the whether the sample has a T-cell or B-cell lineage. These representations are based on the results of the analysis on the 2 Mb scale.

**Figure 4. Significance of finding direct interactions in co-occurring genomic loci.**
For two scales the top 50 co-occurring gene lists for the gain-gain, loss-loss and loss-gain situations were compared to a random set of 100 pairs of genomic loci. For each genomic pair two gene sets were queried for direct interactions using the STRING database. Significance was ascertained using Fisher's exact test on the ratios between all genes and the interacting genes for the co-occurrence gene sets versus the random gene set.

**Figure 5. Networks of co-occurring gain and loss.**
The networks that result from hierarchical clustering of Scale 2 results are shown in different panels. Each panel represents either the gain-gain, loss-loss or gain-loss analysis. The resultant network is visualized using the Cytoscape software package (www.cytoscape.org). Edge thickness scales according to the number of co-occurrence links found between the two genomic loci. The size of the nodes is proportional to the highest rank found among the different individual loci that constitute a node. If only one genomic location is present in a node, i.e. this location did not cluster with any other locations, it is colored gray. The cancer gene enrichment among all genes mapping to the locations described by the nodes is shown in the top right hand corner. P-values are determined by Fisher's Exact test. The functional interaction enrichment of all genes between nodes that are linked with an edge is represented in the lower right hand corner of each panel. P-values are determined using Fishers' Exact test, with randomly generated pairs of loci representing the null hypothesis.

**Figure 6. The gain-gain core network.**
a. The reduced core network for the gain-gain analysis obtained by pruning all edges with less than 5% support in the top 500 list of the Scale 2 analysis. Edge thickness and label represent the number of functional interactions between genes associated with the nodes being connected based on the STRING database. The oncogenes as defined by the Cancer Gene Census that map within the regions defined by the nodes are shown in rectangular insets. b. Representation of the 10 most enriched Ingenuity terms associated with the entire collection of genes in the core network that have a STRING interaction along the edges. The x-axis shows the −log transformed p value, corrected by the Benjamini Hochberg procedure as implemented in the Ingenuity software. c. Functional interaction enrichment is shown as a bar graph, which represent the ratio of interacting genes with respect to the total number of genes. P-values are determined using a Fishers' Exact test with randomly selected pairs of loci representing the null hypothesis.

**Figure 7. The loss-loss core network.**
a. The reduced core network for the loss-loss analysis determined by pruning all edges with less than 5% support in the top 500 list of the Scale 2 analysis. Edge thickness and label represent the number of functional interactions between genes associated with the nodes being connected based on the STRING database. The tumor suppressor genes as defined by the Cancer Gene Census that map within the regions defined by the nodes are shown in rectangular insets. b. Representation of the 10 most enriched Ingenuity terms associated with the entire collection of genes that have a STRING interaction between the 17p region and 9p, 9q, 13q, 16q or 22q as determined by the Ingenuity software. The x-axis shows the −log transformed p value, corrected by the Benjamini Hochberg procedure as implemented in the Ingenuity software. c. Functional interaction enrichment is shown as a bar graph, which represent the ratio of interacting genes with respect to the total number of genes. P-values are determined using a Fishers' Exact test with randomly selected pairs of loci representing the null hypothesis. d. A functional interaction network around the nuclear co-repressor *NCOR1* (also known as *TRAC1*) is shown. This network is a part of the network of interactors derived from the 17p interacting regions after removal of the canonical cancer genes *TP53*, *RB1*, *CDKN2A* and *CDKN2B* from the analysis. e. Illustration of the retroviral insertions mapped near *CBFA2T3*, recovered in a large screen of MuLV retroviral mutagenesis . Insertions are shown as triangles. Blue triangles indicate insertions in the direction of transcription (plus), red triangles indicate insertions in the anti-transcription direction (minus). Insertions linked by dashed boxes are bi-allelic integrations recovered from the same tumor.

See this image and copyright information in PMC

Cited by

Chromosomal Instability, Selection and Competition: Factors That Shape the Level of Karyotype Intra-Tumor Heterogeneity.
van den Bosch T, Derks S, Miedema DM. van den Bosch T, et al. Cancers (Basel). 2022 Oct 12;14(20):4986. doi: 10.3390/cancers14204986. Cancers (Basel). 2022. PMID: 36291770 Free PMC article. Review.
Discovery of cancer common and specific driver gene sets.
Zhang J, Zhang S. Zhang J, et al. Nucleic Acids Res. 2017 Jun 2;45(10):e86. doi: 10.1093/nar/gkx089. Nucleic Acids Res. 2017. PMID: 28168295 Free PMC article.
Discovery of co-occurring driver pathways in cancer.
Zhang J, Wu LY, Zhang XS, Zhang S. Zhang J, et al. BMC Bioinformatics. 2014 Aug 9;15(1):271. doi: 10.1186/1471-2105-15-271. BMC Bioinformatics. 2014. PMID: 25106096 Free PMC article.
KC-SMARTR: An R package for detection of statistically significant aberrations in multi-experiment aCGH data.
de Ronde JJ, Klijn C, Velds A, Holstege H, Reinders MJ, Jonkers J, Wessels LF. de Ronde JJ, et al. BMC Res Notes. 2010 Nov 11;3:298. doi: 10.1186/1756-0500-3-298. BMC Res Notes. 2010. PMID: 21070656 Free PMC article.
CDCOCA: a statistical method to define complexity dependence of co-occuring chromosomal aberrations.
Kumar N, Rehrauer H, Cai H, Baudis M. Kumar N, et al. BMC Med Genomics. 2011 Mar 3;4:21. doi: 10.1186/1755-8794-4-21. BMC Med Genomics. 2011. PMID: 21371302 Free PMC article.

See all "Cited by" articles

References

1. Hanahan D, Weinberg RA. The Hallmarks of Cancer. Cell. 2000;100:57–70. - PubMed
1. Michor F, Iwasa Y, Nowak M. Dynamics of cancer progression. Nat Rev Cancer. 2004;4:197–205. - PubMed
1. de Ridder J, Kool J, Uren A, Bot J, Wessels L, et al. Co-occurrence analysis of insertional mutagenesis data reveals cooperating oncogenes. Bioinformatics. 2007;23:i133. - PubMed
1. Thomas R, Baker A, DeBiasi R, Winckler W, LaFramboise T, et al. High-throughput oncogene mutation profiling in human cancer. Nat Genet. 2007;39:347–351. - PubMed
1. Rajagopalan H, Nowak M, Vogelstein B, Lengauer C. The significance of unstable chromosomes in colorectal cancer. Nat Rev Cancer. 2003;3:695–701. - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions
Actions

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Identification of networks of co-occurring, tumor-related DNA copy number changes using a genome-wide scoring approach

Affiliation

Identification of networks of co-occurring, tumor-related DNA copy number changes using a genome-wide scoring approach

Authors

Affiliation

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Related information

LinkOut - more resources

Full Text Sources