Identification of cancer genes using a statistical framework for multiexperiment analysis of nondiscretized array CGH data

Christiaan Klijn¹, Henne Holstege, Jeroen de Ridder, Xiaoling Liu, Marcel Reinders, Jos Jonkers, Lodewyk Wessels

Affiliations

PMID: 18187509
PMCID: PMC2241875
DOI: 10.1093/nar/gkm1143

Identification of cancer genes using a statistical framework for multiexperiment analysis of nondiscretized array CGH data

Christiaan Klijn et al. Nucleic Acids Res. 2008 Feb.

. 2008 Feb;36(2):e13.

doi: 10.1093/nar/gkm1143. Epub 2008 Jan 10.

Authors

Christiaan Klijn¹, Henne Holstege, Jeroen de Ridder, Xiaoling Liu, Marcel Reinders, Jos Jonkers, Lodewyk Wessels

Affiliation

¹ Netherlands Cancer Institute, Division of Molecular Biology, Plesmanlaan 121 1066 CX Amsterdam, The Netherlands.

PMID: 18187509
PMCID: PMC2241875
DOI: 10.1093/nar/gkm1143

Erratum in

Nucleic Acids Res. 2008 Apr;36(6):2106

Abstract

Tumor formation is in part driven by DNA copy number alterations (CNAs), which can be measured using microarray-based Comparative Genomic Hybridization (aCGH). Multiexperiment analysis of aCGH data from tumors allows discovery of recurrent CNAs that are potentially causal to cancer development. Until now, multiexperiment aCGH data analysis has been dependent on discretization of measurement data to a gain, loss or no-change state. Valuable biological information is lost when a heterogeneous system such as a solid tumor is reduced to these states. We have developed a new approach which inputs nondiscretized aCGH data to identify regions that are significantly aberrant across an entire tumor set. Our method is based on kernel regression and accounts for the strength of a probe's signal, its local genomic environment and the signal distribution across multiple tumors. In an analysis of 89 human breast tumors, our method showed enrichment for known cancer genes in the detected regions and identified aberrations that are strongly associated with breast cancer subtypes and clinical parameters. Furthermore, we identified 18 recurrent aberrant regions in a new dataset of 19 p53-deficient mouse mammary tumors. These regions, combined with gene expression microarray data, point to known cancer genes and novel candidate cancer genes.

PubMed Disclaimer

Figures

**Figure 1.**
A schematic overview of KC-SMART. T1, T2 and T3 represent three arbitrary tumor samples. (a) Illustration of the nature of the data measured and how it is represented on the genome. The BAC clones are spaced along the genome where the sizes of the gaps depend on the platform used. Per BAC clone a log2 value is measured that is a representation of the CNA at that point on the genome. (b) The positive and negative log2 values in the data are separated and summed across tumors and per BAC clone. After summation the kernel convolution is applied and the Kernel Smoothed Estimate (KSE: blue line) is determined. (c) An overview for the method of determining statistical significance is shown here. First, the original log2 values are shuffled randomly within each tumor. After summation across tumors the KSE is computed. For both the gains and the losses a cumulative density function (CDF) of the detected peaks is calculated. By testing the significance level against this CDF a value is obtained, above which peaks are found to be significant. (d) Here the result of a genome-wide analysis is shown. The blue line is the KSE obtained from the data and the red line is the significance threshold at P = 0.05, which was determined in (c). (e) The scale space is constructed by arranging the significantly aberrant areas (visualized as blocks) in order of scale on the genomic position.

**Figure 2.**
Main properties of the flat top Gaussian function The BAC clone is depicted as a red rectangle. The amplitude of the kernel is determined by the summed log2 value of that BAC clone across all tumors. The red line is a representation of g_i.

**Figure 3.**
Results of KC-SMART analysis of 89 human breast tumor samples. The human tumor set was acquired from Chin *et al*. (22). Significant recurrent regions found by KC-SMART are shown in green. Significantly correlating genes from the Cancer Gene Census (CGC) list are shown below for each result. The cancer gene census list was split in CGC dominant genes (for gains) and CGC recessive genes (for losses). Black dotted lines represent the end of chromosomes; magenta dotted lines represent the centromere location.

**Figure 4.**
Comparison of KC-SMART to frequency-based analysis. (a) A genome-wide frequency analysis of copy-number changes is show (gray bars). On top of the frequency analysis the KC-SMART result for 20-Mb kernelwidth is plotted (orange line). The dataset used for analysis is the 89 breast tumor data published by Chin *et al*. The significance threshold for KC-SMART is shown as an orange dotted line. The 30% frequency level has been shown as a gray dotted line. The zoom-panel shows a magnification of chromosome 17. Here the result for KC-SMART at 4 Mb is shown in green. The green dotted line shows the significant threshold for 4-Mb kernelwidth. (b) Proportional Venn diagrams showing the overlap between results from both KC-SMART and the frequency analysis. Overlap is determined on the basis of probes in significant regions (KC-SMART) or in regions over 30% frequency. (c) Smoothed histograms of within-region BAC pair correlation coefficients.

**Figure 5.**
KC-SMART results on the p53-deficient mouse model mammary tumors. The y-axis represents the interpolated scale space running from 2 to 40 Mb. Black dotted lines represent the end of chromosomes.

**Figure 6.**
Expression and genomic profiles of the known cancer genes and the cancer gene candidates discovered in p53-deficient mouse mammary tumors. For each aberration the chromosome number is given as well as the average aberration BAC profile of the BAC clones in the aberrant region. The gene expression profiles of the genes that were selected based on their correlation with the BAC profile are depicted below each BAC profile. Green indicates downregulation/loss, red indicates overexpression/gain.

**Figure 7.**
Scale-space analysis of the chromosome 15 aberration. (a) Scale-space analysis of the complete chromosome. The outer black dotted lines denote the end of the chromosome. The inner black dotted lines denote the area shown in (b). (b) Zoom-in view of the scale-space analysis of chromosome 15. The BAC clones that are mapped to this region are shown as blue blocks. The genes that are situated in this region are depicted as red blocks. (c) Genes close to the region that is significant across all scales are shown in more detail. (d) This figure shows the heatmap of the BAC-clones shown in (b). Numbers along the horizontal axis correspond to the BAC clone numbers in (c). Positive log2 values as shown as red, negative log2 values as green. (e) Heatmap of the gene expression of genes *Myc*, *Ddef1* and *Adcy8*. Note: The tumors are now depicted along the horizontal direction, as opposed to (d), where the tumors are depicted in the vertical direction. Positive log2 values as shown as red, negative log2 values as green. No probe against *Pvt1* was present on the gene expression array. The two unknown-function transcripts overlapping with BAC clone 7 show equally uncorrelated expression profiles, and are denoted by I and II. (f) This figure shows a scale-space analysis of significant gains on chromosome 17. The analysis is from a set of 89 human breast tumors. (g) A scale-space analysis of chromosome 9 losses. The analysis is from a set of 89 human breast tumors.

See this image and copyright information in PMC

References

1. Hanahan D, Weinberg R. The hallmarks of Cancer. Cell. 2000;100:57–70. - PubMed
1. Myllykangas S. Manifestation, mechanisms and mysteries of gene amplifications. Cancer Lett. 2006;232:79–89. - PubMed
1. Kallioniemi A, Kallioniemi OP, Sudar D, Rutovitz D, Gray JW, Waldman F, Pinkel D. Comparative genomic hybridization for molecular cytogenetic analysis of solid tumors. Science. 1992;258:818–821. - PubMed
1. Pinkel D, Segraves R, Sudar D, Clark S, Poole I, Kowbel D, Collins C, Kuo WL, Chen C, et al. High resolution analysis of DNA copy number variation using comparative genomic hybridization to microarrays. Nat. Genet. 1998;20:207–211. - PubMed
1. Pollack JR, Perou CM, Alizadeh AA, Eisen MB, Pergamenschikov A, Williams CF, Jeffrey SS, Botstein D, Brown PO. Genome-wide analysis of DNA copy-number changes using cDNA microarrays. Nat. Genet. 1999;23:41–46. - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Molecular Biology Databases
- Mouse Genome Informatics (MGI)
- NIAID Data Ecosystem - Find datasets on Infectious and Immune-mediated Diseases
Research Materials
- NCI CPTC Antibody Characterization Program
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Identification of cancer genes using a statistical framework for multiexperiment analysis of nondiscretized array CGH data

Affiliation

Identification of cancer genes using a statistical framework for multiexperiment analysis of nondiscretized array CGH data

Authors

Affiliation

Erratum in

Abstract

Figures

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Molecular Biology Databases

Research Materials

Miscellaneous