Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2013;8(1):e53014.
doi: 10.1371/journal.pone.0053014. Epub 2013 Jan 30.

Identifying in-trans process associated genes in breast cancer by integrated analysis of copy number and expression data

Affiliations

Identifying in-trans process associated genes in breast cancer by integrated analysis of copy number and expression data

Miriam Ragle Aure et al. PLoS One. 2013.

Abstract

Genomic copy number alterations are common in cancer. Finding the genes causally implicated in oncogenesis is challenging because the gain or loss of a chromosomal region may affect a few key driver genes and many passengers. Integrative analyses have opened new vistas for addressing this issue. One approach is to identify genes with frequent copy number alterations and corresponding changes in expression. Several methods also analyse effects of transcriptional changes on known pathways. Here, we propose a method that analyses in-cis correlated genes for evidence of in-trans association to biological processes, with no bias towards processes of a particular type or function. The method aims to identify cis-regulated genes for which the expression correlation to other genes provides further evidence of a network-perturbing role in cancer. The proposed unsupervised approach involves a sequence of statistical tests to systematically narrow down the list of relevant genes, based on integrative analysis of copy number and gene expression data. A novel adjustment method handles confounding effects of co-occurring copy number aberrations, potentially a large source of false positives in such studies. Applying the method to whole-genome copy number and expression data from 100 primary breast carcinomas, 6373 genes were identified as commonly aberrant, 578 were highly in-cis correlated, and 56 were in addition associated in-trans to biological processes. Among these in-trans process associated and cis-correlated (iPAC) genes, 28% have previously been reported as breast cancer associated, and 64% as cancer associated. By combining statistical evidence from three separate subanalyses that focus respectively on copy number, gene expression and the combination of the two, the proposed method identifies several known and novel cancer driver candidates. Validation in an independent data set supports the conclusion that the method identifies genes implicated in cancer.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: One author (Zohar Yakhini) is affiliated with Agilent Laboratories, Tel Aviv, Israel. This does not alter the authors‚ adherence to all the PLOS ONE policies on sharing data and materials.

Figures

Figure 1
Figure 1. Workflow of the proposed method to identify iPAC genes.
(1) Starting with all genes, the commonly aberrant genes are selected as those that have more than 10% gains or losses; (2) Next, those genes which in addition have an in-cis Pearson correlation above 0.6 are selected and referred to as in-cis genes; (3) Finally, statistical enrichment analysis is performed to assess in-trans functionality, leading to identification of the 56 iPAC genes.
Figure 2
Figure 2. Copy number aberrations and in-cis correlations.
The frequency of samples with gains (red) and losses (green) is shown at the top. Each gray point shows the level of in-cis correlation between copy number and expression for a particular gene. The chromosomal positions of the genes selected in our workflow are shown at the bottom. This includes commonly aberrant genes (n = 6373; upper band), in-cis genes (n = 578; middle band), and the iPAC genes (n = 56; lower band). Colors indicate whether the gene is most frequently amplified (red) or deleted (green).
Figure 3
Figure 3. Association between expression and copy number.
Linear regression of log-expression as a function of log-copy number for four selected iPAC genes.
Figure 4
Figure 4. Effect of using copy number-adjusted residual expression.
(A) Comparison of in-trans correlations calculated with and without adjustment for in-cis correlation, i.e. copy number-adjusted-residual expression. In each panel, the x-axis represents the in-trans correlation without adjustment for in-cis correlation, and the y-axis represents the in-trans correlation with adjustment for in-cis correlation. The diagonal lines extend from (−1,−1) to (1, 1). Each point represents one pair of genes among all the 578×25,688 gene pairs (G, g) where G is an in-cis gene and g denotes any gene; (I) All pairs for which G and g are either on different chromosomes or on the same chromosome but on different arms; (II) All pairs for which G and g are within a distance of 30 Mb from each other; (III) All pairs for which G and g are within a distance of 5 Mb from each other; (IV) All pairs for which G and g are within a distance of 1 Mb from each other. (B) The copy number-adjusted residual expression as a function of the non-adjusted expression, in log space. Shown here are the expression levels for six genes with an in-cis correlation ranging from 0 to 0.9. Each dot represents one breast cancer patient. The effect of copy number-adjusted-residual expression increases with increasing in-cis correlation level. The dotted line is the diagonal, and the solid line is the regression line.
Figure 5
Figure 5. Effect of residual expression.
Correlation plots showing how the level of high-level in-trans correlations change across the genome with and without copy number-adjusted residual expression correlation. Red dots signify positive in-trans Pearson correlation above 0.6, and green dots signify negative in-trans Pearson correlation below −0.6. The x-axis shows the genomic positions of all 25,688 genes and the y-axis represents the genomic position of the 578 in-cis genes. (A) High in-trans correlations between expression of in-cis genes to expression of all genes. (B) High in-trans correlations between expression of in-cis genes to residual expression of all genes. (C) High in-trans correlations between copy number of in-cis genes to the expression of all genes.
Figure 6
Figure 6. Enrichment of the Cell Cycle Process GO term in ATAD2 correlated genes.
All genes were ranked according to the level of correlation between their copy number-adjusted-residual expression profile and the expression levels of ATAD2 (pivot for this analysis). The heatmap represents the expression levels of all 25,688 genes after ranking them according to the criteria mentioned above and after sorting the samples according to ATAD2 expression levels. Top panel in blue and red presents the expression and copy number levels of ATAD2 across the 100 samples, respectively. The graph shows the significance level in –log(hypergeometric p-value) of cell cycle process genes in the ranked list of genes. Optimal enrichment is attained at the top 189 genes, with 14 times more cell cycle process genes than would be expected by chance (mHG formula image).
Figure 7
Figure 7. Associations between iPAC genes and traits (biological processes).
A hierarchical clustered heatmap representation of traits associated with at least four iPAC genes. A red entry indicates a significant association between an iPAC gene and the corresponding trait (see Figure S4 for all the significant associations). The Expander suite using average Euclidian distance was used to calculate and visualize the hierarchical clustering analysis.
Figure 8
Figure 8. Distribution of in-cis correlation levels between copy number and expression in the MicMa and UNC cohorts.
Green bins in the histogram show distribution of in-cis correlation levels of all genes in the data set, while red bins show the distribution for only the identified iPAC genes. The left-hand y-axes in each histogram show the count in each bin among all genes, and the right-hand axes show the count for iPAC genes in each bin. (A) Distribution of the in-cis correlation levels in the MicMa cohort. (B) Distribution of the in-cis correlation levels in the UNC cohort. The iPAC genes were inferred from the MicMa cohort.
Figure 9
Figure 9. Association consistency of iPAC genes in the validation cohort.
Blue dots represent associations between an iPAC gene and a GO term. The blue dots are plotted according to the level of association, as signed –log(p-value), in the MicMa cohort (x-axis) and in the UNC cohort (y-axis), where signed –log(p-value) refers to –log(mHG p-value) for positive associations and log(mHG p-value) for negative associations. A monotone relation is observed, supporting the iPAC behavior of the MicMa inferred iPAC genes in the validation cohort. A bar with a red dot in the center is plotted for each blue dot representing 1 standard deviation (SD) of the associations generated by associating 100 random genes from the UNC cohort to the relevant GO term.

References

    1. Hanahan D, Weinberg RA (2000) The Hallmarks of Cancer. Cell 100: 57–70. - PubMed
    1. Hanahan D, Weinberg Robert A (2011) Hallmarks of Cancer: The Next Generation. Cell 144: 646–674. - PubMed
    1. Russnes HG, Vollan HKM, Lingjærde OC, Krasnitz A, Lundin P, et al. (2010) Genomic Architecture Characterizes Tumor Progression Paths and Fate in Breast Cancer Patients. Science Translational Medicine 2: 38ra47. - PMC - PubMed
    1. Inaki K, Hillmer AM, Ukil L, Yao F, Woo XY, et al. (2011) Transcriptional consequences of genomic structural aberrations in breast cancer. Genome Research 21: 676–687. - PMC - PubMed
    1. Hyman E, Kauraniemi Pi, Hautaniemi S, Wolf M, Mousses S, et al. (2002) Impact of DNA Amplification on Gene Expression Patterns in Breast Cancer. Cancer Research 62: 6240–6245. - PubMed

Publication types