Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2011 Aug 1:5:121.
doi: 10.1186/1752-0509-5-121.

Inferring causal genomic alterations in breast cancer using gene expression data

Affiliations

Inferring causal genomic alterations in breast cancer using gene expression data

Linh M Tran et al. BMC Syst Biol. .

Abstract

Background: One of the primary objectives in cancer research is to identify causal genomic alterations, such as somatic copy number variation (CNV) and somatic mutations, during tumor development. Many valuable studies lack genomic data to detect CNV; therefore, methods that are able to infer CNVs from gene expression data would help maximize the value of these studies.

Results: We developed a framework for identifying recurrent regions of CNV and distinguishing the cancer driver genes from the passenger genes in the regions. By inferring CNV regions across many datasets we were able to identify 109 recurrent amplified/deleted CNV regions. Many of these regions are enriched for genes involved in many important processes associated with tumorigenesis and cancer progression. Genes in these recurrent CNV regions were then examined in the context of gene regulatory networks to prioritize putative cancer driver genes. The cancer driver genes uncovered by the framework include not only well-known oncogenes but also a number of novel cancer susceptibility genes validated via siRNA experiments.

Conclusions: To our knowledge, this is the first effort to systematically identify and validate drivers for expression based CNV regions in breast cancer. The framework where the wavelet analysis of copy number alteration based on expression coupled with the gene regulatory network analysis, provides a blueprint for leveraging genomic data to identify key regulatory components and gene targets. This integrative approach can be applied to many other large-scale gene expression studies and other novel types of cancer data such as next-generation sequencing based expression (RNA-Seq) as well as CNV data.

PubMed Disclaimer

Figures

Figure 1
Figure 1
A framework for integrating wavelet based CNV inference and gene network analysis. The samples in a given gene expression study are first partitioned into two groups based on phenotypes such as poor versus good outcome, followed by differential expression analysis (t-test) to yields expression scores (ES t-statistics). Wavelet analysis is then performed on ES' ordered by gene chromosomal locations to detect significant consecutive regions (called inferred CNV regions). Using the same gene expression data, a gene regulatory network (Bayesian network) is constructed. Finally, the inferred CNV regions and the Bayesian network are input to the key driver analysis to identify potential cancer driver genes.
Figure 2
Figure 2
Outline of the WACE algorithm. For a given gene expression dataset, the samples are classified into two groups based on phenotypes such as poor versus good outcome and the genes are ordered based on their physical location on chromosomes. Expression scores (ES, t-statistics) for all the genes are computed and then subjected to wavelet transform to obtain smoothed ES, called neighboring score (NS). The significance (false discovery rate, FDR) of NS on each individual chromosome is empirically approximated based on its null distribution by performing the same wavelet transform on "random" ES's based on the randomized samples. A segment containing at least n consecutive positive/negative NS with FDR ≤ 0.01 is defined as an inferred CNV region. ICNV regions from multiple datasets are finally aligned to determine the recurrent regions of CNV.
Figure 3
Figure 3
Cancer driver genes versus passenger genes. Two candidate genes C1 and C2 are located on an inferred CNV region, and are likely cis regulated by a CNV. To distinguish potential drivers from passengers, we test whether they can causally regulate other genes. To determine whether a downstream gene G is regulated by C1 or/and C2 or neither is equivalent to selection among four competing models. Given that many other potential genes or factors affect the expression level of gene G, the models conditioning on all other factors are needed to evaluate, i.e., a Bayesian network reconstruction process.
Figure 4
Figure 4
NS profiles on chromosome 8 from the four independent breast cancer studies using WACE. Red lines encode identified CNV regions. MYC neighborhood region (121-133 Mb) is detected to be amplified in two studies by WACE but only in one study by GACE.
Figure 5
Figure 5
Validation of predicted key drivers by testing the enrichment of siRNA hit signatures in various gene sets. Two siRNA hit signatures, V1 (an across cell line signature with 216 genes) and V2 (with 484 genes as a union of the signatures across cell lines and from the individual cell lines), were tested for enrichment in the following gene sets: the genes not on the recurrent regions (nonRR), the genes on the amplified ICNV regions (RR Gain), the non-driver genes (non Drivers) on the amplified regions, the drivers (global and local), the local drivers and the global drivers. The drivers and non-drivers were based on the key driver analysis on the amplified regions and the Bayesian network. A) and B) shows the fold enrichment of V1 and V2 signatures in the gene sets, respectively. The genes on the recurrent ICNV regions are more than twice more likely to enriched for the siRNA signatures than the genes not on the recurrent regions while the global drivers has the highest likelihood (7.8 and 5.9), followed by the drivers (6.8 and 5.4), the local drivers (4.8 and 4.4), and the non drivers (2 and 1.7).
Figure 6
Figure 6
A regulatory network for the genes on the amplified recurrent ICNV regions. The key driver analysis was applied to the genes on the regions and the combined Bayesian network to identify potential key drivers. The largest nodes in the network are the global drivers and the medium size nodes are local drivers while the non-drivers are the smallest nodes. The genes found in the siRNA signature V1 are highlighted in green.

Similar articles

Cited by

References

    1. Hebbring SJ, Moyer AM, Weinshilboum RM. Sulfotransferase gene copy number variation: pharmacogenetics and function. Cytogenet Genome Res. 2008;123(1-4):205–10. doi: 10.1159/000184710. - DOI - PMC - PubMed
    1. Mullighan CG, Downing JR. Genome-wide profiling of genetic alterations in acute lymphoblastic leukemia: recent insights and future directions. Leukemia. 2009. - PubMed
    1. Overdevest JB, Theodorescu D, Lee JK. Utilizing the molecular gateway: the path to personalized cancer management. Clin Chem. 2009;55(4):684–971. doi: 10.1373/clinchem.2008.118554. - DOI - PMC - PubMed
    1. Swanton C, Caldas C. Molecular classification of solid tumours: towards pathway-driven therapeutics. Br J Cancer. 2009. - PMC - PubMed
    1. van 't Veer LJ. et al.Gene expression profiling predicts clinical outcome of breast cancer. Nature. 2002;415(6871):530–6. doi: 10.1038/415530a. - DOI - PubMed

Publication types

Substances