. 2011 Aug 1:5:121.

doi: 10.1186/1752-0509-5-121.

Inferring causal genomic alterations in breast cancer using gene expression data

Linh M Tran¹, Bin Zhang, Zhan Zhang, Chunsheng Zhang, Tao Xie, John R Lamb, Hongyue Dai, Eric E Schadt, Jun Zhu

Affiliations

PMID: 21806811
PMCID: PMC3162519
DOI: 10.1186/1752-0509-5-121

Inferring causal genomic alterations in breast cancer using gene expression data

Linh M Tran et al. BMC Syst Biol. 2011.

. 2011 Aug 1:5:121.

doi: 10.1186/1752-0509-5-121.

Authors

Linh M Tran¹, Bin Zhang, Zhan Zhang, Chunsheng Zhang, Tao Xie, John R Lamb, Hongyue Dai, Eric E Schadt, Jun Zhu

Affiliation

¹ Sage Bionetworks, Seattle, WA 98109, USA.

PMID: 21806811
PMCID: PMC3162519
DOI: 10.1186/1752-0509-5-121

Abstract

Background: One of the primary objectives in cancer research is to identify causal genomic alterations, such as somatic copy number variation (CNV) and somatic mutations, during tumor development. Many valuable studies lack genomic data to detect CNV; therefore, methods that are able to infer CNVs from gene expression data would help maximize the value of these studies.

Results: We developed a framework for identifying recurrent regions of CNV and distinguishing the cancer driver genes from the passenger genes in the regions. By inferring CNV regions across many datasets we were able to identify 109 recurrent amplified/deleted CNV regions. Many of these regions are enriched for genes involved in many important processes associated with tumorigenesis and cancer progression. Genes in these recurrent CNV regions were then examined in the context of gene regulatory networks to prioritize putative cancer driver genes. The cancer driver genes uncovered by the framework include not only well-known oncogenes but also a number of novel cancer susceptibility genes validated via siRNA experiments.

Conclusions: To our knowledge, this is the first effort to systematically identify and validate drivers for expression based CNV regions in breast cancer. The framework where the wavelet analysis of copy number alteration based on expression coupled with the gene regulatory network analysis, provides a blueprint for leveraging genomic data to identify key regulatory components and gene targets. This integrative approach can be applied to many other large-scale gene expression studies and other novel types of cancer data such as next-generation sequencing based expression (RNA-Seq) as well as CNV data.

PubMed Disclaimer

Figures

**Figure 1**
**A framework for integrating wavelet based CNV inference and gene network analysis**. The samples in a given gene expression study are first partitioned into two groups based on phenotypes such as poor versus good outcome, followed by differential expression analysis (t-test) to yields expression scores (ES t-statistics). Wavelet analysis is then performed on ES' ordered by gene chromosomal locations to detect significant consecutive regions (called inferred CNV regions). Using the same gene expression data, a gene regulatory network (Bayesian network) is constructed. Finally, the inferred CNV regions and the Bayesian network are input to the key driver analysis to identify potential cancer driver genes.

**Figure 2**
**Outline of the WACE algorithm**. For a given gene expression dataset, the samples are classified into two groups based on phenotypes such as poor versus good outcome and the genes are ordered based on their physical location on chromosomes. Expression scores (ES, t-statistics) for all the genes are computed and then subjected to wavelet transform to obtain smoothed ES, called neighboring score (NS). The significance (false discovery rate, FDR) of NS on each individual chromosome is empirically approximated based on its null distribution by performing the same wavelet transform on "random" ES's based on the randomized samples. A segment containing at least n consecutive positive/negative NS with FDR ≤ 0.01 is defined as an inferred CNV region. ICNV regions from multiple datasets are finally aligned to determine the recurrent regions of CNV.

**Figure 3**
**Cancer driver genes versus passenger genes**. Two candidate genes C1 and C2 are located on an inferred CNV region, and are likely cis regulated by a CNV. To distinguish potential drivers from passengers, we test whether they can causally regulate other genes. To determine whether a downstream gene G is regulated by C1 or/and C2 or neither is equivalent to selection among four competing models. Given that many other potential genes or factors affect the expression level of gene G, the models conditioning on all other factors are needed to evaluate, i.e., a Bayesian network reconstruction process.

**Figure 4**
**NS profiles on chromosome 8 from the four independent breast cancer studies using WACE**. Red lines encode identified CNV regions. *MYC* neighborhood region (121-133 Mb) is detected to be amplified in two studies by WACE but only in one study by GACE.

**Figure 5**
**Validation of predicted key drivers by testing the enrichment of siRNA hit signatures in various gene sets**. Two siRNA hit signatures, V1 (an across cell line signature with 216 genes) and V2 (with 484 genes as a union of the signatures across cell lines and from the individual cell lines), were tested for enrichment in the following gene sets: the genes not on the recurrent regions (nonRR), the genes on the amplified ICNV regions (RR Gain), the non-driver genes (non Drivers) on the amplified regions, the drivers (global and local), the local drivers and the global drivers. The drivers and non-drivers were based on the key driver analysis on the amplified regions and the Bayesian network. A) and B) shows the fold enrichment of V1 and V2 signatures in the gene sets, respectively. The genes on the recurrent ICNV regions are more than twice more likely to enriched for the siRNA signatures than the genes not on the recurrent regions while the global drivers has the highest likelihood (7.8 and 5.9), followed by the drivers (6.8 and 5.4), the local drivers (4.8 and 4.4), and the non drivers (2 and 1.7).

**Figure 6**
**A regulatory network for the genes on the amplified recurrent ICNV regions**. The key driver analysis was applied to the genes on the regions and the combined Bayesian network to identify potential key drivers. The largest nodes in the network are the global drivers and the medium size nodes are local drivers while the non-drivers are the smallest nodes. The genes found in the siRNA signature V1 are highlighted in green.

See this image and copyright information in PMC

Cited by

Decrease of mRNA Editing after Spinal Cord Injury is Caused by Down-regulation of ADAR2 that is Triggered by Inflammatory Response.
Di Narzo AF, Kozlenkov A, Ge Y, Zhang B, Sanelli L, May Z, Li Y, Fouad K, Cardozo C, Koonin EV, Bennett DJ, Dracheva S. Di Narzo AF, et al. Sci Rep. 2015 Jul 30;5:12615. doi: 10.1038/srep12615. Sci Rep. 2015. PMID: 26223940 Free PMC article.
Applying Expression Profile Similarity for Discovery of Patient-Specific Functional Mutations.
Meng G. Meng G. High Throughput. 2018 Feb 22;7(1):6. doi: 10.3390/ht7010006. High Throughput. 2018. PMID: 29485617 Free PMC article.
Identifying high-risk multiple myeloma patients: A novel approach using a clonal gene signature.
Li JR, Wang C, Cheng C. Li JR, et al. Int J Cancer. 2024 Nov 1;155(9):1684-1695. doi: 10.1002/ijc.35057. Epub 2024 Jun 14. Int J Cancer. 2024. PMID: 38874435
Reverse engineering biomolecular systems using -omic data: challenges, progress and opportunities.
Quo CF, Kaddi C, Phan JH, Zollanvari A, Xu M, Wang MD, Alterovitz G. Quo CF, et al. Brief Bioinform. 2012 Jul;13(4):430-45. doi: 10.1093/bib/bbs026. Brief Bioinform. 2012. PMID: 22833495 Free PMC article.
Systems analysis of eleven rodent disease models reveals an inflammatome signature and key drivers.
Wang IM, Zhang B, Yang X, Zhu J, Stepaniants S, Zhang C, Meng Q, Peters M, He Y, Ni C, Slipetz D, Crackower MA, Houshyar H, Tan CM, Asante-Appiah E, O'Neill G, Luo MJ, Thieringer R, Yuan J, Chiu CS, Lum PY, Lamb J, Boie Y, Wilkinson HA, Schadt EE, Dai H, Roberts C. Wang IM, et al. Mol Syst Biol. 2012 Jul 17;8:594. doi: 10.1038/msb.2012.24. Mol Syst Biol. 2012. PMID: 22806142 Free PMC article.

See all "Cited by" articles

References

1. Hebbring SJ, Moyer AM, Weinshilboum RM. Sulfotransferase gene copy number variation: pharmacogenetics and function. Cytogenet Genome Res. 2008;123(1-4):205–10. doi: 10.1159/000184710. - DOI - PMC - PubMed
1. Mullighan CG, Downing JR. Genome-wide profiling of genetic alterations in acute lymphoblastic leukemia: recent insights and future directions. Leukemia. 2009. - PubMed
1. Overdevest JB, Theodorescu D, Lee JK. Utilizing the molecular gateway: the path to personalized cancer management. Clin Chem. 2009;55(4):684–971. doi: 10.1373/clinchem.2008.118554. - DOI - PMC - PubMed
1. Swanton C, Caldas C. Molecular classification of solid tumours: towards pathway-driven therapeutics. Br J Cancer. 2009. - PMC - PubMed
1. van 't Veer LJ. et al.Gene expression profiling predicts clinical outcome of breast cancer. Nature. 2002;415(6871):530–6. doi: 10.1038/415530a. - DOI - PubMed

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
Medical
- MedlinePlus Health Information

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Inferring causal genomic alterations in breast cancer using gene expression data

Affiliation

Inferring causal genomic alterations in breast cancer using gene expression data

Authors

Affiliation

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Other Literature Sources

Medical

Abstract

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

Substances

Related information

LinkOut - more resources

Full Text Sources

Other Literature Sources

Medical