Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2015 Jan 1;31(1):84-93.
doi: 10.1093/bioinformatics/btu603. Epub 2014 Sep 5.

Snowball: resampling combined with distance-based regression to discover transcriptional consequences of a driver mutation

Affiliations

Snowball: resampling combined with distance-based regression to discover transcriptional consequences of a driver mutation

Yaomin Xu et al. Bioinformatics. .

Abstract

Motivation: Large-scale cancer genomic studies, such as The Cancer Genome Atlas (TCGA), have profiled multidimensional genomic data, including mutation and expression profiles on a variety of cancer cell types, to uncover the molecular mechanism of cancerogenesis. More than a hundred driver mutations have been characterized that confer the advantage of cell growth. However, how driver mutations regulate the transcriptome to affect cellular functions remains largely unexplored. Differential analysis of gene expression relative to a driver mutation on patient samples could provide us with new insights in understanding driver mutation dysregulation in tumor genome and developing personalized treatment strategies.

Results: Here, we introduce the Snowball approach as a highly sensitive statistical analysis method to identify transcriptional signatures that are affected by a recurrent driver mutation. Snowball utilizes a resampling-based approach and combines a distance-based regression framework to assign a robust ranking index of genes based on their aggregated association with the presence of the mutation, and further selects the top significant genes for downstream data analyses or experiments. In our application of the Snowball approach to both synthesized and TCGA data, we demonstrated that it outperforms the standard methods and provides more accurate inferences to the functional effects and transcriptional dysregulation of driver mutations.

Availability and implementation: R package and source code are available from CRAN at http://cran.r-project.org/web/packages/DESnowball, and also available at http://bioinfo.mc.vanderbilt.edu/DESnowball/.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
Schematic demonstration of the difference between gene-by-gene and distance-based analyses. Gene expression profiles of multiple cancer patients are measured in two mutation groups (mutated or wild type) and additionally, those patients are under different disease predispositions (d1 and d2) such that gene expression profiles of the target genes perturbed by the mutation will show different expression profiles between the mutation groups. Although all genes (g1–g5) could clearly distinguish the mutation and wild type groups based on their co-expression profiles, we would miss the majority of them (g1, g3 and g5) if we applied a gene-by-gene analysis, due to the small marginal differences of those genes between the mutation and wild type groups
Fig. 2.
Fig. 2.
Snowball workflow. Firstly, a whole genome gene expression matrix combined with a known driver mutation measurement on a patient cohort is resampled on gene dimension to generated B number of matrices with a fixed number of d genes, each containing a specific gene Xi. Secondly, the resulting matrices are further resampled on the sample dimension to obtain an equal number of samples within each group. Thirdly, distanced-based regression is applied to evaluate the association of genes in each subsampled matrix with respect to the mutation status of the corresponding patients, and the resulting association scores of gene Xi are augmented to calculate the aggregated association score Jn. Lastly, the robust distance measurements are calculated based on the aggregated score Jn, and the significantly outstanding genes with more extreme Jn values from the genome background are selected
Fig. 3.
Fig. 3.
Comparison of performances of Snowball, Random forests and LIMMA based approaches using simulated expression dataset. This figure shows the ROC curve for scenario 5 in Table 1
Fig. 4.
Fig. 4.
Comparison of results using case study data by Snowball, Random forests and LIMMA. (A) Venn diagram of the top 400 genes detected from three approaches. (B) Comparison of the overlap with BRAF knockdown experiment results on top 1000 genes
Fig. 5.
Fig. 5.
Multiple driver mutations in TCGA melanoma metastasis samples and Snowball application. (A) Driver mutation profiles of BRAF, NRAS and CDKN2A. Red, blue and green indicate amplification, deletion and mutation, respectively. (B) Comparison of the Snowball results of the three driver mutations

Similar articles

Cited by

References

    1. Alexandrov LB, et al. Signatures of mutational processes in human cancer. Nature. 2013;500:415–421. - PMC - PubMed
    1. Altucci L, et al. RAR and RXR modulation in cancer and metabolic disease. Nat. Rev. Drug Discov. 2007;6:793–810. - PubMed
    1. Bollag G, et al. Clinical efficacy of a RAF inhibitor needs broad target blockade in BRAF-mutant melanoma. Nature. 2010;467:596–599. - PMC - PubMed
    1. Borlak J, Jenke HS. Cross-talk between aryl hydrocarbon receptor and mitogen-activated protein kinase signaling pathway in liver cancer through C-RAF transcriptional regulation. Mol. Cancer Res. 2008;6:1326–1336. - PubMed
    1. Breiman L. Random forests. Mach. Learn. 2001;45:5–32.

Publication types

Substances