. 2018 Jul 1;34(13):i404-i411.

doi: 10.1093/bioinformatics/bty232.

Driver gene mutations based clustering of tumors: methods and applications

Wensheng Zhang¹, Erik K Flemington², Kun Zhang¹

Affiliations

¹ Department of Computer Science, Bioinformatics Facility of Xavier NIH RCMI Cancer Research Center, Xavier University of Louisiana, New Orleans, LA, USA.
² Department of Pathology, Tulane School of Medicine, Tulane Cancer Center, Tulane University, New Orleans, LA, USA.

PMID: 29950003
PMCID: PMC6022677
DOI: 10.1093/bioinformatics/bty232

Driver gene mutations based clustering of tumors: methods and applications

Wensheng Zhang et al. Bioinformatics. 2018.

. 2018 Jul 1;34(13):i404-i411.

doi: 10.1093/bioinformatics/bty232.

Authors

Wensheng Zhang¹, Erik K Flemington², Kun Zhang¹

Affiliations

¹ Department of Computer Science, Bioinformatics Facility of Xavier NIH RCMI Cancer Research Center, Xavier University of Louisiana, New Orleans, LA, USA.
² Department of Pathology, Tulane School of Medicine, Tulane Cancer Center, Tulane University, New Orleans, LA, USA.

PMID: 29950003
PMCID: PMC6022677
DOI: 10.1093/bioinformatics/bty232

Abstract

Motivation: Somatic mutations in proto-oncogenes and tumor suppressor genes constitute a major category of causal genetic abnormalities in tumor cells. The mutation spectra of thousands of tumors have been generated by The Cancer Genome Atlas (TCGA) and other whole genome (exome) sequencing projects. A promising approach to utilizing these resources for precision medicine is to identify genetic similarity-based sub-types within a cancer type and relate the pinpointed sub-types to the clinical outcomes and pathologic characteristics of patients.

Results: We propose two novel methods, ccpwModel and xGeneModel, for mutation-based clustering of tumors. In the former, binary variables indicating the status of cancer driver genes in tumors and the genes' involvement in the core cancer pathways are treated as the features in the clustering process. In the latter, the functional similarities of putative cancer driver genes and their confidence scores as the 'true' driver genes are integrated with the mutation spectra to calculate the genetic distances between tumors. We apply both methods to the TCGA data of 16 cancer types. Promising results are obtained when these methods are compared to state-of-the-art approaches as to the associations between the determined tumor clusters and patient race (or survival time). We further extend the analysis to detect mutation-characterized transcriptomic prognostic signatures, which are directly relevant to the etiology of carcinogenesis.

Availability and implementation: R codes and example data for ccpwModel and xGeneModel can be obtained from http://webusers.xula.edu/kzhang/ISMB2018/ccpw_xGene_software.zip.

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

**Fig. 1.**
The flowchart of xGeneModel. In the heatmaps for matrices M1 and M3, grey and black colors indicate 0 and 1 elements, respectively. In the matrices M2, M4, M5, M6 and the vector W, the element values range from 0 to 1, indicated by a light-grey to black gradient

**Fig. 2.**
xGeneModel results for BLCA. In all the plots of this figure, the tumor clusters (groups) are consistently represented by red, green, blue and purple. **Top-left**: The dendrogram generated from the mutation-based clustering of tumors. **Top-middle**: the cluster-specific Kaplan–Meier survival curves. The P-value is calculated for the comparison between the aggregate of Cluster-1 (C1) and Cluster-2 (C2) and the aggregate of other two clusters. Cluster-3 is the one of our main interest, in which ∼95% of patients have a mutation in the TP53 gene and the survival profile is poorer than that of the other clusters with a modest significance (P = 0.09, logRank test). **Top-right**: The association between the tumor clusters and patient race. AN, BL and WH indicate Asian, black and white Americans, respectively. Beside each race ID is the corresponding number of tumor samples. **Bottom**: The mutation characteristics of individual clusters. The bar length denotes the proportion of tumors (or patients) with at least one mutation in the corresponding gene

**Fig. 3.**
Prediction strength and robustness of the prognostic signature identified from the result of xGeneModel for bladder cancer. (**A, C and D**) Clustering-analysis based evaluation of the prediction strength of the signature using the enlarged TCGA dataset, Riester’s dataset and Kim’ dataset, respectively. P-values are calculated for the comparisons between the good (red) and bad (blue) survival clusters. (B) The QQ plot for the P-values obtained from 1000 tests. In each test, the SVD-based survival analysis is performed on a randomly sampled dataset that contains 75% of the patients in the enlarged TCGA data

**Fig. 4.**
ccpwModel results for LIHC. In all the plots of this figure, the tumor clusters (groups) are consistently represented by red, green, blue and purple. **Top-left**: The dendrogram generated from the mutation-based clustering of tumors. **Top-middle**: The cluster-specific Kaplan–Meier survival curves. The P-value is calculated for the comparison between the aggregate of Cluster-2 and Cluster-4 and the aggregate of Cluster-1 (C1) and Cluster-3 (C3). **Top-right**: The association between tumor clusters and patient races. AN, BL and WH indicate Asian, black and white Americans, respectively. Beside each race ID is the corresponding number of tumor samples. **Bottom**: The mutation characteristics of individual clusters. The bar length denotes the proportion of tumors (patients) with at least one mutation in the member genes of the corresponding cancer pathway. Among the abbreviated terms, ‘Trans.’, ‘Regu.’, ‘Chrom.’, ‘Mod.’, ‘Apo.’, ‘Dam.’ and ‘Con.’ represent ‘Transcription’, ‘Regulation’, ‘Chromatin’, ‘Modification’, ‘Apoptosis’, ‘Damage’ and ‘Control’, respectively

**Fig. 5.**
Prediction strength and robustness of the prognostic signature identified from the clustering result of ccpwModel for liver cancer. (**A, C and D**) Clustering analysis based evaluation of the prediction strength of the signature using the enlarged TCGA dataset, Roessler’s dataset and Villa’s dataset, respectively. (B) The QQ plot for the P-values obtained from 1000 tests. In each test, the SVD-based survival analysis is performed on a randomly sampled dataset that contains 75% of the patients in the enlarged TCGA data

See this image and copyright information in PMC

References

1. Dees N.D. et al. (2012) MuSiC: identifying mutational significance in cancer genomes. Genome Res., 22, 1589–1598. - PMC - PubMed
1. Gonzalez-Perez A., Lopez-Bigas N. (2012) Functional impact bias reveals cancer drivers. Nucleic Acids Res., 40, e169.. - PMC - PubMed
1. Hofree M. et al. (2013) Network-based stratification of tumor mutations. Nat. Methods, 10, 1108–1115. - PMC - PubMed
1. Kim S. et al. (2015) A mutation profile for top-k patient search exploiting Gene-Ontology and orthogonal non-negative matrix factorization. Bioinformatics, 31, 3653–3659. - PMC - PubMed
1. Kim W.J. et al. (2010) Predictive value of progression-related gene classifier in primary non-muscle invasive bladder cancer. Mol. Cancer, 9, 3.. - PMC - PubMed

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources
Other Literature Sources
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Driver gene mutations based clustering of tumors: methods and applications

Affiliations

Driver gene mutations based clustering of tumors: methods and applications

Authors

Affiliations

Abstract

Figures

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources