Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2018 Jul 1;34(13):i404-i411.
doi: 10.1093/bioinformatics/bty232.

Driver gene mutations based clustering of tumors: methods and applications

Affiliations

Driver gene mutations based clustering of tumors: methods and applications

Wensheng Zhang et al. Bioinformatics. .

Abstract

Motivation: Somatic mutations in proto-oncogenes and tumor suppressor genes constitute a major category of causal genetic abnormalities in tumor cells. The mutation spectra of thousands of tumors have been generated by The Cancer Genome Atlas (TCGA) and other whole genome (exome) sequencing projects. A promising approach to utilizing these resources for precision medicine is to identify genetic similarity-based sub-types within a cancer type and relate the pinpointed sub-types to the clinical outcomes and pathologic characteristics of patients.

Results: We propose two novel methods, ccpwModel and xGeneModel, for mutation-based clustering of tumors. In the former, binary variables indicating the status of cancer driver genes in tumors and the genes' involvement in the core cancer pathways are treated as the features in the clustering process. In the latter, the functional similarities of putative cancer driver genes and their confidence scores as the 'true' driver genes are integrated with the mutation spectra to calculate the genetic distances between tumors. We apply both methods to the TCGA data of 16 cancer types. Promising results are obtained when these methods are compared to state-of-the-art approaches as to the associations between the determined tumor clusters and patient race (or survival time). We further extend the analysis to detect mutation-characterized transcriptomic prognostic signatures, which are directly relevant to the etiology of carcinogenesis.

Availability and implementation: R codes and example data for ccpwModel and xGeneModel can be obtained from http://webusers.xula.edu/kzhang/ISMB2018/ccpw_xGene_software.zip.

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1.
Fig. 1.
The flowchart of xGeneModel. In the heatmaps for matrices M1 and M3, grey and black colors indicate 0 and 1 elements, respectively. In the matrices M2, M4, M5, M6 and the vector W, the element values range from 0 to 1, indicated by a light-grey to black gradient
Fig. 2.
Fig. 2.
xGeneModel results for BLCA. In all the plots of this figure, the tumor clusters (groups) are consistently represented by red, green, blue and purple. Top-left: The dendrogram generated from the mutation-based clustering of tumors. Top-middle: the cluster-specific Kaplan–Meier survival curves. The P-value is calculated for the comparison between the aggregate of Cluster-1 (C1) and Cluster-2 (C2) and the aggregate of other two clusters. Cluster-3 is the one of our main interest, in which ∼95% of patients have a mutation in the TP53 gene and the survival profile is poorer than that of the other clusters with a modest significance (P = 0.09, logRank test). Top-right: The association between the tumor clusters and patient race. AN, BL and WH indicate Asian, black and white Americans, respectively. Beside each race ID is the corresponding number of tumor samples. Bottom: The mutation characteristics of individual clusters. The bar length denotes the proportion of tumors (or patients) with at least one mutation in the corresponding gene
Fig. 3.
Fig. 3.
Prediction strength and robustness of the prognostic signature identified from the result of xGeneModel for bladder cancer. (A, C and D) Clustering-analysis based evaluation of the prediction strength of the signature using the enlarged TCGA dataset, Riester’s dataset and Kim’ dataset, respectively. P-values are calculated for the comparisons between the good (red) and bad (blue) survival clusters. (B) The QQ plot for the P-values obtained from 1000 tests. In each test, the SVD-based survival analysis is performed on a randomly sampled dataset that contains 75% of the patients in the enlarged TCGA data
Fig. 4.
Fig. 4.
ccpwModel results for LIHC. In all the plots of this figure, the tumor clusters (groups) are consistently represented by red, green, blue and purple. Top-left: The dendrogram generated from the mutation-based clustering of tumors. Top-middle: The cluster-specific Kaplan–Meier survival curves. The P-value is calculated for the comparison between the aggregate of Cluster-2 and Cluster-4 and the aggregate of Cluster-1 (C1) and Cluster-3 (C3). Top-right: The association between tumor clusters and patient races. AN, BL and WH indicate Asian, black and white Americans, respectively. Beside each race ID is the corresponding number of tumor samples. Bottom: The mutation characteristics of individual clusters. The bar length denotes the proportion of tumors (patients) with at least one mutation in the member genes of the corresponding cancer pathway. Among the abbreviated terms, ‘Trans.’, ‘Regu.’, ‘Chrom.’, ‘Mod.’, ‘Apo.’, ‘Dam.’ and ‘Con.’ represent ‘Transcription’, ‘Regulation’, ‘Chromatin’, ‘Modification’, ‘Apoptosis’, ‘Damage’ and ‘Control’, respectively
Fig. 5.
Fig. 5.
Prediction strength and robustness of the prognostic signature identified from the clustering result of ccpwModel for liver cancer. (A, C and D) Clustering analysis based evaluation of the prediction strength of the signature using the enlarged TCGA dataset, Roessler’s dataset and Villa’s dataset, respectively. (B) The QQ plot for the P-values obtained from 1000 tests. In each test, the SVD-based survival analysis is performed on a randomly sampled dataset that contains 75% of the patients in the enlarged TCGA data

Similar articles

Cited by

References

    1. Dees N.D. et al. (2012) MuSiC: identifying mutational significance in cancer genomes. Genome Res., 22, 1589–1598. - PMC - PubMed
    1. Gonzalez-Perez A., Lopez-Bigas N. (2012) Functional impact bias reveals cancer drivers. Nucleic Acids Res., 40, e169.. - PMC - PubMed
    1. Hofree M. et al. (2013) Network-based stratification of tumor mutations. Nat. Methods, 10, 1108–1115. - PMC - PubMed
    1. Kim S. et al. (2015) A mutation profile for top-k patient search exploiting Gene-Ontology and orthogonal non-negative matrix factorization. Bioinformatics, 31, 3653–3659. - PMC - PubMed
    1. Kim W.J. et al. (2010) Predictive value of progression-related gene classifier in primary non-muscle invasive bladder cancer. Mol. Cancer, 9, 3.. - PMC - PubMed

Publication types