Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 May 10;113(19):5394-9.
doi: 10.1073/pnas.1601591113. Epub 2016 Apr 26.

Big data visualization identifies the multidimensional molecular landscape of human gliomas

Affiliations

Big data visualization identifies the multidimensional molecular landscape of human gliomas

Hamid Bolouri et al. Proc Natl Acad Sci U S A. .

Abstract

We show that visualizing large molecular and clinical datasets enables discovery of molecularly defined categories of highly similar patients. We generated a series of linked 2D sample similarity plots using genome-wide single nucleotide alterations (SNAs), copy number alterations (CNAs), DNA methylation, and RNA expression data. Applying this approach to the combined glioblastoma (GBM) and lower grade glioma (LGG) The Cancer Genome Atlas datasets, we find that combined CNA/SNA data divide gliomas into three highly distinct molecular groups. The mutations commonly used in clinical evaluation of these tumors are regionally distributed in these plots. One of the three groups is a mixture of GBM and LGG that shows similar methylation and survival characteristics to GBM. Altogether, our approach identifies eight molecularly defined glioma groups with distinct sequence/expression/methylation profiles. Importantly, we show that regionally clustered samples are enriched for specific drug targets.

Keywords: big data; biomarkers; glioma; precision medicine; visualization.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Fig. 1.
Fig. 1.
Sample similarity plots reveal four distinct subtypes of gliomas. (A) Two-dimensional MDS projection of sample similarities based on combined genome-wide sample SNA and CNA profiles. Three distinct sample clusters stand out. (B) Same as A, but with samples colored by their histologic subtype. The cluster on the Left is primarily GBMs, whereas the Top Right cluster is composed mostly of astrocytomas and oligoastrocytomas and the Bottom Right cluster is predominantly oligodendrogliomas. (C) Sample similarity visualized using a collection of ∼1,500 DNA methylation probes distinguishing CIMP versus non-CIMP tumors. (D) Coloring of all samples with mutations of IDH1/2, codeletions of chromosome arms 1p and 19q, and GBM samples previously shown to be G-CIMP shows that samples in the Left cluster are G-CIMP, whereas samples in the Right cluster are non-CIMP. (E) CIMP GBM samples all fall within or near the astro sample cluster in the SNA/CNA plot (B). Non-CIMP LGGs are genomically more like non-CIMP GBMs rather than like CIMP-LGGs. (F) Kaplan–Meier survival plot shows non-CIMP LGGs are much shorter-lived than CIMP-LGGs (P value ∼0).
Fig. 2.
Fig. 2.
Genomic variations divide gliomas into eight distinct subtypes. (A) (1p,19q) codeletions occur exclusively in the Lower Right (oligo) cluster of samples. (B) IDH1 mutations occur in both astro and oligo LGG clusters, but most IDH2 mutations occur in the oligo CIMP-LGG cluster. (C) TP53 mutations are largely confined to the astro CIMP-LGG cluster and the diffuse portion of the non-CIMP cluster. (D) Mutations in ATRX primarily impact a subset of the astro cluster (Top Right), whereas CIC and FUBP1 mutations define a subset of the oligo cluster. (E) Heterozygous deletions and low-copy gains of NRAS mark the oligo cluster and a diffuse portion of the non-CIMP cluster. (F) Together, the genomic markers described in AE define eight distinct tumor subtypes, as follows. Group1 = nonCIMP & gainNRAS & mutTP53. Group2 = nonCIMP & gainNRAS & wtTP53. Group3 = nonCIMP & wtNRAS & wtTP53. Group4 = nonCIMP & wtNRAS & mutTP53. Group5 = CIMP.LGG & not1p19q & mutATRX & mutTP53. Group6 = CIMP.LGG & not1p19q & wtATRX & mutTP53. Group7 = (CIMP.LGG & del.1p19q) & (mutCIC OR mutFUBP1). Group8 = CIMP.LGG & del.1p19q & wtCIC & wtFUBP1.
Fig. 3.
Fig. 3.
Co-occurrences of chromosome 7 low-copy gain, and chromosome 10 single-copy deletion (7+/10). Samples with whole chromosome copy number changes were defined as those with more than 85% of their thresholded per gene GISTIC2.0 scores matching the expected value. (A) The 7+/10 codeletion does not occur in CIMP-LGGs. (B) Non-CIMP LGGs are highly enriched for 7+/10. (C and D) Expression similarity plots. Although non-CIMP LGGs are genomically very similar to non-CIMP GBMs, the expression patterns of stemness (C) and metabolism (D)-associated genes are very different in non-CIMP LGGs and GBMs.
Fig. 4.
Fig. 4.
Stability of the plot structure. (A) Change in total intersample distances (y-axis, arbitrary units) when each gene in the genome is removed individually and the sample similarity plot is recalculated. Three genes (IDH1, TP53, and ATRX) have a large effect on the plot layout. (B) Coremoval of the two highest impact genes leaves the three major sample clusters largely distinct, but the subgroupings within these large clusters are lost. (C) Coremoval of the three highest-impact genes further degrades the sample clustering. However, as shown in D, the three highest-impact genes by themselves are not sufficient to reproduce any of the sample clustering. (E) The 15 highest-impact genes are sufficient to capture a large portion of the sample clustering obtained from genome-wide data. (F) The 45 highest-impact genes reasonably reproduce the clustering pattern obtained by using genome-wide data.
Fig. 5.
Fig. 5.
Tightly defined regions of the genomic sample similarity plot are highly enriched for specific drug targets. (A) Distribution of samples with high and low levels of Her2 mRNA. Here “high” and “low” are defined as the Top and Bottom decile of the samples. (B) Samples with total and phosphorylated Her2 protein levels in the Top decile are concentrated in a tight region of the non-CIMP cluster that coincides with the region of high Her2 mRNA expression. (C) Red disks mark the non-CIMP tight region of genomically highly similar samples. Blue disks mark samples with high levels of Her2 mRNA/protein/phosphorylated protein. (D) Within the tight non-CIMP sample cluster delineated in C, a large majority of LGG samples are high in Her2 mRNA/protein/phosphoprotein levels.

Similar articles

Cited by

References

    1. Bailey P, Cushing H. A Classification of Tumours of the Glioma Group on a Histogenic Basis. J B Lippincott; Philadelphia: 1926.
    1. Louis DN, et al. The 2007 WHO classification of tumours of the central nervous system. Acta Neuropathol. 2007;114(2):97–109. - PMC - PubMed
    1. Hegi ME, et al. MGMT gene silencing and benefit from temozolomide in glioblastoma. N Engl J Med. 2005;352(10):997–1003. - PubMed
    1. Eckel-Passow JE, et al. Glioma groups based on 1p/19q, IDH, and TERT promoter mutations in tumors. N Engl J Med. 2015;372(26):2499–2508. - PMC - PubMed
    1. Noushmehr H, et al. Cancer Genome Atlas Research Network Identification of a CpG island methylator phenotype that defines a distinct subgroup of glioma. Cancer Cell. 2010;17(5):510–522. - PMC - PubMed

Publication types

LinkOut - more resources