Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2025 Oct 1;27(10):2605-2616.
doi: 10.1093/neuonc/noaf130.

A clinically annotated transcriptomic atlas of nervous system tumors

Affiliations

A clinically annotated transcriptomic atlas of nervous system tumors

Chi H Le et al. Neuro Oncol. .

Abstract

Background: While DNA methylation signatures are distinct across nervous system neoplasms, it has not been comprehensively demonstrated whether transcriptomic signatures exhibit similar uniqueness. Additionally, no large-scale dataset for comparative gene expression analyses exists. This study addresses these knowledge and resource gaps.

Methods: We compiled and harmonized raw transcriptomic and clinical data for neoplastic (n = 5,402) and nonneoplastic (n = 1,973) nervous system samples from publicly available sources, all profiled on the same microarray platform. After adjusting for surrogate variable effects ("batch effects"), machine learning methods were used to visualize, cluster, and reclassify samples with uncertain diagnoses (n = 2,225).

Results: We generated the largest clinically annotated transcriptomic atlas of nervous system tumors to date. Sample clustering was primarily driven by diagnosis. We show the utility of the atlas by refining the transcriptional subtypes of pheochromocytoma and paraganglioma (PH/PG), revealing 6 robust subtypes (Neuronal, Vascular, Metabolic, Steroidal, Developmental, Indeterminate), which were independently validated using TCGA RNA-seq data and that correlated with specific mutational signatures and clinical behaviors of these tumors.

Conclusions: Like bulk DNA methylation, we demonstrate that bulk transcriptomic signatures are distinct across the diagnostic spectrum of nervous system neoplasms. Our atlas' broad coverage of diagnoses, including rarely studied entities, spans all ages and includes individuals from diverse geographical regions, enhancing its utility for comprehensive and robust comparative gene expression analyses, as exemplified by our PH/PG analyses. For access, visit http://kdph.shinyapps.io/atlas/ or https://github.com/axitamm/BrainTumorAtlas.

Keywords: brain neoplasms; gene expression; nervous system neoplasms; paraganglioma; transcriptome.

PubMed Disclaimer

Conflict of interest statement

A.M.M. consults for Neosoma, Inc., and received honorarium from GT Medical Technologies, Inc.

Figures

Figure 1:
Figure 1:
Representation of the training dataset (5,150 samples) in the t-SNE dimensionality reduction performed on the full dataset. Individual samples (dots) are color-coded and labelled according to the diagnosis listed in the side legend. Full names of the 54 diagnostic entities are provided in Supplementary Table 1. Of note, we chose to label a specific type of supratentorial ependymomas as “RELA” instead of the most recent nomenclature “ZFTA fusion-positive” because there are non-RELA, ZFTA-fused ependymomas and these samples were identified as RELA-fusion positive.
Figure 2:
Figure 2:
Representation of all samples (5,150 samples in the training dataset and 2,184 samples with an uncertain diagnosis that are reclassified) in the t-SNE dimensionality reduction performed on the full dataset. Individual samples (dots) are color-coded and labelled according to the diagnosis listed in the side legend. Full names of the 54 diagnostic entities are provided in Supplementary Table 1. Of note, we chose to label a specific type of supratentorial ependymomas as “RELA” instead of the most recent nomenclature “ZFTA fusion-positive” because there are non-RELA, ZFTA-fused ependymomas and samples used in the training dataset were identified as RELA-fusion positive.
Figure 3:
Figure 3:
In the 679 samples with nonunanimous predictions among the classifiers, (A) the frequency of each diagnosis prediction is normalized to its proportion in the “core” dataset used for training the classifiers. (B) When one classifier predicted a sample as ganglioglioma, desmoplastic infantile ganglioglioma, or pleomorphic xanthoastrocytoma (below the dashed line), this chord diagram illustrates the other diagnoses predicted by another classifier (above the dashed line). Specific relationships are highlighted with a black border and an arrow. Full names of the diagnostic entities are provided in Supplementary Table 1.
Figure 4:
Figure 4:
Manually curated metadata and clinical data associated with the 7,375 samples in the atlas generated. The number of samples per (A) country listed in the contact information of the deposited raw data, (B) age group, and (C) anatomic location. The sex of the samples is color-coded (pink/blue or gray if not available) in the latter 2 plots. (D) The number of samples with associated genetic information. (E-N) Overall survival, visualized using Kaplan-Meier curves, based on the final diagnosis. E-G survival curves are of all types of diffuse gliomas according to different subgroups. Tick marks on the curves represent censored values. Full names of the diagnostic entities are provided in Supplementary Table 1. Abbreviations: amplif = amplification; codel = co-deletion; meth = methylation; mut = mutation; NA = not available; NOS = not otherwise specified; pTERT/pMGMT = promoter of the respective gene; yrs = years.
Figure 5:
Figure 5:
Transcriptional subtypes of pheochromocytoma and paraganglioma (PH/PG). (A) t-SNE embedding of gene expression profiles from 240 PH/PG tumors in the atlas. Color of the dots (samples) represents transcriptional subtype. (B) Heatmap of scaled expression for top 30 differentially expressed genes (rows) across the samples (columns) clustered into the 6 subtypes in the atlas. Top 2 rows show cluster membership and genetic mutations. Names on the left designate the subtype cluster. (C) Similar heatmap generated for TCGA samples, and ten or eleven representative genes upregulated in each subtype are shown on the right. Pseudo RET, pseudo VHL, and pseudo SDHx represent designations used in the original study analyzing the samples in the atlas. SDHx represents SDH gene family (e.g., SDHA, SDHB, SDHC, SDHD). Fus. = fusion. (D) t-SNE embedding of TCGA-PCPG samples, based on the most variable genes (~20,000). Color of the dots (samples) represents transcriptional subtype. (E) Word clouds summarizing dominant Gene Ontology Biological Processes enriched in the top upregulated genes in each transcriptional subtype. (F) Shared hub transcription factors identified through weighted gene co-expression network analysis performed independently on atlas and TCGA samples. kME represents the Pearson correlation between the expression of the transcription factor across samples and the first principal component of the expression matrix of genes in a module. Higher absolute values of kME (dot size) represent greater interconnectivity between the transcription factor and genes in a module. Color of the kME values in positive or negative direction represent direct or indirect interconnectivity. (G) Distribution of tumor anatomical locations (pheochromocytoma vs paraganglioma), (H) tumor aggressiveness and metastatic rates, and (I) age at diagnosis across subtypes, showing median and interquartile range, using TCGA clinical data.

References

    1. Dragomir MP, Calina TG, Perez E, et al. DNA methylation-based classifier differentiates intrahepatic pancreato-biliary tumours. EBioMedicine. 2023;93:104657. - PMC - PubMed
    1. Jurmeister P, Gloss S, Roller R, et al. DNA methylation-based classification of sinonasal tumors. Nat Commun. 2022;13(1):7148. - PMC - PubMed
    1. Capper D, Jones DTW, Sill M, et al. DNA methylation-based classification of central nervous system tumours. Nature. 2018;555(7697):469–474. - PMC - PubMed
    1. Technical Note: Design and Performance of the GeneChip® Human Genome U133 Plus 2.0 and Human Genome U133A 2.0 Arrays. 2003; https://assets.thermofisher.com/TFS-Assets%2FLSG%2Fbrochures%2Fhgu133_p2.... Accessed July 31, 2024.
    1. Lakiotaki K, Vorniotakis N, Tsagris M, Georgakopoulos G, Tsamardinos I.. BioDataome: a collection of uniformly preprocessed and automatically annotated datasets for data-driven biology. Database (Oxford). 2018;2018:bay011. - PMC - PubMed

Substances