Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Jul 21;13(1):11790.
doi: 10.1038/s41598-023-35401-3.

Identification of the molecular subtypes and construction of risk models in neuroblastoma

Affiliations

Identification of the molecular subtypes and construction of risk models in neuroblastoma

Enyang He et al. Sci Rep. .

Abstract

The heterogeneity of neuroblastoma directly affects the prognosis of patients. Individualization of patient treatment to improve prognosis is a clinical challenge at this stage and the aim of this study is to characterize different patient populations. To achieve this, immune-related cell cycle genes, identified in the GSE45547 dataset using WGCNA, were used to classify cases from multiple datasets (GSE45547, GSE49710, GSE73517, GES120559, E-MTAB-8248, and TARGET) into subgroups by consensus clustering. ESTIMATES, CIBERSORT and ssGSEA were used to assess the immune status of the patients. And a 7-gene risk model was constructed based on differentially expressed genes between subtypes using randomForestSRC and LASSO. Enrichment analysis was used to demonstrate the biological characteristics between different groups. Key genes were screened using randomForest to construct neural network and validated. Finally, drug sensitivity was assessed in the GSCA and CellMiner databases. We classified the 1811 patients into two subtypes based on immune-related cell cycle genes. The two subtypes (Cluster1 and Cluster2) exhibited distinct clinical features, immune levels, chromosomal instability and prognosis. The same significant differences were demonstrated between the high-risk and low-risk groups. Through our analysis, we identified neuroblastoma subtypes with unique characteristics and established risk models which will improve our understanding of neuroblastoma heterogeneity.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Figure 1
Figure 1
Identification and functional analysis of IRCCGs. (A) The results of the t-SNE algorithm show heterogeneity in the patients. (B) Analysis of network topology for various soft-thresholding powers. The left panel shows the scale-free fit index as a function of the soft-thresholding power. The right panel displays the mean connectivity as a function of the soft-thresholding power. Based on the scale-free fit index greater than 0.9, we chose 4 as the soft thresholding power. (C) At the top is Clustering dendrogram of genes with assigned module colors. At the bottom is Module-trait associations. Each cell contains the corresponding correlation and P value. The darker the color of the cell, the higher the correlation. (D) Results of KEGG enrichment. The numbers in the graph indicated the counts of the pathway. (E) Results of Biological Process enrichment. The line between dots indicated the presence of identical genes between pathways. (F) The top 5 pathways of Cellular Component enrichment was demonstrated. The length of the yellow bar indicated the number of pathway genes. The height of the blue bar indicated the number of intersecting genes. (G) The first 5 enriched to Molecular Function terms and the genes in the terms. (H,I) Principal component analysis (PCA) of the gene expression in datasets. The visualization of patients by scatter plots were based on the top two Dims of gene expression profiles with the removal of batch effect.
Figure 2
Figure 2
Identification of subtypes and clinical correlations of subtypes. (A) Consensus matrix heatmap with cluster count of 2. (B) The bar-plots represent the consensus scores for subgroups and we chose the results with consensus scores greater than 0.8. (C) Heatmap of Top 50 Immune-related cell cycle genes levels and distribution of age, MYCN status, and INSS stage in the two clusters. (D) The Sankey diagram showing whether TERT was rearrangement and whether APB existed. (E) The bar chart showed the distribution of chromosomal abnormalities in the two clusters. (F) The Kaplan–Meier curves showed the OS time of the two clusters of patients inside the E-MTAB-8248 and the TARGET datasets.
Figure 3
Figure 3
Comparison of immunization of two clustered subtypes. (A) Box plots showed the mRNA expression of immune checkpoints in two clusters (*P < 0.05; **P < 0.01; ***P < 0.001). (B) Stacked bar chart showed the percentage of immune cells in 1811 patients. (C) Box plots were used to display the distribution of the cell proportions calculated by the CIBERSORT algorithm of immune cells between the two clusters (*P < 0.05; **P < 0.01; ***P < 0.001). (D) Box plot of the distribution of immune cell expression between the two clusters as calculated by the ssGSEA algorithm (*P < 0.05; **P < 0.01; ***P < 0.001). (EJ) Box plots were created to visualize the distribution of the Stromal Score, Immune Score, ESTIMATES Score, and  Tumor Purity, which were calculated by the ESTIMATE algorithm between the two clusters in the GSE45547 (E), GSE49710 (F), GSE73517 (G), GSE120559 (H), E-MTAB-828 (I) and TARGET (J) datasets.
Figure 4
Figure 4
Identification DEGs and functional annotation of DEGs. (A) Volcano plot depicted the distribution of DEGs in TARGET dataset (Cluster1 VS Cluster2) and labeled the top 5 genes with the smallest ranking according to adjusted P value. (Genes with adjusted P value > 0.05 were not shown in the plot). (B) Heatmap of the DEGs derived from the 5 microarray datasets. (C) The Ven diagram showed the number of intersecting genes in the results of the difference analysis. (D) The TOP 30 genes based on the MCC algorithm, with the darker colors, indicating the higher MCC scores. (E,F) Bar graph (E) showed the results of GO enrichment and Bubble plots (F) showed KEGG enrichment results for Cluster 1 relative to Cluster 2 highly expressed genes. The numbers in the Bar graph represented the counts in the pathway. (G,H) Bar graph (G) showed the results of GO enrichment and Bubble plots (H) showed KEGG enrichment results for Cluster 1 relative to Cluster 2 low expressed genes. The numbers in the Bar graph represented the counts in the pathway. In the GO enrichment results (E,G), BP refers to Biological Process, CC denotes Cellular Component, and MF represents Molecular Function.
Figure 5
Figure 5
Construction of the risk model. (A) The forest plot showed the HR and 95% confidence interval of the most significant TOP 10 genes in the univariate regression results, sorted by P value. (B) The left graph showed the variation of Error rate with the number of trees. The right graph showed the ranking of genes according to the importance of the VIMP algorithm, where blue represents favorable to the correct judgment of the endings and red represents unfavorable. (C) Each line in the above graph represented a gene, the vertical coordinate was the value of the coefficient, the lower horizontal coordinate was log(λ), and the upper horizontal coordinate was the number of non-zero coefficients in the model at this time. (D) Based on cross-validation, for each value of λ, around the mean value of the target covariate shown in red, we can obtain a confidence interval for the target covariate. The two dashed lines indicate each of the two particular λ values. We chose lambda.1se as the final model parameter. (E) Each point in the scatter plot represented the survival status and survival time of a patient. The horizontal coordinates were the patients ranked from lowest to highest according to their risk scores. (F) Based on the risk score of each point in the scatter plot representing one patient, we divided them into high-risk and low-risk groups. (G,H) The Kaplan–Meier curves showed the progression-free survival time (G) and OS time (H) of the two risk groups of patients inside the E-MTAB-8248 dataset.
Figure 6
Figure 6
Validation and investigation of risk models. (A) The ability of clinical indicators and risk scores to determine prognosis at year 1, year 3, and year 5 were compared using ROC curves. (B) The ROC curve demonstrates the ability of the risk score to determine prognosis at year 1, year 3, and year 5. (C) The Kaplan–Meier curves showed the OS time of the two risk groups of patients inside the TARGET dataset. (D) The heat map demonstrated the expression levels of seven risk model genes in patients. (E) Mutations of 7 risk model genes in multiple tumors. (F) Comparison of differences in ESTIMATES results between high and low risk groups. Red dots indicated that patients belong to Cluster 1 and green dots indicated that patients belong to Cluster 2. (G) Box plots showed the mRNA expression of immune checkpoints in two risk groups (*P < 0.05; **P < 0.01; ***P < 0.001). (H) Box plot of the distribution of immune cell expression between the two risk groups as calculated by the ssGSEA algorithm (*P < 0.05; **P < 0.01; ***P < 0.001).
Figure 7
Figure 7
Comparison between high and low risk groups. (A) Comparison of risk scores in MYCN status, age groups and INSS stages. (B) The Sankey diagram showed the distribution of chromosomal abnormalities in the two risk groups. (C) The heat map showed the levels of differential genes between the high and low risk groups. (D) TOP 10 hub genes identified by MCC algorithm. (E) The bar graph showed the results of GSVA enrichment. Purple represented the major pathways enriched to in the high-risk group and green represented the major pathways in the low-risk group. (F) The TOP 3 most significant GO enriched terms in the high-risk and low-risk groups. (G) The TOP 3 most significant KEGG enriched pathways in the high-risk and low-risk groups.
Figure 8
Figure 8
Neural Networks and Treatment Analysis. (A) Importance ranking chart of variables based on MDA and MDG. (B) Neural network structure schematic. The outer red layer represents the input layer, the middle blue represents the 20 hidden neurons, and the output layer is yellow. (C) ROC curves demonstrate the classification performance of the neural network in the GSE49710 dataset. (D) Assessment of patient survival probability using nomograms. (E) Immunotherapy target gene expression levels. (F) G1/S cell cycle checkpoint gene expression levels. (G) G2/M cell cycle checkpoint gene expression levels. (H) GDSC database drug sensitivity analysis results. (I) Heat map of correlation between risk score and drug sensitivity.

References

    1. Zafar A, et al. Molecular targeting therapies for neuroblastoma: Progress and challenges. Med. Res. Rev. 2021;41:961–1021. doi: 10.1002/med.21750. - DOI - PMC - PubMed
    1. Fulda S. The PI3K/Akt/mTOR pathway as therapeutic target in neuroblastoma. Curr. Cancer Drug Targets. 2009;9:729–737. doi: 10.2174/156800909789271521. - DOI - PubMed
    1. Liu X, et al. Deregulated Wnt/beta-catenin program in high-risk neuroblastomas without MYCN amplification. Oncogene. 2008;27:1478–1488. doi: 10.1038/sj.onc.1210769. - DOI - PubMed
    1. Takita J. The role of anaplastic lymphoma kinase in pediatric cancers. Cancer Sci. 2017;108:1913–1920. doi: 10.1111/cas.13333. - DOI - PMC - PubMed
    1. Evan GI, Vousden KH. Proliferation, cell cycle and apoptosis in cancer. Nature. 2001;411:342–348. doi: 10.1038/35077213. - DOI - PubMed

Publication types