Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 Aug 8;11(15):2456.
doi: 10.3390/cells11152456.

HSSG: Identification of Cancer Subtypes Based on Heterogeneity Score of A Single Gene

Affiliations

HSSG: Identification of Cancer Subtypes Based on Heterogeneity Score of A Single Gene

Shanchen Pang et al. Cells. .

Abstract

Cancer is a highly heterogeneous disease, which leads to the fact that even the same cancer can be further classified into different subtypes according to its pathology. With the multi-omics data widely used in cancer subtypes identification, effective feature selection is essential for accurately identifying cancer subtypes. However, the feature selection in the existing cancer subtypes identification methods has the problem that the most helpful features cannot be selected from a biomolecular perspective, and the relationship between the selected features cannot be reflected. To solve this problem, we propose a method for feature selection to identify cancer subtypes based on the heterogeneity score of a single gene: HSSG. In the proposed method, the sample-similarity network of a single gene is constructed, and pseudo-F statistics calculates the heterogeneity score for cancer subtypes identification of each gene. Finally, we construct gene-gene networks using genes with higher heterogeneity scores and mine essential genes from the networks. From the seven TCGA data sets for three experiments, including cancer subtypes identification in single-omics data, the performance in feature selection of multi-omics data, and the effectiveness and stability of the selected features, HSSG achieves good performance in all. This indicates that HSSG can effectively select features for subtypes identification.

Keywords: cancer subtypes; heterogeneity; pseudo-F statistic; single gene.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
The overall process framework of the HSSG.
Figure 2
Figure 2
The process for constructing a similarity network of a single gene sample.
Figure 3
Figure 3
The change curve of cluster accuracy in selecting genes for identifying two subtypes. (a) Curve of the accuracy of three different gene screening methods changing with that number of added genes; (b) With the addition of gene modules, the accuracy changed with the number of adding genes.
Figure 4
Figure 4
Simple heat map of gene expression for the genes selected in identifying two cancer subtypes. (a) The simple heat map of the expression of 644 genes. (b) Simple heat map of the expression of 67 genes.
Figure 5
Figure 5
The survival rate and enrichment analysis of 67 genes and 644 selected genes. (a) The survival rate schematic of the 67 genes cluster. (b) The survival rate schematic of the 644 genes cluster. (c) GO enrichment analysis of the 67 genes. (d) GO enrichment analysis and KEGG pathway enrichment analysis of the 644 genes.
Figure 6
Figure 6
Heat map of gene expression clustering for genes selected by six different methods. (a) The heat map of HSSG-selected gene expression clustering. (b) The heat map of Variance-selected gene expression clustering. (c) The heat map of Entropy-selected gene expression clustering. (d) The heat map of Kruskal-test-selected gene expression clustering. (e) The heat map of Differential expression-selected gene expression clustering. (f) The heat map of Random Forest-selected gene expression clustering.
Figure 6
Figure 6
Heat map of gene expression clustering for genes selected by six different methods. (a) The heat map of HSSG-selected gene expression clustering. (b) The heat map of Variance-selected gene expression clustering. (c) The heat map of Entropy-selected gene expression clustering. (d) The heat map of Kruskal-test-selected gene expression clustering. (e) The heat map of Differential expression-selected gene expression clustering. (f) The heat map of Random Forest-selected gene expression clustering.
Figure 7
Figure 7
The GO enrichment analysis and KEGG pathway enrichment analysis of differentially expressed screening genes.
Figure 8
Figure 8
The changing curve of the accuracy of three different methods in identifying three cancer subtypes.
Figure 9
Figure 9
Simple heat map of gene expression for the selected genes in identifying multiple subtypes. (a) Simple heat map of 500 genes selected by pseudo-f statistics. (b) Simple heat map of the 196 genes selected by network module mining.
Figure 10
Figure 10
The GO and KEGG pathway enrichment analysis of 196 obtained gene.
Figure 11
Figure 11
The changing cluster accuracy curve for two different ways to add genes.
Figure 12
Figure 12
The PCA plot of three different cancer samples.

Similar articles

References

    1. Turajlic S., Sottoriva A., Graham T., Swanton C. Resolving genetic heterogeneity in cancer. Nat. Rev. Genet. 2019;20:404–416. doi: 10.1038/s41576-019-0114-6. - DOI - PubMed
    1. Yang Y., Tian S., Qiu Y., Zhao P., Zou Q. MDICC: Novel method for multi-omics data integration and cancer subtype identification. Brief. Bioinform. 2022;23:bbac132. doi: 10.1093/bib/bbac132. - DOI - PubMed
    1. Prat A., Parker J.S., Karginova O., Fan C., Livasy C., Herschkowitz J.I., He X., Perou C.M. Phenotypic and molecular characterization of the claudin-low intrinsic subtype of breast cancer. Breast Cancer Res. 2010;12:R68. doi: 10.1186/bcr2635. - DOI - PMC - PubMed
    1. Jahid M.J., Huang T.H., Ruan J. A personalized committee classification approach to improving prediction of breast cancer metastasis. Bioinformatics. 2014;30:1858–1866. doi: 10.1093/bioinformatics/btu128. - DOI - PMC - PubMed
    1. Parker J.S., Mullins M., Cheang M.C., Leung S., Voduc D., Vickery T., Davies S., Fauron C., He X., Hu Z., et al. Supervised risk predictor of breast cancer based on intrinsic subtypes. J. Clin. Oncol. 2009;27:1160. doi: 10.1200/JCO.2008.18.1370. - DOI - PMC - PubMed

Publication types