Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2022 May 7;14(9):2325.
doi: 10.3390/cancers14092325.

A Data Science Approach for the Identification of Molecular Signatures of Aggressive Cancers

Affiliations

A Data Science Approach for the Identification of Molecular Signatures of Aggressive Cancers

Adriano Barbosa-Silva et al. Cancers (Basel). .

Abstract

The main hallmarks of cancer include sustaining proliferative signaling and resisting cell death. We analyzed the genes of the WNT pathway and seven cross-linked pathways that may explain the differences in aggressiveness among cancer types. We divided six cancer types (liver, lung, stomach, kidney, prostate, and thyroid) into classes of high (H) and low (L) aggressiveness considering the TCGA data, and their correlations between Shannon entropy and 5-year overall survival (OS). Then, we used principal component analysis (PCA), a random forest classifier (RFC), and protein-protein interactions (PPI) to find the genes that correlated with aggressiveness. Using PCA, we found GRB2, CTNNB1, SKP1, CSNK2A1, PRKDC, HDAC1, YWHAZ, YWHAB, and PSMD2. Except for PSMD2, the RFC analysis showed a different list, which was CAD, PSMD14, APH1A, PSMD2, SHC1, TMEFF2, PSMD11, H2AFZ, PSMB5, and NOTCH1. Both methods use different algorithmic approaches and have different purposes, which explains the discrepancy between the two gene lists. The key genes of aggressiveness found by PCA were those that maximized the separation of H and L classes according to its third component, which represented 19% of the total variance. By contrast, RFC classified whether the RNA-seq of a tumor sample was of the H or L type. Interestingly, PPIs showed that the genes of PCA and RFC lists were connected neighbors in the PPI signaling network of WNT and cross-linked pathways.

Keywords: PCA; RFC; RNA-seq; WNT pathways; aggressiveness; cancer; interactome; machine learning; prognostic genes.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflict of interest.

Figures

Figure 1
Figure 1
Correlation of subnetwork entropies vs. 5-year OS for p = 0.025 with GDC RPKMupper + Log2 (source: [14]).
Figure 2
Figure 2
Frequencies (numbers) of upregulated genes’ connections per target pathway and cancer type. The plot is faceted per pathway and by Kruskal–Wallis p-values (A): 9.1 × 109, (B): 1.9 × 104, (C): 1.6 × 101, (D): 4.9 × 108, (E): 1.4 × 102, (F): 2.9 × 1010, (G): 1.4 × 108, (H): 1.1 × 108. Each value indicates a group in which the average connection count is different from the global average within that facet.
Figure 3
Figure 3
Frequencies (counts) of normalized gene counts (NormCounts) associated with genes of tumors from H and L cancer classes. From the 783 genes in this study, 609 were expressed in the H class and 559 in the L one. Panel (A) displays genes whose proteins had less than or equal to 27 normalized connections (NormConnections), and (BD) those with more than 27 normalized connections. Considering proteins having less than 27 normalized connections (L = 509 and H = 531), frequency distributions were similar in L (blue) and H (red) classes for genes upregulated in less than 27% of tumor samples (A). Proteins having more than 27 normalized connections (L = 14 and H = 12) (BCD) are still present in a large proportion of the L class, even if they tended to be upregulated in more H class samples (B). The same trend was observed for genes upregulated in at least 27%, but no more than 73% of tumors of the L (n = 36) and H (n = 57) classes (C). This fact appeared clearly from panel (D), which shows that nine genes with at least 27 normalized connections were upregulated in at least 73% of the tumors from the H class, whereas no genes with such protein connection rates could be observed in the L class (D). These results show that considering the IntAct interactome as a reference, a protein can be considered as a hub when it has at least 27 normalized connections with its neighbors, and here we observed them in a larger proportion of aggressive cancers (class H), which shows that the WNT and cross-linked pathways are more branched in tumors of this class.
Figure 4
Figure 4
Bidimensional PCs (A) and three-dimensional PCs (B). Principal component analysis (PCA) representations of the variance of upregulated genes frequency considering the eight pathways and cancer type and dendrograms of hierarchical clustering of the PC3 components showing a clear division of cancer types into H (red) and L (green) classes. Unrooted (C) and rooted (D) representations.
Figure 5
Figure 5
Evolution of the RFC error rate (OOB) with the increase in tree number (ntree parameter) for H (red) and L (green) classes.
Figure 6
Figure 6
Interaction network of proteins from genes of Table 3 and Table 4. Table 3 (PCA) genes are represented on beige blocks and form a major component. Most genes of Table 4 (RFC) are represented on purple blocks and are from Table 3, except APH1A, H2AFZ, and TMEFF2. H2AFZ and TMEFF2 were disconsidered here because they have low connectivity rates. APH1A is connected to the major network component through six intermediary proteins (blue) as appeared according to the IntAct interactome.

Similar articles

Cited by

References

    1. Bray F., Ferlay J., Soerjomataram I., Siegel R.L., Torre L.A., Jemal A. Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J. Clin. 2018;68:394–424. doi: 10.3322/caac.21492. - DOI - PubMed
    1. Carels N., Conforte A.J., Lima C.R., da Silva F.A.B. Challenges for the optimization of drug therapy in the treatment of cancer. In: Da Silva F.A.B., Carels N., dos Santos T.M., Lopes F.J.P., editors. Computational Biology. Springer International Publishing; Cham, Switzerland: 2020. pp. 163–198.
    1. Heudobler D., Lüke F., Vogelhuber M., Klobuch S., Pukrop T., Herr W., Gerner C., Pantziarka P., Ghibelli L., Reichle A. Anakoinosis: Correcting aberrant homeostasis of cancer tissue-going beyond apoptosis induction. Front. Oncol. 2019;9:1408. doi: 10.3389/fonc.2019.01408. - DOI - PMC - PubMed
    1. Lahiri C., Pawar S., Rohit Mishra R. Precision medicine and future of cancer treatment. Precis. Cancer Med. 2019;2:5167. doi: 10.21037/pcm.2019.09.01. - DOI
    1. Whirl-Carrillo M., McDonagh E.M., Hebert J.M., Gong L., Sangkuhl K., Thorn C.F., Altman R.B., Klein T.E. Pharmacogenomics knowledge for personalized medicine. Clin. Pharmacol. Ther. 2012;92:414–417. doi: 10.1038/clpt.2012.96. - DOI - PMC - PubMed