Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2024 Nov 22;26(1):bbae628.
doi: 10.1093/bib/bbae628.

Comprehensive bioinformatics and machine learning analyses for breast cancer staging using TCGA dataset

Affiliations

Comprehensive bioinformatics and machine learning analyses for breast cancer staging using TCGA dataset

Saurav Chandra Das et al. Brief Bioinform. .

Abstract

Breast cancer is an alarming global health concern, including a vast and varied set of illnesses with different molecular characteristics. The fusion of sophisticated computational methodologies with extensive biological datasets has emerged as an effective strategy for unravelling complex patterns in cancer oncology. This research delves into breast cancer staging, classification, and diagnosis by leveraging the comprehensive dataset provided by the The Cancer Genome Atlas (TCGA). By integrating advanced machine learning algorithms with bioinformatics analysis, it introduces a cutting-edge methodology for identifying complex molecular signatures associated with different subtypes and stages of breast cancer. This study utilizes TCGA gene expression data to detect and categorize breast cancer through the application of machine learning and systems biology techniques. Researchers identified differentially expressed genes in breast cancer and analyzed them using signaling pathways, protein-protein interactions, and regulatory networks to uncover potential therapeutic targets. The study also highlights the roles of specific proteins (MYH2, MYL1, MYL2, MYH7) and microRNAs (such as hsa-let-7d-5p) that are the potential biomarkers in cancer progression founded on several analyses. In terms of diagnostic accuracy for cancer staging, the random forest method achieved 97.19%, while the XGBoost algorithm attained 95.23%. Bioinformatics and machine learning meet in this study to find potential biomarkers that influence the progression of breast cancer. The combination of sophisticated analytical methods and extensive genomic datasets presents a promising path for expanding our understanding and enhancing clinical outcomes in identifying and categorizing this intricate illness.

Keywords: TCGA; breast cancer; cancer staging; machine learning; ontology; transcription factors.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Working flowchart of the analytical study performed in this research.
Figure 2
Figure 2
Volcano plot of DEGs. The DEGs are obtained based on criteria of log fold-change (LogFC) <1 for downregulated genes and (IogFC) > 1 for upregulated genes with a P-value < 0.05.
Figure 3
Figure 3
The PPI network of the top 10 upregulated and top 10 downregulated genes of BRCA. The bigger circle with different colors represents the top 4 hub proteins.
Figure 4
Figure 4
miRNA–gene interaction regulatory network. Target regulatory molecules are represented by square nodes, while associated genes are represented by circular nodes.
Figure 5
Figure 5
TF–gene interaction regulatory network. Square nodes indicate target regulatory molecules (TFs), and circular shape nodes represent the associated DEGs.
Figure 6
Figure 6
Combined protein–drug and protein–chemical interaction network. Pentangle nodes indicate chemical compounds and rhombus nodes indicate drug regulatory molecules.
Figure 7
Figure 7
Overall survival rate of the genes ACTL8, CGA, IBSP, and MUC2.
Figure 8
Figure 8
Precision–recall curve of machine learning models RF, GNB, KNNs, and XGB (XGBoost).
Figure 9
Figure 9
ROC curve of machine learning models RF, GNB, KNN, and XGB.

References

    1. Cancer.org . https://www.cancer.org/cancer/types/breast-cancer.html, [Accessed 17-11-2023].
    1. Siegel RL, Miller KD, Fuchs HE. et al. . Cancer statistics, 2021. CA Cancer J Clin 2021;71:7–33. 10.3322/caac.21654. - DOI - PubMed
    1. Breast Cancer-Statistics—cancer.net Statistics , https://www.cancer.net/cancer-types/breast-cancer/ [Accessed 17-11-2023].
    1. Indicators casncc. relative survival by stage at diagnosis (female breast cancer) 2019. https://ncci.canceraustralia.gov.au/relative-survival-stage-diagnosis-fe... [Accessed 18-11-2023].
    1. Clarke R, Tyson JJ, Dixon JM. Endocrine resistance in breast cancer–an overview and update. Mol Cell Endocrinol 2015;418:220–34. 10.1016/j.mce.2015.09.035. - DOI - PMC - PubMed