Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2020 Oct 29:2020:4737969.
doi: 10.1155/2020/4737969. eCollection 2020.

RNA-Seq-Based Breast Cancer Subtypes Classification Using Machine Learning Approaches

Affiliations

RNA-Seq-Based Breast Cancer Subtypes Classification Using Machine Learning Approaches

Zhezhou Yu et al. Comput Intell Neurosci. .

Abstract

Background: Breast invasive carcinoma (BRCA) is not a single disease as each subtype has a distinct morphology structure. Although several computational methods have been proposed to conduct breast cancer subtype identification, the specific interaction mechanisms of genes involved in the subtypes are still incomplete. To identify and explore the corresponding interaction mechanisms of genes for each subtype of breast cancer can impose an important impact on the personalized treatment for different patients.

Methods: We integrate the biological importance of genes from the gene regulatory networks to the differential expression analysis and then obtain the weighted differentially expressed genes (weighted DEGs). A gene with a high weight means it regulates more target genes and thus holds more biological importance. Besides, we constructed gene coexpression networks for control and experiment groups, and the significantly differentially interacting structures encouraged us to design the corresponding Gene Ontology (GO) enrichment based on gene coexpression networks (GOEGCN). The GOEGCN considers the two-side distinction analysis between gene coexpression networks for control and experiment groups. The method allows us to study how the modulated coexpressed gene couples impact biological functions at a GO level.

Results: We modeled the binary classification with weighted DEGs for each subtype. The binary classifier could make a good prediction for an unseen sample, and the experimental results validated the effectiveness of our proposed approaches. The novel enriched GO terms based on GOEGCN for control and experiment groups of each subtype explain the specific biological function changes according to the two-side distinction of coexpression network structures to some extent.

Conclusion: The weighted DEGs contain biological importance derived from the gene regulatory network. Based on the weighted DEGs, five binary classifiers were learned and showed good performance concerning the "Sensitivity," "Specificity," "Accuracy," "F1," and "AUC" metrics. The GOEGCN with weighted DEGs for control and experiment groups presented a novel GO enrichment analysis results and the novel enriched GO terms would further unveil the changes of specific biological functions among all the BRCA subtypes to some extent. The R code in this research is available at https://github.com/yxchspring/GOEGCN_BRCA_Subtypes.

PubMed Disclaimer

Conflict of interest statement

The authors declare no conflicts of interest.

Figures

Figure 1
Figure 1
Heatmap for Basal-like and non-Basal-like groups. The left group 1 represents the Basal-like group and the right group 2 denotes the non-Basal-like group.
Figure 2
Figure 2
Flowchart to discover the interaction networks structures for control and experiment groups. (a) Construct the gene coexpression networks by PCC. The bold edges denote the higher PCC, and the thin edges represent the lower PCC. (b) Conduct the pruning operation and remove the edges whose PCC values are less than the cutoff. (c) The symmetric matrix forms compared with step (b). (d) Remove the shared network structures between control and experiment groups, and just focus on the specific structures of the upper triangular matrix from control and experiment groups owing to the symmetry.
Figure 3
Figure 3
Framework of our proposed algorithm.
Figure 4
Figure 4
ROC curves of each subtype using three kinds of machine learning approaches. The Area Under Curve (AUC) is used to assess the performance of the binary classifier. (a) The ROC curves of Basal-like using three kinds of machine learning approaches (i.e., nb, rf, and svmRadial). (b) The ROC curves of Her2 using three kinds of machine learning approaches. (c) The ROC curves of LumA using three kinds of machine learning approaches. (d) The ROC curves of LumB using three kinds of machine learning approaches. (e) The ROC curves of Normal-like using three kinds of machine learning approaches.
Figure 5
Figure 5
Flowchart for conducting the GOEGCN analysis using weighted DEGs. (a) First of all, the initial enriched GO terms are obtained using GO enrichment analysis. Then a sub symmetric coexpression matrix of “geneID” from each GO term for control or experiment group is constructed, and the interaction network structures of original symmetric coexpression matrix for control or experiment group are regarded as the background. (b) Adopt equations (2) and (3) to recalculate the p values for control and experiment groups, respectively. (c) Collect and reorder the results of enriched GO terms which are recalculated and form the final enriched GO terms list for control and experiment groups, respectively.

Similar articles

Cited by

References

    1. Graudenzi A., Cava C., Bertoli G., et al. Pathway-based classification of breast cancer subtypes. Frontiers in Bioscience. 2017;22(10):1697–1712. doi: 10.2741/4566. - DOI - PubMed
    1. Dai X., Li T., Bai Z., et al. Breast cancer intrinsic subtype classification, clinical use and future trends. American Journal of Cancer Research. 2015;5(10):p. 2929. doi: 10.1534/g3.114.014894. - DOI - PMC - PubMed
    1. Sorlie T., Perou C. M., Tibshirani R., et al. Gene expression patterns of breast carcinomas distinguish tumor subclasses with clinical implications. Proceedings of the National Academy of Sciences. 2001;98(19):10869–10874. doi: 10.1073/pnas.191367098. - DOI - PMC - PubMed
    1. Hu Z., Fan C., Oh D. S., et al. The molecular portraits of breast tumors are conserved across microarray platforms. BioMed Central Genomics. 2006;7(1):p. 96. doi: 10.1186/1471-2164-7-96. - DOI - PMC - PubMed
    1. Parker J. S., Mullins M., Cheang M. C. U., et al. Supervised risk predictor of breast cancer based on intrinsic subtypes. Journal of Clinical Oncology. 2009;27(8):p. 1160. doi: 10.1200/jco.2008.18.1370. - DOI - PMC - PubMed