An autoencoder learning method for predicting breast cancer subtypes

doi:10.1371/journal.pone.0327773

. 2025 Jul 23;20(7):e0327773.

doi: 10.1371/journal.pone.0327773. eCollection 2025.

An autoencoder learning method for predicting breast cancer subtypes

Zahra Rostami¹, Kavitha Mukund², Maryam Masnadi-Shirazi³, Shankar Subramaniam^{1

2

4}

Affiliations

¹ Department of Computer Science and Engineering, University of California San Diego, San Diego, California, United States of America.
² Department of Bioengineering, University of California San Diego, San Diego, California, United States of America.
³ Amazon, Seattle, Washington, United States of America.
⁴ Department of Cellular and Molecular Medicine, University of California San Diego, San Diego, California, United States of America.

PMID: 40700427
PMCID: PMC12286384
DOI: 10.1371/journal.pone.0327773

An autoencoder learning method for predicting breast cancer subtypes

Zahra Rostami et al. PLoS One. 2025.

. 2025 Jul 23;20(7):e0327773.

doi: 10.1371/journal.pone.0327773. eCollection 2025.

Authors

Zahra Rostami¹, Kavitha Mukund², Maryam Masnadi-Shirazi³, Shankar Subramaniam^{1

2

4}

Affiliations

¹ Department of Computer Science and Engineering, University of California San Diego, San Diego, California, United States of America.
² Department of Bioengineering, University of California San Diego, San Diego, California, United States of America.
³ Amazon, Seattle, Washington, United States of America.
⁴ Department of Cellular and Molecular Medicine, University of California San Diego, San Diego, California, United States of America.

PMID: 40700427
PMCID: PMC12286384
DOI: 10.1371/journal.pone.0327773

Abstract

Heterogeneity of breast cancer poses several challenges for detection and treatment. With next-generation sequencing, we can now map the transcriptional profile of each patient's breast tissue, which has the potential for identifying and characterizing cancer subtypes. However, the large dimensionality of this transcriptomic data and the heterogeneity between the molecular profiles of breast cancers poses a barrier to identifying minimal markers and mechanistic consequences. In this study, we develop an autoencoder to identify a reduced set of gene markers that characterize the four major breast cancer subtypes with the accuracy of 82.38%. The reduced feature space created by our model captures the functional characteristics of each breast cancer subtype highlighting mechanisms that are unique to each subtype as well as those that are shared. Our high prediction accuracy shows that our markers can be valuable for breast cancer subtype detection and have the potential to provide insights into mechanisms associated with each subtype.

Copyright: © 2025 Rostami et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

**Fig 1. Schematic of our autoencoder-based feature extraction framework.**
a) We collect data and categorize it into the four subtypes of breast cancer, namely TNBC, HER2-enriched, luminal A, and luminal B. The gene expression data for each subtype is passed to the autoencoder one at a time. b) For each subtype, we perform autoencoder-based feature inference and select the top 3000 genes that have the highest feature scores according to our algorithm. c) For each subtype, we run the autoencoder 30 times with a fixed seed and find the overlap of the resulting gene sets (each containing 3000 genes). We repeat this whole step with 16 different random seeds. d) Taking the overlap of the results from all 16 seeds for each subtype, we identify 704, 900, 158, and 865 marker genes for TNBC, HER2-enriched, luminal A and luminal B cancers, respectively. e) The marker genes are examined via two routes: first, multiclass classification using RF; second, assessment of the protein-protein interactions of the marker gene for each subtype separately. The latter involves MCL clustering of the PPI network derived from the marker genes of each cancer to capture distinct functional modules associated with each BC subtype.

**Fig 2. The loss curve representing the performance of the autoencoder for each BC subtype.**
a) TNBC cancer, b) HER2-enriched cancer c) luminal A cancer, d) luminal B cancer. The x-axis shows the number of epochs. The curves demonstrate that the model fits well on the data, does not overfit, and stays consistent throughout training.

**Fig 3. Receiver operating characteristic curves showing the performance of the model.**

**Fig 4. Comparing functional profile of the four BC subtypes using the genes derived from our model.**
The plot shows the 18 most significant gene ontologies. The node size represents $k / n$ ratio, where $n$ is the size of the list of genes corresponding to a subtype (this number is shown in parenthesis on the x-axis) and $k$ is the number of genes within that list that are annotated to the node. The dot colors indicate the adjusted p-values.

**Fig 5. Functional modules and super-modules associated with different BC subtypes.**
The genes associated with TNBC, HER2-enriched, luminal A and luminal B cancers are shown in green, blue, orange, and pink, respectively; a) Autophagy super-module (CAMKK2, shown in gray, is common between TNBC, HER2-enriched and luminal B cancers), b) Ciliary trafficking machinery and cilium assembly super-module, c) Chromatin organization and remodeling super-module, d) NF-kappa B signaling module, e) Centrosome localization and biogenesis module.

See this image and copyright information in PMC

References

1. Dhiman P, Bonkra A, Kaur A, Gulzar Y, Hamid Y, Mir MS, et al. Healthcare Trust Evolution with Explainable Artificial Intelligence: Bibliometric Analysis. Information. 2023;14(10):541. doi: 10.3390/info14100541 - DOI
1. Alharbi F, Vakanski A. Machine Learning Methods for Cancer Classification Using Gene Expression Data: A Review. Bioengineering (Basel). 2023;10(2):173. doi: 10.3390/bioengineering10020173 - DOI - PMC - PubMed
1. Yoosuf N, Navarro JF, Salmén F, Ståhl PL, Daub CO. Identification and transfer of spatial transcriptomics signatures for cancer diagnosis. Breast Cancer Res. 2020;22(1):6. doi: 10.1186/s13058-019-1242-9 - DOI - PMC - PubMed
1. Monjo T, Koido M, Nagasawa S, Suzuki Y, Kamatani Y. Efficient prediction of a spatial transcriptomics profile better characterizes breast cancer tissue sections without costly experimentation. Sci Rep. 2022;12(1):4133. doi: 10.1038/s41598-022-07685-4 - DOI - PMC - PubMed
1. Shibahara T, Wada C, Yamashita Y, Fujita K, Sato M, Kuwata J, et al. Deep learning generates custom-made logistic regression models for explaining how breast cancer subtypes are classified. PLoS One. 2023;18(5):e0286072. doi: 10.1371/journal.pone.0286072 - DOI - PMC - PubMed

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

Substances

Actions

LinkOut - more resources

Full Text Sources
- PubMed Central
- Public Library of Science
Medical
- MedlinePlus Health Information

[1] Dhiman P, Bonkra A, Kaur A, Gulzar Y, Hamid Y, Mir MS, et al. Healthcare Trust Evolution with Explainable Artificial Intelligence: Bibliometric Analysis. Information. 2023;14(10):541. doi: 10.3390/info14100541 - DOI

[2] Dhiman P, Bonkra A, Kaur A, Gulzar Y, Hamid Y, Mir MS, et al. Healthcare Trust Evolution with Explainable Artificial Intelligence: Bibliometric Analysis. Information. 2023;14(10):541. doi: 10.3390/info14100541 - DOI

[3] Alharbi F, Vakanski A. Machine Learning Methods for Cancer Classification Using Gene Expression Data: A Review. Bioengineering (Basel). 2023;10(2):173. doi: 10.3390/bioengineering10020173 - DOI - PMC - PubMed

[4] Alharbi F, Vakanski A. Machine Learning Methods for Cancer Classification Using Gene Expression Data: A Review. Bioengineering (Basel). 2023;10(2):173. doi: 10.3390/bioengineering10020173 - DOI - PMC - PubMed

[5] Yoosuf N, Navarro JF, Salmén F, Ståhl PL, Daub CO. Identification and transfer of spatial transcriptomics signatures for cancer diagnosis. Breast Cancer Res. 2020;22(1):6. doi: 10.1186/s13058-019-1242-9 - DOI - PMC - PubMed

[6] Yoosuf N, Navarro JF, Salmén F, Ståhl PL, Daub CO. Identification and transfer of spatial transcriptomics signatures for cancer diagnosis. Breast Cancer Res. 2020;22(1):6. doi: 10.1186/s13058-019-1242-9 - DOI - PMC - PubMed

[7] Monjo T, Koido M, Nagasawa S, Suzuki Y, Kamatani Y. Efficient prediction of a spatial transcriptomics profile better characterizes breast cancer tissue sections without costly experimentation. Sci Rep. 2022;12(1):4133. doi: 10.1038/s41598-022-07685-4 - DOI - PMC - PubMed

[8] Monjo T, Koido M, Nagasawa S, Suzuki Y, Kamatani Y. Efficient prediction of a spatial transcriptomics profile better characterizes breast cancer tissue sections without costly experimentation. Sci Rep. 2022;12(1):4133. doi: 10.1038/s41598-022-07685-4 - DOI - PMC - PubMed

[9] Shibahara T, Wada C, Yamashita Y, Fujita K, Sato M, Kuwata J, et al. Deep learning generates custom-made logistic regression models for explaining how breast cancer subtypes are classified. PLoS One. 2023;18(5):e0286072. doi: 10.1371/journal.pone.0286072 - DOI - PMC - PubMed

[10] Shibahara T, Wada C, Yamashita Y, Fujita K, Sato M, Kuwata J, et al. Deep learning generates custom-made logistic regression models for explaining how breast cancer subtypes are classified. PLoS One. 2023;18(5):e0286072. doi: 10.1371/journal.pone.0286072 - DOI - PMC - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

An autoencoder learning method for predicting breast cancer subtypes

Affiliations

An autoencoder learning method for predicting breast cancer subtypes

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

References

MeSH terms

Substances

LinkOut - more resources

Full Text Sources

Medical

Abstract

Conflict of interest statement

Figures

Similar articles

References

MeSH terms

Substances

Related information

LinkOut - more resources

Full Text Sources

Medical