Dynamic Meta-data Network Sparse PCA for Cancer Subtype Biomarker Screening

Rui Miao¹, Xin Dong¹, Xiao-Ying Liu², Sio-Long Lo¹, Xin-Yue Mei¹, Qi Dang¹, Jie Cai¹, Shao Li³, Kuo Yang³, Sheng-Li Xie⁴, Yong Liang⁵

Affiliations

¹ Institute of Systems Engineering, Macau University of Science and Technology, Avenida Wai Long, Taipa, China.
² Computer Engineering Technical College, Guangdong Polytechnic of Science and Technology, Zhuhai, China.
³ MOE Key Laboratory of Bioinformatics, TCM-X Center/Bioinformatics Division, BNRIST/Department of Automation, Tsinghua University, Beijing, China.
⁴ Guangdong-HongKong-Macao Joint Laboratory for Smart Discrete Manufacturing, Guangzhou, China.
⁵ Peng Cheng Laboratory, Shenzhen, China.

PMID: 35711917
PMCID: PMC9197542
DOI: 10.3389/fgene.2022.869906

Dynamic Meta-data Network Sparse PCA for Cancer Subtype Biomarker Screening

Rui Miao et al. Front Genet. 2022.

. 2022 May 9:13:869906.

doi: 10.3389/fgene.2022.869906. eCollection 2022.

Authors

Rui Miao¹, Xin Dong¹, Xiao-Ying Liu², Sio-Long Lo¹, Xin-Yue Mei¹, Qi Dang¹, Jie Cai¹, Shao Li³, Kuo Yang³, Sheng-Li Xie⁴, Yong Liang⁵

Affiliations

¹ Institute of Systems Engineering, Macau University of Science and Technology, Avenida Wai Long, Taipa, China.
² Computer Engineering Technical College, Guangdong Polytechnic of Science and Technology, Zhuhai, China.
³ MOE Key Laboratory of Bioinformatics, TCM-X Center/Bioinformatics Division, BNRIST/Department of Automation, Tsinghua University, Beijing, China.
⁴ Guangdong-HongKong-Macao Joint Laboratory for Smart Discrete Manufacturing, Guangzhou, China.
⁵ Peng Cheng Laboratory, Shenzhen, China.

PMID: 35711917
PMCID: PMC9197542
DOI: 10.3389/fgene.2022.869906

Abstract

Previous research shows that each type of cancer can be divided into multiple subtypes, which is one of the key reasons that make cancer difficult to cure. Under these circumstances, finding a new target gene of cancer subtypes has great significance on developing new anti-cancer drugs and personalized treatment. Due to the fact that gene expression data sets of cancer are usually high-dimensional and with high noise and have multiple potential subtypes' information, many sparse principal component analysis (sparse PCA) methods have been used to identify cancer subtype biomarkers and subtype clusters. However, the existing sparse PCA methods have not used the known cancer subtype information as prior knowledge, and their results are greatly affected by the quality of the samples. Therefore, we propose the Dynamic Metadata Edge-group Sparse PCA (DM-ESPCA) model, which combines the idea of meta-learning to solve the problem of sample quality and uses the known cancer subtype information as prior knowledge to capture some gene modules with better biological interpretations. The experiment results on the three biological data sets showed that the DM-ESPCA model can find potential target gene probes with richer biological information to the cancer subtypes. Moreover, the results of clustering and machine learning classification models based on the target genes screened by the DM-ESPCA model can be improved by up to 22-23% of accuracies compared with the existing sparse PCA methods. We also proved that the result of the DM-ESPCA model is better than those of the four classic supervised machine learning models in the task of classification of cancer subtypes.

Keywords: Cancer subtype; DM-ESPCA model; biomarkers; dynamic network; meta-data; sparse PCA.

PubMed Disclaimer

Conflict of interest statement

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Figures

**FIGURE 1**
Flow chart of the DM-ESPCA model. **(A)** The DM-ESPCA model requires input gene expression and pathway data. **(B)** The DM-ESPCA model selects meta-data by clustering all samples. **(C)** Workflow of the DM-ESPCA model to screen targeted genes. The DM-ESPCA model will generate a dynamic gene network for each subtype. **(D)** Finally, this model will output the screened genes.

**FIGURE 2**
Algorithm of the DM-ESPCA model.

**FIGURE 3**
Heat maps of the DM-ESPCA model. **(A)** Result of the BCI data set. **(B)** Result of the BCII data set. **(C)** Result of the GC data set. The row is the gene probs; different color blocks of rows indicate genes selected by different PC loadings. The column is the samples. The color of each block in the heat maps is the expression value of the genes.

**FIGURE 4**
Pathway numbers with screened genes of GO, KEGG, and Reactome in the bio-enrichment analysis; **(A)** number of pathways in the BCI data set; **(B)** number of pathways in the BCII data set; **(C)** number of pathways in the GC data set. The blue bar is the DM-ESPCA model, the orange bar is the ESPCA model, and the gray one is the SPCA model.

**FIGURE 5**
Results of the DisGeNET dataset and PPI pathways of the Basal subtype in the BCI dataset; **(A)** relationship between the diseases and gene selected by the DM-ESPCA model of the Basal subtype in the BCI dataset.The blue bar shows the z-score of each gene.Data collected from the DisGeNET dataset. **(B)** KeyPPI pathways of part of the gene selected by the DM-ESPCA data set.

**FIGURE 6**
Functional pathways collected from the BCI data set Luminal A subtype; **(A)** results of GO-BP in the DMESPCA model; **(B)** results of GO-BP in the ESPCA model; and **(C)** results of GO-BP in the SPCA model.

**FIGURE 7**
Boxplots and classification comprehensive indicators of the BCI data set; **(A)** p-values of selected genes in all subtypes. **(B)** Results of KNN in three sparse PCA methods and the use of all genes.

See this image and copyright information in PMC

References

1. Banerji S., Cibulskis K., Rangel-Escareno C., Brown K. K., Carter S. L., Frederick A. M., et al. (2012). Sequence Analysis of Mutations and Translocations across Breast Cancer Subtypes. Nature 486, 405–409. 10.1038/nature11154 - DOI - PMC - PubMed
1. Calon A., Lonardo E., Berenguer-Llergo A., Espinet E., Hernando-Momblona X., Iglesias M., et al. (2015). Stromal Gene Expression Defines Poor-Prognosis Subtypes in Colorectal Cancer. Nat. Genet. 47, 320–329. 10.1038/ng.3225 - DOI - PubMed
1. Cancello G., Maisonneuve P., Rotmensz N., Viale G., Mastropasqua M. G., Pruneri G., et al. (2010). Prognosis and Adjuvant Treatment Effects in Selected Breast Cancer Subtypes of Very Young Women. Ann. Oncol. 21, 1974–1981. 10.1093/annonc/mdq072 - DOI - PubMed
1. Carlson M., Falcon S., Pages H., Li N. (2016). hgu133plus2. Db: Affymetrix Human Genome U133 Plus 2.0 Array Annotation Data (Chip Hgu133plus2). R. Package Version 3.
1. Cooper M. R., Chim H., Chan H., Durand C. (2015). Ceritinib. Ann. Pharmacother. 49, 107–112. 10.1177/1060028014553619 - DOI - PubMed

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Dynamic Meta-data Network Sparse PCA for Cancer Subtype Biomarker Screening

Affiliations

Dynamic Meta-data Network Sparse PCA for Cancer Subtype Biomarker Screening

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

LinkOut - more resources

Full Text Sources