PWSC: a novel clustering method based on polynomial weight-adjusted sparse clustering for sparse biomedical data and its application in cancer subtyping

Xiaomeng Zhang¹, Hongtao Zhang², Zhihao Wang², Xiaofei Ma², Jiancheng Luo³, Yingying Zhu⁴

Affiliations

¹ Department of Nephrology, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430030, Hubei Province, China.
² School of Mathematics and Statistics, Wuhan University, Wuhan, 430070, Hubei Province, China.
³ School of Mathematics and Statistics, Wuhan University, Wuhan, 430070, Hubei Province, China. luojc@aiyi.link.
⁴ Department of Oncology, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430030, Hubei Province, China. julianzyy@hotmail.com.

PMID: 38129803
PMCID: PMC10740247
DOI: 10.1186/s12859-023-05595-4

PWSC: a novel clustering method based on polynomial weight-adjusted sparse clustering for sparse biomedical data and its application in cancer subtyping

Xiaomeng Zhang et al. BMC Bioinformatics. 2023.

. 2023 Dec 21;24(1):490.

doi: 10.1186/s12859-023-05595-4.

Authors

Xiaomeng Zhang¹, Hongtao Zhang², Zhihao Wang², Xiaofei Ma², Jiancheng Luo³, Yingying Zhu⁴

Affiliations

¹ Department of Nephrology, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430030, Hubei Province, China.
² School of Mathematics and Statistics, Wuhan University, Wuhan, 430070, Hubei Province, China.
³ School of Mathematics and Statistics, Wuhan University, Wuhan, 430070, Hubei Province, China. luojc@aiyi.link.
⁴ Department of Oncology, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, 430030, Hubei Province, China. julianzyy@hotmail.com.

PMID: 38129803
PMCID: PMC10740247
DOI: 10.1186/s12859-023-05595-4

Abstract

Background: Clustering analysis is widely used to interpret biomedical data and uncover new knowledge and patterns. However, conventional clustering methods are not effective when dealing with sparse biomedical data. To overcome this limitation, we propose a hierarchical clustering method called polynomial weight-adjusted sparse clustering (PWSC).

Results: The PWSC algorithm adjusts feature weights using a polynomial function, redefines the distances between samples, and performs hierarchical clustering analysis based on these adjusted distances. Additionally, we incorporate a consensus clustering approach to determine the optimal number of classifications. This consensus approach utilizes relative change in the cumulative distribution function to identify the best number of clusters, resulting in more stable clustering results. Leveraging the PWSC algorithm, we successfully classified a cohort of gastric cancer patients, enabling categorization of patients carrying different types of altered genes. Further evaluation using Entropy showed a significant improvement (p = 2.905e-05), while using the Calinski-Harabasz index demonstrates a remarkable 100% improvement in the quality of the best classification compared to conventional algorithms. Similarly, significantly increased entropy (p = 0.0336) and comparable CHI, were observed when classifying another colorectal cancer cohort with microbial abundance. The above attempts in cancer subtyping demonstrate that PWSC is highly applicable to different types of biomedical data. To facilitate its application, we have developed a user-friendly tool that implements the PWSC algorithm, which canbe accessed at http://pwsc.aiyimed.com/ .

Conclusions: PWSC addresses the limitations of conventional approaches when clustering sparse biomedical data. By adjusting feature weights and employing consensus clustering, we achieve improved clustering results compared to conventional methods. The PWSC algorithm provides a valuable tool for researchers in the field, enabling more accurate and stable clustering analysis. Its application can enhance our understanding of complex biological systems and contribute to advancements in various biomedical disciplines.

Keywords: Consensus clustering; Hierarchical clustering; Polynomial weight; Sparse biomedical data.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

**Algorithm 1**
**PWSC (Polynomial Weight-adjusted Sparse Clustering)**

**Fig. 2**
Clustering results and assessing coefficients. a The clustering heatmap of biomedical data. b The Entropy of PWSC and conventional algorithm when k is from 2 to 15. c The CHI of PWSC and conventional algorithm when k is from 2 to 15

**Fig. 3**
Optimal clustering and occurrence of genes for the gastric cancer cohort. a The consensus clustering CDF curve when k is from 2 to 15. b The consensus matrix when k is 4. c The clustering heatmap of gene mutations for four groups. d The most valuable genes with high occurrence in each group

**Fig. 4**
Subtyping of colorectal cancer cohort using tumor microbial data. a The consensus clustering CDF curve when k is from 2 to 15. b The consensus matrix when k is 4. c The clustering heatmap of microbial data for five groups. d The most valuable biomarkers with high occurrence in each group. e The Entropy of PWSC and conventional algorithm when k is from 2 to 15. f The CHI of PWSC and conventional algorithm when k is from 2 to 15

**Fig. 5**
Online web service interface presentation

See this image and copyright information in PMC

References

1. Xu R, Wunsch DC. Clustering algorithms in biomedical research: a review. IEEE Rev Biomed Eng. 2010;3:120–154. doi: 10.1109/RBME.2010.2083647. - DOI - PubMed
1. Segal E, Koller D. Probabilistic hierarchical clustering for biological data. In: Proceedings of the sixth annual international conference on Computational biology. Washington: Association for Computing Machinery; 2002, pp. 273–280.
1. Hanage WP, Fraser C, Spratt BG. Sequences, sequence clusters and bacterial species. Philos Trans R Soc Lond B Biol Sci. 2006;361(1475):1917–1927. doi: 10.1098/rstb.2006.1917. - DOI - PMC - PubMed
1. Nascimento MCV, Toledo FMB, de Carvalho ACPLF. Investigation of a new GRASP-based clustering algorithm applied to biological data. Comput Oper Res. 2010;37(8):1381–1388. doi: 10.1016/j.cor.2009.02.014. - DOI
1. Wei D, et al. A novel hierarchical clustering algorithm for gene sequences. BMC Bioinform. 2012;13(1):174. doi: 10.1186/1471-2105-13-174. - DOI - PMC - PubMed

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

PWSC: a novel clustering method based on polynomial weight-adjusted sparse clustering for sparse biomedical data and its application in cancer subtyping

Affiliations

PWSC: a novel clustering method based on polynomial weight-adjusted sparse clustering for sparse biomedical data and its application in cancer subtyping

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

References

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources

Medical