Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2023 Dec 21;24(1):490.
doi: 10.1186/s12859-023-05595-4.

PWSC: a novel clustering method based on polynomial weight-adjusted sparse clustering for sparse biomedical data and its application in cancer subtyping

Affiliations

PWSC: a novel clustering method based on polynomial weight-adjusted sparse clustering for sparse biomedical data and its application in cancer subtyping

Xiaomeng Zhang et al. BMC Bioinformatics. .

Abstract

Background: Clustering analysis is widely used to interpret biomedical data and uncover new knowledge and patterns. However, conventional clustering methods are not effective when dealing with sparse biomedical data. To overcome this limitation, we propose a hierarchical clustering method called polynomial weight-adjusted sparse clustering (PWSC).

Results: The PWSC algorithm adjusts feature weights using a polynomial function, redefines the distances between samples, and performs hierarchical clustering analysis based on these adjusted distances. Additionally, we incorporate a consensus clustering approach to determine the optimal number of classifications. This consensus approach utilizes relative change in the cumulative distribution function to identify the best number of clusters, resulting in more stable clustering results. Leveraging the PWSC algorithm, we successfully classified a cohort of gastric cancer patients, enabling categorization of patients carrying different types of altered genes. Further evaluation using Entropy showed a significant improvement (p = 2.905e-05), while using the Calinski-Harabasz index demonstrates a remarkable 100% improvement in the quality of the best classification compared to conventional algorithms. Similarly, significantly increased entropy (p = 0.0336) and comparable CHI, were observed when classifying another colorectal cancer cohort with microbial abundance. The above attempts in cancer subtyping demonstrate that PWSC is highly applicable to different types of biomedical data. To facilitate its application, we have developed a user-friendly tool that implements the PWSC algorithm, which canbe accessed at http://pwsc.aiyimed.com/ .

Conclusions: PWSC addresses the limitations of conventional approaches when clustering sparse biomedical data. By adjusting feature weights and employing consensus clustering, we achieve improved clustering results compared to conventional methods. The PWSC algorithm provides a valuable tool for researchers in the field, enabling more accurate and stable clustering analysis. Its application can enhance our understanding of complex biological systems and contribute to advancements in various biomedical disciplines.

Keywords: Consensus clustering; Hierarchical clustering; Polynomial weight; Sparse biomedical data.

PubMed Disclaimer

Conflict of interest statement

The authors declare no competing interests.

Figures

Algorithm 1
Algorithm 1
PWSC (Polynomial Weight-adjusted Sparse Clustering)
Fig. 1
Fig. 1
Frame work of PWSC
Fig. 2
Fig. 2
Clustering results and assessing coefficients. a The clustering heatmap of biomedical data. b The Entropy of PWSC and conventional algorithm when k is from 2 to 15. c The CHI of PWSC and conventional algorithm when k is from 2 to 15
Fig. 3
Fig. 3
Optimal clustering and occurrence of genes for the gastric cancer cohort. a The consensus clustering CDF curve when k is from 2 to 15. b The consensus matrix when k is 4. c The clustering heatmap of gene mutations for four groups. d The most valuable genes with high occurrence in each group
Fig. 4
Fig. 4
Subtyping of colorectal cancer cohort using tumor microbial data. a The consensus clustering CDF curve when k is from 2 to 15. b The consensus matrix when k is 4. c The clustering heatmap of microbial data for five groups. d The most valuable biomarkers with high occurrence in each group. e The Entropy of PWSC and conventional algorithm when k is from 2 to 15. f The CHI of PWSC and conventional algorithm when k is from 2 to 15
Fig. 5
Fig. 5
Online web service interface presentation

References

    1. Xu R, Wunsch DC. Clustering algorithms in biomedical research: a review. IEEE Rev Biomed Eng. 2010;3:120–154. doi: 10.1109/RBME.2010.2083647. - DOI - PubMed
    1. Segal E, Koller D. Probabilistic hierarchical clustering for biological data. In: Proceedings of the sixth annual international conference on Computational biology. Washington: Association for Computing Machinery; 2002, pp. 273–280.
    1. Hanage WP, Fraser C, Spratt BG. Sequences, sequence clusters and bacterial species. Philos Trans R Soc Lond B Biol Sci. 2006;361(1475):1917–1927. doi: 10.1098/rstb.2006.1917. - DOI - PMC - PubMed
    1. Nascimento MCV, Toledo FMB, de Carvalho ACPLF. Investigation of a new GRASP-based clustering algorithm applied to biological data. Comput Oper Res. 2010;37(8):1381–1388. doi: 10.1016/j.cor.2009.02.014. - DOI
    1. Wei D, et al. A novel hierarchical clustering algorithm for gene sequences. BMC Bioinform. 2012;13(1):174. doi: 10.1186/1471-2105-13-174. - DOI - PMC - PubMed