. 2021 Jul 20;22(4):bbaa291.

doi: 10.1093/bib/bbaa291.

Supervised clustering of high-dimensional data using regularized mixture modeling

Wennan Chang¹, Changlin Wan¹, Yong Zang², Chi Zhang³, Sha Cao²

Affiliations

¹ Department of Electrical and Computer Engineering, Purdue University.
² Department of Biostatistics and a member of the Center for Computational Biology and Bioinformatics, Indiana University School of Medicine.
³ Department of Medical and Molecular Genetics and a member of the Center for Computational Biology and Bioinformatics, Indiana University School of Medicine.

PMID: 34293851
PMCID: PMC8294591
DOI: 10.1093/bib/bbaa291

Supervised clustering of high-dimensional data using regularized mixture modeling

Wennan Chang et al. Brief Bioinform. 2021.

. 2021 Jul 20;22(4):bbaa291.

doi: 10.1093/bib/bbaa291.

Authors

Wennan Chang¹, Changlin Wan¹, Yong Zang², Chi Zhang³, Sha Cao²

Affiliations

¹ Department of Electrical and Computer Engineering, Purdue University.
² Department of Biostatistics and a member of the Center for Computational Biology and Bioinformatics, Indiana University School of Medicine.
³ Department of Medical and Molecular Genetics and a member of the Center for Computational Biology and Bioinformatics, Indiana University School of Medicine.

PMID: 34293851
PMCID: PMC8294591
DOI: 10.1093/bib/bbaa291

Abstract

Identifying relationships between genetic variations and their clinical presentations has been challenged by the heterogeneous causes of a disease. It is imperative to unveil the relationship between the high-dimensional genetic manifestations and the clinical presentations, while taking into account the possible heterogeneity of the study subjects.We proposed a novel supervised clustering algorithm using penalized mixture regression model, called component-wise sparse mixture regression (CSMR), to deal with the challenges in studying the heterogeneous relationships between high-dimensional genetic features and a phenotype. The algorithm was adapted from the classification expectation maximization algorithm, which offers a novel supervised solution to the clustering problem, with substantial improvement on both the computational efficiency and biological interpretability. Experimental evaluation on simulated benchmark datasets demonstrated that the CSMR can accurately identify the subspaces on which subset of features are explanatory to the response variables, and it outperformed the baseline methods. Application of CSMR on a drug sensitivity dataset again demonstrated the superior performance of CSMR over the others, where CSMR is powerful in recapitulating the distinct subgroups hidden in the pool of cell lines with regards to their coping mechanisms to different drugs. CSMR represents a big data analysis tool with the potential to resolve the complexity of translating the clinical representations of the disease to the real causes underpinning it. We believe that it will bring new understanding to the molecular basis of a disease and could be of special relevance in the growing field of personalized medicine.

Keywords: disease heterogeneity; mixture modeling; supervised learning.

PubMed Disclaimer

Figures

**Figure 1**
The motivation of CSMR. Under the same treatment, some patients acquired one mechanism to deal with the drug, (blue), while others picked up another (pink), resulting in different prognoses for the same treatment. The motivation of CSMR is to cluster the patients in a supervised fashion and examine what are the genes (yellow) that are selected in tumor progression that lead to the different drug resistance subtypes of patients, and their functions (network).

**Figure 2**
Time consumption of CSMR, and ICC on simulation data for (left) and (right), and over 100 repetitions; error bars indicate standard deviations.

formula image — **Figure 2**
Time consumption of CSMR, and ICC on simulation data for (left) and (right), and over 100 repetitions; error bars indicate standard deviations.

**Figure 3**
The distributions of the RMSE over 100 repetitions for the five methods, for the 24 drugs. The lower the RMSE value, the better the performance. ‘C’,‘I’,‘A’,‘G’ and ‘F’ stand for ‘CSMR’,‘ICC’,‘LASSO’,‘RIDGE’ and ‘Random Forest’.

**Figure 4**
For each drug, the Venn diagram of the selected genes for different mixing components is shown. The numbers show the size of overlap among the gene sets.

See this image and copyright information in PMC

Comment in

Letter to the Editor: on the stability and internal consistency of component-wise sparse mixture regression-based clustering.
Zhang B, He J, Hu J, Koestler DC, Chalise P. Zhang B, et al. Brief Bioinform. 2022 Jan 17;23(1):bbab532. doi: 10.1093/bib/bbab532. Brief Bioinform. 2022. PMID: 34953466 Free PMC article.

References

1. Curtis C, Shah SP, Chin SF, et al.. The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups. Nature 2012; 486(7403): 346–52. - PMC - PubMed
1. Schlicker A, Beran G, Chresta CM, et al.. Subtypes of primary colorectal tumors correlate with response to targeted treatment in colorectal cell lines. BMC Med Genomics 2012; 5(1): 66. - PMC - PubMed
1. Guinney J, Dienstmann R, Wang X, et al.. The consensus molecular subtypes of colorectal cancer. Nat Med 2015; 21(11): 1350–6. - PMC - PubMed
1. Marusyk A, Polyak K. Tumor heterogeneity: causes and consequences. Biochim Biophys Acta Rev Cancer 2010; 1805(1): 105–17. - PMC - PubMed
1. Cao S, Chang W, Wan C, et al.. Bi-clustering based biological and clinical characterization of colorectal cancer in complementary to cms classification. bioRxiv 508275, 2018.

Publication types

Actions
Actions

MeSH terms

Actions
Actions
Actions
Actions

Grants and funding

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Supervised clustering of high-dimensional data using regularized mixture modeling

Affiliations

Supervised clustering of high-dimensional data using regularized mixture modeling

Authors

Affiliations

Abstract

Figures

Comment in

References

Publication types

MeSH terms

Grants and funding

LinkOut - more resources

Full Text Sources