Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
Comment
. 2022 Jan 17;23(1):bbab532.
doi: 10.1093/bib/bbab532.

Letter to the Editor: on the stability and internal consistency of component-wise sparse mixture regression-based clustering

Affiliations
Comment

Letter to the Editor: on the stability and internal consistency of component-wise sparse mixture regression-based clustering

Bo Zhang et al. Brief Bioinform. .

Abstract

Understanding the relationship between molecular markers and a phenotype of interest is often obfuscated by patient-level heterogeneity. To address this challenge, Chang et al. recently published a novel method called Component-wise Sparse Mixture Regression (CSMR), a regression-based clustering method that promises to detect heterogeneous relationships between molecular markers and a phenotype of interest under high-dimensional settings. In this Letter to the Editor, we raise awareness to several issues concerning the assessment of CSMR in Chang et al., particularly its assessment in settings where the number of features, P, exceeds the study sample size, N, and advocate for additional metrics/approaches when assessing the performance of regression-based clustering methodologies.

Keywords: disease heterogeneity; mixture modeling; supervised learning.

PubMed Disclaimer

Figures

Figure 1
Figure 1
Performance of CSMR on the CCLE data set. (A) Boxplot of clustering IC, calculated as the ARI of cluster memberships between each pair of runs, from 100 separate applications of CSMR to the CCLE data set using the same tuning parameters for each run. (B) Bar plot of frequency of specific features being selected by CSMR from 100 separate applications of CSMR to the CCLE data set. Out of the 500 considered features, 425 features were selected by CSMR at least once. On average, CSMR selected 43.6 features each iteration (standard deviation = 11.7).
Figure 2
Figure 2
CSMR model performance in various simulation settings. Boxplots showing performance metric values when formula image. The x-axis showsformula image at 100, 200 and 300, and the colors indicate formula image. As formula imageand P increase, both accuracy and IC decline.

Comment on

References

    1. Chang W, Wan C, Zang Y, et al. . Supervised clustering of high-dimensional data using regularized mixture modeling. Brief Bioinform 2020;22(4):1–11. - PMC - PubMed
    1. Li Q, Shi R, Liang F. Drug sensitivity prediction with high-dimensional mixture regression. PLoS One 2019;14(2):e0212108. - PMC - PubMed
    1. Khalili A, Chen J. Variable selection in finite mixture of regression models. J Am Stat Assoc 2007;102(479):1025–38.
    1. Wang H, Leng C. Unified LASSO estimation by least squares approximation. J Am Stat Assoc 2007;102(479):1039–48.
    1. Barretina J, Caponigro G, Stransky N, et al. . The cancer cell line encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature 2012;483(7391):603–7. - PMC - PubMed

Publication types