Multi-View Cluster Analysis with Incomplete Data to Understand Treatment Effects

Guoqing Chao¹, Jiangwen Sun², Jin Lu¹, An-Li Wang³, Daniel D Langleben³, Chiang-Shan Li⁴, Jinbo Bi¹

Affiliations

¹ Department of Computer Science and Engineering, University of Connecticut, Storrs, CT, USA.
² Department of Computer Science Old Dominion University, Norfolk, Virginia, USA.
³ University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA.
⁴ Department of Psychiatry Yale University, New Haven, CT, USA.

PMID: 32863420
PMCID: PMC7455020
DOI: 10.1016/j.ins.2019.04.039

Multi-View Cluster Analysis with Incomplete Data to Understand Treatment Effects

Guoqing Chao et al. Inf Sci (N Y). 2019 Aug.

. 2019 Aug:494:278-293.

doi: 10.1016/j.ins.2019.04.039. Epub 2019 Apr 22.

Authors

Guoqing Chao¹, Jiangwen Sun², Jin Lu¹, An-Li Wang³, Daniel D Langleben³, Chiang-Shan Li⁴, Jinbo Bi¹

Affiliations

¹ Department of Computer Science and Engineering, University of Connecticut, Storrs, CT, USA.
² Department of Computer Science Old Dominion University, Norfolk, Virginia, USA.
³ University of Pennsylvania Perelman School of Medicine, Philadelphia, PA, USA.
⁴ Department of Psychiatry Yale University, New Haven, CT, USA.

PMID: 32863420
PMCID: PMC7455020
DOI: 10.1016/j.ins.2019.04.039

Abstract

Multi-view cluster analysis, as a popular granular computing method, aims to partition sample subjects into consistent clusters across different views in which the subjects are characterized. Frequently, data entries can be missing from some of the views. The latest multi-view co-clustering methods cannot effectively deal with incomplete data, especially when there are mixed patterns of missing values. We propose an enhanced formulation for a family of multi-view co-clustering methods to cope with the missing data problem by introducing an indicator matrix whose elements indicate which data entries are observed and assessing cluster validity only on observed entries. In comparison with the simple strategy of removing subjects with missing values, our approach can use all available data in cluster analysis. In comparison with common methods that impute missing data in order to use regular multi-view analytics, our approach is less sensitive to imputation uncertainty. In comparison with other state-of-the-art multi-view incomplete clustering methods, our approach is sensible in the cases of missing any value in a view or missing the entire view, the most common scenario in practice. We first validated the proposed strategy in simulations, and then applied it to a treatment study of heroin dependence which would have been impossible with previous methods due to a number of missing-data patterns. Patients in a treatment study were naturally assessed in different feature spaces such as in the pre-, during-and post-treatment time windows. Our algorithm was able to identify subgroups where patients in each group showed similarities in all of the three time windows, thus leading to the recognition of pre-treatment (baseline) features predictive of post-treatment outcomes.

Keywords: co-clustering; granular computing; heroin pharmacotherapy; missing value; multi-view data analysis.

PubMed Disclaimer

Figures

**Figure 1**
The distribution of the missing values in the heroin treatment dataset.

**Figure 2**
The simulated block data structure, the numbers in the vertical axis represent the associated variables, the number in the horizontal axis represent the subject index.

**Figure 3**
The adherence characteristics of the two clusters (high adherence (HA) versus low adherence (LA)) obtained by our algorithm when variables were grouped in the three views according to variable type.

**Figure 4**
The mean values of the selected variables by cluster when data were grouped in the three views according to variable type. **Abbreviation**: ∆Pre_Cra_Oth, change in craving for other drugs after cue exposure at baseline; Pre_THC, tetrahydrocannabinol level in urine drug screen at baseline; Pre_COC, cocaine level in urine drug screen at baseline; ∆Pre_MAP, change in mean arterial pressure after cue exposure at baseline.

**Figure 5**
The adherence characteristics of the two clusters (high adherence (HA) versus low adherence (LA)) obtained by our algorithm when variables were grouped in three time windows.

**Figure 6**
The mean values of the selected variables by cluster when data were grouped in three time windows. **Abbreviation**: ∆Pre_SOWS, change in the subjective opioid withdrawal scale after cue exposure at baseline; ∆Pre_Cra_Oth, change in craving for other drugs after cue exposure at baseline; ∆Pre_WD_Oth, change in withdrawal for other drugs after cue exposure at baseline; ∆Pre_Cra_Heroin, change in craving for heroin after cue exposure at baseline; ∆On_Cra_Oth, change in craving for other drugs after cue exposure during treatment; ∆On_WD_Oth, change in withdrawal for other drugs after cue exposure during treatment; Post_THC, tetrahydrocannabinol urine drug screen after treatment; ∆Post_high_Heroin, change in feeling “high” for heroin after cue exposure after treatment.

See this image and copyright information in PMC

References

1. Abdi H, & Valentin D (2007). Multiple correspondence analysis. Encyclopedia of Meaturement and Statistics, (pp. 651–657).
1. Balcan MF, Blum A, & Yang K (2005). Co-training and expansion: Towards bridging theory and practice. In Saul LK, Weiss Y, & Bottou L (Eds.), Advances in Neural Information Processing Systems 17 (pp. 89–96). Cambridge, MA: MIT Press.
1. Bargiela A, & Pedrycz W (2006). The roots of granular computing. In 2006 IEEE International Conference on Granular Computing (pp. 806–809). IEEE.
1. Blum A, & Mitchell T (1998). Combining labeled and unlabeled data with co-training. In Proceedings of the 11th Annual Conference on Computational Learning Theory (pp. 92–100). New York, NY, USA: ACM. doi:10.1145/279943.279962. - DOI
1. Bolte J, Sabach S, & Teboulle M (2014). Proximal alternating linearized minimization for nonconvex and nonsmooth problems. Mathematical Programming, 146, 459–494. URL: www.summon.com.

Grants and funding

LinkOut - more resources

Full Text Sources
- Europe PubMed Central
- PubMed Central
Other Literature Sources
- The Lens - Patent Citations Database

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Multi-View Cluster Analysis with Incomplete Data to Understand Treatment Effects

Affiliations

Multi-View Cluster Analysis with Incomplete Data to Understand Treatment Effects

Authors

Affiliations

Abstract

Figures

References

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources