. 2019 Jun 28;14(6):e0217316.

doi: 10.1371/journal.pone.0217316. eCollection 2019.

S3CMTF: Fast, accurate, and scalable method for incomplete coupled matrix-tensor factorization

Dongjin Choi¹, Jun-Gi Jang², U Kang²

Affiliations

¹ School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, Georgia, United States of America.
² Department of Computer Science and Engineering, Seoul National University, Seoul, Republic of Korea.

PMID: 31251750
PMCID: PMC6599158
DOI: 10.1371/journal.pone.0217316

S3CMTF: Fast, accurate, and scalable method for incomplete coupled matrix-tensor factorization

Dongjin Choi et al. PLoS One. 2019.

. 2019 Jun 28;14(6):e0217316.

doi: 10.1371/journal.pone.0217316. eCollection 2019.

Authors

Dongjin Choi¹, Jun-Gi Jang², U Kang²

Affiliations

¹ School of Computational Science and Engineering, Georgia Institute of Technology, Atlanta, Georgia, United States of America.
² Department of Computer Science and Engineering, Seoul National University, Seoul, Republic of Korea.

PMID: 31251750
PMCID: PMC6599158
DOI: 10.1371/journal.pone.0217316

Abstract

How can we extract hidden relations from a tensor and a matrix data simultaneously in a fast, accurate, and scalable way? Coupled matrix-tensor factorization (CMTF) is an important tool for this purpose. Designing an accurate and efficient CMTF method has become more crucial as the size and dimension of real-world data are growing explosively. However, existing methods for CMTF suffer from lack of accuracy, slow running time, and limited scalability. In this paper, we propose S3CMTF, a fast, accurate, and scalable CMTF method. In contrast to previous methods which do not handle large sparse tensors and are not parallelizable, S3CMTF provides parallel sparse CMTF by carefully deriving gradient update rules. S3CMTF asynchronously updates partial gradients without expensive locking. We show that our method is guaranteed to converge to a quality solution theoretically and empirically. S3CMTF further boosts the performance by carefully storing intermediate computation and reusing them. We theoretically and empirically show that S3CMTF is the fastest, outperforming existing methods. Experimental results show that S3CMTF is up to 930× faster than existing methods while providing the best accuracy. S3CMTF shows linear scalability on the number of data entries and the number of cores. In addition, we apply S3CMTF to Yelp rating tensor data coupled with 3 additional matrices to discover interesting patterns.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

**Fig 1. Comparison of our proposed S³CMTF and the existing methods.**
(a) For a fixed number of nonzeros, S³CMTF takes constant time as dimensionality grows, while existing methods become slower. Our sequential method S³CMTF-opt1 is 930× and 54× faster than CMTF-OPT and CMTF-Tucker ALS, respectively. (b) S³CMTF-opt20 shows the best convergence rate and accuracy on real world Yelp dataset. CMTF-Tucker-ALS shows O.O.M. in both experiments. (O.O.M.: out of memory error).

**Fig 3. Example hypergraphs induced by S³CMTF objective function (Eq (7)).**
A matrix Y is coupled to the second mode of $X$ with a coupled factor matrix V. Each node represents a factor row or the core tensor. Each hyperedge includes corresponding factors to an SGD update. (a) Induced hypergraph with the core tensor. Every hyperedge corresponding to tensor entries includes $G$ . (b) Induced hypergraph without core tensor. The graph has sparse structure as every node is shared by only few hyperedges.

**Fig 4. Test RMSE of S³CMTF and other CMTF methods over iterations.**
S³CMTF-opt20 shows the best convergence rate and accuracy.

**Fig 5. Comparison with SALS-single for movieLens dataset.**
We compare two non-coupled version of S³CMTF, S³CMTF-CP-opt and S³CMTF-TUCKER-opt with the parallel CP decomposition method, SALS-single. For (a), we set 1 mark per 20 iterations for clarity. (a) S³CMTF converges faster to a lower error than SALS does. (b) S³CMTF-CP-opt is 2.3× faster than SALS-single.

**Fig 6. Comparison of scalability.**
(a) S³CMTF shows linear scalability as the number of entries increases. (b) S³CMTF-base and S³CMTF-opt show linear *speed up* as the number of cores grows. O.O.M.: out of memory error.

**Fig 7**
(a) Gap statistics on U⁽²⁾ of S³CMTF and the Tucker decomposition for Yelp dataset. S³CMTF outperforms the naive Tucker decomposition for its clustering ability. (b) Visualization of the personal recommendation scenario.

See this image and copyright information in PMC

Cited by

Spectroscopic technologies and data fusion: Applications for the dairy industry.
Hayes E, Greene D, O'Donnell C, O'Shea N, Fenelon MA. Hayes E, et al. Front Nutr. 2023 Jan 11;9:1074688. doi: 10.3389/fnut.2022.1074688. eCollection 2022. Front Nutr. 2023. PMID: 36712542 Free PMC article.
GIFT: Guided and Interpretable Factorization for Tensors with an application to large-scale multi-platform cancer analysis.
Lee J, Oh S, Sael L. Lee J, et al. Bioinformatics. 2018 Dec 15;34(24):4151-4158. doi: 10.1093/bioinformatics/bty490. Bioinformatics. 2018. PMID: 29931238 Free PMC article.
TASTE: Temporal and Static Tensor Factorization for Phenotyping Electronic Health Records.
Afshar A, Perros I, Park H, deFilippi C, Yan X, Stewart W, Ho J, Sun J. Afshar A, et al. Proc ACM Conf Health Inference Learn (2020). 2020 Apr;2020:193-203. doi: 10.1145/3368555.3384464. Proc ACM Conf Health Inference Learn (2020). 2020. PMID: 33659966 Free PMC article.
Tensor-structured decomposition improves systems serology analysis.
Tan ZC, Murphy MC, Alpay HS, Taylor SD, Meyer AS. Tan ZC, et al. Mol Syst Biol. 2021 Sep;17(9):e10243. doi: 10.15252/msb.202110243. Mol Syst Biol. 2021. PMID: 34487431 Free PMC article.

References

1. Park N, Jeon B, Lee J, Kang U. BIGtensor: Mining Billion-Scale Tensor Made Easy. In: Proceedings of the International Conference on Information and Knowledge Management. ACM; 2016.
1. Park N, Oh S, Kang U. Fast and Scalable Distributed Boolean Tensor Factorization. In: Data Engineering (ICDE), 2017 IEEE 33rd International Conference on. IEEE; 2017. p. 1071–1082.
1. Oh S, Park N, Sael L, Kang U. Scalable Tucker Factorization for Sparse Tensors—Algorithms and Discoveries. In: Data Engineering (ICDE), 2018 IEEE 34th International Conference on. IEEE; 2018. p. 1120–1131.
1. Koren Y, Bell R, Volinsky C. Matrix factorization techniques for recommender systems. Computer. 2009;42(8). 10.1109/MC.2009.263 - DOI
1. Kolda TG, Bader BW. Tensor decompositions and applications. SIAM review. 2009;51(3):455–500. 10.1137/07070111X - DOI

Publication types

Actions

MeSH terms

Actions

LinkOut - more resources

Full Text Sources

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

S3CMTF: Fast, accurate, and scalable method for incomplete coupled matrix-tensor factorization

Affiliations

S3CMTF: Fast, accurate, and scalable method for incomplete coupled matrix-tensor factorization

Authors

Affiliations

Abstract

Conflict of interest statement

Figures

Similar articles

Cited by

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources