Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2019 Jun 28;14(6):e0217316.
doi: 10.1371/journal.pone.0217316. eCollection 2019.

S3CMTF: Fast, accurate, and scalable method for incomplete coupled matrix-tensor factorization

Affiliations

S3CMTF: Fast, accurate, and scalable method for incomplete coupled matrix-tensor factorization

Dongjin Choi et al. PLoS One. .

Abstract

How can we extract hidden relations from a tensor and a matrix data simultaneously in a fast, accurate, and scalable way? Coupled matrix-tensor factorization (CMTF) is an important tool for this purpose. Designing an accurate and efficient CMTF method has become more crucial as the size and dimension of real-world data are growing explosively. However, existing methods for CMTF suffer from lack of accuracy, slow running time, and limited scalability. In this paper, we propose S3CMTF, a fast, accurate, and scalable CMTF method. In contrast to previous methods which do not handle large sparse tensors and are not parallelizable, S3CMTF provides parallel sparse CMTF by carefully deriving gradient update rules. S3CMTF asynchronously updates partial gradients without expensive locking. We show that our method is guaranteed to converge to a quality solution theoretically and empirically. S3CMTF further boosts the performance by carefully storing intermediate computation and reusing them. We theoretically and empirically show that S3CMTF is the fastest, outperforming existing methods. Experimental results show that S3CMTF is up to 930× faster than existing methods while providing the best accuracy. S3CMTF shows linear scalability on the number of data entries and the number of cores. In addition, we apply S3CMTF to Yelp rating tensor data coupled with 3 additional matrices to discover interesting patterns.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Fig 1
Fig 1. Comparison of our proposed S3CMTF and the existing methods.
(a) For a fixed number of nonzeros, S3CMTF takes constant time as dimensionality grows, while existing methods become slower. Our sequential method S3CMTF-opt1 is 930× and 54× faster than CMTF-OPT and CMTF-Tucker ALS, respectively. (b) S3CMTF-opt20 shows the best convergence rate and accuracy on real world Yelp dataset. CMTF-Tucker-ALS shows O.O.M. in both experiments. (O.O.M.: out of memory error).
Fig 2
Fig 2. The scheme for S3CMTF.
Fig 3
Fig 3. Example hypergraphs induced by S3CMTF objective function (Eq (7)).
A matrix Y is coupled to the second mode of X with a coupled factor matrix V. Each node represents a factor row or the core tensor. Each hyperedge includes corresponding factors to an SGD update. (a) Induced hypergraph with the core tensor. Every hyperedge corresponding to tensor entries includes G. (b) Induced hypergraph without core tensor. The graph has sparse structure as every node is shared by only few hyperedges.
Fig 4
Fig 4. Test RMSE of S3CMTF and other CMTF methods over iterations.
S3CMTF-opt20 shows the best convergence rate and accuracy.
Fig 5
Fig 5. Comparison with SALS-single for movieLens dataset.
We compare two non-coupled version of S3CMTF, S3CMTF-CP-opt and S3CMTF-TUCKER-opt with the parallel CP decomposition method, SALS-single. For (a), we set 1 mark per 20 iterations for clarity. (a) S3CMTF converges faster to a lower error than SALS does. (b) S3CMTF-CP-opt is 2.3× faster than SALS-single.
Fig 6
Fig 6. Comparison of scalability.
(a) S3CMTF shows linear scalability as the number of entries increases. (b) S3CMTF-base and S3CMTF-opt show linear speed up as the number of cores grows. O.O.M.: out of memory error.
Fig 7
Fig 7
(a) Gap statistics on U(2) of S3CMTF and the Tucker decomposition for Yelp dataset. S3CMTF outperforms the naive Tucker decomposition for its clustering ability. (b) Visualization of the personal recommendation scenario.

Similar articles

Cited by

References

    1. Park N, Jeon B, Lee J, Kang U. BIGtensor: Mining Billion-Scale Tensor Made Easy. In: Proceedings of the International Conference on Information and Knowledge Management. ACM; 2016.
    1. Park N, Oh S, Kang U. Fast and Scalable Distributed Boolean Tensor Factorization. In: Data Engineering (ICDE), 2017 IEEE 33rd International Conference on. IEEE; 2017. p. 1071–1082.
    1. Oh S, Park N, Sael L, Kang U. Scalable Tucker Factorization for Sparse Tensors—Algorithms and Discoveries. In: Data Engineering (ICDE), 2018 IEEE 34th International Conference on. IEEE; 2018. p. 1120–1131.
    1. Koren Y, Bell R, Volinsky C. Matrix factorization techniques for recommender systems. Computer. 2009;42(8). 10.1109/MC.2009.263 - DOI
    1. Kolda TG, Bader BW. Tensor decompositions and applications. SIAM review. 2009;51(3):455–500. 10.1137/07070111X - DOI

Publication types

MeSH terms