. 2016 Aug;9(4):269-290.

doi: 10.1002/sam.11315. Epub 2016 Jun 30.

Turbo-SMT: Parallel Coupled Sparse Matrix-Tensor Factorizations and Applications

Evangelos E Papalexakis¹, Christos Faloutsos¹, Tom M Mitchell¹, Partha Pratim Talukdar², Nicholas D Sidiropoulos³, Brian Murphy⁴

Affiliations

¹ Carnegie Mellon University.
² Indian Institute of Science (IISc), Bangalore.
³ University of Minnesota.
⁴ Queen's University of Belfast.

PMID: 27672406
PMCID: PMC5034949
DOI: 10.1002/sam.11315

Turbo-SMT: Parallel Coupled Sparse Matrix-Tensor Factorizations and Applications

Evangelos E Papalexakis et al. Stat Anal Data Min. 2016 Aug.

. 2016 Aug;9(4):269-290.

doi: 10.1002/sam.11315. Epub 2016 Jun 30.

Authors

Evangelos E Papalexakis¹, Christos Faloutsos¹, Tom M Mitchell¹, Partha Pratim Talukdar², Nicholas D Sidiropoulos³, Brian Murphy⁴

Affiliations

¹ Carnegie Mellon University.
² Indian Institute of Science (IISc), Bangalore.
³ University of Minnesota.
⁴ Queen's University of Belfast.

PMID: 27672406
PMCID: PMC5034949
DOI: 10.1002/sam.11315

Abstract

How can we correlate the neural activity in the human brain as it responds to typed words, with properties of these terms (like 'edible', 'fits in hand')? In short, we want to find latent variables, that jointly explain both the brain activity, as well as the behavioral responses. This is one of many settings of the Coupled Matrix-Tensor Factorization (CMTF) problem. Can we enhance any CMTF solver, so that it can operate on potentially very large datasets that may not fit in main memory? We introduce Turbo-SMT, a meta-method capable of doing exactly that: it boosts the performance of any CMTF algorithm, produces sparse and interpretable solutions, and parallelizes any CMTF algorithm, producing sparse and interpretable solutions (up to 65 fold). Additionally, we improve upon ALS, the work-horse algorithm for CMTF, with respect to efficiency and robustness to missing values. We apply Turbo-SMT to BrainQ, a dataset consisting of a (nouns, brain voxels, human subjects) tensor and a (nouns, properties) matrix, with coupling along the nouns dimension. Turbo-SMT is able to find meaningful latent variables, as well as to predict brain activity with competitive accuracy. Finally, we demonstrate the generality of Turbo-SMT, by applying it on a Facebook dataset (users, 'friends', wall-postings); there, Turbo-SMT spots spammer-like anomalies.

PubMed Disclaimer

Figures

**Figure 1**
(a): Analyzing the BrainQ dataset for the Neurosemantics application, finding semantically similar nouns that activate the same part of the brain. In this case, the premotor cortex is associated with movements such as holding or picking small items up. (b): The proposed algorithm Turbo-SMT reduces the approximation error, while running in parallel, and operating on much smaller pieces of data at any given point in time.

**Figure 2**
PARAFAC decomposition [24] of a three-way tensor of a brain activity tensor as sum of F outer products (rank-one tensors), reminiscent of the rank-F singular value decomposition of a matrix. Each component corresponds to a **latent** concept of, e.g. “insects”, “tools” and so on, a set of brain regions that are most active for that particular set of words, as well as groups of persons.

**Figure 3**
*Coupled Matrix - Tensor* example: Tensors often share one or more modes (with thick, wavy line): X̲ is the brain activity tensor and Y is the semantic matrix. As the wavy line indicates, these two datasets are coupled in the ’word’ dimension.

**Figure 4**
Turbo-SMT finds meaningful groups of words, questions, and brain regions that are (both negatively and positively) correlated, as obtained using Turbo-SMT. For instance, Group 3 refers to small items that can be held in one hand, such as a tomato or a glass, and the activation pattern is very different from the one of Group 1, which mostly refers to insects, such as bee or beetle. Additionally, Group 3 shows high activation in the *premotor cortex* which is associated with the concepts of that group.

**Figure 5**
This is a pattern extracted using Turbo-SMT, which shows what appears to be a spammer on the Facebook dataset: One person, posting to many different walls on a single day.

**Figure 6**
Run time (in seconds) for the baselines, on the full BrainQ dataset. The results shown are over 5 full runs of the algorithm.

**Figure 7**
Run time (in seconds) for Turbo-SMT when using each of the baselines as core solver. The results shown are over 5 full runs of the algorithm. For every value of F, the bars on the plot starting from left to right, correspond to the parameters shown on the legend. The number of repetitions r was set to 4.

**Figure 8**
Reconstruction error of the tensor and the matrix (as measured by the cost function of the problem) for the two baselines. for r = 4.

**Figure 9**
Reconstruction error for Turbo-SMT.

**Figure 10**
The relative error of the model, as a function of the number of repetitions r is decreasing, which empirically shows that Turbo-SMT generally reduces the approximation error of the CMTF model.

**Figure 11**
The relative output size vs. the relative cost indicates that, even for very dense datasets such as BrainQ, we are able to get up to 65 fold (for CMTF-OPT) decrease in the output size, while maintaining almost same approximation error as the baseline.

**Figure 12**
Comparison of Turbo-SMT and a compression-based technique, that uses the Tucker3 decomposition, as described in Sec. 7.5. Turbo-SMT significantly outperforms the alternative method.

**Figure 13**
This Figure shows the Signal-to-Noise ratio (SNR)-as defined in the main text- as a function of the percentage of missing values. We can observe that, even for a fair amount of missing values, the SNR is quite high, signifying that Turbo-SMT is able to handle such ill-conditioned settings, with reasonable fidelity.

**Figure 14**
Run time (in seconds) for the baselines

**Figure 15**
Run time (in seconds) for Turbo-SMT

**Figure 16**
Run times, sparsity, and approximation error for the Facebook dataset, using CMTF-OPT and Turbo-SMT with CMTF-OPT as a core solver.

See this image and copyright information in PMC

References

1. Read the web. http://rtw.ml.cmu.edu/rtw/.
1. Acar E, Aykut-Bingol C, Bingol H, Bro R, Yener B. Multiway analysis of epilepsy tensors. Bioinformatics. 2007;23(13):i10–i18. - PubMed
1. Acar E, Gurdeniz G, Rasmussen MA, Rago D, Dragsted LO, Bro R. IEEE ICDM Workshops. IEEE; 2012. Coupled matrix factorization with sparse factors to identify potential biomarkers in metabolomics; pp. 1–8.
1. Acar E, Kolda TG, Dunlavy DM. All-at-once optimization for coupled matrix and tensor factorizations. 2011 arXiv preprint arXiv:1105.3422.
1. Acar E, Plopper GE, Yener B. Coupled analysis of in vitro and histology tissue samples to quantify structure-function relationship. PloS one. 2012;7(3):e32227. - PMC - PubMed

Grants and funding

R01 HD075328/HD/NICHD NIH HHS/United States

LinkOut - more resources

Full Text Sources
- Europe PubMed Central
- PubMed Central
Other Literature Sources
- scite Smart Citations
Miscellaneous
- NCI CPTAC Assay Portal

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

Turbo-SMT: Parallel Coupled Sparse Matrix-Tensor Factorizations and Applications

Affiliations

Turbo-SMT: Parallel Coupled Sparse Matrix-Tensor Factorizations and Applications

Authors

Affiliations

Abstract

Figures

References

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources

Miscellaneous