Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2016 Aug;9(4):269-290.
doi: 10.1002/sam.11315. Epub 2016 Jun 30.

Turbo-SMT: Parallel Coupled Sparse Matrix-Tensor Factorizations and Applications

Affiliations

Turbo-SMT: Parallel Coupled Sparse Matrix-Tensor Factorizations and Applications

Evangelos E Papalexakis et al. Stat Anal Data Min. 2016 Aug.

Abstract

How can we correlate the neural activity in the human brain as it responds to typed words, with properties of these terms (like 'edible', 'fits in hand')? In short, we want to find latent variables, that jointly explain both the brain activity, as well as the behavioral responses. This is one of many settings of the Coupled Matrix-Tensor Factorization (CMTF) problem. Can we enhance any CMTF solver, so that it can operate on potentially very large datasets that may not fit in main memory? We introduce Turbo-SMT, a meta-method capable of doing exactly that: it boosts the performance of any CMTF algorithm, produces sparse and interpretable solutions, and parallelizes any CMTF algorithm, producing sparse and interpretable solutions (up to 65 fold). Additionally, we improve upon ALS, the work-horse algorithm for CMTF, with respect to efficiency and robustness to missing values. We apply Turbo-SMT to BrainQ, a dataset consisting of a (nouns, brain voxels, human subjects) tensor and a (nouns, properties) matrix, with coupling along the nouns dimension. Turbo-SMT is able to find meaningful latent variables, as well as to predict brain activity with competitive accuracy. Finally, we demonstrate the generality of Turbo-SMT, by applying it on a Facebook dataset (users, 'friends', wall-postings); there, Turbo-SMT spots spammer-like anomalies.

PubMed Disclaimer

Figures

Figure 1
Figure 1
(a): Analyzing the BrainQ dataset for the Neurosemantics application, finding semantically similar nouns that activate the same part of the brain. In this case, the premotor cortex is associated with movements such as holding or picking small items up. (b): The proposed algorithm Turbo-SMT reduces the approximation error, while running in parallel, and operating on much smaller pieces of data at any given point in time.
Figure 2
Figure 2
PARAFAC decomposition [24] of a three-way tensor of a brain activity tensor as sum of F outer products (rank-one tensors), reminiscent of the rank-F singular value decomposition of a matrix. Each component corresponds to a latent concept of, e.g. “insects”, “tools” and so on, a set of brain regions that are most active for that particular set of words, as well as groups of persons.
Figure 3
Figure 3
Coupled Matrix - Tensor example: Tensors often share one or more modes (with thick, wavy line): is the brain activity tensor and Y is the semantic matrix. As the wavy line indicates, these two datasets are coupled in the ’word’ dimension.
Figure 4
Figure 4
Turbo-SMT finds meaningful groups of words, questions, and brain regions that are (both negatively and positively) correlated, as obtained using Turbo-SMT. For instance, Group 3 refers to small items that can be held in one hand, such as a tomato or a glass, and the activation pattern is very different from the one of Group 1, which mostly refers to insects, such as bee or beetle. Additionally, Group 3 shows high activation in the premotor cortex which is associated with the concepts of that group.
Figure 5
Figure 5
This is a pattern extracted using Turbo-SMT, which shows what appears to be a spammer on the Facebook dataset: One person, posting to many different walls on a single day.
Figure 6
Figure 6
Run time (in seconds) for the baselines, on the full BrainQ dataset. The results shown are over 5 full runs of the algorithm.
Figure 7
Figure 7
Run time (in seconds) for Turbo-SMT when using each of the baselines as core solver. The results shown are over 5 full runs of the algorithm. For every value of F, the bars on the plot starting from left to right, correspond to the parameters shown on the legend. The number of repetitions r was set to 4.
Figure 8
Figure 8
Reconstruction error of the tensor and the matrix (as measured by the cost function of the problem) for the two baselines. for r = 4.
Figure 9
Figure 9
Reconstruction error for Turbo-SMT.
Figure 10
Figure 10
The relative error of the model, as a function of the number of repetitions r is decreasing, which empirically shows that Turbo-SMT generally reduces the approximation error of the CMTF model.
Figure 11
Figure 11
The relative output size vs. the relative cost indicates that, even for very dense datasets such as BrainQ, we are able to get up to 65 fold (for CMTF-OPT) decrease in the output size, while maintaining almost same approximation error as the baseline.
Figure 12
Figure 12
Comparison of Turbo-SMT and a compression-based technique, that uses the Tucker3 decomposition, as described in Sec. 7.5. Turbo-SMT significantly outperforms the alternative method.
Figure 13
Figure 13
This Figure shows the Signal-to-Noise ratio (SNR)-as defined in the main text- as a function of the percentage of missing values. We can observe that, even for a fair amount of missing values, the SNR is quite high, signifying that Turbo-SMT is able to handle such ill-conditioned settings, with reasonable fidelity.
Figure 14
Figure 14
Run time (in seconds) for the baselines
Figure 15
Figure 15
Run time (in seconds) for Turbo-SMT
Figure 16
Figure 16
Run times, sparsity, and approximation error for the Facebook dataset, using CMTF-OPT and Turbo-SMT with CMTF-OPT as a core solver.

Similar articles

References

    1. Read the web. http://rtw.ml.cmu.edu/rtw/.
    1. Acar E, Aykut-Bingol C, Bingol H, Bro R, Yener B. Multiway analysis of epilepsy tensors. Bioinformatics. 2007;23(13):i10–i18. - PubMed
    1. Acar E, Gurdeniz G, Rasmussen MA, Rago D, Dragsted LO, Bro R. IEEE ICDM Workshops. IEEE; 2012. Coupled matrix factorization with sparse factors to identify potential biomarkers in metabolomics; pp. 1–8.
    1. Acar E, Kolda TG, Dunlavy DM. All-at-once optimization for coupled matrix and tensor factorizations. 2011 arXiv preprint arXiv:1105.3422.
    1. Acar E, Plopper GE, Yener B. Coupled analysis of in vitro and histology tissue samples to quantify structure-function relationship. PloS one. 2012;7(3):e32227. - PMC - PubMed

LinkOut - more resources