nips nips2013 nips2013-74 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Ryota Tomioka, Taiji Suzuki
Abstract: We study a new class of structured Schatten norms for tensors that includes two recently proposed norms (“overlapped” and “latent”) for convex-optimizationbased tensor decomposition. We analyze the performance of “latent” approach for tensor decomposition, which was empirically found to perform better than the “overlapped” approach in some settings. We show theoretically that this is indeed the case. In particular, when the unknown true tensor is low-rank in a specific unknown mode, this approach performs as well as knowing the mode with the smallest rank. Along the way, we show a novel duality result for structured Schatten norms, which is also interesting in the general context of structured sparsity. We confirm through numerical simulations that our theory can precisely predict the scaling behaviour of the mean squared error. 1
Reference: text
sentIndex sentText sentNum sentScore
1 jp Abstract We study a new class of structured Schatten norms for tensors that includes two recently proposed norms (“overlapped” and “latent”) for convex-optimizationbased tensor decomposition. [sent-5, score-0.775]
2 We analyze the performance of “latent” approach for tensor decomposition, which was empirically found to perform better than the “overlapped” approach in some settings. [sent-6, score-0.502]
3 In particular, when the unknown true tensor is low-rank in a specific unknown mode, this approach performs as well as knowing the mode with the smallest rank. [sent-8, score-0.561]
4 For example, in neuroimaging, spatio-temporal patterns of neural activities that are related to certain experimental conditions or subjects can be found by computing the tensor decomposition of the data tensor, which can be of size channels × timepoints × subjects × conditions [18]. [sent-12, score-0.519]
5 Conventionally, tensor decomposition has been tackled through non-convex optimization problems, using alternate least squares or higher-order orthogonal iteration [6]. [sent-18, score-0.519]
6 Compared to its empirical success, little has been theoretically understood about the performance of tensor decomposition algorithms. [sent-19, score-0.56]
7 Recently a convex-optimization-based approach for tensor decomposition has been proposed by several authors [9, 15, 23, 25], and its performance has been analyzed in [26]. [sent-24, score-0.566]
8 This approach does not require the rank of the decomposition to be specified beforehand, and due to the low-rank inducing property of the Schatten 1-norm, the rank of the decomposition is automatically determined. [sent-40, score-0.517]
9 However, it has been noticed that the above overlapped approach has a limitation that it performs poorly for a tensor that is only low-rank in a certain mode. [sent-41, score-1.103]
10 The authors of [25] proposed an alternative approach, which we call latent approach, that decomposes a given tensor into a mixture of tensors that each are low-rank in a specific mode. [sent-42, score-0.754]
11 Figure 1 demonstrates that the latent approach is preferable to the overlapped approach when the underlying tensor is almost full rank in all but one mode. [sent-43, score-1.468]
12 Along the way, we show a novel duality between the two types of norms employed in the above two approaches, namely the overlapped Schatten norm and the latent Schatten norm. [sent-46, score-1.084]
13 In fact, the (plain) overlapped group lasso constrains the weights to be simultaneously group sparse over overlapping groups. [sent-48, score-0.765]
14 The latent group lasso predicts with a mixture of group sparse weights [see also 1, 3, 12]. [sent-49, score-0.355]
15 These approaches clearly correspond to the two variations of tensor decomposition algorithms we discussed above. [sent-50, score-0.519]
16 Finally we empirically compare the overlapped approach and latent approach and show that even when the unknown tensor is simultaneously low-rank, which is a favorable situation for the overlapped approach, the latent approach performs better in many cases. [sent-51, score-2.261]
17 Thus we provide both theoretical and empirical evidence that for noisy tensor decomposition, the latent approach is preferable to the overlapped approach. [sent-52, score-1.33]
18 Our result is complementary to the previous study [25, 26], which mainly focused on the noise-less tensor completion setting. [sent-53, score-0.444]
19 1 For a K-way tensor, there are K ways to unfold a tensor into a matrix. [sent-54, score-0.416]
20 Section 3 presents our main theoretical contributions; we establish the consistency of the latent approach, and we analyze the denoising performance of the latent approach. [sent-58, score-0.424]
21 2 Structured Schatten norms for tensors In this section, we define the overlapped Schatten norm and the latent Schatten norm and discuss their basic properties. [sent-62, score-1.194]
22 The Frobenius norm of a tensor is defined as W F = √ ⟨W, W⟩. [sent-69, score-0.487]
23 Each dimensionality of a tensor is called a mode. [sent-70, score-0.416]
24 The mode k unfolding W (k) ∈ Rnk ×N/nk is a matrix that is obtained by concatenating the mode-k fibers along columns; here a mode-k fiber is an nk dimensional vector obtained by fixing all the indices but the kth index of W. [sent-71, score-0.226]
25 The mode-k rank rk of W is the rank of the mode-k unfolding W (k) . [sent-72, score-0.432]
26 We say that a tensor W has multilinear rank (r1 , . [sent-73, score-0.601]
27 1 Overlapped Schatten norms The low-rank inducing norm studied in [9, 15, 23, 25], which we call overlapped Schatten 1-norm, can be written as follows: ∑K W S1 /1 = ∥W (k) ∥S1 . [sent-82, score-0.815]
28 (1) k=1 In this paper, we consider the following more general overlapped Sp /q-norm, which includes the Schatten 1-norm as the special case (p, q) = (1, 1). [sent-83, score-0.638]
29 The overlapped Sp /q-norm is written as follows: (∑K )1/q ∥W (k) ∥q p , (2) W Sp /q = S k=1 where 1 ≤ p, q ≤ ∞; here ∥W ∥Sp = (∑r j=1 )1/p p σj (W ) is the Schatten p-norm for matrices, where σj (W ) is the jth largest singular value of W . [sent-84, score-0.638]
30 When used as a regularizer, the overlapped Schatten 1-norm penalizes all modes of W to be jointly low-rank. [sent-85, score-0.661]
31 It is related to the overlapped group regularization [see 13, 16] in a sense that the same object W appears repeatedly in the norm. [sent-86, score-0.734]
32 The following inequality relates the overlapped Schatten 1-norm with the Frobenius norm, which was a key step in the analysis of [26]: W S1 /1 ≤ K ∑√ rk W F , (3) k=1 where rk is the mode-k rank of W. [sent-87, score-1.039]
33 Now we are interested in the dual norm of the overlapped Sp /q-norm, because deriving the dual norm is a key step in solving the minimization problem that involves the norm (2) [see 16], as well as computing various complexity measures, such as, Rademacher complexity [8] and Gaussian width [4]. [sent-88, score-1.0]
34 It turns out that the dual norm of the overlapped Sp /q-norm is the latent Sp∗ /q ∗ -norm as shown in the following lemma (proof is presented in Appendix A). [sent-89, score-0.965]
35 The dual norm of the overlapped Sp /q-norm is the latent Sp∗ /q ∗ -norm, where 1/p + 1/p∗ = 1 and 1/q + 1/q ∗ = 1, which is defined as follows: (∑ )1/q∗ K (k) q ∗ X S ∗ /q∗ = inf ∥X (k) ∥Sp∗ . [sent-91, score-0.932]
36 In the supplementary material, we show a slightly more general version of the above lemma that naturally generalizes the duality between overlapped/latent group sparsity norms [1, 12, 17, 21]; see Section A. [sent-96, score-0.239]
37 Note that when the groups have no overlap, the overlapped/latent group sparsity norms become identical, and the duality is the ordinary duality between the group Sp /q-norms and the group Sp∗ /q ∗ -norms. [sent-97, score-0.379]
38 2 Latent Schatten norms The latent approach for tensor decomposition [25] solves the following minimization problem K ∑ (k) minimize L(W (1) + · · · + W (K) ) + λ ∥W (k) ∥S1 , (5) W (1) ,. [sent-99, score-0.866]
39 Intuitively speaking, the latent approach for tensor decomposition predicts with a mixture of K tensors that each are regularized to be low-rank in a specific mode. [sent-103, score-0.918]
40 W 1 In other words, we have identified the structured Schatten norm employed in the latent approach as the latent S1 /1-norm (or latent Schatten 1-norm for short), which can be written as follows: K ∑ (k) W S1 /1 = inf ∥W (k) ∥S1 . [sent-108, score-0.769]
41 (6) (1) +···+W (K) =W (W ) k=1 According to Lemma 1, the dual norm of the latent S1 /1-norm is the overlapped S∞ /∞-norm (7) X S∞ /∞ = max ∥X (k) ∥S∞ , k where ∥ · ∥S∞ is the spectral norm. [sent-109, score-0.932]
42 ( ) √ W S1 /1 ≤ min rk W F, k where rk is the mode-k rank of W. [sent-112, score-0.366]
43 Compared to inequality (3), the latent Schatten 1-norm is bounded by the minimal square root of the ranks instead of the sum. [sent-113, score-0.387]
44 This is the fundamental reason why the latent approach performs betters than the overlapped approach as in Figure 1. [sent-114, score-0.918]
45 [1], we study the generalization performance of the latent approach for tensor decomposition in the context of recovering an unknown tensor W ∗ from noisy measurements. [sent-116, score-1.213]
46 Finally, we discuss the difference between overlapped approach and latent approach and provide an explanation for the empirically observed superior performance of the latent approach in Figure 1. [sent-120, score-1.171]
47 1 Consistency Let W ∗ be the underlying true tensor and the noisy version Y is obtained as follows: Y = W ∗ + E, where E ∈ Rn1 ×···×nK is the noise tensor. [sent-122, score-0.465]
48 Assume that the regularization constant λ satisfies λ ≥ E S∞ /∞ (overlapped S∞ /∞ ( ) 2 1 ˆ norm of the noise), then the estimator defined by W = argminW 2 Y − W F + λ W S1 /1 , satisfies the inequality √ ˆ W − W ∗ F ≤ 2λ min nk . [sent-124, score-0.284]
49 We employ the following optimization problem to recover the unknown tensor W ∗ : ( K ∑ 1 2 (l) ˆ W = argmin Y − W F + λ W S1 /1 s. [sent-136, score-0.437]
50 Let W (k) be an optimal decomposition of W induced by the latent Schatten 1-norm (6). [sent-141, score-0.302]
51 (12) W (k) − W ∗(k) F ≤ cλ2 k=1 k=1 Moreover, the overall error can be bounded in terms of the multilinear rank of W ∗ as follows: ˆ W − W∗ 2 F ≤ cλ2 min rk . [sent-144, score-0.302]
52 k 5 (13) Note that in order to get inequality (13), we exploit the arbitrariness of the decomposition W ∗ = ∑K ∗(k) to replace the sum over the ranks with the minimal mode-k rank. [sent-145, score-0.256]
53 On the other hand, when the noise is larger and/or mink rk ≪ mink nk , (13) is tighter. [sent-151, score-0.351]
54 3 Gaussian noise When the elements of the noise tensor E are Gaussian, we obtain the following theorem. [sent-153, score-0.462]
55 Assume that the elements of the noise tensor E are independent zero-mean Gaussian random variables with variance σ 2 . [sent-155, score-0.439]
56 Note that the theoretically optimal choice of regularization constant λ is independent of the ranks of the truth W ∗ or its factors in (9), which are unknown in practice. [sent-160, score-0.243]
57 Again we can obtain a bound corresponding to the minimum rank singleton decomposition as in inequality (13) as follows: 1 ˆ W − W∗ N 2 F ≤ cF σ 2 mink rk , nK (15) where F is the same factor as in Theorem 3. [sent-161, score-0.498]
58 4 Comparison with the overlapped approach Inequality (15) explains the superior performance of the latent approach for tensor decomposition in Figure 1. [sent-163, score-1.459]
59 The inequality obtained in [26] for the overlapped approach that uses overlapped Schatten 1-norm (1) can be stated as follows: )2( ( )2 K K ∑√ 1 ˆ 1 ∑√ ∗ 2 ′ 2 1 1 W −W F ≤cσ rk . [sent-164, score-1.46]
60 (16) nk N K K k=1 k=1 Comparing inequalities (15) and (16), we notice that the complexity of the overlapped approach depends on the average (square root) of the mode-k ranks r1 , . [sent-165, score-0.906]
61 , rK , whereas that of the latent approach only grows linearly against the minimum mode-k rank. [sent-168, score-0.254]
62 Interestingly, the latent approach performs as if it knows the mode with the minimum rank, although such information is not given. [sent-169, score-0.325]
63 [19] proved a lower bound of the number of measurements for solving linear inverse problem via the overlapped approach. [sent-171, score-0.638]
64 Although the setting is different, the lower bound depends on the minimum mode-k rank, which agrees with the complexity of the latent approach. [sent-172, score-0.28]
65 The goal of this experiment is to recover the true low rank tensor W ∗ from a noisy observation Y. [sent-174, score-0.574]
66 We randomly generated the true low rank tensors W ∗ of size 50 × 50 × 20 or 80 × 80 × 40 with various mode-k ranks (r1 , r2 , r3 ). [sent-175, score-0.349]
67 A low-rank tensor is generated by first randomly drawing the 6 Overlapped approach Latent approach 0. [sent-176, score-0.48]
68 8 1 0 0 1 2 3 TR complexity/LR complexity 4 Figure 2: Performance of the overlapped approach and latent approach for tensor decomposition are shown against their theoretically predicted complexity measures (see Eqs. [sent-208, score-1.554]
69 The right panel shows the improvement of the latent approach from the overlapped approach against the ratio of their complexity measures. [sent-210, score-0.983]
70 r1 × r2 × r3 core tensor from the standard normal distribution and multiplying an orthogonal factor matrix drawn uniformly to its each mode. [sent-211, score-0.416]
71 The observation tensor Y is obtained by adding Gaussian noise with standard deviation σ = 0. [sent-212, score-0.439]
72 For each observation Y, we computed tensor decompositions using the overlapped approach and the latent approach (11). [sent-215, score-1.337]
73 , rK ) of the truth W ∗ , whereas the LR complexity (18) is defined in terms of the ranks of the latent factors (r1 , . [sent-227, score-0.345]
74 In order to find a decomposition that minimizes the right hand side of (18), we ran the latent approach to the true tensor W ∗ without noise, and took the minimum of the sum of ranks found by the run and mink rk , i. [sent-231, score-1.036]
75 The left panel shows the MSE of the overlapped approach against the TR complexity (17). [sent-236, score-0.752]
76 The middle panel shows the MSE of the latent approach against the LR complexity (18). [sent-237, score-0.313]
77 , MSE of the overlap approach over that of the latent approach) against the ratio of the respective complexity measures. [sent-240, score-0.299]
78 First, from the left panel, we can confirm that as predicted by [26], the MSE of the overlapped approach scales linearly against the TR complexity (17) for each value of the regularization constant. [sent-241, score-0.777]
79 From the central panel, we can clearly see that the MSE of the latent approach scales linearly against the LR complexity (18) as predicted by Theorem 3. [sent-242, score-0.286]
80 46 for 80 × 80 × 40) is mostly below other series, which means that the optimal choice of the regularization constant is independent of the rank of the true tensor and only depends on the size; this agrees with the condition on λ in Theorem 3. [sent-247, score-0.641]
81 The right panel reveals that in many cases the latent approach performs better than the overlapped approach, i. [sent-251, score-0.93]
82 Moreover, we can see that the success of the latent approach relative to the overlapped approach is correlated with high TR complexity to LR complexity ratio. [sent-254, score-0.977]
83 Indeed, we found that an optimal decomposition of the true tensor W ∗ was typically a singleton decomposition corresponding to the smallest tucker rank (see Section 3. [sent-255, score-0.837]
84 It is encouraging that the latent approach perform at least as well as the overlapped approach in such situations as well. [sent-260, score-0.901]
85 The current framework includes both the overlapped Schatten 1-norm and latent Schatten 1-norm recently proposed in the context of convex-optimization-based tensor decomposition [9, 15, 23, 25], and connects these studies to the broader studies on structured sparsity [2, 13, 17, 21]. [sent-262, score-1.427]
86 Furthermore, we have rigorously studied the performance of the latent approach for tensor decomposition. [sent-264, score-0.669]
87 Next, we have analyzed the denoising performance of the latent approach and shown that the error of the latent approach is upper bounded by the minimal mode-k rank, which contrasts sharply against the average (square root) dependency of the overlapped approach analyzed in [26]. [sent-266, score-1.187]
88 This explains the empirically observed superior performance of the latent approach compared to the overlapped approach. [sent-267, score-0.93]
89 The most difficult case for the overlapped approach is when the unknown tensor is only low-rank in one mode as in Figure 1. [sent-268, score-1.161]
90 We have also confirmed through numerical simulations that our analysis precisely predicts the scaling of the mean squared error as a function of the dimensionalities and the sum of ranks of the factors of the unknown tensor, which is dominated by the minimal mode-k rank. [sent-269, score-0.241]
91 Thus, we have theoretically and empirically shown that for noisy tensor decomposition, the latent approach is more likely to perform better than the overlapped approach. [sent-272, score-1.374]
92 Analyzing the performance of the latent approach for tensor completion would be an important future work. [sent-273, score-0.675]
93 The structured Schatten norms proposed in this paper include norms for tensors that are not employed in practice yet. [sent-274, score-0.375]
94 Therefore, it would be interesting to explore various extensions, such as, using the overlapped S1 /∞-norm instead of the S1 /1-norm or a non-sparse tensor decomposition. [sent-275, score-1.054]
95 Tensor completion and low-n-rank tensor recovery via convex optimization. [sent-349, score-0.47]
96 The expression of a tensor or a polyadic as a sum of products. [sent-354, score-0.416]
97 Applications of tensor (multiway array) factorizations and decompositions in data mining. [sent-418, score-0.436]
98 Square deal: Lower bounds and improved relaxations for tensor recovery. [sent-425, score-0.416]
99 Group lasso with overlaps: the latent group lasso approach. [sent-441, score-0.291]
100 Nuclear norms for tensors and their use for convex multilinear estimation. [sent-454, score-0.294]
wordName wordTfidf (topN-words)
[('overlapped', 0.638), ('tensor', 0.416), ('schatten', 0.412), ('latent', 0.199), ('rank', 0.132), ('tensors', 0.124), ('rk', 0.117), ('sp', 0.111), ('nk', 0.105), ('decomposition', 0.103), ('ranks', 0.093), ('norms', 0.091), ('mse', 0.079), ('norm', 0.071), ('duality', 0.069), ('lr', 0.058), ('mode', 0.054), ('structured', 0.053), ('mink', 0.053), ('multilinear', 0.053), ('tomioka', 0.053), ('regularization', 0.052), ('unfolding', 0.051), ('lathauwer', 0.05), ('tucker', 0.048), ('group', 0.044), ('panel', 0.044), ('theoretically', 0.041), ('tr', 0.04), ('complexity', 0.038), ('inequality', 0.035), ('singleton', 0.035), ('dimensionalities', 0.034), ('approach', 0.032), ('overlap', 0.03), ('predicts', 0.029), ('jsps', 0.029), ('jenatton', 0.028), ('completion', 0.028), ('statement', 0.028), ('consistency', 0.026), ('convex', 0.026), ('noisy', 0.026), ('minimization', 0.025), ('moor', 0.025), ('minimal', 0.025), ('dual', 0.024), ('lasso', 0.024), ('arxiv', 0.024), ('minimum', 0.023), ('modes', 0.023), ('noise', 0.023), ('cf', 0.023), ('mu', 0.023), ('explains', 0.022), ('rigorously', 0.022), ('empirically', 0.022), ('nuclear', 0.022), ('squared', 0.021), ('hayashi', 0.021), ('tokyo', 0.021), ('unknown', 0.021), ('recht', 0.021), ('constant', 0.021), ('suzuki', 0.02), ('missing', 0.02), ('incoherence', 0.02), ('agrees', 0.02), ('decompositions', 0.02), ('report', 0.019), ('preferable', 0.019), ('negahban', 0.019), ('square', 0.019), ('fazel', 0.019), ('obozinski', 0.018), ('scaling', 0.018), ('technical', 0.018), ('mairal', 0.018), ('vec', 0.018), ('sparsity', 0.018), ('dashed', 0.017), ('chicago', 0.017), ('predicted', 0.017), ('lemma', 0.017), ('superior', 0.017), ('performs', 0.017), ('appendix', 0.016), ('root', 0.016), ('employed', 0.016), ('presented', 0.016), ('kth', 0.016), ('inducing', 0.015), ('marked', 0.015), ('rm', 0.015), ('analyzed', 0.015), ('ravikumar', 0.015), ('simultaneously', 0.015), ('mixture', 0.015), ('truth', 0.015), ('dot', 0.015)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999964 74 nips-2013-Convex Tensor Decomposition via Structured Schatten Norm Regularization
Author: Ryota Tomioka, Taiji Suzuki
Abstract: We study a new class of structured Schatten norms for tensors that includes two recently proposed norms (“overlapped” and “latent”) for convex-optimizationbased tensor decomposition. We analyze the performance of “latent” approach for tensor decomposition, which was empirically found to perform better than the “overlapped” approach in some settings. We show theoretically that this is indeed the case. In particular, when the unknown true tensor is low-rank in a specific unknown mode, this approach performs as well as knowing the mode with the smallest rank. Along the way, we show a novel duality result for structured Schatten norms, which is also interesting in the general context of structured sparsity. We confirm through numerical simulations that our theory can precisely predict the scaling behaviour of the mean squared error. 1
2 0.3776015 11 nips-2013-A New Convex Relaxation for Tensor Completion
Author: Bernardino Romera-Paredes, Massimiliano Pontil
Abstract: We study the problem of learning a tensor from a set of linear measurements. A prominent methodology for this problem is based on a generalization of trace norm regularization, which has been used extensively for learning low rank matrices, to the tensor setting. In this paper, we highlight some limitations of this approach and propose an alternative convex relaxation on the Euclidean ball. We then describe a technique to solve the associated regularization problem, which builds upon the alternating direction method of multipliers. Experiments on one synthetic dataset and two real datasets indicate that the proposed method improves significantly over tensor trace norm regularization in terms of estimation error, while remaining computationally tractable. 1
3 0.29635686 295 nips-2013-Simultaneous Rectification and Alignment via Robust Recovery of Low-rank Tensors
Author: Xiaoqin Zhang, Di Wang, Zhengyuan Zhou, Yi Ma
Abstract: In this work, we propose a general method for recovering low-rank three-order tensors, in which the data can be deformed by some unknown transformation and corrupted by arbitrary sparse errors. Since the unfolding matrices of a tensor are interdependent, we introduce auxiliary variables and relax the hard equality constraints by the augmented Lagrange multiplier method. To improve the computational efficiency, we introduce a proximal gradient step to the alternating direction minimization method. We have provided proof for the convergence of the linearized version of the problem which is the inner loop of the overall algorithm. Both simulations and experiments show that our methods are more efficient and effective than previous work. The proposed method can be easily applied to simultaneously rectify and align multiple images or videos frames. In this context, the state-of-the-art algorithms “RASL” and “TILT” can be viewed as two special cases of our work, and yet each only performs part of the function of our method.
4 0.27058598 155 nips-2013-Learning Hidden Markov Models from Non-sequence Data via Tensor Decomposition
Author: Tzu-Kuo Huang, Jeff Schneider
Abstract: Learning dynamic models from observed data has been a central issue in many scientific studies or engineering tasks. The usual setting is that data are collected sequentially from trajectories of some dynamical system operation. In quite a few modern scientific modeling tasks, however, it turns out that reliable sequential data are rather difficult to gather, whereas out-of-order snapshots are much easier to obtain. Examples include the modeling of galaxies, chronic diseases such Alzheimer’s, or certain biological processes. Existing methods for learning dynamic model from non-sequence data are mostly based on Expectation-Maximization, which involves non-convex optimization and is thus hard to analyze. Inspired by recent advances in spectral learning methods, we propose to study this problem from a different perspective: moment matching and spectral decomposition. Under that framework, we identify reasonable assumptions on the generative process of non-sequence data, and propose learning algorithms based on the tensor decomposition method [2] to provably recover firstorder Markov models and hidden Markov models. To the best of our knowledge, this is the first formal guarantee on learning from non-sequence data. Preliminary simulation results confirm our theoretical findings. 1
5 0.26325485 113 nips-2013-Exact and Stable Recovery of Pairwise Interaction Tensors
Author: Shouyuan Chen, Michael R. Lyu, Irwin King, Zenglin Xu
Abstract: Tensor completion from incomplete observations is a problem of significant practical interest. However, it is unlikely that there exists an efficient algorithm with provable guarantee to recover a general tensor from a limited number of observations. In this paper, we study the recovery algorithm for pairwise interaction tensors, which has recently gained considerable attention for modeling multiple attribute data due to its simplicity and effectiveness. Specifically, in the absence of noise, we show that one can exactly recover a pairwise interaction tensor by solving a constrained convex program which minimizes the weighted sum of nuclear norms of matrices from O(nr log2 (n)) observations. For the noisy cases, we also prove error bounds for a constrained convex program for recovering the tensors. Our experiments on the synthetic dataset demonstrate that the recovery performance of our algorithm agrees well with the theory. In addition, we apply our algorithm on a temporal collaborative filtering task and obtain state-of-the-art results. 1
6 0.23376565 179 nips-2013-Low-Rank Matrix and Tensor Completion via Adaptive Sampling
7 0.20745918 203 nips-2013-Multilinear Dynamical Systems for Tensor Time Series
8 0.19586071 281 nips-2013-Robust Low Rank Kernel Embeddings of Multivariate Distributions
9 0.11406036 263 nips-2013-Reasoning With Neural Tensor Networks for Knowledge Base Completion
10 0.10855158 75 nips-2013-Convex Two-Layer Modeling
11 0.10410114 222 nips-2013-On the Linear Convergence of the Proximal Gradient Method for Trace Norm Regularization
12 0.10222603 185 nips-2013-Matrix Completion From any Given Set of Observations
13 0.098088048 353 nips-2013-When are Overcomplete Topic Models Identifiable? Uniqueness of Tensor Tucker Decompositions with Structured Sparsity
14 0.081035241 91 nips-2013-Dirty Statistical Models
15 0.073947378 92 nips-2013-Discovering Hidden Variables in Noisy-Or Networks using Quartet Tests
16 0.0734277 70 nips-2013-Contrastive Learning Using Spectral Methods
17 0.066866867 254 nips-2013-Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms
18 0.063712671 286 nips-2013-Robust learning of low-dimensional dynamics from large neural ensembles
19 0.057814512 108 nips-2013-Error-Minimizing Estimates and Universal Entry-Wise Error Bounds for Low-Rank Matrix Completion
20 0.057814237 285 nips-2013-Robust Transfer Principal Component Analysis with Rank Constraints
topicId topicWeight
[(0, 0.161), (1, 0.117), (2, 0.135), (3, 0.331), (4, 0.007), (5, -0.324), (6, 0.078), (7, -0.048), (8, 0.18), (9, 0.034), (10, -0.005), (11, -0.045), (12, -0.039), (13, -0.014), (14, -0.002), (15, -0.03), (16, -0.023), (17, -0.031), (18, 0.079), (19, -0.037), (20, 0.029), (21, 0.13), (22, 0.011), (23, 0.074), (24, -0.01), (25, -0.002), (26, -0.021), (27, 0.064), (28, 0.055), (29, -0.023), (30, 0.01), (31, -0.005), (32, 0.025), (33, -0.035), (34, 0.01), (35, 0.026), (36, -0.083), (37, -0.053), (38, -0.011), (39, 0.011), (40, -0.006), (41, -0.005), (42, 0.039), (43, -0.016), (44, -0.022), (45, -0.013), (46, 0.009), (47, 0.07), (48, 0.023), (49, 0.014)]
simIndex simValue paperId paperTitle
same-paper 1 0.96006799 74 nips-2013-Convex Tensor Decomposition via Structured Schatten Norm Regularization
Author: Ryota Tomioka, Taiji Suzuki
Abstract: We study a new class of structured Schatten norms for tensors that includes two recently proposed norms (“overlapped” and “latent”) for convex-optimizationbased tensor decomposition. We analyze the performance of “latent” approach for tensor decomposition, which was empirically found to perform better than the “overlapped” approach in some settings. We show theoretically that this is indeed the case. In particular, when the unknown true tensor is low-rank in a specific unknown mode, this approach performs as well as knowing the mode with the smallest rank. Along the way, we show a novel duality result for structured Schatten norms, which is also interesting in the general context of structured sparsity. We confirm through numerical simulations that our theory can precisely predict the scaling behaviour of the mean squared error. 1
2 0.90064335 11 nips-2013-A New Convex Relaxation for Tensor Completion
Author: Bernardino Romera-Paredes, Massimiliano Pontil
Abstract: We study the problem of learning a tensor from a set of linear measurements. A prominent methodology for this problem is based on a generalization of trace norm regularization, which has been used extensively for learning low rank matrices, to the tensor setting. In this paper, we highlight some limitations of this approach and propose an alternative convex relaxation on the Euclidean ball. We then describe a technique to solve the associated regularization problem, which builds upon the alternating direction method of multipliers. Experiments on one synthetic dataset and two real datasets indicate that the proposed method improves significantly over tensor trace norm regularization in terms of estimation error, while remaining computationally tractable. 1
3 0.89788312 203 nips-2013-Multilinear Dynamical Systems for Tensor Time Series
Author: Mark Rogers, Lei Li, Stuart Russell
Abstract: Data in the sciences frequently occur as sequences of multidimensional arrays called tensors. How can hidden, evolving trends in such data be extracted while preserving the tensor structure? The model that is traditionally used is the linear dynamical system (LDS) with Gaussian noise, which treats the latent state and observation at each time slice as a vector. We present the multilinear dynamical system (MLDS) for modeling tensor time series and an expectation–maximization (EM) algorithm to estimate the parameters. The MLDS models each tensor observation in the time series as the multilinear projection of the corresponding member of a sequence of latent tensors. The latent tensors are again evolving with respect to a multilinear projection. Compared to the LDS with an equal number of parameters, the MLDS achieves higher prediction accuracy and marginal likelihood for both artificial and real datasets. 1
4 0.8927626 295 nips-2013-Simultaneous Rectification and Alignment via Robust Recovery of Low-rank Tensors
Author: Xiaoqin Zhang, Di Wang, Zhengyuan Zhou, Yi Ma
Abstract: In this work, we propose a general method for recovering low-rank three-order tensors, in which the data can be deformed by some unknown transformation and corrupted by arbitrary sparse errors. Since the unfolding matrices of a tensor are interdependent, we introduce auxiliary variables and relax the hard equality constraints by the augmented Lagrange multiplier method. To improve the computational efficiency, we introduce a proximal gradient step to the alternating direction minimization method. We have provided proof for the convergence of the linearized version of the problem which is the inner loop of the overall algorithm. Both simulations and experiments show that our methods are more efficient and effective than previous work. The proposed method can be easily applied to simultaneously rectify and align multiple images or videos frames. In this context, the state-of-the-art algorithms “RASL” and “TILT” can be viewed as two special cases of our work, and yet each only performs part of the function of our method.
5 0.8035571 113 nips-2013-Exact and Stable Recovery of Pairwise Interaction Tensors
Author: Shouyuan Chen, Michael R. Lyu, Irwin King, Zenglin Xu
Abstract: Tensor completion from incomplete observations is a problem of significant practical interest. However, it is unlikely that there exists an efficient algorithm with provable guarantee to recover a general tensor from a limited number of observations. In this paper, we study the recovery algorithm for pairwise interaction tensors, which has recently gained considerable attention for modeling multiple attribute data due to its simplicity and effectiveness. Specifically, in the absence of noise, we show that one can exactly recover a pairwise interaction tensor by solving a constrained convex program which minimizes the weighted sum of nuclear norms of matrices from O(nr log2 (n)) observations. For the noisy cases, we also prove error bounds for a constrained convex program for recovering the tensors. Our experiments on the synthetic dataset demonstrate that the recovery performance of our algorithm agrees well with the theory. In addition, we apply our algorithm on a temporal collaborative filtering task and obtain state-of-the-art results. 1
6 0.66883689 155 nips-2013-Learning Hidden Markov Models from Non-sequence Data via Tensor Decomposition
7 0.58190364 179 nips-2013-Low-Rank Matrix and Tensor Completion via Adaptive Sampling
8 0.53525192 281 nips-2013-Robust Low Rank Kernel Embeddings of Multivariate Distributions
9 0.42594674 263 nips-2013-Reasoning With Neural Tensor Networks for Knowledge Base Completion
10 0.41095343 70 nips-2013-Contrastive Learning Using Spectral Methods
11 0.39389756 92 nips-2013-Discovering Hidden Variables in Noisy-Or Networks using Quartet Tests
13 0.35968053 254 nips-2013-Probabilistic Low-Rank Matrix Completion with Adaptive Spectral Regularization Algorithms
14 0.35026219 222 nips-2013-On the Linear Convergence of the Proximal Gradient Method for Trace Norm Regularization
15 0.33904001 75 nips-2013-Convex Two-Layer Modeling
16 0.33513197 342 nips-2013-Unsupervised Spectral Learning of Finite State Transducers
17 0.30062994 307 nips-2013-Speedup Matrix Completion with Side Information: Application to Multi-Label Learning
18 0.29328403 148 nips-2013-Latent Maximum Margin Clustering
19 0.29325435 336 nips-2013-Translating Embeddings for Modeling Multi-relational Data
20 0.28754643 185 nips-2013-Matrix Completion From any Given Set of Observations
topicId topicWeight
[(16, 0.04), (33, 0.164), (34, 0.105), (41, 0.058), (49, 0.022), (56, 0.167), (70, 0.03), (85, 0.037), (89, 0.055), (93, 0.028), (94, 0.155), (95, 0.037)]
simIndex simValue paperId paperTitle
1 0.90586418 32 nips-2013-Aggregating Optimistic Planning Trees for Solving Markov Decision Processes
Author: Gunnar Kedenburg, Raphael Fonteneau, Remi Munos
Abstract: This paper addresses the problem of online planning in Markov decision processes using a randomized simulator, under a budget constraint. We propose a new algorithm which is based on the construction of a forest of planning trees, where each tree corresponds to a random realization of the stochastic environment. The trees are constructed using a “safe” optimistic planning strategy combining the optimistic principle (in order to explore the most promising part of the search space first) with a safety principle (which guarantees a certain amount of uniform exploration). In the decision-making step of the algorithm, the individual trees are aggregated and an immediate action is recommended. We provide a finite-sample analysis and discuss the trade-off between the principles of optimism and safety. We also report numerical results on a benchmark problem. Our algorithm performs as well as state-of-the-art optimistic planning algorithms, and better than a related algorithm which additionally assumes the knowledge of all transition distributions. 1
same-paper 2 0.90513211 74 nips-2013-Convex Tensor Decomposition via Structured Schatten Norm Regularization
Author: Ryota Tomioka, Taiji Suzuki
Abstract: We study a new class of structured Schatten norms for tensors that includes two recently proposed norms (“overlapped” and “latent”) for convex-optimizationbased tensor decomposition. We analyze the performance of “latent” approach for tensor decomposition, which was empirically found to perform better than the “overlapped” approach in some settings. We show theoretically that this is indeed the case. In particular, when the unknown true tensor is low-rank in a specific unknown mode, this approach performs as well as knowing the mode with the smallest rank. Along the way, we show a novel duality result for structured Schatten norms, which is also interesting in the general context of structured sparsity. We confirm through numerical simulations that our theory can precisely predict the scaling behaviour of the mean squared error. 1
3 0.89708143 194 nips-2013-Model Selection for High-Dimensional Regression under the Generalized Irrepresentability Condition
Author: Adel Javanmard, Andrea Montanari
Abstract: In the high-dimensional regression model a response variable is linearly related to p covariates, but the sample size n is smaller than p. We assume that only a small subset of covariates is ‘active’ (i.e., the corresponding coefficients are non-zero), and consider the model-selection problem of identifying the active covariates. A popular approach is to estimate the regression coefficients through the Lasso ( 1 -regularized least squares). This is known to correctly identify the active set only if the irrelevant covariates are roughly orthogonal to the relevant ones, as quantified through the so called ‘irrepresentability’ condition. In this paper we study the ‘Gauss-Lasso’ selector, a simple two-stage method that first solves the Lasso, and then performs ordinary least squares restricted to the Lasso active set. We formulate ‘generalized irrepresentability condition’ (GIC), an assumption that is substantially weaker than irrepresentability. We prove that, under GIC, the Gauss-Lasso correctly recovers the active set. 1
4 0.89520413 325 nips-2013-The Pareto Regret Frontier
Author: Wouter M. Koolen
Abstract: Performance guarantees for online learning algorithms typically take the form of regret bounds, which express that the cumulative loss overhead compared to the best expert in hindsight is small. In the common case of large but structured expert sets we typically wish to keep the regret especially small compared to simple experts, at the cost of modest additional overhead compared to more complex others. We study which such regret trade-offs can be achieved, and how. We analyse regret w.r.t. each individual expert as a multi-objective criterion in the simple but fundamental case of absolute loss. We characterise the achievable and Pareto optimal trade-offs, and the corresponding optimal strategies for each sample size both exactly for each finite horizon and asymptotically. 1
5 0.85920328 249 nips-2013-Polar Operators for Structured Sparse Estimation
Author: Xinhua Zhang, Yao-Liang Yu, Dale Schuurmans
Abstract: Structured sparse estimation has become an important technique in many areas of data analysis. Unfortunately, these estimators normally create computational difficulties that entail sophisticated algorithms. Our first contribution is to uncover a rich class of structured sparse regularizers whose polar operator can be evaluated efficiently. With such an operator, a simple conditional gradient method can then be developed that, when combined with smoothing and local optimization, significantly reduces training time vs. the state of the art. We also demonstrate a new reduction of polar to proximal maps that enables more efficient latent fused lasso. 1
6 0.85466319 11 nips-2013-A New Convex Relaxation for Tensor Completion
7 0.85408407 184 nips-2013-Marginals-to-Models Reducibility
8 0.85341269 116 nips-2013-Fantope Projection and Selection: A near-optimal convex relaxation of sparse PCA
9 0.85171723 318 nips-2013-Structured Learning via Logistic Regression
10 0.85169715 355 nips-2013-Which Space Partitioning Tree to Use for Search?
11 0.85110509 228 nips-2013-Online Learning of Dynamic Parameters in Social Networks
12 0.85104328 224 nips-2013-On the Sample Complexity of Subspace Learning
13 0.85009485 233 nips-2013-Online Robust PCA via Stochastic Optimization
14 0.84945387 311 nips-2013-Stochastic Convex Optimization with Multiple Objectives
15 0.84927303 218 nips-2013-On Sampling from the Gibbs Distribution with Random Maximum A-Posteriori Perturbations
16 0.8490026 286 nips-2013-Robust learning of low-dimensional dynamics from large neural ensembles
17 0.84887207 80 nips-2013-Data-driven Distributionally Robust Polynomial Optimization
18 0.84788322 142 nips-2013-Information-theoretic lower bounds for distributed statistical estimation with communication constraints
19 0.84756368 28 nips-2013-Adaptive Step-Size for Policy Gradient Methods
20 0.84713614 33 nips-2013-An Approximate, Efficient LP Solver for LP Rounding