nips nips2011 nips2011-270 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Ryota Tomioka, Taiji Suzuki, Kohei Hayashi, Hisashi Kashima
Abstract: We analyze the statistical performance of a recently proposed convex tensor decomposition algorithm. Conventionally tensor decomposition has been formulated as non-convex optimization problems, which hindered the analysis of their performance. We show under some conditions that the mean squared error of the convex method scales linearly with the quantity we call the normalized rank of the true tensor. The current analysis naturally extends the analysis of convex low-rank matrix estimation to tensors. Furthermore, we show through numerical experiments that our theory can precisely predict the scaling behaviour in practice.
Reference: text
sentIndex sentText sentNum sentScore
1 jp ∗ Abstract We analyze the statistical performance of a recently proposed convex tensor decomposition algorithm. [sent-15, score-0.885]
2 Conventionally tensor decomposition has been formulated as non-convex optimization problems, which hindered the analysis of their performance. [sent-16, score-0.799]
3 We show under some conditions that the mean squared error of the convex method scales linearly with the quantity we call the normalized rank of the true tensor. [sent-17, score-0.534]
4 Moreover, in collaborative filtering, users’ preferences on products, conventionally represented as a matrix, can be represented as a tensor when the preferences change over time or context. [sent-22, score-0.758]
5 For the analysis of tensor data, various models and methods for the low-rank decomposition of tensors have been proposed (see Kolda & Bader [12] for a recent survey). [sent-23, score-1.004]
6 Despite empirical success, the statistical performance of tensor decomposition algorithms has not been fully elucidated. [sent-26, score-0.799]
7 The difficulty lies in the non-convexity of the conventional tensor decomposition algorithms (e. [sent-27, score-0.799]
8 In addition, studies have revealed many discrepancies (see [12]) between matrix rank and tensor rank, which make extension of studies on the performance of low-rank matrix models (e. [sent-30, score-0.978]
9 Recently, several authors [21, 10, 13, 23] have focused on the notion of tensor mode-k rank (instead of tensor rank), which is related to the Tucker decomposition [24]. [sent-33, score-1.702]
10 They discovered that regularized estimation based on the Schatten 1-norm, which is a popular technique for recovering low-rank matrices via convex optimization, can also be applied to tensor decomposition. [sent-34, score-0.786]
11 8 Fraction of observed elements 1 Figure 1: Result of estimation of rank-(7, 8, 9) tensor of dimensions 50 × 50 × 20 from partial ˆ measurements; see [23] for the details. [sent-39, score-0.766]
12 The estimation error W − W ∗ F is plotted against the fraction of observed elements m = M/N . [sent-40, score-0.171]
13 Convex refers to the convex tensor decomposition based on the minimization problem (7). [sent-42, score-0.893]
14 In this paper, motivated by the above recent work, we mathematically analyze the performance of convex tensor decomposition. [sent-48, score-0.768]
15 The new convex formulation for tensor decomposition allows us to generalize recent results on Schatten 1-norm-regularized estimation of matrices (see [17, 18, 5, 19]). [sent-49, score-0.903]
16 Under a general setting we show how the estimation error scales with the mode-k ranks of the true tensor. [sent-50, score-0.169]
17 Furthermore, we analyze the specific settings of (i) noisy tensor decomposition and (ii) random Gaussian design. [sent-51, score-0.856]
18 In the first setting, we assume that all the elements of a low-rank tensor is observed with noise and the goal is to recover the underlying low-rank structure. [sent-52, score-0.764]
19 This is the most common setting a tensor decomposition algorithm is used. [sent-53, score-0.818]
20 In the second setting, we assume that the unknown tensor is a coefficient of a tensor-input scalar-output regression problem and the input tensors (design) are randomly given from independent Gaussian distributions. [sent-54, score-0.887]
21 To the best of our knowledge, this is the first paper that rigorously studies the performance of a tensor decomposition algorithm. [sent-56, score-0.799]
22 Moreover, we introduce a H¨ ldero like inequality (3) and the notion of mode-k decomposability (5), which play central roles in our analysis. [sent-58, score-0.193]
23 We denote the number of elements in X by N = k=1 nk . [sent-60, score-0.384]
24 The inner product between two tensors 〈W, X 〉 is defined as 〈W, X 〉 = vec(W)⊤ vec(X ), where vec is a vectorization. [sent-61, score-0.244]
25 In addition, we define the Frobenius norm of a tensor X F = 〈X , X 〉. [sent-62, score-0.73]
26 K The mode-k unfolding X (k) is the nk × n\k (¯ \k := k′ ̸=k nk′ ) matrix obtained by concatenating ¯ n the mode-k fibers (the vectors obtained by fixing every index of X but the kth index) of X as column vectors. [sent-63, score-0.481]
27 The mode-k rank of a tensor X , denoted by rankk (X ), is the rank of the mode-k unfolding X (k) (as a matrix). [sent-64, score-1.215]
28 Note that the mode-k rank can be computed in a polynomial time, because it boils down to computing a matrix rank, whereas computing tensor rank is NP complete [11]. [sent-73, score-1.119]
29 The same inequality holds for the overlapped Schatten 1-norm (1) and its dual norm. [sent-82, score-0.216]
30 The dual norm of the overlapped Schatten 1-norm can be characterized by the following lemma. [sent-83, score-0.206]
31 The dual norm of the overlapped Schatten 1-norm denoted as · S ∗ is defined as the 1 infimum of the maximum mode-k spectral norm over the tensors whose average equals the given tensor X as follows: X ∗ S1 = 1 K inf (Y (1) +Y (2) +···+Y (K) )=X (k) max ∥Y (k) ∥S∞ , k=1,. [sent-85, score-1.141]
32 Moreover, the following upper bound on the dual norm · S ∗ is valid: 1 X ∗ S1 ≤ X mean := 1 K K k=1 ∥X (k) ∥S∞ . [sent-89, score-0.151]
33 Finally, let W ∗ ∈ Rn1 ×···×nK be the low-rank tensor that we wish to recover. [sent-100, score-0.682]
34 We define the mode-k orthogonal complement ∆′′ of an k n unfolding ∆(k) ∈ Rnk ׯ \k of ∆ with respect to the true low-rank tensor W ∗ as follows: ∆′ k ∆′′ = (I nk − U k U k ⊤ )∆(k) (I n\k − V k V k ⊤ ). [sent-110, score-1.146]
35 ¯ k (4) := ∆(k) − ∆′′ is k the true tensor W ∗ . [sent-111, score-0.704]
36 (k) In addition the component having overlapped row/column space with the unfolding of Note that the decomposition ∆(k) = ∆′ + ∆′′ is defined for k k each mode; thus we use subscript k instead of (k). [sent-112, score-0.316]
37 Using the decomposition defined above we have the following equality, which we call mode-k decomposability of the Schatten 1-norm: ∥W ∗ + ∆′′ ∥S1 = ∥W ∗ ∥S1 + ∥∆′′ ∥S1 (k = 1, . [sent-113, score-0.249]
38 (5) (k) k (k) k The above decomposition is defined for each mode and thus it is weaker than the notion of decomposability discussed by Negahban et al. [sent-117, score-0.295]
39 3 3 Theory In this section, we first present a deterministic result that holds under a certain choice of regularization constant λM and an assumption called the restricted strong convexity. [sent-119, score-0.164]
40 Then, we focus on special cases to justify the choice of regularization constant and the restricted strong convexity assumption. [sent-120, score-0.213]
41 We analyze the setting of (i) noisy tensor decomposition and (ii) random Gaussian design in Section 3. [sent-121, score-0.896]
42 1 Main result Our goal is to estimate an unknown rank (r1 , . [sent-125, score-0.199]
43 , rK ) tensor W ∗ ∈ Rn1 ×···nK from observations yi = 〈Xi , W ∗ 〉 + ϵi (i = 1, . [sent-128, score-0.682]
44 Accordingly, the solution of the minimization problem (7) is typically a low-rank tensor when λM is sufficiently large. [sent-141, score-0.725]
45 ˆ The first step in our analysis is to characterize the particularity of the residual tensor ∆ := W − W ∗ as in the following lemma. [sent-143, score-0.682]
46 Although, “strong” may sound like a strong assumption, the point is that we require this assumption to hold only for the particular residual tensor we characterized in Lemma 2. [sent-157, score-0.72]
47 Suppose that the operator X satisfies the restricted strong convexity condition. [sent-167, score-0.149]
48 Then the following bound is true: K √ 32λM k=1 rk ˆ − W∗ . [sent-168, score-0.162]
49 Combining the fact that the objective value for W is smaller than that for ∗ ∗ ˆ W , the H¨ lder-like inequality (3), the triangular inequality W S − W S ≤ ∆ S , and o 1 1 1 ∗ the assumption X (ϵ)/M mean ≤ λM /2, we obtain 1 (10) ∥X(∆)∥2 ≤ X∗ (ϵ)/M mean ∆ S1 + λM ∆ S1 ≤ 2λM ∆ S1 . [sent-171, score-0.182]
50 [15] (see also [17]) pointed out that the key properties for establishing a sharp convergence result for a regularized M -estimator is the decomposability of the regularizer and the restricted strong convexity. [sent-179, score-0.189]
51 What we have shown suggests that the weaker mode-k decomposability (5) suffice to obtain the above convergence result for the overlapped Schatten 1-norm (1) regularization. [sent-180, score-0.22]
52 2 Noisy Tensor Decomposition In this subsection, we consider the setting where all the elements are observed (with noise) and the goal is to recover the underlying low-rank tensor without noise. [sent-182, score-0.735]
53 Since all the elements are observed only once, X is simply a vectorization (M = N ), and the leftˆ hand side of inequality (10) gives the quantity of interest ∥X(∆)∥2 = W − W ∗ F . [sent-183, score-0.192]
54 With high probability the quantity X∗ (ϵ) mean is concentrated around its mean, which can be bounded as follows: E X∗ (ϵ) mean ≤ σ K K √ nk + n\k . [sent-187, score-0.468]
55 There are universal constants c0 and c1 , such that, with high probability, any solution of the minimization problem (7) √ K with regularization constant λM = c0 σ k=1 ( nk + n\k )/(KN ) satisfies the following bound: ¯ ˆ W −W ∗ 2 F ≤ c1 σ 2 1 K K √ 2 nk + n\k ¯ k=1 1 K K √ 2 rk . [sent-191, score-0.974]
56 Combining Equations (10)–(11) with the fact that X is simply a vectorization and M = N , we have √ K √ 1 ˆ ∥W − W ∗ ∥F ≤ 16 2λM rk . [sent-193, score-0.212]
57 We can simplify the result of Theorem 2 by noting that n\k = N/nk ≫ nk , when the dimen¯ K √ 1 sions are of the same order. [sent-195, score-0.388]
58 (13) We call the quantity r = ∥n−1 ∥1/2 ∥r∥1/2 the normalized rank, because r = r/n when the dimen¯ ¯ sions are balanced (nk = n and rk = r for all k = 1, . [sent-200, score-0.305]
59 3 Random Gaussian Design In this subsection, we consider the case the elements of the input tensors Xi (i = 1, . [sent-205, score-0.239]
60 First we show an upper bound on the norm X∗ (ϵ) the regularization constant λM in Theorem 1. [sent-210, score-0.155]
61 Then with high probability the quantity X∗ (ϵ) mean is concentrated around its mean, which can be bounded as follows: ∗ E X (ϵ) mean √ σ M ≤ K K √ nk + n\k . [sent-214, score-0.468]
62 ¯ k=1 Next the following lemma, which is a generalization of a result presented in Negahban and Wainwright [17, Proposition 1], provides a ground for the restricted strong convexity assumption (8). [sent-215, score-0.148]
63 Then it satisfies ∥X(∆)∥2 1 √ ∆ ≥ 4 M F − 1 K K nk + M k=1 n\k ¯ M ∆ S1 , with probability at least 1 − 2 exp(−N/32). [sent-218, score-0.35]
64 The proof is analogous to that of Proposition 1 in [17] except that we use H¨ lder-like ino equality (3) for tensors instead of inequality (2) for matrices. [sent-220, score-0.263]
65 Figure 2: Result of noisy tensor decomposition for tensors of size 50 × 50 × 20 and 100 × 100 × 50. [sent-260, score-1.064]
66 1 Noisy Tensor Decomposition We randomly generated low-rank tensors of dimensions n(1) = (50, 50, 20) and n(2) = (100, 100, 50) for various ranks (r1 , . [sent-262, score-0.272]
67 For a specific rank, we generated the true tensor by drawing elements of the r1 × · · · × rK “core tensor” from the standard normal distribution and multiplying its each mode by an orthonormal factor randomly drawn from the Haar measure. [sent-266, score-0.781]
68 2, the observation y consists of all the elements of the original tensor once (M = N ) with additive independent Gaussian noise with variance σ 2 . [sent-268, score-0.764]
69 The mean squared error W − W ∗ F /N is plotted against the normalized rank r = ∥n−1 ∥1/2 ∥r∥1/2 (of the true tensor) defined in Equation (13). [sent-272, score-0.421]
70 As predicted by Theorem 2, the squared error W − W ∗ F grows linearly against the normalized rank r. [sent-277, score-0.377]
71 This behaviour is consistently observed not only around the preferred ¯ regularization constant value (triangles) but also in the over-fitting case (circles) and the underfitting case (crosses). [sent-278, score-0.169]
72 Moreover, as predicted by Theorem 2, the preferred regularization constant value scales linearly and the squared error scales quadratically to the noise standard deviation σ. [sent-279, score-0.294]
73 As predicted by Lemma 3, the curves for the smaller 50 × 50 × 20 tensor and those for the larger 100 × 100 × 50 tensor seem to agree when the regularization constant is scaled by the factor two. [sent-280, score-1.478]
74 2 Tensor completion from partial observations In this subsection, we repeat the simulation originally done by Tomioka et al. [sent-283, score-0.167]
75 3 can precisely predict the empirical scaling behaviour with respect to both the size and rank of a tensor. [sent-285, score-0.377]
76 We present results for both matrix completion (K = 2) and tensor completion (K = 3). [sent-286, score-1.055]
77 For the tensor case, we randomly generated low-rank tensors of dimensions 50 × 50 × 20 and 100 × 100 × 50. [sent-288, score-0.907]
78 We generated the matrices or tensors as in the previous subsection for various ranks. [sent-289, score-0.269]
79 5 Normalized rank ||n−1|| ||r|| 1/2 Fraction at Error<=0. [sent-300, score-0.199]
80 Figure 3: Scaling behaviour of matrix/tensor completion with respect to the size n and the rank r. [sent-312, score-0.485]
81 We used the ADMM for “as a matrix” and “constraint” approaches described in [23] to solve the minimization problem (7) for matrix completion and tensor completion, respectively. [sent-315, score-0.931]
82 A single experiment for a specific size and rank can be visualized as in Figure 1. [sent-317, score-0.237]
83 01 against the normalized rank r = ∥n−1 ∥1/2 ∥r∥1/2 (of the true ten¯ sor) defined in Equation (13). [sent-319, score-0.295]
84 The matrix case is plotted in Figure 3(a) and the tensor case is plotted in Figure 3(b). [sent-320, score-0.791]
85 We can see that the fraction of observations m = M/N scales linearly against the normalized rank r, which agrees with the condition ¯ M/N ≥ c1 ∥n−1 ∥1/2 ∥r∥1/2 = c1 r in Theorem 3 (see Equation (14)). [sent-322, score-0.368]
86 The agreement is especially ¯ good for tensor completion (Figure 3(b)), where the two series almost overlap. [sent-323, score-0.849]
87 Interestingly, we can see that when compared at the same normalized rank, tensor completion is easier than matrix completion. [sent-324, score-0.962]
88 For example, when nk = 50 and rk = 10 for each k = 1, . [sent-325, score-0.493]
89 From Figure 3, we can see that we only need to see 30% of the entries in the tensor case to achieve error smaller than 0. [sent-330, score-0.706]
90 5 Conclusion We have analyzed the statistical performance of a tensor decomposition algorithm based on the overlapped Schatten 1-norm regularization (7). [sent-332, score-0.97]
91 The fraction of observation m = M/N at the threshold predicted by our theory is proportional to the quantity we call the normalized rank, which refines conjecture (sum of the mode-k ranks) in [23]. [sent-334, score-0.198]
92 In this paper, we have focused on the convergence of the estimation error; it would be meaningful to also analyze the condition for the consistency of the estimated rank as in [2]. [sent-336, score-0.264]
93 Second, although we have succeeded in predicting the empirical scaling behaviour, the setting of random Gaussian design does not match the tensor completion setting in Section 4. [sent-337, score-0.944]
94 This might also explain why tensor completion is easier than matrix completion at the same normalized rank. [sent-340, score-1.129]
95 Moreover, when the target tensor is only low-rank in a certain mode, Schatten 1-norm regularization fails badly (as predicted by the high normalized rank). [sent-341, score-0.846]
96 In a broader context, we believe that the current paper could serve as a basis for re-examining the concept of tensor rank and low-rank approximation of tensors based on convex optimization. [sent-343, score-1.137]
97 Tensor completion and low-n-rank tensor recovery via convex optimization. [sent-421, score-0.9]
98 Tensor completion for estimating missing values in visual data. [sent-439, score-0.167]
99 Applications of tensor (multiway array) factorizations and decompositions in data mining. [sent-444, score-0.705]
100 Nuclear norms for tensors and their use for convex multilinear estimation. [sent-507, score-0.302]
wordName wordTfidf (topN-words)
[('tensor', 0.682), ('nk', 0.35), ('schatten', 0.259), ('tensors', 0.205), ('rank', 0.199), ('completion', 0.167), ('rk', 0.143), ('decomposition', 0.117), ('decomposability', 0.113), ('overlapped', 0.107), ('unfolding', 0.092), ('behaviour', 0.081), ('negahban', 0.077), ('normalized', 0.074), ('tucker', 0.069), ('vectorization', 0.069), ('tomioka', 0.069), ('regularization', 0.064), ('inequality', 0.058), ('convex', 0.051), ('dual', 0.051), ('lemma', 0.049), ('convexity', 0.049), ('noise', 0.048), ('fraction', 0.048), ('norm', 0.048), ('ranks', 0.047), ('multilinear', 0.046), ('mode', 0.043), ('nara', 0.043), ('rankk', 0.043), ('minimization', 0.043), ('subsection', 0.041), ('vec', 0.039), ('japan', 0.039), ('matrix', 0.039), ('rm', 0.039), ('strong', 0.038), ('tokyo', 0.038), ('restricted', 0.038), ('size', 0.038), ('dimen', 0.038), ('sions', 0.038), ('admm', 0.038), ('psychometrika', 0.038), ('lathauwer', 0.038), ('rnk', 0.038), ('hayashi', 0.038), ('scaling', 0.036), ('analyze', 0.035), ('nuclear', 0.035), ('plotted', 0.035), ('kashima', 0.035), ('squared', 0.034), ('elements', 0.034), ('mean', 0.033), ('conventionally', 0.032), ('op', 0.032), ('quantity', 0.031), ('multiway', 0.031), ('estimation', 0.03), ('kolda', 0.029), ('gaussian', 0.028), ('scales', 0.027), ('recht', 0.026), ('predicted', 0.026), ('crosses', 0.026), ('operator', 0.024), ('error', 0.024), ('constant', 0.024), ('theorem', 0.024), ('generalization', 0.023), ('decompositions', 0.023), ('matrices', 0.023), ('predict', 0.023), ('array', 0.023), ('preferences', 0.022), ('notion', 0.022), ('true', 0.022), ('noisy', 0.022), ('circles', 0.022), ('design', 0.021), ('tolerance', 0.021), ('concentrated', 0.021), ('dimensions', 0.02), ('linearly', 0.02), ('signal', 0.02), ('setting', 0.019), ('call', 0.019), ('equation', 0.019), ('bound', 0.019), ('psychometrics', 0.019), ('discrepancies', 0.019), ('ryota', 0.019), ('bader', 0.019), ('spikiness', 0.019), ('bro', 0.019), ('sidiropoulos', 0.019), ('ntt', 0.019), ('ly', 0.019)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000002 270 nips-2011-Statistical Performance of Convex Tensor Decomposition
Author: Ryota Tomioka, Taiji Suzuki, Kohei Hayashi, Hisashi Kashima
Abstract: We analyze the statistical performance of a recently proposed convex tensor decomposition algorithm. Conventionally tensor decomposition has been formulated as non-convex optimization problems, which hindered the analysis of their performance. We show under some conditions that the mean squared error of the convex method scales linearly with the quantity we call the normalized rank of the true tensor. The current analysis naturally extends the analysis of convex low-rank matrix estimation to tensors. Furthermore, we show through numerical experiments that our theory can precisely predict the scaling behaviour in practice.
2 0.33885983 102 nips-2011-Generalised Coupled Tensor Factorisation
Author: Kenan Y. Yılmaz, Ali T. Cemgil, Umut Simsekli
Abstract: We derive algorithms for generalised tensor factorisation (GTF) by building upon the well-established theory of Generalised Linear Models. Our algorithms are general in the sense that we can compute arbitrary factorisations in a message passing framework, derived for a broad class of exponential family distributions including special cases such as Tweedie’s distributions corresponding to βdivergences. By bounding the step size of the Fisher Scoring iteration of the GLM, we obtain general updates for real data and multiplicative updates for non-negative data. The GTF framework is, then extended easily to address the problems when multiple observed tensors are factorised simultaneously. We illustrate our coupled factorisation approach on synthetic data as well as on a musical audio restoration problem. 1
3 0.32291171 179 nips-2011-Multilinear Subspace Regression: An Orthogonal Tensor Decomposition Approach
Author: Qibin Zhao, Cesar F. Caiafa, Danilo P. Mandic, Liqing Zhang, Tonio Ball, Andreas Schulze-bonhage, Andrzej S. Cichocki
Abstract: A multilinear subspace regression model based on so called latent variable decomposition is introduced. Unlike standard regression methods which typically employ matrix (2D) data representations followed by vector subspace transformations, the proposed approach uses tensor subspace transformations to model common latent variables across both the independent and dependent data. The proposed approach aims to maximize the correlation between the so derived latent variables and is shown to be suitable for the prediction of multidimensional dependent data from multidimensional independent data, where for the estimation of the latent variables we introduce an algorithm based on Multilinear Singular Value Decomposition (MSVD) on a specially defined cross-covariance tensor. It is next shown that in this way we are also able to unify the existing Partial Least Squares (PLS) and N-way PLS regression algorithms within the same framework. Simulations on benchmark synthetic data confirm the advantages of the proposed approach, in terms of its predictive ability and robustness, especially for small sample sizes. The potential of the proposed technique is further illustrated on a real world task of the decoding of human intracranial electrocorticogram (ECoG) from a simultaneously recorded scalp electroencephalograph (EEG). 1
4 0.16813293 203 nips-2011-On the accuracy of l1-filtering of signals with block-sparse structure
Author: Fatma K. Karzan, Arkadi S. Nemirovski, Boris T. Polyak, Anatoli Juditsky
Abstract: We discuss new methods for the recovery of signals with block-sparse structure, based on 1 -minimization. Our emphasis is on the efficiently computable error bounds for the recovery routines. We optimize these bounds with respect to the method parameters to construct the estimators with improved statistical properties. We justify the proposed approach with an oracle inequality which links the properties of the recovery algorithms and the best estimation performance. 1
5 0.15321119 211 nips-2011-Penalty Decomposition Methods for Rank Minimization
Author: Yong Zhang, Zhaosong Lu
Abstract: In this paper we consider general rank minimization problems with rank appearing in either objective function or constraint. We first show that a class of matrix optimization problems can be solved as lower dimensional vector optimization problems. As a consequence, we establish that a class of rank minimization problems have closed form solutions. Using this result, we then propose penalty decomposition methods for general rank minimization problems. The convergence results of the PD methods have been shown in the longer version of the paper [19]. Finally, we test the performance of our methods by applying them to matrix completion and nearest low-rank correlation matrix problems. The computational results demonstrate that our methods generally outperform the existing methods in terms of solution quality and/or speed. 1
6 0.12901311 108 nips-2011-Greedy Algorithms for Structurally Constrained High Dimensional Problems
7 0.10828675 140 nips-2011-Kernel Embeddings of Latent Tree Graphical Models
8 0.097774528 165 nips-2011-Matrix Completion for Multi-label Image Classification
9 0.092659757 297 nips-2011-Universal low-rank matrix recovery from Pauli measurements
10 0.080908783 202 nips-2011-On the Universality of Online Mirror Descent
11 0.074406989 94 nips-2011-Facial Expression Transfer with Input-Output Temporal Restricted Boltzmann Machines
12 0.074048638 70 nips-2011-Dimensionality Reduction Using the Sparse Linear Model
13 0.07315319 118 nips-2011-High-dimensional regression with noisy and missing data: Provable guarantees with non-convexity
14 0.068105415 230 nips-2011-RTRMC: A Riemannian trust-region method for low-rank matrix completion
15 0.066993572 239 nips-2011-Robust Lasso with missing and grossly corrupted observations
16 0.066141255 289 nips-2011-Trace Lasso: a trace norm regularization for correlated designs
17 0.057978101 257 nips-2011-SpaRCS: Recovering low-rank and sparse matrices from compressive measurements
18 0.053319711 161 nips-2011-Linearized Alternating Direction Method with Adaptive Penalty for Low-Rank Representation
19 0.051803466 159 nips-2011-Learning with the weighted trace-norm under arbitrary sampling distributions
20 0.050407153 5 nips-2011-A Denoising View of Matrix Completion
topicId topicWeight
[(0, 0.148), (1, -0.007), (2, -0.04), (3, -0.171), (4, -0.104), (5, 0.048), (6, 0.026), (7, 0.027), (8, 0.156), (9, 0.081), (10, 0.114), (11, -0.066), (12, -0.101), (13, 0.007), (14, 0.101), (15, -0.315), (16, -0.199), (17, 0.025), (18, -0.135), (19, -0.077), (20, -0.206), (21, -0.217), (22, -0.119), (23, -0.075), (24, 0.093), (25, 0.078), (26, 0.048), (27, -0.084), (28, 0.008), (29, -0.095), (30, 0.095), (31, -0.009), (32, -0.051), (33, 0.174), (34, -0.077), (35, -0.084), (36, 0.039), (37, -0.027), (38, 0.033), (39, 0.16), (40, 0.011), (41, -0.004), (42, -0.018), (43, -0.015), (44, 0.003), (45, -0.075), (46, -0.055), (47, -0.014), (48, 0.062), (49, -0.014)]
simIndex simValue paperId paperTitle
same-paper 1 0.95943892 270 nips-2011-Statistical Performance of Convex Tensor Decomposition
Author: Ryota Tomioka, Taiji Suzuki, Kohei Hayashi, Hisashi Kashima
Abstract: We analyze the statistical performance of a recently proposed convex tensor decomposition algorithm. Conventionally tensor decomposition has been formulated as non-convex optimization problems, which hindered the analysis of their performance. We show under some conditions that the mean squared error of the convex method scales linearly with the quantity we call the normalized rank of the true tensor. The current analysis naturally extends the analysis of convex low-rank matrix estimation to tensors. Furthermore, we show through numerical experiments that our theory can precisely predict the scaling behaviour in practice.
2 0.8289243 102 nips-2011-Generalised Coupled Tensor Factorisation
Author: Kenan Y. Yılmaz, Ali T. Cemgil, Umut Simsekli
Abstract: We derive algorithms for generalised tensor factorisation (GTF) by building upon the well-established theory of Generalised Linear Models. Our algorithms are general in the sense that we can compute arbitrary factorisations in a message passing framework, derived for a broad class of exponential family distributions including special cases such as Tweedie’s distributions corresponding to βdivergences. By bounding the step size of the Fisher Scoring iteration of the GLM, we obtain general updates for real data and multiplicative updates for non-negative data. The GTF framework is, then extended easily to address the problems when multiple observed tensors are factorised simultaneously. We illustrate our coupled factorisation approach on synthetic data as well as on a musical audio restoration problem. 1
3 0.82876283 179 nips-2011-Multilinear Subspace Regression: An Orthogonal Tensor Decomposition Approach
Author: Qibin Zhao, Cesar F. Caiafa, Danilo P. Mandic, Liqing Zhang, Tonio Ball, Andreas Schulze-bonhage, Andrzej S. Cichocki
Abstract: A multilinear subspace regression model based on so called latent variable decomposition is introduced. Unlike standard regression methods which typically employ matrix (2D) data representations followed by vector subspace transformations, the proposed approach uses tensor subspace transformations to model common latent variables across both the independent and dependent data. The proposed approach aims to maximize the correlation between the so derived latent variables and is shown to be suitable for the prediction of multidimensional dependent data from multidimensional independent data, where for the estimation of the latent variables we introduce an algorithm based on Multilinear Singular Value Decomposition (MSVD) on a specially defined cross-covariance tensor. It is next shown that in this way we are also able to unify the existing Partial Least Squares (PLS) and N-way PLS regression algorithms within the same framework. Simulations on benchmark synthetic data confirm the advantages of the proposed approach, in terms of its predictive ability and robustness, especially for small sample sizes. The potential of the proposed technique is further illustrated on a real world task of the decoding of human intracranial electrocorticogram (ECoG) from a simultaneously recorded scalp electroencephalograph (EEG). 1
4 0.60283381 108 nips-2011-Greedy Algorithms for Structurally Constrained High Dimensional Problems
Author: Ambuj Tewari, Pradeep K. Ravikumar, Inderjit S. Dhillon
Abstract: A hallmark of modern machine learning is its ability to deal with high dimensional problems by exploiting structural assumptions that limit the degrees of freedom in the underlying model. A deep understanding of the capabilities and limits of high dimensional learning methods under specific assumptions such as sparsity, group sparsity, and low rank has been attsined. Efforts [1,2] are now underway to distill this valuable experience by proposing general unified frameworks that can achieve the twio goals of summarizing previous analyses and enabling their application to notions of structure hitherto unexplored. Inspired by these developments, we propose and analyze a general computational scheme based on a greedy strategy to solve convex optimization problems that arise when dealing with structurally constrained high-dimensional problems. Our framework not only unifies existing greedy algorithms by recovering them as special cases but also yields novel ones. Finally, we extend our results to infinite dimensional settings by using interesting connections between smoothness of norms and behavior of martingales in Banach spaces.
5 0.51525438 211 nips-2011-Penalty Decomposition Methods for Rank Minimization
Author: Yong Zhang, Zhaosong Lu
Abstract: In this paper we consider general rank minimization problems with rank appearing in either objective function or constraint. We first show that a class of matrix optimization problems can be solved as lower dimensional vector optimization problems. As a consequence, we establish that a class of rank minimization problems have closed form solutions. Using this result, we then propose penalty decomposition methods for general rank minimization problems. The convergence results of the PD methods have been shown in the longer version of the paper [19]. Finally, we test the performance of our methods by applying them to matrix completion and nearest low-rank correlation matrix problems. The computational results demonstrate that our methods generally outperform the existing methods in terms of solution quality and/or speed. 1
6 0.42485982 230 nips-2011-RTRMC: A Riemannian trust-region method for low-rank matrix completion
7 0.39853328 5 nips-2011-A Denoising View of Matrix Completion
8 0.39803138 297 nips-2011-Universal low-rank matrix recovery from Pauli measurements
9 0.363671 203 nips-2011-On the accuracy of l1-filtering of signals with block-sparse structure
10 0.32117644 73 nips-2011-Divide-and-Conquer Matrix Factorization
11 0.321156 257 nips-2011-SpaRCS: Recovering low-rank and sparse matrices from compressive measurements
12 0.314531 164 nips-2011-Manifold Precis: An Annealing Technique for Diverse Sampling of Manifolds
13 0.31336567 107 nips-2011-Global Solution of Fully-Observed Variational Bayesian Matrix Factorization is Column-Wise Independent
14 0.29686207 202 nips-2011-On the Universality of Online Mirror Descent
15 0.29651958 165 nips-2011-Matrix Completion for Multi-label Image Classification
16 0.27839991 118 nips-2011-High-dimensional regression with noisy and missing data: Provable guarantees with non-convexity
17 0.27622175 161 nips-2011-Linearized Alternating Direction Method with Adaptive Penalty for Low-Rank Representation
18 0.25446954 65 nips-2011-Convergent Fitted Value Iteration with Linear Function Approximation
19 0.25289172 225 nips-2011-Probabilistic amplitude and frequency demodulation
20 0.25220764 83 nips-2011-Efficient inference in matrix-variate Gaussian models with \iid observation noise
topicId topicWeight
[(0, 0.063), (4, 0.022), (20, 0.023), (26, 0.054), (31, 0.058), (39, 0.011), (43, 0.063), (45, 0.083), (57, 0.039), (65, 0.013), (74, 0.044), (83, 0.018), (84, 0.011), (99, 0.411)]
simIndex simValue paperId paperTitle
1 0.96914887 248 nips-2011-Semi-supervised Regression via Parallel Field Regularization
Author: Binbin Lin, Chiyuan Zhang, Xiaofei He
Abstract: This paper studies the problem of semi-supervised learning from the vector field perspective. Many of the existing work use the graph Laplacian to ensure the smoothness of the prediction function on the data manifold. However, beyond smoothness, it is suggested by recent theoretical work that we should ensure second order smoothness for achieving faster rates of convergence for semisupervised regression problems. To achieve this goal, we show that the second order smoothness measures the linearity of the function, and the gradient field of a linear function has to be a parallel vector field. Consequently, we propose to find a function which minimizes the empirical error, and simultaneously requires its gradient field to be as parallel as possible. We give a continuous objective function on the manifold and discuss how to discretize it by using random points. The discretized optimization problem turns out to be a sparse linear system which can be solved very efficiently. The experimental results have demonstrated the effectiveness of our proposed approach. 1
2 0.94717121 50 nips-2011-Budgeted Optimization with Concurrent Stochastic-Duration Experiments
Author: Javad Azimi, Alan Fern, Xiaoli Z. Fern
Abstract: Budgeted optimization involves optimizing an unknown function that is costly to evaluate by requesting a limited number of function evaluations at intelligently selected inputs. Typical problem formulations assume that experiments are selected one at a time with a limited total number of experiments, which fail to capture important aspects of many real-world problems. This paper defines a novel problem formulation with the following important extensions: 1) allowing for concurrent experiments; 2) allowing for stochastic experiment durations; and 3) placing constraints on both the total number of experiments and the total experimental time. We develop both offline and online algorithms for selecting concurrent experiments in this new setting and provide experimental results on a number of optimization benchmarks. The results show that our algorithms produce highly effective schedules compared to natural baselines. 1
3 0.94198954 54 nips-2011-Co-regularized Multi-view Spectral Clustering
Author: Abhishek Kumar, Piyush Rai, Hal Daume
Abstract: In many clustering problems, we have access to multiple views of the data each of which could be individually used for clustering. Exploiting information from multiple views, one can hope to find a clustering that is more accurate than the ones obtained using the individual views. Often these different views admit same underlying clustering of the data, so we can approach this problem by looking for clusterings that are consistent across the views, i.e., corresponding data points in each view should have same cluster membership. We propose a spectral clustering framework that achieves this goal by co-regularizing the clustering hypotheses, and propose two co-regularization schemes to accomplish this. Experimental comparisons with a number of baselines on two synthetic and three real-world datasets establish the efficacy of our proposed approaches.
4 0.90777004 8 nips-2011-A Model for Temporal Dependencies in Event Streams
Author: Asela Gunawardana, Christopher Meek, Puyang Xu
Abstract: We introduce the Piecewise-Constant Conditional Intensity Model, a model for learning temporal dependencies in event streams. We describe a closed-form Bayesian approach to learning these models, and describe an importance sampling algorithm for forecasting future events using these models, using a proposal distribution based on Poisson superposition. We then use synthetic data, supercomputer event logs, and web search query logs to illustrate that our learning algorithm can efficiently learn nonlinear temporal dependencies, and that our importance sampling algorithm can effectively forecast future events. 1
same-paper 5 0.90079236 270 nips-2011-Statistical Performance of Convex Tensor Decomposition
Author: Ryota Tomioka, Taiji Suzuki, Kohei Hayashi, Hisashi Kashima
Abstract: We analyze the statistical performance of a recently proposed convex tensor decomposition algorithm. Conventionally tensor decomposition has been formulated as non-convex optimization problems, which hindered the analysis of their performance. We show under some conditions that the mean squared error of the convex method scales linearly with the quantity we call the normalized rank of the true tensor. The current analysis naturally extends the analysis of convex low-rank matrix estimation to tensors. Furthermore, we show through numerical experiments that our theory can precisely predict the scaling behaviour in practice.
6 0.84245408 46 nips-2011-Better Mini-Batch Algorithms via Accelerated Gradient Methods
7 0.63089389 186 nips-2011-Noise Thresholds for Spectral Clustering
8 0.61896837 71 nips-2011-Directed Graph Embedding: an Algorithm based on Continuous Limits of Laplacian-type Operators
9 0.61877406 102 nips-2011-Generalised Coupled Tensor Factorisation
10 0.61191708 263 nips-2011-Sparse Manifold Clustering and Embedding
11 0.60813594 29 nips-2011-Algorithms and hardness results for parallel large margin learning
12 0.59038621 121 nips-2011-Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent
13 0.58996654 16 nips-2011-A reinterpretation of the policy oscillation phenomenon in approximate policy iteration
14 0.57324469 198 nips-2011-On U-processes and clustering performance
15 0.57193351 36 nips-2011-Analysis and Improvement of Policy Gradient Estimation
16 0.56849813 37 nips-2011-Analytical Results for the Error in Filtering of Gaussian Processes
17 0.56625003 72 nips-2011-Distributed Delayed Stochastic Optimization
18 0.55796814 278 nips-2011-TD gamma: Re-evaluating Complex Backups in Temporal Difference Learning
19 0.55334502 66 nips-2011-Crowdclustering
20 0.55301964 45 nips-2011-Beating SGD: Learning SVMs in Sublinear Time