nips nips2012 nips2012-301 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Thanh Ngo, Yousef Saad
Abstract: This paper describes gradient methods based on a scaled metric on the Grassmann manifold for low-rank matrix completion. The proposed methods significantly improve canonical gradient methods, especially on ill-conditioned matrices, while maintaining established global convegence and exact recovery guarantees. A connection between a form of subspace iteration for matrix completion and the scaled gradient descent procedure is also established. The proposed conjugate gradient method based on the scaled gradient outperforms several existing algorithms for matrix completion and is competitive with recently proposed methods. 1
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract This paper describes gradient methods based on a scaled metric on the Grassmann manifold for low-rank matrix completion. [sent-6, score-0.551]
2 The proposed methods significantly improve canonical gradient methods, especially on ill-conditioned matrices, while maintaining established global convegence and exact recovery guarantees. [sent-7, score-0.21]
3 A connection between a form of subspace iteration for matrix completion and the scaled gradient descent procedure is also established. [sent-8, score-0.714]
4 The proposed conjugate gradient method based on the scaled gradient outperforms several existing algorithms for matrix completion and is competitive with recently proposed methods. [sent-9, score-0.619]
5 The matrix completion problem is to reconstruct A given a subset of entries of A. [sent-11, score-0.273]
6 If the rank is unknown and there is no noise, the problem can be formulated as: Minimize rank (X) subject to PΩ (X) = PΩ (A). [sent-20, score-0.212]
7 (1) Rank minimization is NP-hard and so work has been done to solve a convex relaxation of it by approximating the rank by the nuclear norm. [sent-21, score-0.106]
8 Under some conditions, the solution of the relaxed problem can be shown to be the exact solution of the rank minimization problem with overwhelming probability [8, 18]. [sent-22, score-0.195]
9 Usually, algorithms to minimize the nuclear norm iteratively use the Singular Value Decomposition (SVD), specifically the singular value thresholding operator [7, 15, 17], which makes them expensive. [sent-23, score-0.205]
10 If the rank is known, we can formulate the matrix completion problem as follows: Find matrix X to minimize ||PΩ (X) − PΩ (A)||F subject to rank (X) = r. [sent-24, score-0.505]
11 A number of algorithms based on non-convex formulation use the framework of optimization on matrix manifolds [14, 22, 6]. [sent-27, score-0.141]
12 [14] propose a steepest descent procedure on the product of Grassmann manifolds of r-dimensional subspaces. [sent-29, score-0.174]
13 Vandereycken [22] discusses a conjugate gradient algorithm on the Riemann manifold of rank-r matrices. [sent-30, score-0.325]
14 [23] perform a low rank matrix factorization based on a successive over-relaxation iteration. [sent-33, score-0.208]
15 Also, Srebro and Jaakkola [21] discuss SVD-EM, one of the early fixed-rank methods using truncated singular value decomposition iteratively. [sent-34, score-0.205]
16 In fact, even when the rank is unknown, the sparse matrix which consists of observed entries can give us a very good approximation of the rank based on its singular spectrum [14]. [sent-39, score-0.588]
17 Moreover, the singular spectrum is revealed during the iterations, so many fixed rank methods can also be adapted to find the rank of the matrix. [sent-41, score-0.439]
18 2 Our contribution OptSpace [14] is an efficient algorithm for low-rank matrix completion with global convergence and exact recovery guarantees. [sent-43, score-0.322]
19 We propose using a non-canonical metric on the Grassmann manifold to improve OptSpace while maintaining its appealing properties. [sent-44, score-0.222]
20 The non-canonical metric introduces a scaling factor to the gradient of the objective function which can be interpreted as an adaptive preconditioner for the matrix completion problem. [sent-45, score-0.374]
21 The gradient descent procedure using the scaled gradient is related to a form of subspace iteration for matrix completion. [sent-46, score-0.635]
22 Each iteration of the subspace iteration is inexpensive and the procedure converges very rapidly. [sent-47, score-0.199]
23 2 Subspace iteration for incomplete matrices We begin with a form of subspace iteration for matrix completion depicted in Algorithm 1. [sent-52, score-0.498]
24 Output: Left and right dominant subspaces U and V and associated singular values. [sent-55, score-0.307]
25 10: end if 11: end for matrix A is fully observed, U and V can be randomly initialized, line 3 is not needed and in lines 4 and 6 we use A instead of Xi+1 to update the subspaces. [sent-60, score-0.153]
26 In this case, we have the classical twosided subspace iteration for singular value decomposition. [sent-61, score-0.342]
27 Lines 6-9 correspond to a Rayleigh-Ritz projection to obtain current approximations of singular vectors and singular values. [sent-62, score-0.41]
28 It is known that if the initial columns of U and V are not orthogonal to any of the first r left and right singular vectors of A respectively, the algorithm converges to the dominant subspaces of A [20, Theorem 5. [sent-63, score-0.307]
29 Back to the case when the matrix A is not fully observed, the basic idea of Algorithm 1 is to use an approximation of A in each iteration to update the subspaces U and V and then from the new U and V , we can obtain a better approximation of A for the next iteration. [sent-65, score-0.261]
30 This step provides current approximations of the singular values which could be useful for several purposes such as in regularization or for convergence test. [sent-69, score-0.232]
31 Each iteration of Algorithm 1 can be seen as an approximation of an iteration of SVD-EM where a few matrix multiplications are used to update U and V instead of using a truncated SVD to compute the dominant subspaces of Xi+1 . [sent-71, score-0.369]
32 by a Lanczos type procedure, requires several, possibly a large number of, matrix multiplications of this type. [sent-74, score-0.114]
33 Then T ˆ i + Ei , where Ei = PΩ (A − Xi ) is a sparse matrix of errors at ˆ Xi+1 = PΩ (Ui Si Vi ) + AΩ = X ¯ ˆ known entries which can be computed efficiently by exploiting the structure of Xi . [sent-78, score-0.126]
34 Assume that each Si is not singular (the non-singularity of Si will be discussed in Section 4). [sent-79, score-0.205]
35 Then if we post-multiply −1 the update of U in line 4 by Si , the subspace remains the same, and the update becomes: −1 −1 −1 ˆ Ui+1 = Xi+1 Vi Si = (Xi + Ei )Vi Si = Ui + Ei Vi Si , (3) The update of V can also be efficiently implemented. [sent-80, score-0.184]
36 We observe that the convergence speed remains roughly the same (when A is fully observed, the algorithm is a slower version of subspace iteration where the convergence rate is halved). [sent-82, score-0.22]
37 With this change, we can derive an update to V that is similar to (3), −T T Vi+1 = Vi + Ei Ui Si , (4) −T −1 T We will point out in Section 3 that the updating terms Ei Vi Si and Ei Ui Si are related to the gradients of a matrix completion objective function on the Grassmann manifold. [sent-83, score-0.303]
38 As a result, to improve the convergence speed, we can add an adaptive step size ti to the process, as follows: −1 Ui+1 = Ui + ti Ei Vi Si and −T T Vi+1 = Vi + ti Ei Ui Si . [sent-84, score-0.372]
39 ˆ This is equivalent to using Xi + ti Ei as the estimate of A in each iteration. [sent-85, score-0.115]
40 T T Si+1 = (Ui+1 Ui )Si (ViT Vi+1 ) + ti Ui+1 Ei Vi+1 (5) There are also other ways to obtain Si+1 once Ui+1 and Vi+1 are determined to improve the current approximation of A . [sent-91, score-0.115]
41 Output: Left and right dominant subspaces U and V and associated singular values. [sent-95, score-0.307]
42 do 3: Compute Ei and appropriate step size ti −1 −T T 4: Ui+1 = Ui + ti Ei Vi Si and Vi+1 = Vi + ti Ei Ui Si 5: Orthonormalize Ui+1 and Vi+1 T 6: Find Si+1 such that PΩ (Ui+1 Si+1 Vi+1 ) is close to AΩ (e. [sent-100, score-0.345]
43 The updates (3) and (4) are reminiscent of the gradient descent steps for minimizing matrix completion error on the Grassmann manifold that is introduced in [14] and the next section discusses the connection to optimization on the Grassmann manifold. [sent-107, score-0.596]
44 3 3 Optimization on the Grassmann manifold In this section, we show that using a non-canonical Riemann metric on the Grassmann manifold, the gradient of the same objective function in [14] is of a form similar to (3) and (4). [sent-108, score-0.315]
45 Based on this, improvements to the gradient descent algorithms can be made and exact recovery results similar to those of [14] can be maintained. [sent-109, score-0.244]
46 1 Gradients on the Grassmann manifold for matrix completion problem Let G(m, r) be the Grassmann manifold in which each point corresponds to a subspace of dimension r in Rm . [sent-112, score-0.659]
47 One of the results of [14], is that under a few assumptions (to be addressed in Section 4), one can obtain with high probability the exact matrix A by minimizing a regularized version of the function F : G(m, r) × G(n, r) → R defined below. [sent-113, score-0.109]
48 Here, we abuse the notation by denoting by U and V both orthonormal matrices as well as the points on the Grassmann manifold which they span. [sent-115, score-0.277]
49 Note that F only depends on the subspaces spanned by matrices U and V . [sent-116, score-0.114]
50 If G(m, r) is endowed with the canonical inner product W, W ′ = Tr (W T W ′ ), where W and W ′ are tangent vectors of G(m, r) at U (i. [sent-118, score-0.119]
51 W, W ′ ∈ Rm×r such that W T U = 0 and W ′T U = 0) and similarly for G(n, r), the gradients of F (U, V ) on the product manifold are: gradFU (U, V ) gradFV (U, V ) = = (I − U U T )PΩ (U SV T − A)V S T T (I − V V )PΩ (U SV T T (8) T − A) U S. [sent-120, score-0.236]
52 (9) T T In the above formulas, (I −U U ) and (I −V V ) are the projections of the derivatives PΩ (U SV − A)V S T and PΩ (U SV T − A)T U S onto the tangent space of the manifold at (U, V ). [sent-121, score-0.237]
53 It is therefore compelling to use a scaled metric on the Grassmann manifold. [sent-128, score-0.203]
54 We will derive the partial gradients of F on the Grassmann manifold endowed with this scaled inner product. [sent-130, score-0.421]
55 According to [11], gradFU is the tangent vector of G(m, r) at U such that T Tr (FU W ) = (gradFU )T , W D, (10) for all tangent vectors W at U , where FU is the partial derivative of F with respect to U . [sent-131, score-0.11]
56 The solution of (10) with the constraints that W T U = 0 and (gradFU )T U = 0 gives us the gradient based on the scaled metric, which we will denote by grads FU and grads FV . [sent-133, score-1.068]
57 grads FU (U, V ) grads FV (U, V ) = = (I − U U T )FU D−1 = (I − U U T )PΩ (U SV T − A)V SD−1 . [sent-134, score-0.812]
58 (12) Notice the additional scaling D appearing in these scaled gradients. [sent-136, score-0.184]
59 4 If S is not diagonalized, we use SS T and S T S to derive grads FU and grads FV respectively, and the scalings appear exactly as in (3) and (4). [sent-138, score-0.812]
60 grads FU (U, V ) grads FV (U, V ) = = (I − U U T )PΩ (U SV T − A)V S −1 T (I − V V )PΩ (U SV T T − A) U S (13) −T (14) This scaling can be interpreted as an adaptive preconditioning step similar to those that are popular in the scientific computing literature [4]. [sent-139, score-0.833]
61 As will be shown in our experiments, this scaled gradient direction outperforms canonical gradient directions especially for ill-conditioned matrices. [sent-140, score-0.413]
62 The optimization framework on matrix manifolds allows to define several elements of the manifold in a flexible way. [sent-141, score-0.323]
63 Here, we use the scaled-metric to obtain a good descent direction, while other operations on the manifold can be based on the canonical metric which has simple and efficient computational forms. [sent-142, score-0.34]
64 We use R(U ) = span (U ) as the retraction on the Grassmann manifold where span (U ) is represented by qf(U ), which is the Q factor in the QR factorization of U . [sent-146, score-0.288]
65 Algorithm 3 is an outline of our gradient descent method (i) (i) for matrix completion. [sent-151, score-0.242]
66 We let grads FU ≡ grads FU (Ui , Vi ) and grads FV ≡ grads FV (Ui , Vi ). [sent-152, score-1.624]
67 The subspace iteration and LMaFit can be seen as relaxed versions of this gradient descent procedure. [sent-157, score-0.359]
68 The next section goes further and described the conjugate gradient iteration. [sent-158, score-0.143]
69 do (i) (i) 3: Compute grads FU and grads FV according to (13) and (14). [sent-166, score-0.812]
70 4: Find an appropriate step size ti and compute (i) (i) (Ui+1 , Vi+1 ) = (qf(Ui − ti grads FU ), qf(Vi − ti grads FV )) 5: Compute Si+1 according to (6) (exact) or (5) (approximate). [sent-167, score-1.157]
71 3 Conjugate gradient method on the Grassmann manifold In this section, we describe the conjugate gradient (CG) method on the Grassmann manifold based on the scaled gradients to solve the matrix completion problem. [sent-169, score-1.037]
72 The main additional ingredient we need is vector transport which is used to transport the old search direction to the current point on the manifold. [sent-170, score-0.116]
73 The transported search direction is then combined with the scaled gradient at the current point, e. [sent-171, score-0.313]
74 1, we will use the canonical metric to 5 derive vector transport when considering the natural quotient manifold structure of the Grassmann manifold. [sent-177, score-0.311]
75 Algorithm 4 is a sketch of the resulting conjugate gradient procedure. [sent-179, score-0.143]
76 (0) (0) 2: Compute (η0 , ξ0 ) = (grads FU , grads FV ). [sent-184, score-0.406]
77 do 4: Compute a step size ti and compute (Ui+1 , Vi+1 ) = (qf(Ui + ti ηi ), qf(Vi + ti ξi )) 5: Compute βi+1 (Polak-Ribiere) and set (i) (i) (ηi+1 , ξi+1 ) = (−grads FU + βi+1 TUi+1 (ηi ), −grads FV + βi+1 TVi+1 (ξi )) 6: Compute Si+1 according to (6) or (5). [sent-188, score-0.345]
78 7: end for 4 Convergence and exact recovery of scaled-gradient descent methods Let A = U∗ Σ∗ V∗T be the singular value decomposition of A, where U∗ ∈ Rm×r , V∗ ∈ Rn×r and Σ∗ ∈ Rr×r . [sent-189, score-0.356]
79 Assume that A is incoherent [14]; A has bounded entries and the minimum singular value of A is bounded away from 0. [sent-192, score-0.295]
80 From these observations, given an initial point that is sufficiently close to z∗ , a gradient descent procedure on F (with an additional regularization term to keep the intermediate points incoherent) converges to z∗ and exact recovery is obtained. [sent-197, score-0.244]
81 The singular value decomposition of a trimmed version of the observerd matrix AΩ can give us the initial point that ensures convergence. [sent-198, score-0.278]
82 4 in [14], the extreme singular values of any intermediate S are bounded by ∗ ∗ ∗ ∗ extreme singular values σmin and σmax of Σ∗ : σmax ≤ 2σmax and σmin ≥ 1 σmin . [sent-207, score-0.41]
83 ˜ The scaled-gradient is the descent direction of F as a direct result from the fact that it is in˜ deed the gradient of F based on a non-canonical metric. [sent-209, score-0.191]
84 ) are the canonical norm and distance on the Grassmann manifold respectively. [sent-214, score-0.224]
85 Based on this, a similar lower bound of ˜ grads F can be derived. [sent-215, score-0.406]
86 ≥ Therefore, the scaled gradients only vanish at z∗ which means the scaled-gradient descent procedure must converge to z∗ , which is the exact solution [3]. [sent-218, score-0.351]
87 6 5 Experiments and results The proposed algorithms were implemented in Matlab with some mex-routines to perform matrix multiplications with sparse masks. [sent-219, score-0.114]
88 First, we illustrate the improvement of scaled gradients over canonical gradients for steepest descent and conjugate gradient methods on 5000 × 5000 matrices with rank 5 (Figure 1). [sent-222, score-0.714]
89 The time needed for each iteration is roughly the same for all methods so we only present the results in terms of iteration counts. [sent-225, score-0.124]
90 We can see that there are some small improvements for the fully random case (Figure 1a) since the singular values are roughly the same. [sent-226, score-0.234]
91 Now, we compare the relaxed version of the scaled conjugate gradient which uses (5) to compute Si (ScGrass-CG) to LMaFit [23], Riemann-CG [22], RTRMC2 [6] (trust region method with second order information), SVP [12] and GROUSE [5] (Figure 2). [sent-231, score-0.359]
92 The subspace iteration method and the relaxed version of scaled steepest descent converge similarly to LMaFit, so we omit them in the graph. [sent-233, score-0.459]
93 When the condition number of the matrix is higher, ScGrass-CG converges fastest both in terms of iteration counts and execution time. [sent-239, score-0.135]
94 ScGrass-CG is the relaxed scaled CG method and ScGrassCG-Reg is the exact scaled CG method using a spectral-regularization version of F proposed in 7 10000x10000 − Rank 10 − 0. [sent-247, score-0.415]
95 Upper row is fully random, lower row is random with chosen singular values. [sent-252, score-0.234]
96 6 Conlusion and future work The gradients obtained from a scaled metric on the Grassmann manifold can result in improved convergence of gradient methods on matrix manifolds for matrix completion while maintaining good global convergence and exact recovery guarantees. [sent-281, score-1.022]
97 We have established a connection between scaled gradient methods and subspace iteration method for matrix completion. [sent-282, score-0.491]
98 The relaxed versions of the proposed gradient methods, adapted from the subspace iteration, are faster than previously discussed algorithms, sometimes much faster depending on the conditionining of the data matrix. [sent-283, score-0.221]
99 Fixed point and bregman iterative methods for matrix rank minimization. [sent-393, score-0.179]
100 Solving a low-rank factorization model for matrix completion using a non-linear successive over-relaxation algorithm. [sent-441, score-0.249]
wordName wordTfidf (topN-words)
[('grads', 0.406), ('grassmann', 0.358), ('ui', 0.262), ('singular', 0.205), ('si', 0.186), ('manifold', 0.182), ('fv', 0.172), ('fu', 0.166), ('vi', 0.164), ('scaled', 0.163), ('grouse', 0.159), ('lmafit', 0.159), ('completion', 0.147), ('cg', 0.14), ('sv', 0.133), ('ei', 0.126), ('grass', 0.125), ('svp', 0.124), ('qf', 0.123), ('ti', 0.115), ('riemann', 0.109), ('gradfu', 0.106), ('rank', 0.106), ('gradient', 0.093), ('descent', 0.076), ('subspace', 0.075), ('matrix', 0.073), ('canon', 0.071), ('gradfv', 0.071), ('jester', 0.071), ('optspace', 0.071), ('randn', 0.071), ('vit', 0.071), ('subspaces', 0.068), ('manifolds', 0.068), ('rmse', 0.067), ('iteration', 0.062), ('tangent', 0.055), ('gradients', 0.054), ('keshavan', 0.054), ('entries', 0.053), ('cinc', 0.053), ('gradf', 0.053), ('scgrass', 0.053), ('relaxed', 0.053), ('steep', 0.051), ('conjugate', 0.05), ('orthonormal', 0.049), ('transport', 0.047), ('realizes', 0.047), ('matrices', 0.046), ('svd', 0.045), ('movielens', 0.043), ('canonical', 0.042), ('multiplications', 0.041), ('metric', 0.04), ('recovery', 0.039), ('incoherent', 0.037), ('exact', 0.036), ('boumal', 0.035), ('nmaes', 0.035), ('randomizations', 0.035), ('rassmann', 0.035), ('retraction', 0.035), ('saad', 0.035), ('teration', 0.035), ('transported', 0.035), ('ubspace', 0.035), ('dominant', 0.034), ('incomplete', 0.033), ('rm', 0.031), ('absil', 0.031), ('steepest', 0.03), ('factorization', 0.029), ('srebro', 0.029), ('fully', 0.029), ('update', 0.029), ('find', 0.029), ('matlab', 0.027), ('xi', 0.027), ('dai', 0.027), ('ru', 0.027), ('convergence', 0.027), ('count', 0.026), ('connection', 0.025), ('qr', 0.025), ('trust', 0.024), ('observed', 0.023), ('readers', 0.023), ('rv', 0.023), ('line', 0.022), ('tu', 0.022), ('spectrum', 0.022), ('direction', 0.022), ('endowed', 0.022), ('vanish', 0.022), ('riemannian', 0.021), ('ss', 0.021), ('scaling', 0.021), ('span', 0.021)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999988 301 nips-2012-Scaled Gradients on Grassmann Manifolds for Matrix Completion
Author: Thanh Ngo, Yousef Saad
Abstract: This paper describes gradient methods based on a scaled metric on the Grassmann manifold for low-rank matrix completion. The proposed methods significantly improve canonical gradient methods, especially on ill-conditioned matrices, while maintaining established global convegence and exact recovery guarantees. A connection between a form of subspace iteration for matrix completion and the scaled gradient descent procedure is also established. The proposed conjugate gradient method based on the scaled gradient outperforms several existing algorithms for matrix completion and is competitive with recently proposed methods. 1
2 0.13420397 277 nips-2012-Probabilistic Low-Rank Subspace Clustering
Author: S. D. Babacan, Shinichi Nakajima, Minh Do
Abstract: In this paper, we consider the problem of clustering data points into lowdimensional subspaces in the presence of outliers. We pose the problem using a density estimation formulation with an associated generative model. Based on this probability model, we first develop an iterative expectation-maximization (EM) algorithm and then derive its global solution. In addition, we develop two Bayesian methods based on variational Bayesian (VB) approximation, which are capable of automatic dimensionality selection. While the first method is based on an alternating optimization scheme for all unknowns, the second method makes use of recent results in VB matrix factorization leading to fast and effective estimation. Both methods are extended to handle sparse outliers for robustness and can handle missing values. Experimental results suggest that proposed methods are very effective in subspace clustering and identifying outliers. 1
3 0.12523238 134 nips-2012-Finite Sample Convergence Rates of Zero-Order Stochastic Optimization Methods
Author: Andre Wibisono, Martin J. Wainwright, Michael I. Jordan, John C. Duchi
Abstract: We consider derivative-free algorithms for stochastic optimization problems that use only noisy function values rather than gradients, analyzing their finite-sample convergence rates. We show that if pairs of function values are available, algorithms that √ gradient estimates based on random perturbations suffer a factor use of at most d in convergence rate over traditional stochastic gradient methods, where d is the problem dimension. We complement our algorithmic development with information-theoretic lower bounds on the minimax convergence rate of such problems, which show that our bounds are sharp with respect to all problemdependent quantities: they cannot be improved by more than constant factors. 1
4 0.10486764 318 nips-2012-Sparse Approximate Manifolds for Differential Geometric MCMC
Author: Ben Calderhead, Mátyás A. Sustik
Abstract: One of the enduring challenges in Markov chain Monte Carlo methodology is the development of proposal mechanisms to make moves distant from the current point, that are accepted with high probability and at low computational cost. The recent introduction of locally adaptive MCMC methods based on the natural underlying Riemannian geometry of such models goes some way to alleviating these problems for certain classes of models for which the metric tensor is analytically tractable, however computational efficiency is not assured due to the necessity of potentially high-dimensional matrix operations at each iteration. In this paper we firstly investigate a sampling-based approach for approximating the metric tensor and suggest a valid MCMC algorithm that extends the applicability of Riemannian Manifold MCMC methods to statistical models that do not admit an analytically computable metric tensor. Secondly, we show how the approximation scheme we consider naturally motivates the use of 1 regularisation to improve estimates and obtain a sparse approximate inverse of the metric, which enables stable and sparse approximations of the local geometry to be made. We demonstrate the application of this algorithm for inferring the parameters of a realistic system of ordinary differential equations using a biologically motivated robust Student-t error model, for which the Expected Fisher Information is analytically intractable. 1
5 0.09965767 29 nips-2012-Accelerated Training for Matrix-norm Regularization: A Boosting Approach
Author: Xinhua Zhang, Dale Schuurmans, Yao-liang Yu
Abstract: Sparse learning models typically combine a smooth loss with a nonsmooth penalty, such as trace norm. Although recent developments in sparse approximation have offered promising solution methods, current approaches either apply only to matrix-norm constrained problems or provide suboptimal convergence rates. In this paper, we propose a boosting method for regularized learning that guarantees accuracy within O(1/ ) iterations. Performance is further accelerated by interlacing boosting with fixed-rank local optimization—exploiting a simpler local objective than previous work. The proposed method yields state-of-the-art performance on large-scale problems. We also demonstrate an application to latent multiview learning for which we provide the first efficient weak-oracle. 1
6 0.096103735 206 nips-2012-Majorization for CRFs and Latent Likelihoods
7 0.092118941 225 nips-2012-Multi-task Vector Field Learning
8 0.088157415 246 nips-2012-Nonparametric Max-Margin Matrix Factorization for Collaborative Prediction
9 0.083764374 64 nips-2012-Calibrated Elastic Regularization in Matrix Completion
10 0.081843786 199 nips-2012-Link Prediction in Graphs with Autoregressive Features
11 0.078741945 9 nips-2012-A Geometric take on Metric Learning
12 0.076622061 320 nips-2012-Spectral Learning of General Weighted Automata via Constrained Matrix Completion
13 0.075199343 287 nips-2012-Random function priors for exchangeable arrays with applications to graphs and relational data
14 0.074058309 272 nips-2012-Practical Bayesian Optimization of Machine Learning Algorithms
15 0.072273165 27 nips-2012-A quasi-Newton proximal splitting method
16 0.071248248 247 nips-2012-Nonparametric Reduced Rank Regression
17 0.07107114 179 nips-2012-Learning Manifolds with K-Means and K-Flats
18 0.069503717 204 nips-2012-MAP Inference in Chains using Column Generation
19 0.067177385 240 nips-2012-Newton-Like Methods for Sparse Inverse Covariance Estimation
20 0.067008071 333 nips-2012-Synchronization can Control Regularization in Neural Systems via Correlated Noise Processes
topicId topicWeight
[(0, 0.162), (1, 0.024), (2, 0.063), (3, -0.083), (4, 0.015), (5, 0.05), (6, -0.023), (7, 0.01), (8, 0.052), (9, -0.01), (10, 0.01), (11, -0.058), (12, -0.072), (13, -0.025), (14, 0.009), (15, 0.148), (16, -0.0), (17, -0.063), (18, 0.178), (19, -0.011), (20, 0.017), (21, -0.023), (22, -0.01), (23, -0.029), (24, -0.12), (25, 0.03), (26, 0.03), (27, -0.06), (28, -0.028), (29, 0.019), (30, 0.017), (31, 0.069), (32, -0.051), (33, 0.16), (34, -0.08), (35, -0.034), (36, 0.004), (37, 0.019), (38, 0.085), (39, 0.088), (40, 0.107), (41, -0.149), (42, 0.048), (43, 0.052), (44, 0.049), (45, -0.043), (46, -0.048), (47, 0.022), (48, -0.071), (49, 0.036)]
simIndex simValue paperId paperTitle
same-paper 1 0.94432485 301 nips-2012-Scaled Gradients on Grassmann Manifolds for Matrix Completion
Author: Thanh Ngo, Yousef Saad
Abstract: This paper describes gradient methods based on a scaled metric on the Grassmann manifold for low-rank matrix completion. The proposed methods significantly improve canonical gradient methods, especially on ill-conditioned matrices, while maintaining established global convegence and exact recovery guarantees. A connection between a form of subspace iteration for matrix completion and the scaled gradient descent procedure is also established. The proposed conjugate gradient method based on the scaled gradient outperforms several existing algorithms for matrix completion and is competitive with recently proposed methods. 1
2 0.65002882 17 nips-2012-A Scalable CUR Matrix Decomposition Algorithm: Lower Time Complexity and Tighter Bound
Author: Shusen Wang, Zhihua Zhang
Abstract: The CUR matrix decomposition is an important extension of Nystr¨ m approximao tion to a general matrix. It approximates any data matrix in terms of a small number of its columns and rows. In this paper we propose a novel randomized CUR algorithm with an expected relative-error bound. The proposed algorithm has the advantages over the existing relative-error CUR algorithms that it possesses tighter theoretical bound and lower time complexity, and that it can avoid maintaining the whole data matrix in main memory. Finally, experiments on several real-world datasets demonstrate significant improvement over the existing relative-error algorithms. 1
3 0.61918753 320 nips-2012-Spectral Learning of General Weighted Automata via Constrained Matrix Completion
Author: Borja Balle, Mehryar Mohri
Abstract: Many tasks in text and speech processing and computational biology require estimating functions mapping strings to real numbers. A broad class of such functions can be defined by weighted automata. Spectral methods based on the singular value decomposition of a Hankel matrix have been recently proposed for learning a probability distribution represented by a weighted automaton from a training sample drawn according to this same target distribution. In this paper, we show how spectral methods can be extended to the problem of learning a general weighted automaton from a sample generated by an arbitrary distribution. The main obstruction to this approach is that, in general, some entries of the Hankel matrix may be missing. We present a solution to this problem based on solving a constrained matrix completion problem. Combining these two ingredients, matrix completion and spectral method, a whole new family of algorithms for learning general weighted automata is obtained. We present generalization bounds for a particular algorithm in this family. The proofs rely on a joint stability analysis of matrix completion and spectral learning. 1
4 0.61345226 225 nips-2012-Multi-task Vector Field Learning
Author: Binbin Lin, Sen Yang, Chiyuan Zhang, Jieping Ye, Xiaofei He
Abstract: Multi-task learning (MTL) aims to improve generalization performance by learning multiple related tasks simultaneously and identifying the shared information among tasks. Most of existing MTL methods focus on learning linear models under the supervised setting. We propose a novel semi-supervised and nonlinear approach for MTL using vector fields. A vector field is a smooth mapping from the manifold to the tangent spaces which can be viewed as a directional derivative of functions on the manifold. We argue that vector fields provide a natural way to exploit the geometric structure of data as well as the shared differential structure of tasks, both of which are crucial for semi-supervised multi-task learning. In this paper, we develop multi-task vector field learning (MTVFL) which learns the predictor functions and the vector fields simultaneously. MTVFL has the following key properties. (1) The vector fields MTVFL learns are close to the gradient fields of the predictor functions. (2) Within each task, the vector field is required to be as parallel as possible which is expected to span a low dimensional subspace. (3) The vector fields from all tasks share a low dimensional subspace. We formalize our idea in a regularization framework and also provide a convex relaxation method to solve the original non-convex problem. The experimental results on synthetic and real data demonstrate the effectiveness of our proposed approach. 1
5 0.54052526 277 nips-2012-Probabilistic Low-Rank Subspace Clustering
Author: S. D. Babacan, Shinichi Nakajima, Minh Do
Abstract: In this paper, we consider the problem of clustering data points into lowdimensional subspaces in the presence of outliers. We pose the problem using a density estimation formulation with an associated generative model. Based on this probability model, we first develop an iterative expectation-maximization (EM) algorithm and then derive its global solution. In addition, we develop two Bayesian methods based on variational Bayesian (VB) approximation, which are capable of automatic dimensionality selection. While the first method is based on an alternating optimization scheme for all unknowns, the second method makes use of recent results in VB matrix factorization leading to fast and effective estimation. Both methods are extended to handle sparse outliers for robustness and can handle missing values. Experimental results suggest that proposed methods are very effective in subspace clustering and identifying outliers. 1
6 0.53365111 125 nips-2012-Factoring nonnegative matrices with linear programs
7 0.5313583 29 nips-2012-Accelerated Training for Matrix-norm Regularization: A Boosting Approach
8 0.505045 86 nips-2012-Convex Multi-view Subspace Learning
9 0.50262576 334 nips-2012-Tensor Decomposition for Fast Parsing with Latent-Variable PCFGs
10 0.49376401 22 nips-2012-A latent factor model for highly multi-relational data
11 0.49327549 135 nips-2012-Forging The Graphs: A Low Rank and Positive Semidefinite Graph Learning Approach
12 0.48168784 318 nips-2012-Sparse Approximate Manifolds for Differential Geometric MCMC
13 0.47047377 44 nips-2012-Approximating Concavely Parameterized Optimization Problems
14 0.45790708 333 nips-2012-Synchronization can Control Regularization in Neural Systems via Correlated Noise Processes
15 0.4512459 206 nips-2012-Majorization for CRFs and Latent Likelihoods
16 0.45065728 134 nips-2012-Finite Sample Convergence Rates of Zero-Order Stochastic Optimization Methods
18 0.43839401 246 nips-2012-Nonparametric Max-Margin Matrix Factorization for Collaborative Prediction
19 0.43763658 54 nips-2012-Bayesian Probabilistic Co-Subspace Addition
20 0.43748295 281 nips-2012-Provable ICA with Unknown Gaussian Noise, with Implications for Gaussian Mixtures and Autoencoders
topicId topicWeight
[(0, 0.022), (9, 0.01), (20, 0.253), (21, 0.013), (38, 0.182), (42, 0.032), (54, 0.047), (55, 0.013), (61, 0.018), (74, 0.058), (76, 0.127), (80, 0.084), (92, 0.038)]
simIndex simValue paperId paperTitle
1 0.87519264 350 nips-2012-Trajectory-Based Short-Sighted Probabilistic Planning
Author: Felipe Trevizan, Manuela Veloso
Abstract: Probabilistic planning captures the uncertainty of plan execution by probabilistically modeling the effects of actions in the environment, and therefore the probability of reaching different states from a given state and action. In order to compute a solution for a probabilistic planning problem, planners need to manage the uncertainty associated with the different paths from the initial state to a goal state. Several approaches to manage uncertainty were proposed, e.g., consider all paths at once, perform determinization of actions, and sampling. In this paper, we introduce trajectory-based short-sighted Stochastic Shortest Path Problems (SSPs), a novel approach to manage uncertainty for probabilistic planning problems in which states reachable with low probability are substituted by artificial goals that heuristically estimate their cost to reach a goal state. We also extend the theoretical results of Short-Sighted Probabilistic Planner (SSiPP) [1] by proving that SSiPP always finishes and is asymptotically optimal under sufficient conditions on the structure of short-sighted SSPs. We empirically compare SSiPP using trajectorybased short-sighted SSPs with the winners of the previous probabilistic planning competitions and other state-of-the-art planners in the triangle tireworld problems. Trajectory-based SSiPP outperforms all the competitors and is the only planner able to scale up to problem number 60, a problem in which the optimal solution contains approximately 1070 states. 1
same-paper 2 0.80156088 301 nips-2012-Scaled Gradients on Grassmann Manifolds for Matrix Completion
Author: Thanh Ngo, Yousef Saad
Abstract: This paper describes gradient methods based on a scaled metric on the Grassmann manifold for low-rank matrix completion. The proposed methods significantly improve canonical gradient methods, especially on ill-conditioned matrices, while maintaining established global convegence and exact recovery guarantees. A connection between a form of subspace iteration for matrix completion and the scaled gradient descent procedure is also established. The proposed conjugate gradient method based on the scaled gradient outperforms several existing algorithms for matrix completion and is competitive with recently proposed methods. 1
3 0.73423308 139 nips-2012-Fused sparsity and robust estimation for linear models with unknown variance
Author: Arnak Dalalyan, Yin Chen
Abstract: In this paper, we develop a novel approach to the problem of learning sparse representations in the context of fused sparsity and unknown noise level. We propose an algorithm, termed Scaled Fused Dantzig Selector (SFDS), that accomplishes the aforementioned learning task by means of a second-order cone program. A special emphasize is put on the particular instance of fused sparsity corresponding to the learning in presence of outliers. We establish finite sample risk bounds and carry out an experimental evaluation on both synthetic and real data. 1
4 0.72850424 90 nips-2012-Deep Learning of Invariant Features via Simulated Fixations in Video
Author: Will Zou, Shenghuo Zhu, Kai Yu, Andrew Y. Ng
Abstract: We apply salient feature detection and tracking in videos to simulate fixations and smooth pursuit in human vision. With tracked sequences as input, a hierarchical network of modules learns invariant features using a temporal slowness constraint. The network encodes invariance which are increasingly complex with hierarchy. Although learned from videos, our features are spatial instead of spatial-temporal, and well suited for extracting features from still images. We applied our features to four datasets (COIL-100, Caltech 101, STL-10, PubFig), and observe a consistent improvement of 4% to 5% in classification accuracy. With this approach, we achieve state-of-the-art recognition accuracy 61% on STL-10 dataset. 1
5 0.70106035 255 nips-2012-On the Use of Non-Stationary Policies for Stationary Infinite-Horizon Markov Decision Processes
Author: Bruno Scherrer, Boris Lesner
Abstract: We consider infinite-horizon stationary γ-discounted Markov Decision Processes, for which it is known that there exists a stationary optimal policy. Using Value and Policy Iteration with some error at each iteration, it is well-known that one 2γ can compute stationary policies that are (1−γ)2 -optimal. After arguing that this guarantee is tight, we develop variations of Value and Policy Iteration for com2γ puting non-stationary policies that can be up to 1−γ -optimal, which constitutes a significant improvement in the usual situation when γ is close to 1. Surprisingly, this shows that the problem of “computing near-optimal non-stationary policies” is much simpler than that of “computing near-optimal stationary policies”. 1
6 0.69907898 162 nips-2012-Inverse Reinforcement Learning through Structured Classification
7 0.69603401 178 nips-2012-Learning Label Trees for Probabilistic Modelling of Implicit Feedback
8 0.69530004 83 nips-2012-Controlled Recognition Bounds for Visual Learning and Exploration
9 0.69454437 199 nips-2012-Link Prediction in Graphs with Autoregressive Features
10 0.69451845 38 nips-2012-Algorithms for Learning Markov Field Policies
11 0.69377458 187 nips-2012-Learning curves for multi-task Gaussian process regression
12 0.69245684 15 nips-2012-A Polylog Pivot Steps Simplex Algorithm for Classification
13 0.69245362 230 nips-2012-Multiple Choice Learning: Learning to Produce Multiple Structured Outputs
14 0.69191438 358 nips-2012-Value Pursuit Iteration
15 0.69154155 17 nips-2012-A Scalable CUR Matrix Decomposition Algorithm: Lower Time Complexity and Tighter Bound
16 0.691248 246 nips-2012-Nonparametric Max-Margin Matrix Factorization for Collaborative Prediction
17 0.69100034 236 nips-2012-Near-Optimal MAP Inference for Determinantal Point Processes
18 0.69032753 160 nips-2012-Imitation Learning by Coaching
19 0.68989819 227 nips-2012-Multiclass Learning with Simplex Coding
20 0.68961167 241 nips-2012-No-Regret Algorithms for Unconstrained Online Convex Optimization