nips nips2012 nips2012-301 knowledge-graph by maker-knowledge-mining

301 nips-2012-Scaled Gradients on Grassmann Manifolds for Matrix Completion

Source: pdf

Author: Thanh Ngo, Yousef Saad

Abstract: This paper describes gradient methods based on a scaled metric on the Grassmann manifold for low-rank matrix completion. The proposed methods signiﬁcantly improve canonical gradient methods, especially on ill-conditioned matrices, while maintaining established global convegence and exact recovery guarantees. A connection between a form of subspace iteration for matrix completion and the scaled gradient descent procedure is also established. The proposed conjugate gradient method based on the scaled gradient outperforms several existing algorithms for matrix completion and is competitive with recently proposed methods. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract This paper describes gradient methods based on a scaled metric on the Grassmann manifold for low-rank matrix completion. [sent-6, score-0.551]

2 The proposed methods signiﬁcantly improve canonical gradient methods, especially on ill-conditioned matrices, while maintaining established global convegence and exact recovery guarantees. [sent-7, score-0.21]

3 A connection between a form of subspace iteration for matrix completion and the scaled gradient descent procedure is also established. [sent-8, score-0.714]

4 The proposed conjugate gradient method based on the scaled gradient outperforms several existing algorithms for matrix completion and is competitive with recently proposed methods. [sent-9, score-0.619]

5 The matrix completion problem is to reconstruct A given a subset of entries of A. [sent-11, score-0.273]

6 If the rank is unknown and there is no noise, the problem can be formulated as: Minimize rank (X) subject to PΩ (X) = PΩ (A). [sent-20, score-0.212]

7 (1) Rank minimization is NP-hard and so work has been done to solve a convex relaxation of it by approximating the rank by the nuclear norm. [sent-21, score-0.106]

8 Under some conditions, the solution of the relaxed problem can be shown to be the exact solution of the rank minimization problem with overwhelming probability [8, 18]. [sent-22, score-0.195]

9 Usually, algorithms to minimize the nuclear norm iteratively use the Singular Value Decomposition (SVD), speciﬁcally the singular value thresholding operator [7, 15, 17], which makes them expensive. [sent-23, score-0.205]

10 If the rank is known, we can formulate the matrix completion problem as follows: Find matrix X to minimize ||PΩ (X) − PΩ (A)||F subject to rank (X) = r. [sent-24, score-0.505]

11 A number of algorithms based on non-convex formulation use the framework of optimization on matrix manifolds [14, 22, 6]. [sent-27, score-0.141]

12 [14] propose a steepest descent procedure on the product of Grassmann manifolds of r-dimensional subspaces. [sent-29, score-0.174]

13 Vandereycken [22] discusses a conjugate gradient algorithm on the Riemann manifold of rank-r matrices. [sent-30, score-0.325]

14 [23] perform a low rank matrix factorization based on a successive over-relaxation iteration. [sent-33, score-0.208]

15 Also, Srebro and Jaakkola [21] discuss SVD-EM, one of the early ﬁxed-rank methods using truncated singular value decomposition iteratively. [sent-34, score-0.205]

16 In fact, even when the rank is unknown, the sparse matrix which consists of observed entries can give us a very good approximation of the rank based on its singular spectrum [14]. [sent-39, score-0.588]

17 Moreover, the singular spectrum is revealed during the iterations, so many ﬁxed rank methods can also be adapted to ﬁnd the rank of the matrix. [sent-41, score-0.439]

18 2 Our contribution OptSpace [14] is an efﬁcient algorithm for low-rank matrix completion with global convergence and exact recovery guarantees. [sent-43, score-0.322]

19 We propose using a non-canonical metric on the Grassmann manifold to improve OptSpace while maintaining its appealing properties. [sent-44, score-0.222]

20 The non-canonical metric introduces a scaling factor to the gradient of the objective function which can be interpreted as an adaptive preconditioner for the matrix completion problem. [sent-45, score-0.374]

21 The gradient descent procedure using the scaled gradient is related to a form of subspace iteration for matrix completion. [sent-46, score-0.635]

22 Each iteration of the subspace iteration is inexpensive and the procedure converges very rapidly. [sent-47, score-0.199]

23 2 Subspace iteration for incomplete matrices We begin with a form of subspace iteration for matrix completion depicted in Algorithm 1. [sent-52, score-0.498]

24 Output: Left and right dominant subspaces U and V and associated singular values. [sent-55, score-0.307]

25 10: end if 11: end for matrix A is fully observed, U and V can be randomly initialized, line 3 is not needed and in lines 4 and 6 we use A instead of Xi+1 to update the subspaces. [sent-60, score-0.153]

26 In this case, we have the classical twosided subspace iteration for singular value decomposition. [sent-61, score-0.342]

27 Lines 6-9 correspond to a Rayleigh-Ritz projection to obtain current approximations of singular vectors and singular values. [sent-62, score-0.41]

28 It is known that if the initial columns of U and V are not orthogonal to any of the ﬁrst r left and right singular vectors of A respectively, the algorithm converges to the dominant subspaces of A [20, Theorem 5. [sent-63, score-0.307]

29 Back to the case when the matrix A is not fully observed, the basic idea of Algorithm 1 is to use an approximation of A in each iteration to update the subspaces U and V and then from the new U and V , we can obtain a better approximation of A for the next iteration. [sent-65, score-0.261]

30 This step provides current approximations of the singular values which could be useful for several purposes such as in regularization or for convergence test. [sent-69, score-0.232]

31 Each iteration of Algorithm 1 can be seen as an approximation of an iteration of SVD-EM where a few matrix multiplications are used to update U and V instead of using a truncated SVD to compute the dominant subspaces of Xi+1 . [sent-71, score-0.369]

32 by a Lanczos type procedure, requires several, possibly a large number of, matrix multiplications of this type. [sent-74, score-0.114]

33 Then T ˆ i + Ei , where Ei = PΩ (A − Xi ) is a sparse matrix of errors at ˆ Xi+1 = PΩ (Ui Si Vi ) + AΩ = X ¯ ˆ known entries which can be computed efﬁciently by exploiting the structure of Xi . [sent-78, score-0.126]

34 Assume that each Si is not singular (the non-singularity of Si will be discussed in Section 4). [sent-79, score-0.205]

35 Then if we post-multiply −1 the update of U in line 4 by Si , the subspace remains the same, and the update becomes: −1 −1 −1 ˆ Ui+1 = Xi+1 Vi Si = (Xi + Ei )Vi Si = Ui + Ei Vi Si , (3) The update of V can also be efﬁciently implemented. [sent-80, score-0.184]

36 We observe that the convergence speed remains roughly the same (when A is fully observed, the algorithm is a slower version of subspace iteration where the convergence rate is halved). [sent-82, score-0.22]

37 With this change, we can derive an update to V that is similar to (3), −T T Vi+1 = Vi + Ei Ui Si , (4) −T −1 T We will point out in Section 3 that the updating terms Ei Vi Si and Ei Ui Si are related to the gradients of a matrix completion objective function on the Grassmann manifold. [sent-83, score-0.303]

38 As a result, to improve the convergence speed, we can add an adaptive step size ti to the process, as follows: −1 Ui+1 = Ui + ti Ei Vi Si and −T T Vi+1 = Vi + ti Ei Ui Si . [sent-84, score-0.372]

39 ˆ This is equivalent to using Xi + ti Ei as the estimate of A in each iteration. [sent-85, score-0.115]

40 T T Si+1 = (Ui+1 Ui )Si (ViT Vi+1 ) + ti Ui+1 Ei Vi+1 (5) There are also other ways to obtain Si+1 once Ui+1 and Vi+1 are determined to improve the current approximation of A . [sent-91, score-0.115]

41 Output: Left and right dominant subspaces U and V and associated singular values. [sent-95, score-0.307]

42 do 3: Compute Ei and appropriate step size ti −1 −T T 4: Ui+1 = Ui + ti Ei Vi Si and Vi+1 = Vi + ti Ei Ui Si 5: Orthonormalize Ui+1 and Vi+1 T 6: Find Si+1 such that PΩ (Ui+1 Si+1 Vi+1 ) is close to AΩ (e. [sent-100, score-0.345]

43 The updates (3) and (4) are reminiscent of the gradient descent steps for minimizing matrix completion error on the Grassmann manifold that is introduced in [14] and the next section discusses the connection to optimization on the Grassmann manifold. [sent-107, score-0.596]

44 3 3 Optimization on the Grassmann manifold In this section, we show that using a non-canonical Riemann metric on the Grassmann manifold, the gradient of the same objective function in [14] is of a form similar to (3) and (4). [sent-108, score-0.315]

45 Based on this, improvements to the gradient descent algorithms can be made and exact recovery results similar to those of [14] can be maintained. [sent-109, score-0.244]

46 1 Gradients on the Grassmann manifold for matrix completion problem Let G(m, r) be the Grassmann manifold in which each point corresponds to a subspace of dimension r in Rm . [sent-112, score-0.659]

47 One of the results of [14], is that under a few assumptions (to be addressed in Section 4), one can obtain with high probability the exact matrix A by minimizing a regularized version of the function F : G(m, r) × G(n, r) → R deﬁned below. [sent-113, score-0.109]

48 Here, we abuse the notation by denoting by U and V both orthonormal matrices as well as the points on the Grassmann manifold which they span. [sent-115, score-0.277]

49 Note that F only depends on the subspaces spanned by matrices U and V . [sent-116, score-0.114]

50 If G(m, r) is endowed with the canonical inner product W, W ′ = Tr (W T W ′ ), where W and W ′ are tangent vectors of G(m, r) at U (i. [sent-118, score-0.119]

51 W, W ′ ∈ Rm×r such that W T U = 0 and W ′T U = 0) and similarly for G(n, r), the gradients of F (U, V ) on the product manifold are: gradFU (U, V ) gradFV (U, V ) = = (I − U U T )PΩ (U SV T − A)V S T T (I − V V )PΩ (U SV T T (8) T − A) U S. [sent-120, score-0.236]

52 (9) T T In the above formulas, (I −U U ) and (I −V V ) are the projections of the derivatives PΩ (U SV − A)V S T and PΩ (U SV T − A)T U S onto the tangent space of the manifold at (U, V ). [sent-121, score-0.237]

53 It is therefore compelling to use a scaled metric on the Grassmann manifold. [sent-128, score-0.203]

54 We will derive the partial gradients of F on the Grassmann manifold endowed with this scaled inner product. [sent-130, score-0.421]

55 According to [11], gradFU is the tangent vector of G(m, r) at U such that T Tr (FU W ) = (gradFU )T , W D, (10) for all tangent vectors W at U , where FU is the partial derivative of F with respect to U . [sent-131, score-0.11]

56 The solution of (10) with the constraints that W T U = 0 and (gradFU )T U = 0 gives us the gradient based on the scaled metric, which we will denote by grads FU and grads FV . [sent-133, score-1.068]

57 grads FU (U, V ) grads FV (U, V ) = = (I − U U T )FU D−1 = (I − U U T )PΩ (U SV T − A)V SD−1 . [sent-134, score-0.812]

58 (12) Notice the additional scaling D appearing in these scaled gradients. [sent-136, score-0.184]

59 4 If S is not diagonalized, we use SS T and S T S to derive grads FU and grads FV respectively, and the scalings appear exactly as in (3) and (4). [sent-138, score-0.812]

60 grads FU (U, V ) grads FV (U, V ) = = (I − U U T )PΩ (U SV T − A)V S −1 T (I − V V )PΩ (U SV T T − A) U S (13) −T (14) This scaling can be interpreted as an adaptive preconditioning step similar to those that are popular in the scientiﬁc computing literature [4]. [sent-139, score-0.833]

61 As will be shown in our experiments, this scaled gradient direction outperforms canonical gradient directions especially for ill-conditioned matrices. [sent-140, score-0.413]

62 The optimization framework on matrix manifolds allows to deﬁne several elements of the manifold in a ﬂexible way. [sent-141, score-0.323]

63 Here, we use the scaled-metric to obtain a good descent direction, while other operations on the manifold can be based on the canonical metric which has simple and efﬁcient computational forms. [sent-142, score-0.34]

64 We use R(U ) = span (U ) as the retraction on the Grassmann manifold where span (U ) is represented by qf(U ), which is the Q factor in the QR factorization of U . [sent-146, score-0.288]

65 Algorithm 3 is an outline of our gradient descent method (i) (i) for matrix completion. [sent-151, score-0.242]

66 We let grads FU ≡ grads FU (Ui , Vi ) and grads FV ≡ grads FV (Ui , Vi ). [sent-152, score-1.624]

67 The subspace iteration and LMaFit can be seen as relaxed versions of this gradient descent procedure. [sent-157, score-0.359]

68 The next section goes further and described the conjugate gradient iteration. [sent-158, score-0.143]

69 do (i) (i) 3: Compute grads FU and grads FV according to (13) and (14). [sent-166, score-0.812]

70 4: Find an appropriate step size ti and compute (i) (i) (Ui+1 , Vi+1 ) = (qf(Ui − ti grads FU ), qf(Vi − ti grads FV )) 5: Compute Si+1 according to (6) (exact) or (5) (approximate). [sent-167, score-1.157]

71 3 Conjugate gradient method on the Grassmann manifold In this section, we describe the conjugate gradient (CG) method on the Grassmann manifold based on the scaled gradients to solve the matrix completion problem. [sent-169, score-1.037]

72 The main additional ingredient we need is vector transport which is used to transport the old search direction to the current point on the manifold. [sent-170, score-0.116]

73 The transported search direction is then combined with the scaled gradient at the current point, e. [sent-171, score-0.313]

74 1, we will use the canonical metric to 5 derive vector transport when considering the natural quotient manifold structure of the Grassmann manifold. [sent-177, score-0.311]

75 Algorithm 4 is a sketch of the resulting conjugate gradient procedure. [sent-179, score-0.143]

76 (0) (0) 2: Compute (η0 , ξ0 ) = (grads FU , grads FV ). [sent-184, score-0.406]

77 do 4: Compute a step size ti and compute (Ui+1 , Vi+1 ) = (qf(Ui + ti ηi ), qf(Vi + ti ξi )) 5: Compute βi+1 (Polak-Ribiere) and set (i) (i) (ηi+1 , ξi+1 ) = (−grads FU + βi+1 TUi+1 (ηi ), −grads FV + βi+1 TVi+1 (ξi )) 6: Compute Si+1 according to (6) or (5). [sent-188, score-0.345]

78 7: end for 4 Convergence and exact recovery of scaled-gradient descent methods Let A = U∗ Σ∗ V∗T be the singular value decomposition of A, where U∗ ∈ Rm×r , V∗ ∈ Rn×r and Σ∗ ∈ Rr×r . [sent-189, score-0.356]

79 Assume that A is incoherent [14]; A has bounded entries and the minimum singular value of A is bounded away from 0. [sent-192, score-0.295]

80 From these observations, given an initial point that is sufﬁciently close to z∗ , a gradient descent procedure on F (with an additional regularization term to keep the intermediate points incoherent) converges to z∗ and exact recovery is obtained. [sent-197, score-0.244]

81 The singular value decomposition of a trimmed version of the observerd matrix AΩ can give us the initial point that ensures convergence. [sent-198, score-0.278]

82 4 in [14], the extreme singular values of any intermediate S are bounded by ∗ ∗ ∗ ∗ extreme singular values σmin and σmax of Σ∗ : σmax ≤ 2σmax and σmin ≥ 1 σmin . [sent-207, score-0.41]

83 ˜ The scaled-gradient is the descent direction of F as a direct result from the fact that it is in˜ deed the gradient of F based on a non-canonical metric. [sent-209, score-0.191]

84 ) are the canonical norm and distance on the Grassmann manifold respectively. [sent-214, score-0.224]

85 Based on this, a similar lower bound of ˜ grads F can be derived. [sent-215, score-0.406]

86 ≥ Therefore, the scaled gradients only vanish at z∗ which means the scaled-gradient descent procedure must converge to z∗ , which is the exact solution [3]. [sent-218, score-0.351]

87 6 5 Experiments and results The proposed algorithms were implemented in Matlab with some mex-routines to perform matrix multiplications with sparse masks. [sent-219, score-0.114]

88 First, we illustrate the improvement of scaled gradients over canonical gradients for steepest descent and conjugate gradient methods on 5000 × 5000 matrices with rank 5 (Figure 1). [sent-222, score-0.714]

89 The time needed for each iteration is roughly the same for all methods so we only present the results in terms of iteration counts. [sent-225, score-0.124]

90 We can see that there are some small improvements for the fully random case (Figure 1a) since the singular values are roughly the same. [sent-226, score-0.234]

91 Now, we compare the relaxed version of the scaled conjugate gradient which uses (5) to compute Si (ScGrass-CG) to LMaFit [23], Riemann-CG [22], RTRMC2 [6] (trust region method with second order information), SVP [12] and GROUSE [5] (Figure 2). [sent-231, score-0.359]

92 The subspace iteration method and the relaxed version of scaled steepest descent converge similarly to LMaFit, so we omit them in the graph. [sent-233, score-0.459]

93 When the condition number of the matrix is higher, ScGrass-CG converges fastest both in terms of iteration counts and execution time. [sent-239, score-0.135]

94 ScGrass-CG is the relaxed scaled CG method and ScGrassCG-Reg is the exact scaled CG method using a spectral-regularization version of F proposed in 7 10000x10000 − Rank 10 − 0. [sent-247, score-0.415]

95 Upper row is fully random, lower row is random with chosen singular values. [sent-252, score-0.234]

96 6 Conlusion and future work The gradients obtained from a scaled metric on the Grassmann manifold can result in improved convergence of gradient methods on matrix manifolds for matrix completion while maintaining good global convergence and exact recovery guarantees. [sent-281, score-1.022]

97 We have established a connection between scaled gradient methods and subspace iteration method for matrix completion. [sent-282, score-0.491]

98 The relaxed versions of the proposed gradient methods, adapted from the subspace iteration, are faster than previously discussed algorithms, sometimes much faster depending on the conditionining of the data matrix. [sent-283, score-0.221]

99 Fixed point and bregman iterative methods for matrix rank minimization. [sent-393, score-0.179]

100 Solving a low-rank factorization model for matrix completion using a non-linear successive over-relaxation algorithm. [sent-441, score-0.249]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('grads', 0.406), ('grassmann', 0.358), ('ui', 0.262), ('singular', 0.205), ('si', 0.186), ('manifold', 0.182), ('fv', 0.172), ('fu', 0.166), ('vi', 0.164), ('scaled', 0.163), ('grouse', 0.159), ('lmafit', 0.159), ('completion', 0.147), ('cg', 0.14), ('sv', 0.133), ('ei', 0.126), ('grass', 0.125), ('svp', 0.124), ('qf', 0.123), ('ti', 0.115), ('riemann', 0.109), ('gradfu', 0.106), ('rank', 0.106), ('gradient', 0.093), ('descent', 0.076), ('subspace', 0.075), ('matrix', 0.073), ('canon', 0.071), ('gradfv', 0.071), ('jester', 0.071), ('optspace', 0.071), ('randn', 0.071), ('vit', 0.071), ('subspaces', 0.068), ('manifolds', 0.068), ('rmse', 0.067), ('iteration', 0.062), ('tangent', 0.055), ('gradients', 0.054), ('keshavan', 0.054), ('entries', 0.053), ('cinc', 0.053), ('gradf', 0.053), ('scgrass', 0.053), ('relaxed', 0.053), ('steep', 0.051), ('conjugate', 0.05), ('orthonormal', 0.049), ('transport', 0.047), ('realizes', 0.047), ('matrices', 0.046), ('svd', 0.045), ('movielens', 0.043), ('canonical', 0.042), ('multiplications', 0.041), ('metric', 0.04), ('recovery', 0.039), ('incoherent', 0.037), ('exact', 0.036), ('boumal', 0.035), ('nmaes', 0.035), ('randomizations', 0.035), ('rassmann', 0.035), ('retraction', 0.035), ('saad', 0.035), ('teration', 0.035), ('transported', 0.035), ('ubspace', 0.035), ('dominant', 0.034), ('incomplete', 0.033), ('rm', 0.031), ('absil', 0.031), ('steepest', 0.03), ('factorization', 0.029), ('srebro', 0.029), ('fully', 0.029), ('update', 0.029), ('find', 0.029), ('matlab', 0.027), ('xi', 0.027), ('dai', 0.027), ('ru', 0.027), ('convergence', 0.027), ('count', 0.026), ('connection', 0.025), ('qr', 0.025), ('trust', 0.024), ('observed', 0.023), ('readers', 0.023), ('rv', 0.023), ('line', 0.022), ('tu', 0.022), ('spectrum', 0.022), ('direction', 0.022), ('endowed', 0.022), ('vanish', 0.022), ('riemannian', 0.021), ('ss', 0.021), ('scaling', 0.021), ('span', 0.021)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999988 301 nips-2012-Scaled Gradients on Grassmann Manifolds for Matrix Completion

Author: Thanh Ngo, Yousef Saad

2 0.13420397 277 nips-2012-Probabilistic Low-Rank Subspace Clustering

Author: S. D. Babacan, Shinichi Nakajima, Minh Do

Abstract: In this paper, we consider the problem of clustering data points into lowdimensional subspaces in the presence of outliers. We pose the problem using a density estimation formulation with an associated generative model. Based on this probability model, we ﬁrst develop an iterative expectation-maximization (EM) algorithm and then derive its global solution. In addition, we develop two Bayesian methods based on variational Bayesian (VB) approximation, which are capable of automatic dimensionality selection. While the ﬁrst method is based on an alternating optimization scheme for all unknowns, the second method makes use of recent results in VB matrix factorization leading to fast and effective estimation. Both methods are extended to handle sparse outliers for robustness and can handle missing values. Experimental results suggest that proposed methods are very effective in subspace clustering and identifying outliers. 1

3 0.12523238 134 nips-2012-Finite Sample Convergence Rates of Zero-Order Stochastic Optimization Methods

Author: Andre Wibisono, Martin J. Wainwright, Michael I. Jordan, John C. Duchi

Abstract: We consider derivative-free algorithms for stochastic optimization problems that use only noisy function values rather than gradients, analyzing their ﬁnite-sample convergence rates. We show that if pairs of function values are available, algorithms that √ gradient estimates based on random perturbations suffer a factor use of at most d in convergence rate over traditional stochastic gradient methods, where d is the problem dimension. We complement our algorithmic development with information-theoretic lower bounds on the minimax convergence rate of such problems, which show that our bounds are sharp with respect to all problemdependent quantities: they cannot be improved by more than constant factors. 1

4 0.10486764 318 nips-2012-Sparse Approximate Manifolds for Differential Geometric MCMC

Author: Ben Calderhead, Mátyás A. Sustik

Abstract: One of the enduring challenges in Markov chain Monte Carlo methodology is the development of proposal mechanisms to make moves distant from the current point, that are accepted with high probability and at low computational cost. The recent introduction of locally adaptive MCMC methods based on the natural underlying Riemannian geometry of such models goes some way to alleviating these problems for certain classes of models for which the metric tensor is analytically tractable, however computational efﬁciency is not assured due to the necessity of potentially high-dimensional matrix operations at each iteration. In this paper we ﬁrstly investigate a sampling-based approach for approximating the metric tensor and suggest a valid MCMC algorithm that extends the applicability of Riemannian Manifold MCMC methods to statistical models that do not admit an analytically computable metric tensor. Secondly, we show how the approximation scheme we consider naturally motivates the use of 1 regularisation to improve estimates and obtain a sparse approximate inverse of the metric, which enables stable and sparse approximations of the local geometry to be made. We demonstrate the application of this algorithm for inferring the parameters of a realistic system of ordinary differential equations using a biologically motivated robust Student-t error model, for which the Expected Fisher Information is analytically intractable. 1

5 0.09965767 29 nips-2012-Accelerated Training for Matrix-norm Regularization: A Boosting Approach

Author: Xinhua Zhang, Dale Schuurmans, Yao-liang Yu

Abstract: Sparse learning models typically combine a smooth loss with a nonsmooth penalty, such as trace norm. Although recent developments in sparse approximation have offered promising solution methods, current approaches either apply only to matrix-norm constrained problems or provide suboptimal convergence rates. In this paper, we propose a boosting method for regularized learning that guarantees accuracy within O(1/ ) iterations. Performance is further accelerated by interlacing boosting with ﬁxed-rank local optimization—exploiting a simpler local objective than previous work. The proposed method yields state-of-the-art performance on large-scale problems. We also demonstrate an application to latent multiview learning for which we provide the ﬁrst efﬁcient weak-oracle. 1

6 0.096103735 206 nips-2012-Majorization for CRFs and Latent Likelihoods

7 0.092118941 225 nips-2012-Multi-task Vector Field Learning

8 0.088157415 246 nips-2012-Nonparametric Max-Margin Matrix Factorization for Collaborative Prediction

9 0.083764374 64 nips-2012-Calibrated Elastic Regularization in Matrix Completion

10 0.081843786 199 nips-2012-Link Prediction in Graphs with Autoregressive Features

11 0.078741945 9 nips-2012-A Geometric take on Metric Learning

12 0.076622061 320 nips-2012-Spectral Learning of General Weighted Automata via Constrained Matrix Completion

13 0.075199343 287 nips-2012-Random function priors for exchangeable arrays with applications to graphs and relational data

14 0.074058309 272 nips-2012-Practical Bayesian Optimization of Machine Learning Algorithms

15 0.072273165 27 nips-2012-A quasi-Newton proximal splitting method

16 0.071248248 247 nips-2012-Nonparametric Reduced Rank Regression

17 0.07107114 179 nips-2012-Learning Manifolds with K-Means and K-Flats

18 0.069503717 204 nips-2012-MAP Inference in Chains using Column Generation

19 0.067177385 240 nips-2012-Newton-Like Methods for Sparse Inverse Covariance Estimation

20 0.067008071 333 nips-2012-Synchronization can Control Regularization in Neural Systems via Correlated Noise Processes

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.162), (1, 0.024), (2, 0.063), (3, -0.083), (4, 0.015), (5, 0.05), (6, -0.023), (7, 0.01), (8, 0.052), (9, -0.01), (10, 0.01), (11, -0.058), (12, -0.072), (13, -0.025), (14, 0.009), (15, 0.148), (16, -0.0), (17, -0.063), (18, 0.178), (19, -0.011), (20, 0.017), (21, -0.023), (22, -0.01), (23, -0.029), (24, -0.12), (25, 0.03), (26, 0.03), (27, -0.06), (28, -0.028), (29, 0.019), (30, 0.017), (31, 0.069), (32, -0.051), (33, 0.16), (34, -0.08), (35, -0.034), (36, 0.004), (37, 0.019), (38, 0.085), (39, 0.088), (40, 0.107), (41, -0.149), (42, 0.048), (43, 0.052), (44, 0.049), (45, -0.043), (46, -0.048), (47, 0.022), (48, -0.071), (49, 0.036)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.94432485 301 nips-2012-Scaled Gradients on Grassmann Manifolds for Matrix Completion

Author: Thanh Ngo, Yousef Saad

2 0.65002882 17 nips-2012-A Scalable CUR Matrix Decomposition Algorithm: Lower Time Complexity and Tighter Bound

Author: Shusen Wang, Zhihua Zhang

Abstract: The CUR matrix decomposition is an important extension of Nystr¨ m approximao tion to a general matrix. It approximates any data matrix in terms of a small number of its columns and rows. In this paper we propose a novel randomized CUR algorithm with an expected relative-error bound. The proposed algorithm has the advantages over the existing relative-error CUR algorithms that it possesses tighter theoretical bound and lower time complexity, and that it can avoid maintaining the whole data matrix in main memory. Finally, experiments on several real-world datasets demonstrate signiﬁcant improvement over the existing relative-error algorithms. 1

3 0.61918753 320 nips-2012-Spectral Learning of General Weighted Automata via Constrained Matrix Completion

Author: Borja Balle, Mehryar Mohri

Abstract: Many tasks in text and speech processing and computational biology require estimating functions mapping strings to real numbers. A broad class of such functions can be deﬁned by weighted automata. Spectral methods based on the singular value decomposition of a Hankel matrix have been recently proposed for learning a probability distribution represented by a weighted automaton from a training sample drawn according to this same target distribution. In this paper, we show how spectral methods can be extended to the problem of learning a general weighted automaton from a sample generated by an arbitrary distribution. The main obstruction to this approach is that, in general, some entries of the Hankel matrix may be missing. We present a solution to this problem based on solving a constrained matrix completion problem. Combining these two ingredients, matrix completion and spectral method, a whole new family of algorithms for learning general weighted automata is obtained. We present generalization bounds for a particular algorithm in this family. The proofs rely on a joint stability analysis of matrix completion and spectral learning. 1

4 0.61345226 225 nips-2012-Multi-task Vector Field Learning

Author: Binbin Lin, Sen Yang, Chiyuan Zhang, Jieping Ye, Xiaofei He

Abstract: Multi-task learning (MTL) aims to improve generalization performance by learning multiple related tasks simultaneously and identifying the shared information among tasks. Most of existing MTL methods focus on learning linear models under the supervised setting. We propose a novel semi-supervised and nonlinear approach for MTL using vector ﬁelds. A vector ﬁeld is a smooth mapping from the manifold to the tangent spaces which can be viewed as a directional derivative of functions on the manifold. We argue that vector ﬁelds provide a natural way to exploit the geometric structure of data as well as the shared differential structure of tasks, both of which are crucial for semi-supervised multi-task learning. In this paper, we develop multi-task vector ﬁeld learning (MTVFL) which learns the predictor functions and the vector ﬁelds simultaneously. MTVFL has the following key properties. (1) The vector ﬁelds MTVFL learns are close to the gradient ﬁelds of the predictor functions. (2) Within each task, the vector ﬁeld is required to be as parallel as possible which is expected to span a low dimensional subspace. (3) The vector ﬁelds from all tasks share a low dimensional subspace. We formalize our idea in a regularization framework and also provide a convex relaxation method to solve the original non-convex problem. The experimental results on synthetic and real data demonstrate the effectiveness of our proposed approach. 1

5 0.54052526 277 nips-2012-Probabilistic Low-Rank Subspace Clustering

Author: S. D. Babacan, Shinichi Nakajima, Minh Do

6 0.53365111 125 nips-2012-Factoring nonnegative matrices with linear programs

7 0.5313583 29 nips-2012-Accelerated Training for Matrix-norm Regularization: A Boosting Approach

8 0.505045 86 nips-2012-Convex Multi-view Subspace Learning

9 0.50262576 334 nips-2012-Tensor Decomposition for Fast Parsing with Latent-Variable PCFGs

10 0.49376401 22 nips-2012-A latent factor model for highly multi-relational data

11 0.49327549 135 nips-2012-Forging The Graphs: A Low Rank and Positive Semidefinite Graph Learning Approach

12 0.48168784 318 nips-2012-Sparse Approximate Manifolds for Differential Geometric MCMC

13 0.47047377 44 nips-2012-Approximating Concavely Parameterized Optimization Problems

14 0.45790708 333 nips-2012-Synchronization can Control Regularization in Neural Systems via Correlated Noise Processes

15 0.4512459 206 nips-2012-Majorization for CRFs and Latent Likelihoods

16 0.45065728 134 nips-2012-Finite Sample Convergence Rates of Zero-Order Stochastic Optimization Methods

17 0.45034653 321 nips-2012-Spectral learning of linear dynamics from generalised-linear observations with application to neural population data

18 0.43839401 246 nips-2012-Nonparametric Max-Margin Matrix Factorization for Collaborative Prediction

19 0.43763658 54 nips-2012-Bayesian Probabilistic Co-Subspace Addition

20 0.43748295 281 nips-2012-Provable ICA with Unknown Gaussian Noise, with Implications for Gaussian Mixtures and Autoencoders

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.022), (9, 0.01), (20, 0.253), (21, 0.013), (38, 0.182), (42, 0.032), (54, 0.047), (55, 0.013), (61, 0.018), (74, 0.058), (76, 0.127), (80, 0.084), (92, 0.038)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.87519264 350 nips-2012-Trajectory-Based Short-Sighted Probabilistic Planning

Author: Felipe Trevizan, Manuela Veloso

Abstract: Probabilistic planning captures the uncertainty of plan execution by probabilistically modeling the effects of actions in the environment, and therefore the probability of reaching different states from a given state and action. In order to compute a solution for a probabilistic planning problem, planners need to manage the uncertainty associated with the different paths from the initial state to a goal state. Several approaches to manage uncertainty were proposed, e.g., consider all paths at once, perform determinization of actions, and sampling. In this paper, we introduce trajectory-based short-sighted Stochastic Shortest Path Problems (SSPs), a novel approach to manage uncertainty for probabilistic planning problems in which states reachable with low probability are substituted by artiﬁcial goals that heuristically estimate their cost to reach a goal state. We also extend the theoretical results of Short-Sighted Probabilistic Planner (SSiPP) [1] by proving that SSiPP always ﬁnishes and is asymptotically optimal under sufﬁcient conditions on the structure of short-sighted SSPs. We empirically compare SSiPP using trajectorybased short-sighted SSPs with the winners of the previous probabilistic planning competitions and other state-of-the-art planners in the triangle tireworld problems. Trajectory-based SSiPP outperforms all the competitors and is the only planner able to scale up to problem number 60, a problem in which the optimal solution contains approximately 1070 states. 1

same-paper 2 0.80156088 301 nips-2012-Scaled Gradients on Grassmann Manifolds for Matrix Completion

Author: Thanh Ngo, Yousef Saad

3 0.73423308 139 nips-2012-Fused sparsity and robust estimation for linear models with unknown variance

Author: Arnak Dalalyan, Yin Chen

Abstract: In this paper, we develop a novel approach to the problem of learning sparse representations in the context of fused sparsity and unknown noise level. We propose an algorithm, termed Scaled Fused Dantzig Selector (SFDS), that accomplishes the aforementioned learning task by means of a second-order cone program. A special emphasize is put on the particular instance of fused sparsity corresponding to the learning in presence of outliers. We establish ﬁnite sample risk bounds and carry out an experimental evaluation on both synthetic and real data. 1

4 0.72850424 90 nips-2012-Deep Learning of Invariant Features via Simulated Fixations in Video

Author: Will Zou, Shenghuo Zhu, Kai Yu, Andrew Y. Ng

Abstract: We apply salient feature detection and tracking in videos to simulate ﬁxations and smooth pursuit in human vision. With tracked sequences as input, a hierarchical network of modules learns invariant features using a temporal slowness constraint. The network encodes invariance which are increasingly complex with hierarchy. Although learned from videos, our features are spatial instead of spatial-temporal, and well suited for extracting features from still images. We applied our features to four datasets (COIL-100, Caltech 101, STL-10, PubFig), and observe a consistent improvement of 4% to 5% in classiﬁcation accuracy. With this approach, we achieve state-of-the-art recognition accuracy 61% on STL-10 dataset. 1

5 0.70106035 255 nips-2012-On the Use of Non-Stationary Policies for Stationary Infinite-Horizon Markov Decision Processes

Author: Bruno Scherrer, Boris Lesner

Abstract: We consider inﬁnite-horizon stationary γ-discounted Markov Decision Processes, for which it is known that there exists a stationary optimal policy. Using Value and Policy Iteration with some error at each iteration, it is well-known that one 2γ can compute stationary policies that are (1−γ)2 -optimal. After arguing that this guarantee is tight, we develop variations of Value and Policy Iteration for com2γ puting non-stationary policies that can be up to 1−γ -optimal, which constitutes a signiﬁcant improvement in the usual situation when γ is close to 1. Surprisingly, this shows that the problem of “computing near-optimal non-stationary policies” is much simpler than that of “computing near-optimal stationary policies”. 1

6 0.69907898 162 nips-2012-Inverse Reinforcement Learning through Structured Classification

7 0.69603401 178 nips-2012-Learning Label Trees for Probabilistic Modelling of Implicit Feedback

8 0.69530004 83 nips-2012-Controlled Recognition Bounds for Visual Learning and Exploration

9 0.69454437 199 nips-2012-Link Prediction in Graphs with Autoregressive Features

10 0.69451845 38 nips-2012-Algorithms for Learning Markov Field Policies

11 0.69377458 187 nips-2012-Learning curves for multi-task Gaussian process regression

12 0.69245684 15 nips-2012-A Polylog Pivot Steps Simplex Algorithm for Classification

13 0.69245362 230 nips-2012-Multiple Choice Learning: Learning to Produce Multiple Structured Outputs

14 0.69191438 358 nips-2012-Value Pursuit Iteration

15 0.69154155 17 nips-2012-A Scalable CUR Matrix Decomposition Algorithm: Lower Time Complexity and Tighter Bound

16 0.691248 246 nips-2012-Nonparametric Max-Margin Matrix Factorization for Collaborative Prediction

17 0.69100034 236 nips-2012-Near-Optimal MAP Inference for Determinantal Point Processes

18 0.69032753 160 nips-2012-Imitation Learning by Coaching

19 0.68989819 227 nips-2012-Multiclass Learning with Simplex Coding

20 0.68961167 241 nips-2012-No-Regret Algorithms for Unconstrained Online Convex Optimization