nips nips2009 nips2009-177 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Raman Arora
Abstract: An algorithm is presented for online learning of rotations. The proposed algorithm involves matrix exponentiated gradient updates and is motivated by the von Neumann divergence. The multiplicative updates are exponentiated skew-symmetric matrices which comprise the Lie algebra of the rotation group. The orthonormality and unit determinant of the matrix parameter are preserved using matrix logarithms and exponentials and the algorithm lends itself to intuitive interpretation in terms of the differential geometry of the manifold associated with the rotation group. A complexity reduction result is presented that exploits the eigenstructure of the matrix updates to simplify matrix exponentiation to a quadratic form. 1
Reference: text
sentIndex sentText sentNum sentScore
1 The proposed algorithm involves matrix exponentiated gradient updates and is motivated by the von Neumann divergence. [sent-4, score-0.576]
2 The multiplicative updates are exponentiated skew-symmetric matrices which comprise the Lie algebra of the rotation group. [sent-5, score-0.865]
3 The orthonormality and unit determinant of the matrix parameter are preserved using matrix logarithms and exponentials and the algorithm lends itself to intuitive interpretation in terms of the differential geometry of the manifold associated with the rotation group. [sent-6, score-0.896]
4 A complexity reduction result is presented that exploits the eigenstructure of the matrix updates to simplify matrix exponentiation to a quadratic form. [sent-7, score-0.405]
5 1 Introduction The problem of learning rotations finds application in many areas of signal processing and machine learning. [sent-8, score-0.205]
6 It is an important problem since many problems can be reduced to that of learning rotations; for instance Euclidean motion in Rn−1 is simply rotation in Rn . [sent-9, score-0.395]
7 A conformal embedding was presented in [1] that extends rotations to a representation for all Euclidean transformations. [sent-10, score-0.233]
8 Furthermore, the rotation group provides a universal representation for all Lie groups. [sent-11, score-0.422]
9 This was established in [2] by showing that any Lie algebra can be expressed as a bivector algebra. [sent-12, score-0.164]
10 Since the Lie algebra describes the structure of the associated Lie group completely, any Lie group can be represented as rotation group. [sent-13, score-0.663]
11 While the batch version of the problem is well understood, the online learning of rotations from vector instances is challenging since the manifold associated with the rotation group is a curved space and it is not possible to form updates that are linear combinations of rotations [13]. [sent-17, score-1.195]
12 The set of rotations about the origin in n-dimensional Euclidean space forms a compact Lie group, SO(n), under the operation of composition. [sent-18, score-0.205]
13 The manifold associated with the n-dimensional rotation group is the unit sphere Sn−1 in n dimensional Euclidean space. [sent-19, score-0.514]
14 1 Related Work The online version of learning rotations was posed as an open problem by Smith and Warmuth [13]. [sent-21, score-0.312]
15 In [14], an online algorithm was proposed for learning density matrix parameters and was extended in [15] to the problem of learning subspaces of low rank. [sent-23, score-0.192]
16 However, the extension of these algorithms to learning rotations will require repeated projection and approximation [13]. [sent-24, score-0.205]
17 2 Our Approach This paper presents an online algorithm for learning rotations that utilizes the Bregman matrix divergence with respect to the quantum relative entropy (also known as von Neumann divergence) as a distance measure between two rotation matrices. [sent-28, score-0.983]
18 The resulting algorithm has matrix-exponentiated gradient (MEG) updates [14]. [sent-29, score-0.231]
19 The key ingredients of our approach are (a) von Neumann Divergence between rotation matrices [17], (b) squared error loss function and (c) matrix exponentiated gradient (MEG) updates. [sent-30, score-0.911]
20 Any Lie group is also a smooth manifold and the updates in the proposed algorithm have an intuitive interpretation in terms of the differential topology of the associated manifold. [sent-31, score-0.365]
21 We also utilize various elementary Lie algebra concepts to provide intuitive interpretation of the updates. [sent-32, score-0.236]
22 The development in the paper closely follows that of the matrix exponentiated gradient (MEG) updates in [14] for density matrix parameters. [sent-33, score-0.589]
23 The form of the updates are similar to steepest descent methods of [16], but are derived for learning rotations from vector instances using an information-theoretic approach. [sent-34, score-0.431]
24 The MEG updates are reduced to a quadratic form in the Lie algebra element corresponding to the gradient of loss function on the rotation group. [sent-35, score-0.875]
25 Section 3 presents mathematical preliminaries in differential geometry and Bregman matrix divergence. [sent-38, score-0.199]
26 The matrix exponentiated gradient updates are developed in Section 4. [sent-39, score-0.487]
27 2 Problem Statement Let xt be a stream of instances of n-dimensional unit vectors. [sent-42, score-0.179]
28 Let R∗ be an unknown n × n rotation ˆ matrix that acts on xt to give the rotated vector yt = R∗ xt . [sent-43, score-0.961]
29 The matrix Rt denotes the estimate ˆ ˆ of R∗ at instance t and yt = Rt xt represents the prediction for the rotated vector yt . [sent-44, score-0.622]
30 The loss ˆ incurred due to error in prediction is Lt (Rt ) = d(ˆ t , yt ), where d(·, ·) is a distance function. [sent-45, score-0.249]
31 The y estimate of the rotation needs to be updated based on the loss incurred at every instance and the objective is to develop an algorithm for learning R∗ that has a bounded regret. [sent-46, score-0.474]
32 This is a typical problem formulation in online learning where the objective comprises a loss function and a divergence term. [sent-48, score-0.203]
33 Minimizing the weighted objective therefore results in smooth updates as well as minimizes the loss function. [sent-50, score-0.205]
34 In this paper, the updates are smoothed using the von Neumann divergence which is defined for matrices as ˆ ˆ ˆ ∆F (R, Rt ) = tr(R log R − R log Rt − R + Rt ), (2) where tr(A) is the trace of the matrix A. [sent-51, score-0.517]
35 3 Mathematical Preliminaries This section reviews some basic definitions and concepts in linear algebra and differential geometry that are utilized for the development of the updates in the next section. [sent-55, score-0.415]
36 1 Matrix Calculus Given a real-valued matrix function F : Rn×n → R, the gradient of the function with respect to the matrix R ∈ Rn×n is defined to be the matrix [18], ∂F ∂F · · · ∂R1n ∂R11 . [sent-57, score-0.383]
37 ∂F ∂F · · · ∂Rnn ∂Rn1 Some of the matrix derivatives that are used later in the paper are following: for a constant matrix Γ ∈ Rn×n , 1. [sent-67, score-0.204]
38 R A related concept in differential geometry is that of the space of vectors tangent to a group at the identity element of the group. [sent-71, score-0.244]
39 This is defined to be the Lie algebra associated with the group. [sent-72, score-0.182]
40 The utility of the Lie algebra is due to the fact that it is a vector space and thus it is much easier to work with it than with the linear group. [sent-74, score-0.164]
41 A real n × n matrix A is in the Lie algebra of the rotation group SO(n) if and only if it is a skewsymmetric matrix (i. [sent-75, score-0.79]
42 Furthermore, for any matrix A in the Lie algebra of SO(n), exp(ηA) is a one-parameter subgroup of the rotation group, parametrized by η ∈ R [19]. [sent-78, score-0.629]
43 The matrix exponential and logarithm play an important role in relating a matrix Lie group G and the associated Lie algebra g. [sent-79, score-0.528]
44 Given an element A ∈ g, the matrix exponential exp(A) is the corresponding element in the group. [sent-82, score-0.239]
45 The matrix logarithm log (R) is defined to be the inverse of the matrix exponential: it maps from the Lie group G into the Lie algebra g. [sent-83, score-0.493]
46 The matrix logarithm is a well-defined map since the exponential map is a local diffeomorphism between a neighborhood of the zero matrix and a neighborhood of the identity matrix [19, 20]. [sent-84, score-0.41]
47 2 Riemannian Gradient Consider a real-valued differentiable function, Lt : SO(n) → R, defined on the rotation group. [sent-86, score-0.363]
48 This paper utilizes the von Neumann divergence which is a special case of the Bregman divergence and measures discrepancy between two matrices. [sent-90, score-0.281]
49 (6) The gradient of Bregman divergence with respect to R1 is given as, R1 ∆F (R1 , R2 ) = f (R1 ) − f (R2 ). [sent-93, score-0.156]
50 3 (7) Choosing the function F in the definition of Bregman divergence to be the von Neumann entropy, given as F (R) = tr(R log R − R)), obtain the von Neumann divergence [14, 17]: ∆F (R1 , R2 ) = Tr(R1 log R1 − R1 log R2 − R1 + R2 ). [sent-94, score-0.396]
51 (8) Finally, the gradient of the von Neumann entropy was shown to be f (R) = R F (R) = log R in [14]. [sent-95, score-0.186]
52 Consequently, the gradient of the von Neumann divergence can be expressed as R1 ∆F (R1 , R2 ) 4 = log (R1 ) − log (R2 ). [sent-96, score-0.285]
53 (9) Online Algorithm The problem of online learning of rotations can be expressed as the optimization problem ˆ Rt+1 = arg min ˆ ∆F (R, Rt ) + ηLt (R) R s. [sent-97, score-0.278]
54 (10) RT R = I, RRT = I det(R) = 1 ˆ where Rt is the estimate of the rotation matrix at time instance t and Lt is the loss incurred in the prediction of yt . [sent-99, score-0.72]
55 The loss function is defined to be the squared error loss function and therefore the gradient of the ˆ loss function is given by the matrix R Lt (Rt ) = 2(ˆ t − yt )xT . [sent-102, score-0.521]
56 This results in the online updates y t ˆ Rt+1 ˆ y ˆ = exp log Rt − 2η skew RT (ˆ t − yt )xT t t ˆ = Rt exp 4. [sent-103, score-0.565]
57 Introducing the Lagrangian multiplier matrix Γ for the orthonormality constraint and Lagrangian multiplier λ for the unity determinant constraint, the objective function can be written as ˆ J (R, Γ, λ) = ∆F (R, Rt ) + ηLt (R) + tr(Γ(RRT − I)) + λ(det(R) − 1). [sent-107, score-0.222]
58 (13) Taking the gradient on both sides of equation with respect to the matrix R, get R J (R, Γ, λ) = R ˆ ∆F (R, Rt ) + η ˜ R Lt (R) +(Γ + ΓT )R + λ det(R)(R−1 )T , 4 (14) using the matrix derivatives from Section 3. [sent-108, score-0.281]
59 (16) R Lt (R) Since the objective is convex, it is sufficient to produce a choice of Lagrange multipliers that enT forces the rotation constraint. [sent-113, score-0.363]
60 Choosing λ = det(R)−1 and Γ = −(1/2) R−1 R−1 yields the following implicit update ˆ Rt+1 = exp ˆ ˆ log Rt − η skew RT t ˆ R Lt (Rt+1 ) . [sent-114, score-0.194]
61 However, by approximating R Lt (Rt+1 ) with R Lt (Rt ) (as in [21, 14]), we obtain an explicit update ˆ Rt+1 = exp ˆ ˆ log Rt − η skew RT t ˆ R Lt (Rt ) . [sent-118, score-0.194]
62 (18) The next result ensures the closure property for the matrix exponentiated gradient updates in the equation above. [sent-119, score-0.487]
63 In other words, the estimates for the rotation matrix do not steer away from the ˆ ˆ manifold associated with the rotation group. [sent-120, score-0.881]
64 If Rt ∈ SO(n) then Rt+1 given by the updates in (18) is a rotation matrix in SO(n). [sent-123, score-0.619]
65 Using the properties of matrix logarithm and matrix exponential, express (18) as ˆ ˆ Rt+1 = Rt exp(−ηS), ˆ where S = RT t trace zero. [sent-125, score-0.273]
66 Similarly, ˆ ˆ ˆ det(Rt+1 ) = det(Rt e−ηS ) = det(Rt ) · det(e−ηS ) = e−η Tr (S) , since determinant of exponential of a matrix is equal to the exponential of the trace of the matrix. [sent-128, score-0.226]
67 2 Differential Geometrical Interpretation The resulting updates in (18) have nice interpretation in terms of the differential geometry of the ˆ rotation group. [sent-131, score-0.636]
68 The gradient of the cost function, R Lt (Rt ), in the Euclidean space gives a tangent direction at the current estimate of the rotation matrix. [sent-132, score-0.457]
69 The Riemannian gradient at the identity element of the group is ˆ ˆ obtained by de-rotation by Rt , giving ˜ R Lt (Rt ), as in (5). [sent-134, score-0.207]
70 The gradient corresponds to an element of the Lie algebra, so(n), of the rotation group. [sent-135, score-0.49]
71 The exponential map gives the corresponding rotation matrix which is the multiplicative update to the estimate of the rotation matrix at the previous instance. [sent-136, score-0.987]
72 5 5 Complexity Reduction of MEG Updates The matrix exponentiated gradient updates ensure that the estimates for the rotation matrix stay on the manifold associated with the rotation group at each iteration. [sent-137, score-1.427]
73 However, with the matrix exponentiation at each step, the updates are computationally intensive and in fact the computational complexity of the updates is comparable to other approaches that would require repeated approximation and projection on to the manifold. [sent-138, score-0.441]
74 (12) (for the case of squared error loss function) can be written as ˆ y S = −2η skew RT (ˆ t − yt )xT , t t ˆ ˆ = −2η skew RT (Rt xt − R∗ xt )xT , t t ˆ = −2η skew xt xT − RT R∗ xt xT , t t t = ˆ ˆ RT R∗ xt xT − xt xT RT Rt , t t t ∗ 2η = AT X − XA, (20) ˆ where X ≡ xt xT and A ≡ 2ηRT Rt . [sent-141, score-1.594]
75 Each term in the matrix S is a rank-one matrix (due to pre ∗ t and post-multiplication with xt xT , respectively). [sent-142, score-0.34]
76 Tˆ 1 − yt yt 2 ˆ = Rt I+ sin(λ) 1 − cos(λ) 2 S+ S , λ λ2 (21) and S is the skew-symmetric matrix given in eqn. [sent-148, score-0.39]
77 (20) with eigenval- ˆ Note that yt , yt are unit vectors in Rn and therefore λ is real-valued. [sent-149, score-0.31]
78 Owing to the result above the matrix exponential reduces to a simple quadratic form involving an element from the Lie algebra of the rotation group. [sent-152, score-0.732]
79 The performance of the algorithm is evaluated in terms of the Frobenius norm of the difference of the true rotation matrix and the estimate. [sent-158, score-0.495]
80 The unknown rotation is a 12 × 12 dimensional matrix and changes randomly every 200 instances. [sent-160, score-0.481]
81 It is clear from the plot that the estimation error decays rapidly to zero and estimates of the rotation matrices are exact. [sent-162, score-0.437]
82 Estimation error versus time − SO(12) 5 Frobenius norm Spectral norm Estimation Error 4 3 2 1 0 0 200 400 600 800 Time (instance index) 1000 1200 Figure 1: Online learning of rotations: Estimate of unknown rotation is updated every time new instance of rotation is observed. [sent-163, score-0.876]
83 The true rotation matrix is randomly changing at regular interval (N=200). [sent-164, score-0.465]
84 the observations are now given as yt = R∗ xt + α wt , where α determines the signal to noise ratio. [sent-168, score-0.28]
85 In Figure 3, the unknown rotation R∗ ∈ SO(30) changes slightly after every 30 instances. [sent-176, score-0.379]
86 The smoothly changing rotation is induced by composing R∗ matrix with a matrix Rδ every thirty iterations. [sent-177, score-0.587]
87 The matrix Rδ is composed of 3 × 3 block-diagonal matrices, each corresponding to rotation about the X-axis in 3D space by π/360 radians. [sent-178, score-0.465]
88 The batch version stores the last 30 instances in an 30 × 30 matrix X and corresponding rotated vectors in matrix Y. [sent-179, score-0.351]
89 The estimate of the unknown rotation is given as YX−1 . [sent-180, score-0.379]
90 The batch version achieves zero error only at time instances when all the data in X, Y correspond to the same rotation whereas the online version consistently achieves a low error and tracks the changing rotation. [sent-181, score-0.587]
91 The proposed algorithm was also applied to learning and tracking the rotations of 3D objects. [sent-184, score-0.242]
92 The algorithm was motivated using the von Neumann divergence and squared error loss function and the updates were 7 Estimation Error (Frobenius norm) Avg. [sent-187, score-0.418]
93 Tracking rotations in SO(30) 5 Error (Frobenius norm) Batch version Online algorithm 4 3 2 1 0 100 200 300 400 Time (instance index) 500 600 Figure 3: Comparing the performance of tracking rotations for the batch version versus the online algorithm. [sent-190, score-0.614]
94 The rotation matrix changes smoothly every M = 30 instances. [sent-191, score-0.485]
95 developed in the Lie algebra of the rotation group. [sent-192, score-0.527]
96 The resulting matrix exponentiated gradient updates were reduced to a simple quadratic form. [sent-193, score-0.503]
97 References [1] Rich Wareham, Jonathan Cameron, and Joan Lasenby, “Applications of conformal geometric algebra in computer vision and graphics,” in IWMM/GIAE, 2004, pp. [sent-203, score-0.192]
98 [14] Koji Tsuda, Gunnar Ratsch, and Manfred K Warmuth, “Matrix exponentiated gradient updates for on-line learning and Bregman projection,” Journal of Machine Learning Research, vol. [sent-259, score-0.385]
99 Warmuth, “Exponentiated gradient versus gradient descent for linear predictors,” Information and Computation, vol. [sent-286, score-0.186]
100 [27] Raman Arora, “Tracking rotations of 3D Stanford bunny,” http://www. [sent-307, score-0.205]
wordName wordTfidf (topN-words)
[('rt', 0.632), ('rotation', 0.363), ('lt', 0.276), ('rotations', 0.205), ('algebra', 0.164), ('lie', 0.155), ('updates', 0.154), ('exponentiated', 0.154), ('yt', 0.144), ('xt', 0.136), ('skew', 0.134), ('neumann', 0.13), ('det', 0.122), ('matrix', 0.102), ('von', 0.089), ('meg', 0.083), ('divergence', 0.079), ('rrt', 0.079), ('gradient', 0.077), ('online', 0.073), ('rotated', 0.064), ('arora', 0.063), ('bregman', 0.063), ('frobenius', 0.061), ('riemannian', 0.059), ('group', 0.059), ('quantum', 0.055), ('raman', 0.055), ('differential', 0.054), ('loss', 0.051), ('element', 0.05), ('logarithm', 0.046), ('batch', 0.046), ('geometry', 0.043), ('warmuth', 0.041), ('tr', 0.038), ('tracking', 0.037), ('spherical', 0.037), ('exponential', 0.037), ('steepest', 0.035), ('manifold', 0.035), ('rn', 0.034), ('exponentials', 0.032), ('instance', 0.032), ('bunny', 0.031), ('exponentiation', 0.031), ('multiplier', 0.031), ('orthonormality', 0.031), ('procrustes', 0.031), ('manfred', 0.031), ('norm', 0.03), ('rx', 0.03), ('matrices', 0.03), ('euclidean', 0.029), ('stanford', 0.028), ('incurred', 0.028), ('attitude', 0.028), ('wahba', 0.028), ('conformal', 0.028), ('elementary', 0.027), ('determinant', 0.027), ('error', 0.026), ('unitary', 0.024), ('intuitive', 0.023), ('trace', 0.023), ('tsuda', 0.022), ('logarithms', 0.022), ('interpretation', 0.022), ('unit', 0.022), ('instances', 0.021), ('identity', 0.021), ('madison', 0.02), ('update', 0.02), ('exp', 0.02), ('smoothly', 0.02), ('log', 0.02), ('pseudocode', 0.019), ('jun', 0.019), ('squared', 0.019), ('posed', 0.018), ('associated', 0.018), ('facts', 0.018), ('decays', 0.018), ('discrepancy', 0.017), ('subspaces', 0.017), ('utilizes', 0.017), ('eigenvalues', 0.017), ('tangent', 0.017), ('sphere', 0.017), ('robotics', 0.017), ('unknown', 0.016), ('descent', 0.016), ('lagrange', 0.016), ('lagrangian', 0.016), ('smith', 0.016), ('quadratic', 0.016), ('sin', 0.016), ('august', 0.016), ('version', 0.016), ('versus', 0.016)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999946 177 nips-2009-On Learning Rotations
Author: Raman Arora
Abstract: An algorithm is presented for online learning of rotations. The proposed algorithm involves matrix exponentiated gradient updates and is motivated by the von Neumann divergence. The multiplicative updates are exponentiated skew-symmetric matrices which comprise the Lie algebra of the rotation group. The orthonormality and unit determinant of the matrix parameter are preserved using matrix logarithms and exponentials and the algorithm lends itself to intuitive interpretation in terms of the differential geometry of the manifold associated with the rotation group. A complexity reduction result is presented that exploits the eigenstructure of the matrix updates to simplify matrix exponentiation to a quadratic form. 1
2 0.24435893 22 nips-2009-Accelerated Gradient Methods for Stochastic Optimization and Online Learning
Author: Chonghai Hu, Weike Pan, James T. Kwok
Abstract: Regularized risk minimization often involves non-smooth optimization, either because of the loss function (e.g., hinge loss) or the regularizer (e.g., ℓ1 -regularizer). Gradient methods, though highly scalable and easy to implement, are known to converge slowly. In this paper, we develop a novel accelerated gradient method for stochastic optimization while still preserving their computational simplicity and scalability. The proposed algorithm, called SAGE (Stochastic Accelerated GradiEnt), exhibits fast convergence rates on stochastic composite optimization with convex or strongly convex objectives. Experimental results show that SAGE is faster than recent (sub)gradient methods including FOLOS, SMIDAS and SCD. Moreover, SAGE can also be extended for online learning, resulting in a simple algorithm but with the best regret bounds currently known for these problems. 1
3 0.16078553 215 nips-2009-Sensitivity analysis in HMMs with application to likelihood maximization
Author: Pierre-arnaud Coquelin, Romain Deguest, Rémi Munos
Abstract: This paper considers a sensitivity analysis in Hidden Markov Models with continuous state and observation spaces. We propose an Infinitesimal Perturbation Analysis (IPA) on the filtering distribution with respect to some parameters of the model. We describe a methodology for using any algorithm that estimates the filtering density, such as Sequential Monte Carlo methods, to design an algorithm that estimates its gradient. The resulting IPA estimator is proven to be asymptotically unbiased, consistent and has computational complexity linear in the number of particles. We consider an application of this analysis to the problem of identifying unknown parameters of the model given a sequence of observations. We derive an IPA estimator for the gradient of the log-likelihood, which may be used in a gradient method for the purpose of likelihood maximization. We illustrate the method with several numerical experiments.
4 0.14839956 178 nips-2009-On Stochastic and Worst-case Models for Investing
Author: Elad Hazan, Satyen Kale
Abstract: In practice, most investing is done assuming a probabilistic model of stock price returns known as the Geometric Brownian Motion (GBM). While often an acceptable approximation, the GBM model is not always valid empirically. This motivates a worst-case approach to investing, called universal portfolio management, where the objective is to maximize wealth relative to the wealth earned by the best fixed portfolio in hindsight. In this paper we tie the two approaches, and design an investment strategy which is universal in the worst-case, and yet capable of exploiting the mostly valid GBM model. Our method is based on new and improved regret bounds for online convex optimization with exp-concave loss functions. 1
5 0.13188472 216 nips-2009-Sequential effects reflect parallel learning of multiple environmental regularities
Author: Matthew Wilder, Matt Jones, Michael C. Mozer
Abstract: Across a wide range of cognitive tasks, recent experience influences behavior. For example, when individuals repeatedly perform a simple two-alternative forcedchoice task (2AFC), response latencies vary dramatically based on the immediately preceding trial sequence. These sequential effects have been interpreted as adaptation to the statistical structure of an uncertain, changing environment (e.g., Jones and Sieck, 2003; Mozer, Kinoshita, and Shettel, 2007; Yu and Cohen, 2008). The Dynamic Belief Model (DBM) (Yu and Cohen, 2008) explains sequential effects in 2AFC tasks as a rational consequence of a dynamic internal representation that tracks second-order statistics of the trial sequence (repetition rates) and predicts whether the upcoming trial will be a repetition or an alternation of the previous trial. Experimental results suggest that first-order statistics (base rates) also influence sequential effects. We propose a model that learns both first- and second-order sequence properties, each according to the basic principles of the DBM but under a unified inferential framework. This model, the Dynamic Belief Mixture Model (DBM2), obtains precise, parsimonious fits to data. Furthermore, the model predicts dissociations in behavioral (Maloney, Martello, Sahm, and Spillmann, 2005) and electrophysiological studies (Jentzsch and Sommer, 2002), supporting the psychological and neurobiological reality of its two components. 1
6 0.10178815 27 nips-2009-Adaptive Regularization of Weight Vectors
7 0.096171111 202 nips-2009-Regularized Distance Metric Learning:Theory and Algorithm
8 0.083689041 137 nips-2009-Learning transport operators for image manifolds
9 0.07929758 151 nips-2009-Measuring Invariances in Deep Networks
10 0.070457205 154 nips-2009-Modeling the spacing effect in sequential category learning
11 0.068300001 220 nips-2009-Slow Learners are Fast
12 0.063373595 176 nips-2009-On Invariance in Hierarchical Models
13 0.063133858 61 nips-2009-Convex Relaxation of Mixture Regression with Efficient Algorithms
14 0.06151326 224 nips-2009-Sparse and Locally Constant Gaussian Graphical Models
15 0.060972035 116 nips-2009-Information-theoretic lower bounds on the oracle complexity of convex optimization
16 0.059371632 147 nips-2009-Matrix Completion from Noisy Entries
17 0.056267921 134 nips-2009-Learning to Explore and Exploit in POMDPs
18 0.053866718 57 nips-2009-Conditional Random Fields with High-Order Features for Sequence Labeling
19 0.053702042 223 nips-2009-Sparse Metric Learning via Smooth Optimization
20 0.052075665 74 nips-2009-Efficient Bregman Range Search
topicId topicWeight
[(0, -0.148), (1, 0.132), (2, 0.091), (3, -0.04), (4, 0.168), (5, 0.159), (6, 0.058), (7, 0.067), (8, 0.039), (9, -0.025), (10, 0.116), (11, 0.104), (12, -0.064), (13, 0.005), (14, 0.013), (15, -0.146), (16, -0.053), (17, -0.032), (18, -0.034), (19, 0.053), (20, 0.01), (21, 0.007), (22, 0.04), (23, 0.0), (24, 0.022), (25, 0.034), (26, 0.032), (27, -0.107), (28, 0.088), (29, -0.053), (30, 0.054), (31, -0.017), (32, -0.038), (33, 0.03), (34, -0.008), (35, -0.069), (36, 0.062), (37, -0.046), (38, 0.009), (39, 0.098), (40, 0.163), (41, 0.037), (42, 0.058), (43, 0.008), (44, 0.006), (45, 0.05), (46, -0.063), (47, 0.054), (48, 0.098), (49, -0.043)]
simIndex simValue paperId paperTitle
same-paper 1 0.9597885 177 nips-2009-On Learning Rotations
Author: Raman Arora
Abstract: An algorithm is presented for online learning of rotations. The proposed algorithm involves matrix exponentiated gradient updates and is motivated by the von Neumann divergence. The multiplicative updates are exponentiated skew-symmetric matrices which comprise the Lie algebra of the rotation group. The orthonormality and unit determinant of the matrix parameter are preserved using matrix logarithms and exponentials and the algorithm lends itself to intuitive interpretation in terms of the differential geometry of the manifold associated with the rotation group. A complexity reduction result is presented that exploits the eigenstructure of the matrix updates to simplify matrix exponentiation to a quadratic form. 1
2 0.75613958 22 nips-2009-Accelerated Gradient Methods for Stochastic Optimization and Online Learning
Author: Chonghai Hu, Weike Pan, James T. Kwok
Abstract: Regularized risk minimization often involves non-smooth optimization, either because of the loss function (e.g., hinge loss) or the regularizer (e.g., ℓ1 -regularizer). Gradient methods, though highly scalable and easy to implement, are known to converge slowly. In this paper, we develop a novel accelerated gradient method for stochastic optimization while still preserving their computational simplicity and scalability. The proposed algorithm, called SAGE (Stochastic Accelerated GradiEnt), exhibits fast convergence rates on stochastic composite optimization with convex or strongly convex objectives. Experimental results show that SAGE is faster than recent (sub)gradient methods including FOLOS, SMIDAS and SCD. Moreover, SAGE can also be extended for online learning, resulting in a simple algorithm but with the best regret bounds currently known for these problems. 1
3 0.68079752 215 nips-2009-Sensitivity analysis in HMMs with application to likelihood maximization
Author: Pierre-arnaud Coquelin, Romain Deguest, Rémi Munos
Abstract: This paper considers a sensitivity analysis in Hidden Markov Models with continuous state and observation spaces. We propose an Infinitesimal Perturbation Analysis (IPA) on the filtering distribution with respect to some parameters of the model. We describe a methodology for using any algorithm that estimates the filtering density, such as Sequential Monte Carlo methods, to design an algorithm that estimates its gradient. The resulting IPA estimator is proven to be asymptotically unbiased, consistent and has computational complexity linear in the number of particles. We consider an application of this analysis to the problem of identifying unknown parameters of the model given a sequence of observations. We derive an IPA estimator for the gradient of the log-likelihood, which may be used in a gradient method for the purpose of likelihood maximization. We illustrate the method with several numerical experiments.
4 0.66000974 27 nips-2009-Adaptive Regularization of Weight Vectors
Author: Koby Crammer, Alex Kulesza, Mark Dredze
Abstract: We present AROW, a new online learning algorithm that combines several useful properties: large margin training, confidence weighting, and the capacity to handle non-separable data. AROW performs adaptive regularization of the prediction function upon seeing each new instance, allowing it to perform especially well in the presence of label noise. We derive a mistake bound, similar in form to the second order perceptron bound, that does not assume separability. We also relate our algorithm to recent confidence-weighted online learning techniques and show empirically that AROW achieves state-of-the-art performance and notable robustness in the case of non-separable data. 1
5 0.5787999 178 nips-2009-On Stochastic and Worst-case Models for Investing
Author: Elad Hazan, Satyen Kale
Abstract: In practice, most investing is done assuming a probabilistic model of stock price returns known as the Geometric Brownian Motion (GBM). While often an acceptable approximation, the GBM model is not always valid empirically. This motivates a worst-case approach to investing, called universal portfolio management, where the objective is to maximize wealth relative to the wealth earned by the best fixed portfolio in hindsight. In this paper we tie the two approaches, and design an investment strategy which is universal in the worst-case, and yet capable of exploiting the mostly valid GBM model. Our method is based on new and improved regret bounds for online convex optimization with exp-concave loss functions. 1
6 0.49749705 220 nips-2009-Slow Learners are Fast
7 0.4624145 146 nips-2009-Manifold Regularization for SIR with Rate Root-n Convergence
8 0.44804901 154 nips-2009-Modeling the spacing effect in sequential category learning
9 0.40733755 216 nips-2009-Sequential effects reflect parallel learning of multiple environmental regularities
10 0.38984463 137 nips-2009-Learning transport operators for image manifolds
11 0.36371911 176 nips-2009-On Invariance in Hierarchical Models
12 0.354514 116 nips-2009-Information-theoretic lower bounds on the oracle complexity of convex optimization
13 0.33481669 257 nips-2009-White Functionals for Anomaly Detection in Dynamical Systems
14 0.33405223 63 nips-2009-DUOL: A Double Updating Approach for Online Learning
15 0.314017 159 nips-2009-Multi-Step Dyna Planning for Policy Evaluation and Control
16 0.31052911 202 nips-2009-Regularized Distance Metric Learning:Theory and Algorithm
17 0.30271652 252 nips-2009-Unsupervised Feature Selection for the $k$-means Clustering Problem
18 0.29671323 147 nips-2009-Matrix Completion from Noisy Entries
19 0.29296145 134 nips-2009-Learning to Explore and Exploit in POMDPs
20 0.28095132 228 nips-2009-Speeding up Magnetic Resonance Image Acquisition by Bayesian Multi-Slice Adaptive Compressed Sensing
topicId topicWeight
[(10, 0.281), (21, 0.012), (24, 0.052), (25, 0.044), (35, 0.033), (36, 0.108), (39, 0.033), (58, 0.126), (60, 0.01), (61, 0.017), (71, 0.044), (81, 0.034), (86, 0.091), (91, 0.012)]
simIndex simValue paperId paperTitle
1 0.88980156 186 nips-2009-Parallel Inference for Latent Dirichlet Allocation on Graphics Processing Units
Author: Feng Yan, Ningyi Xu, Yuan Qi
Abstract: The recent emergence of Graphics Processing Units (GPUs) as general-purpose parallel computing devices provides us with new opportunities to develop scalable learning methods for massive data. In this work, we consider the problem of parallelizing two inference methods on GPUs for latent Dirichlet Allocation (LDA) models, collapsed Gibbs sampling (CGS) and collapsed variational Bayesian (CVB). To address limited memory constraints on GPUs, we propose a novel data partitioning scheme that effectively reduces the memory cost. This partitioning scheme also balances the computational cost on each multiprocessor and enables us to easily avoid memory access conflicts. We use data streaming to handle extremely large datasets. Extensive experiments showed that our parallel inference methods consistently produced LDA models with the same predictive power as sequential training methods did but with 26x speedup for CGS and 196x speedup for CVB on a GPU with 30 multiprocessors. The proposed partitioning scheme and data streaming make our approach scalable with more multiprocessors. Furthermore, they can be used as general techniques to parallelize other machine learning models. 1
same-paper 2 0.8090511 177 nips-2009-On Learning Rotations
Author: Raman Arora
Abstract: An algorithm is presented for online learning of rotations. The proposed algorithm involves matrix exponentiated gradient updates and is motivated by the von Neumann divergence. The multiplicative updates are exponentiated skew-symmetric matrices which comprise the Lie algebra of the rotation group. The orthonormality and unit determinant of the matrix parameter are preserved using matrix logarithms and exponentials and the algorithm lends itself to intuitive interpretation in terms of the differential geometry of the manifold associated with the rotation group. A complexity reduction result is presented that exploits the eigenstructure of the matrix updates to simplify matrix exponentiation to a quadratic form. 1
3 0.74742651 220 nips-2009-Slow Learners are Fast
Author: Martin Zinkevich, John Langford, Alex J. Smola
Abstract: Online learning algorithms have impressive convergence properties when it comes to risk minimization and convex games on very large problems. However, they are inherently sequential in their design which prevents them from taking advantage of modern multi-core architectures. In this paper we prove that online learning with delayed updates converges well, thereby facilitating parallel online learning. 1
4 0.58864337 104 nips-2009-Group Sparse Coding
Author: Samy Bengio, Fernando Pereira, Yoram Singer, Dennis Strelow
Abstract: Bag-of-words document representations are often used in text, image and video processing. While it is relatively easy to determine a suitable word dictionary for text documents, there is no simple mapping from raw images or videos to dictionary terms. The classical approach builds a dictionary using vector quantization over a large set of useful visual descriptors extracted from a training set, and uses a nearest-neighbor algorithm to count the number of occurrences of each dictionary word in documents to be encoded. More robust approaches have been proposed recently that represent each visual descriptor as a sparse weighted combination of dictionary words. While favoring a sparse representation at the level of visual descriptors, those methods however do not ensure that images have sparse representation. In this work, we use mixed-norm regularization to achieve sparsity at the image level as well as a small overall dictionary. This approach can also be used to encourage using the same dictionary words for all the images in a class, providing a discriminative signal in the construction of image representations. Experimental results on a benchmark image classification dataset show that when compact image or dictionary representations are needed for computational efficiency, the proposed approach yields better mean average precision in classification. 1
5 0.5848738 202 nips-2009-Regularized Distance Metric Learning:Theory and Algorithm
Author: Rong Jin, Shijun Wang, Yang Zhou
Abstract: In this paper, we examine the generalization error of regularized distance metric learning. We show that with appropriate constraints, the generalization error of regularized distance metric learning could be independent from the dimensionality, making it suitable for handling high dimensional data. In addition, we present an efficient online learning algorithm for regularized distance metric learning. Our empirical studies with data classification and face recognition show that the proposed algorithm is (i) effective for distance metric learning when compared to the state-of-the-art methods, and (ii) efficient and robust for high dimensional data.
6 0.58262837 254 nips-2009-Variational Gaussian-process factor analysis for modeling spatio-temporal data
7 0.57668608 61 nips-2009-Convex Relaxation of Mixture Regression with Efficient Algorithms
8 0.5762226 27 nips-2009-Adaptive Regularization of Weight Vectors
9 0.57605731 191 nips-2009-Positive Semidefinite Metric Learning with Boosting
10 0.57582432 224 nips-2009-Sparse and Locally Constant Gaussian Graphical Models
11 0.57497311 169 nips-2009-Nonlinear Learning using Local Coordinate Coding
12 0.57468063 158 nips-2009-Multi-Label Prediction via Sparse Infinite CCA
13 0.57247531 79 nips-2009-Efficient Recovery of Jointly Sparse Vectors
14 0.57094991 70 nips-2009-Discriminative Network Models of Schizophrenia
15 0.57077092 35 nips-2009-Approximating MAP by Compensating for Structural Relaxations
16 0.57057148 126 nips-2009-Learning Bregman Distance Functions and Its Application for Semi-Supervised Clustering
17 0.57006788 22 nips-2009-Accelerated Gradient Methods for Stochastic Optimization and Online Learning
18 0.56912297 137 nips-2009-Learning transport operators for image manifolds
19 0.56872511 128 nips-2009-Learning Non-Linear Combinations of Kernels
20 0.56856191 108 nips-2009-Heterogeneous multitask learning with joint sparsity constraints