nips nips2012 nips2012-282 knowledge-graph by maker-knowledge-mining

282 nips-2012-Proximal Newton-type methods for convex optimization


Source: pdf

Author: Jason Lee, Yuekai Sun, Michael Saunders

Abstract: We seek to solve convex optimization problems in composite form: minimize f (x) := g(x) + h(x), n x∈R where g is convex and continuously differentiable and h : Rn → R is a convex but not necessarily differentiable function whose proximal mapping can be evaluated efficiently. We derive a generalization of Newton-type methods to handle such convex but nonsmooth objective functions. We prove such methods are globally convergent and achieve superlinear rates of convergence in the vicinity of an optimal solution. We also demonstrate the performance of these methods using problems of relevance in machine learning and statistics. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Proximal Newton-type methods for convex optimization Jason D. [sent-1, score-0.15]

2 edu Abstract We seek to solve convex optimization problems in composite form: minimize f (x) := g(x) + h(x), n x∈R where g is convex and continuously differentiable and h : Rn → R is a convex but not necessarily differentiable function whose proximal mapping can be evaluated efficiently. [sent-5, score-1.152]

3 We derive a generalization of Newton-type methods to handle such convex but nonsmooth objective functions. [sent-6, score-0.334]

4 We prove such methods are globally convergent and achieve superlinear rates of convergence in the vicinity of an optimal solution. [sent-7, score-0.172]

5 We also demonstrate the performance of these methods using problems of relevance in machine learning and statistics. [sent-8, score-0.101]

6 We describe a family of Newton-type methods tailored to these problems that achieve superlinear rates of convergence subject to standard assumptions. [sent-11, score-0.204]

7 These methods can be interpreted as generalizations of the classic proximal gradient method that use the curvature of the objective function to select a search direction. [sent-12, score-0.672]

8 1 First-order methods The most popular methods for solving convex optimization problems in composite form are firstorder methods that use proximal mappings to handle the nonsmooth part. [sent-14, score-1.048]

9 SpaRSA is a generalized spectral projected gradient method that uses a spectral step length together with a nonmonotone line ∗ Equal contributors 1 search to improve convergence [24]. [sent-15, score-0.352]

10 also uses a spectral step length but selects search directions using a trust-region strategy [12]. [sent-17, score-0.219]

11 TRIP performs comparably with SpaRSA and the projected Newton-type methods we describe later. [sent-18, score-0.077]

12 These methods have been implemented in the software package TFOCS and used to solve problems that commonly arise in statistics, machine learning, and signal processing [3]. [sent-21, score-0.102]

13 2 Newton-type methods There are three classes of methods that generalize Newton-type methods to handle nonsmooth objective functions. [sent-23, score-0.324]

14 The first are projected Newton-type methods for constrained optimization [20]. [sent-24, score-0.109]

15 Such methods cannot handle nonsmooth objective functions; they tackle problems in composite form via constraints of the form h(x) ≤ τ . [sent-25, score-0.393]

16 [25] uses a local quadratic approximation to the smooth part of the form Q(x) := f (x) + sup g∈∂f (x) 1 g T d + dT Hd, 2 where ∂f (x) denotes the subdifferential of f at x. [sent-28, score-0.125]

17 These methods achieve state-of-the-art performance on many problems of relevance, such as 1 -regularized logistic regression and 2 -regularized support vector machines. [sent-29, score-0.161]

18 This paper focuses on proximal Newton-type methods that were previously studied in [16, 18] and are closely related to the methods of Fukushima and Mine [10] and Tseng and Yun [21]. [sent-30, score-0.524]

19 Both use search directions ∆x that are solutions to subproblems of the form minimize d 1 g(x)T d + dT Hd + h(x + d), 2 where H is a positive definite matrix that approximates the Hessian 2 g(x). [sent-31, score-0.268]

20 Fukushima and Mine choose H to be a multiple of the identity, while Tseng and Yun set some components of the search direction ∆x to be zero to obtain a (block) coordinate descent direction. [sent-32, score-0.213]

21 [11, 15] (sparse inverse covariance estimation) are special cases of proximal Newton-type methods. [sent-37, score-0.452]

22 These methods are considered state-of-the-art for their specific applications, often outperforming generic methods by orders of magnitude. [sent-38, score-0.102]

23 QUIC and LIBLINEAR also achieve a quadratic rate of convergence, although these results rely crucially on the structure of the 1 norm and do not generalize to generic nonsmooth regularizers. [sent-39, score-0.231]

24 The quasi-Newton splitting method developed by Becker and Fadili is equivalent to a proximal quasi-Newton method with rank-one Hessian approxiamtion [4]. [sent-40, score-0.452]

25 In this case, they can solve the subproblem via the solution of a single variable root finding problem, making their method significantly more efficient than a generic proximal Newton-type method. [sent-41, score-0.674]

26 The methods described in this paper are a special case of cost approximation (CA), a class of methods developed by Patriksson [16]. [sent-42, score-0.117]

27 CA requires a CA function ϕ and selects search directions via subproblems of the form minimize g(x) + ϕ(x + d) − ϕ(x) + h(x + d) − d g(x)T d. [sent-43, score-0.268]

28 We refer to [16] for details about cost approximation and its convergence analysis. [sent-46, score-0.097]

29 2 2 Proximal Newton-type methods We seek to solve convex optimization problems in composite form: minimize f (x) := g(x) + h(x). [sent-47, score-0.367]

30 n x∈R (2) n We assume g : R → R is a closed, proper convex, continuously differentiable function, and its gradient g is Lipschitz continuous with constant L1 ; i. [sent-48, score-0.176]

31 h : Rn → R is a closed and proper convex but not necessarily everywhere differentiable function whose proximal mapping can be evaluated efficiently. [sent-51, score-0.619]

32 1 The proximal gradient method The proximal mapping of a convex function h at x is 1 y − x 2. [sent-54, score-1.042]

33 2 y Proximal mappings can be interpreted as generalized projections because if h is the indicator function of a convex set, then proxh (x) is the projection of x onto the set. [sent-55, score-0.277]

34 We can also interpret the proximal gradient step as a generalized gradient step Gf (x) = proxh (x − g(x)) − x. [sent-57, score-0.777]

35 Many state-of-the-art methods for problems in composite form, such as SpaRSA and the optimal first-order methods, are variants of this method. [sent-59, score-0.177]

36 Our method uses a Newton-type approximation in lieu of the simple quadratic to achieve faster convergence. [sent-60, score-0.135]

37 Let h be a convex function and H, a positive definite matrix. [sent-64, score-0.082]

38 Then the scaled proximal mapping of h at x is defined to be 1 (4) proxH (x) := arg min h(y) + y − x 2 . [sent-65, score-0.501]

39 H h 2 y Proximal Newton-type methods use the iteration xk+1 = xk + tk ∆xk , (5) −1 ∆xk := x k − Hk g(xk ) − xk , (6) where tk > 0 is the k-th step length, usually determined using a line search procedure and Hk is an approximation to the Hessian of g at xk . [sent-66, score-1.737]

40 2 y 3 (7) Hence, the search direction solves the subproblem 1 g(xk )T d + dT Hk d + h(xk + d) 2 d = arg min Qk (d) + h(xk + d). [sent-68, score-0.352]

41 ∆xk = arg min d To simplify notation, we shall drop the subscripts and say x+ = x + t∆x in lieu of xk+1 = xk + tk ∆xk when discussing a single iteration. [sent-69, score-0.631]

42 If H is a positive definite matrix, then the search direction ∆x = arg mind Q(d) + h(x + d) satisfies: f (x+ ) ≤ f (x) + t g(x)T ∆x + h(x + ∆x) − h(x) + O(t2 ), T T g(x) ∆x + h(x + ∆x) − h(x) ≤ −∆x H∆x. [sent-72, score-0.194]

43 2 implies the search direction is a descent direction for f because we can substitute (9) into (8) to obtain f (x+ ) ≤ f (x) − t∆xT H∆x + O(t2 ). [sent-74, score-0.261]

44 We use a quasi-Newton approximation to the Hessian and a first-order method to solve the subproblem for a search direction, although the user is free to use a method of his or her choice. [sent-75, score-0.334]

45 Empirically, we find that inexact solutions to the subproblem yield viable descent directions. [sent-76, score-0.278]

46 We use a backtracking line search to select a step length t that satisfies a sufficient descent condition: f (x+ ) ≤ f (x) + αt∆ ∆ := T g(x) ∆x + h(x + ∆x) − h(x), (10) (11) where α ∈ (0, 0. [sent-77, score-0.266]

47 This sufficient descent condition is motivated by our convergence analysis but it also seems to perform well in practice. [sent-79, score-0.155]

48 L1 (12) satisfies the sufficient descent condition (10). [sent-84, score-0.103]

49 Algorithm 1 A generic proximal Newton-type method Require: x0 in dom f 1: repeat 2: Update Hk using a quasi-Newton update rule −1 3: zk ← proxHk xk − Hk g(xk ) h 4: ∆xk ← zk − xk 5: Conduct backtracking line search to select tk 6: xk+1 ← xk + tk ∆xk 7: until stopping conditions are satisfied 3 3. [sent-85, score-2.176]

50 1 Convergence analysis Global convergence We assume our Hessian approximations are sufficiently positive definite; i. [sent-86, score-0.052]

51 This assumption guarantees the existence of step lengths that satisfy the sufficient decrease condition. [sent-92, score-0.139]

52 Then x is a minimizer of f if and only if the search direction is zero at x; i. [sent-96, score-0.145]

53 d 4 The global convergence of proximal Newton-type methods results from the fact that the search directions are descent directions and if our Hessian approximations are sufficiently positive definite, then the step lengths are bounded away from zero. [sent-99, score-0.905]

54 Then the sequence {xk } generated by a proximal Newton-type method converges to a minimizer of f . [sent-106, score-0.486]

55 2 Convergence rate If g is twice-continuously differentiable and we use the second order Taylor approximation as our local quadratic approximation to g, then we can prove {xk } converges Q-quadratically to the optimal solution x . [sent-108, score-0.257]

56 We assume in a neighborhood of x : (i) g is strongly convex with constant m; i. [sent-109, score-0.082]

57 This convergence analysis is similar to that of Fukushima and Min´ [10] and Patriksson [16]. [sent-112, score-0.052]

58 First, e we state two lemmas: (i) that says step lengths of unity satisfy the sufficient descent condition after sufficiently many iterations and (ii) that the backward step is nonexpansive. [sent-113, score-0.321]

59 , then the step length tk = 1 satisfies the sufficient decrease condition (10) for k sufficiently large. [sent-120, score-0.197]

60 We can characterize the solution of the subproblem using the first-order optimality conditions for (4). [sent-121, score-0.158]

61 xk − x 2 We can also use the fact that the proximal Newton method converges quadratically to prove a proximal quasi-Newton method converges superlinearly. [sent-144, score-1.418]

62 We assume the quasi-Newton Hessian approximations satisfy the Dennis-Mor´ criterion [7]: e Hk − 2 g(x ) (xk+1 − xk ) → 0. [sent-145, score-0.493]

63 xk+1 − xk (13) We first prove two lemmas: (i) step lengths of unity satisfy the sufficient descent condition after sufficiently many iterations and (ii) the proximal quasi-Newton step is close to the proximal Newton step. [sent-146, score-1.671]

64 Suppose g is twice-continuously differentiable and the eigenvalues of Hk , k = 1, 2, . [sent-149, score-0.085]

65 If {Hk } satisfy the Dennis-Mor´ criterion, then the unit step length satisfies the sufficient descent condition (10) after e sufficiently many iterations. [sent-155, score-0.218]

66 Let ∆x and ∆ˆ denote the search directions generated using H H M I and mI ˆ H M x ˆ and H respectively; i. [sent-161, score-0.151]

67 x h Then these two search directions satisfy ˆ 1 + c(H, H) ˆ (H − H)∆x m ∆x − ∆ˆ ≤ x 1/2 ∆x 1/2 , ˆ where c is a constant that depends on H and H. [sent-164, score-0.198]

68 Suppose g is twice-continuously differentiable and the eigenvalues of Hk , k = 1, 2, . [sent-167, score-0.085]

69 If {Hk } satisfy the Dennis-Mor´ criterion, then the sequence {xk } converges e to x Q-superlinearly; i. [sent-171, score-0.081]

70 1 PN OPT: Proximal Newton OPTimizer PN OPT1 is a M ATLAB package that uses proximal Newton-type methods to minimize convex objective functions in composite form. [sent-175, score-0.752]

71 PN OPT can build BFGS and L-BFGS approximation to the Hessian (the user can also supply a Hessian approximation) and uses our implementation of SpaRSA or an optimal first order method to solve the subproblem for a search direction. [sent-176, score-0.334]

72 PN OPT uses an early stopping condition for the subproblem solver based on two ideas: (i) the subproblem should be solved to a higher accuracy if Qk is a good approximation to g and (ii) near a solution, the subproblem should be solved almost exactly to achieve fast convergence. [sent-177, score-0.554]

73 We thus require that the solution to the k-th subproblem (7) yk satisfy GQ+h (yk ) ≤ ηk Gf (yk ) , (14) where Gf (x) denotes the generalized gradient step at x (3) and ηk is a forcing term. [sent-178, score-0.384]

74 We choose forcing terms based on the agreement between g and the previous quadratic approximation to g Qk−1 . [sent-179, score-0.148]

75 (15) This choice measures the agreement between g(xk ) and Qk−1 (xk ) and is borrowed from a choice of forcing terms for inexact Newton methods described by Eisenstat and Walker [8]. [sent-185, score-0.143]

76 Empirically, we find that this choice avoids “oversolving” the subproblem and yields desirable convergence behavior. [sent-186, score-0.21]

77 We compare the performance of PN OPT, our implementation of SpaRSA, and the TFOCS implementations of Auslender and Teboulle’s method (AT) and FISTA on 1 -regularized logistic regression and Markov random field structure learning. [sent-187, score-0.093]

78 SpaRSA: We use a nonmonotone line search with a 10 iteration memory and also set the sufficient decrease parameter to α = 0. [sent-198, score-0.135]

79 The objective function is given by minimize − θ θrj (xr , xj ) + log Z(θ) + (r,j)∈E λ1 θrj 2 + λ2 θrj 2 F . [sent-208, score-0.073]

80 The algorithms for solving (16) require evaluating the value and gradient of the smooth part. [sent-218, score-0.088]

81 Thus even for our small example, where k = 3 and |V | = 12, function and gradient evaluations dominate the computational expense required to solve (16). [sent-220, score-0.177]

82 Proximal Newton-type methods are well-suited to solve such problems because the main computational expense is shifted to solving the subproblems that do not require function evaluations. [sent-222, score-0.269]

83 3 1 -regularized 1- logistic regression Given training data (xi , yi ), i = 1, 2, . [sent-225, score-0.093]

84 , n, 1 -regularized logistic regression trains a classifier via the solution of the convex optimization problem minimize p w∈R 1 n n log(1 + exp(−yi wT xi )) + λ w 1 . [sent-228, score-0.249]

85 We see in figure 2 that PN OPT outperforms the other methods because the computational expense is shifted to solving the subproblems, whose objective functions are cheap to evaluate. [sent-241, score-0.159]

86 5 Conclusion Proximal Newton-type methods are natural generalizations of first-order methods that account for curvature of the objective function. [sent-242, score-0.103]

87 They share many of the desirable characteristics of traditional first-order methods for convex optimization problems in composite form and achieve superlinear rates of convergence subject to standard assumptions. [sent-243, score-0.427]

88 These methods are especially suited to problems with expensive function evaluations because the main computational expense is shifted to solving subproblems that do not require function evaluations. [sent-244, score-0.268]

89 Teboulle, Interior gradient and proximal methods for convex and conic optimization, SIAM J. [sent-248, score-0.626]

90 Grant, Templates for convex cone problems with applications to e sparse signal recovery, Math. [sent-264, score-0.114]

91 Fadili, A quasi-Newton proximal splitting method, NIPS, Lake Tahoe, California, 2012. [sent-271, score-0.452]

92 Mor´ , A characterization of superlinear convergence and its application to e quasi-Newton methods, Math. [sent-287, score-0.136]

93 Walker, Choosing the forcing terms in an inexact Newton method, SIAM J. [sent-295, score-0.107]

94 Mine, A generalized proximal point algorithm for certain non-convex minimization problems, Internat. [sent-310, score-0.452]

95 Dhillon, Sparse inverse covariance matrix estimation using quadratic approximation, NIPS, Granada, Spain, 2011. [sent-322, score-0.048]

96 Nesterov, Gradient methods for minimizing composite objective function, CORE discussion paper, 2007. [sent-329, score-0.176]

97 Lafferty, High-dimensional Ising model selection using 1regularized logistic regression, Ann. [sent-349, score-0.056]

98 Yun, A coordinate gradient descent method for nonsmooth separable minimization, Math. [sent-371, score-0.277]

99 Tseng, On accelerated proximal gradient methods for convex-concave optimization, submitted to SIAM J. [sent-377, score-0.544]

100 Lin, An improved GLMNET for 1-regularized logistic regression and support vector machines, National Taiwan University, Tech. [sent-415, score-0.093]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('proximal', 0.452), ('xk', 0.446), ('sparsa', 0.273), ('hk', 0.202), ('subproblem', 0.158), ('nonsmooth', 0.153), ('proxh', 0.147), ('pn', 0.127), ('hessian', 0.123), ('opt', 0.122), ('teboulle', 0.114), ('composite', 0.109), ('tfocs', 0.105), ('fista', 0.1), ('search', 0.097), ('gf', 0.094), ('tk', 0.094), ('differentiable', 0.085), ('fukushima', 0.084), ('superlinear', 0.084), ('convex', 0.082), ('newton', 0.082), ('mi', 0.079), ('tseng', 0.079), ('patriksson', 0.077), ('proxhk', 0.077), ('subproblems', 0.075), ('qk', 0.07), ('auslender', 0.068), ('descent', 0.068), ('rj', 0.062), ('lengths', 0.059), ('mine', 0.059), ('yun', 0.059), ('schmidt', 0.057), ('logistic', 0.056), ('gradient', 0.056), ('forcing', 0.055), ('directions', 0.054), ('expense', 0.054), ('inexact', 0.052), ('convergence', 0.052), ('glmnet', 0.052), ('trip', 0.052), ('lipschitz', 0.051), ('arg', 0.049), ('becker', 0.049), ('mappings', 0.048), ('quadratic', 0.048), ('direction', 0.048), ('satisfy', 0.047), ('suppose', 0.047), ('sra', 0.047), ('ca', 0.046), ('eisenstat', 0.046), ('unity', 0.046), ('suboptimality', 0.046), ('approximation', 0.045), ('stanford', 0.043), ('minimize', 0.042), ('fadili', 0.042), ('gisette', 0.042), ('lieu', 0.042), ('projected', 0.041), ('quic', 0.039), ('xr', 0.039), ('walker', 0.039), ('tahoe', 0.039), ('suf', 0.039), ('shifted', 0.038), ('lemma', 0.038), ('dom', 0.038), ('nonmonotone', 0.038), ('regression', 0.037), ('olsen', 0.036), ('lake', 0.036), ('saunders', 0.036), ('methods', 0.036), ('ii', 0.036), ('length', 0.035), ('condition', 0.035), ('yk', 0.035), ('continuously', 0.035), ('liblinear', 0.035), ('converges', 0.034), ('solve', 0.034), ('backtracking', 0.033), ('relevance', 0.033), ('step', 0.033), ('evaluations', 0.033), ('hd', 0.032), ('promotes', 0.032), ('handle', 0.032), ('optimization', 0.032), ('smooth', 0.032), ('problems', 0.032), ('kim', 0.032), ('hastie', 0.031), ('objective', 0.031), ('generic', 0.03)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000002 282 nips-2012-Proximal Newton-type methods for convex optimization

Author: Jason Lee, Yuekai Sun, Michael Saunders

Abstract: We seek to solve convex optimization problems in composite form: minimize f (x) := g(x) + h(x), n x∈R where g is convex and continuously differentiable and h : Rn → R is a convex but not necessarily differentiable function whose proximal mapping can be evaluated efficiently. We derive a generalization of Newton-type methods to handle such convex but nonsmooth objective functions. We prove such methods are globally convergent and achieve superlinear rates of convergence in the vicinity of an optimal solution. We also demonstrate the performance of these methods using problems of relevance in machine learning and statistics. 1

2 0.48830867 300 nips-2012-Scalable nonconvex inexact proximal splitting

Author: Suvrit Sra

Abstract: We study a class of large-scale, nonsmooth, and nonconvex optimization problems. In particular, we focus on nonconvex problems with composite objectives. This class includes the extensively studied class of convex composite objective problems as a subclass. To solve composite nonconvex problems we introduce a powerful new framework based on asymptotically nonvanishing errors, avoiding the common stronger assumption of vanishing errors. Within our new framework we derive both batch and incremental proximal splitting algorithms. To our knowledge, our work is first to develop and analyze incremental nonconvex proximalsplitting algorithms, even if we were to disregard the ability to handle nonvanishing errors. We illustrate one instance of our general framework by showing an application to large-scale nonsmooth matrix factorization. 1

3 0.33376262 27 nips-2012-A quasi-Newton proximal splitting method

Author: Stephen Becker, Jalal Fadili

Abstract: A new result in convex analysis on the calculation of proximity operators in certain scaled norms is derived. We describe efficient implementations of the proximity calculation for a useful class of functions; the implementations exploit the piece-wise linear nature of the dual problem. The second part of the paper applies the previous result to acceleration of convex minimization problems, and leads to an elegant quasi-Newton method. The optimization method compares favorably against state-of-the-art alternatives. The algorithm has extensive applications including signal processing, sparse recovery and machine learning and classification. 1

4 0.23606952 120 nips-2012-Exact and Stable Recovery of Sequences of Signals with Sparse Increments via Differential 1-Minimization

Author: Demba Ba, Behtash Babadi, Patrick Purdon, Emery Brown

Abstract: We consider the problem of recovering a sequence of vectors, (xk )K , for which k=0 the increments xk − xk−1 are Sk -sparse (with Sk typically smaller than S1 ), based on linear measurements (yk = Ak xk + ek )K , where Ak and ek denote the meak=1 surement matrix and noise, respectively. Assuming each Ak obeys the restricted isometry property (RIP) of a certain order—depending only on Sk —we show that in the absence of noise a convex program, which minimizes the weighted sum of the ℓ1 -norm of successive differences subject to the linear measurement constraints, recovers the sequence (xk )K exactly. This is an interesting result bek=1 cause this convex program is equivalent to a standard compressive sensing problem with a highly-structured aggregate measurement matrix which does not satisfy the RIP requirements in the standard sense, and yet we can achieve exact recovery. In the presence of bounded noise, we propose a quadratically-constrained convex program for recovery and derive bounds on the reconstruction error of the sequence. We supplement our theoretical analysis with simulations and an application to real video data. These further support the validity of the proposed approach for acquisition and recovery of signals with time-varying sparsity.

5 0.23423621 6 nips-2012-A Convex Formulation for Learning Scale-Free Networks via Submodular Relaxation

Author: Aaron Defazio, Tibério S. Caetano

Abstract: A key problem in statistics and machine learning is the determination of network structure from data. We consider the case where the structure of the graph to be reconstructed is known to be scale-free. We show that in such cases it is natural to formulate structured sparsity inducing priors using submodular functions, and we use their Lov´ sz extension to obtain a convex relaxation. For tractable classes a such as Gaussian graphical models, this leads to a convex optimization problem that can be efficiently solved. We show that our method results in an improvement in the accuracy of reconstructed networks for synthetic data. We also show how our prior encourages scale-free reconstructions on a bioinfomatics dataset.

6 0.1962485 164 nips-2012-Iterative Thresholding Algorithm for Sparse Inverse Covariance Estimation

7 0.15013637 240 nips-2012-Newton-Like Methods for Sparse Inverse Covariance Estimation

8 0.14866747 335 nips-2012-The Bethe Partition Function of Log-supermodular Graphical Models

9 0.12494597 106 nips-2012-Dynamical And-Or Graph Learning for Object Shape Modeling and Detection

10 0.12353576 134 nips-2012-Finite Sample Convergence Rates of Zero-Order Stochastic Optimization Methods

11 0.11081449 29 nips-2012-Accelerated Training for Matrix-norm Regularization: A Boosting Approach

12 0.11000144 290 nips-2012-Recovery of Sparse Probability Measures via Convex Programming

13 0.10803243 13 nips-2012-A Nonparametric Conjugate Prior Distribution for the Maximizing Argument of a Noisy Function

14 0.10384099 20 nips-2012-A Stochastic Gradient Method with an Exponential Convergence Rate for Finite Training Sets

15 0.10306867 319 nips-2012-Sparse Prediction with the $k$-Support Norm

16 0.10059207 340 nips-2012-The representer theorem for Hilbert spaces: a necessary and sufficient condition

17 0.092780374 291 nips-2012-Reducing statistical time-series problems to binary classification

18 0.092374288 292 nips-2012-Regularized Off-Policy TD-Learning

19 0.090200707 264 nips-2012-Optimal kernel choice for large-scale two-sample tests

20 0.087902114 263 nips-2012-Optimal Regularized Dual Averaging Methods for Stochastic Optimization


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.211), (1, 0.05), (2, 0.223), (3, -0.128), (4, 0.152), (5, 0.179), (6, 0.009), (7, -0.251), (8, 0.343), (9, 0.076), (10, -0.181), (11, 0.172), (12, 0.12), (13, -0.015), (14, -0.093), (15, -0.032), (16, -0.092), (17, 0.088), (18, -0.025), (19, -0.02), (20, -0.089), (21, 0.015), (22, 0.049), (23, -0.011), (24, 0.092), (25, -0.082), (26, 0.01), (27, -0.102), (28, 0.005), (29, -0.086), (30, 0.036), (31, -0.065), (32, 0.008), (33, 0.014), (34, 0.085), (35, -0.039), (36, -0.016), (37, -0.025), (38, 0.044), (39, -0.003), (40, 0.029), (41, -0.016), (42, -0.018), (43, -0.008), (44, 0.028), (45, -0.031), (46, -0.014), (47, 0.08), (48, -0.056), (49, -0.043)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.97314978 282 nips-2012-Proximal Newton-type methods for convex optimization

Author: Jason Lee, Yuekai Sun, Michael Saunders

Abstract: We seek to solve convex optimization problems in composite form: minimize f (x) := g(x) + h(x), n x∈R where g is convex and continuously differentiable and h : Rn → R is a convex but not necessarily differentiable function whose proximal mapping can be evaluated efficiently. We derive a generalization of Newton-type methods to handle such convex but nonsmooth objective functions. We prove such methods are globally convergent and achieve superlinear rates of convergence in the vicinity of an optimal solution. We also demonstrate the performance of these methods using problems of relevance in machine learning and statistics. 1

2 0.93913233 300 nips-2012-Scalable nonconvex inexact proximal splitting

Author: Suvrit Sra

Abstract: We study a class of large-scale, nonsmooth, and nonconvex optimization problems. In particular, we focus on nonconvex problems with composite objectives. This class includes the extensively studied class of convex composite objective problems as a subclass. To solve composite nonconvex problems we introduce a powerful new framework based on asymptotically nonvanishing errors, avoiding the common stronger assumption of vanishing errors. Within our new framework we derive both batch and incremental proximal splitting algorithms. To our knowledge, our work is first to develop and analyze incremental nonconvex proximalsplitting algorithms, even if we were to disregard the ability to handle nonvanishing errors. We illustrate one instance of our general framework by showing an application to large-scale nonsmooth matrix factorization. 1

3 0.87532216 27 nips-2012-A quasi-Newton proximal splitting method

Author: Stephen Becker, Jalal Fadili

Abstract: A new result in convex analysis on the calculation of proximity operators in certain scaled norms is derived. We describe efficient implementations of the proximity calculation for a useful class of functions; the implementations exploit the piece-wise linear nature of the dual problem. The second part of the paper applies the previous result to acceleration of convex minimization problems, and leads to an elegant quasi-Newton method. The optimization method compares favorably against state-of-the-art alternatives. The algorithm has extensive applications including signal processing, sparse recovery and machine learning and classification. 1

4 0.76904708 120 nips-2012-Exact and Stable Recovery of Sequences of Signals with Sparse Increments via Differential 1-Minimization

Author: Demba Ba, Behtash Babadi, Patrick Purdon, Emery Brown

Abstract: We consider the problem of recovering a sequence of vectors, (xk )K , for which k=0 the increments xk − xk−1 are Sk -sparse (with Sk typically smaller than S1 ), based on linear measurements (yk = Ak xk + ek )K , where Ak and ek denote the meak=1 surement matrix and noise, respectively. Assuming each Ak obeys the restricted isometry property (RIP) of a certain order—depending only on Sk —we show that in the absence of noise a convex program, which minimizes the weighted sum of the ℓ1 -norm of successive differences subject to the linear measurement constraints, recovers the sequence (xk )K exactly. This is an interesting result bek=1 cause this convex program is equivalent to a standard compressive sensing problem with a highly-structured aggregate measurement matrix which does not satisfy the RIP requirements in the standard sense, and yet we can achieve exact recovery. In the presence of bounded noise, we propose a quadratically-constrained convex program for recovery and derive bounds on the reconstruction error of the sequence. We supplement our theoretical analysis with simulations and an application to real video data. These further support the validity of the proposed approach for acquisition and recovery of signals with time-varying sparsity.

5 0.66349423 20 nips-2012-A Stochastic Gradient Method with an Exponential Convergence Rate for Finite Training Sets

Author: Nicolas L. Roux, Mark Schmidt, Francis R. Bach

Abstract: We propose a new stochastic gradient method for optimizing the sum of a finite set of smooth functions, where the sum is strongly convex. While standard stochastic gradient methods converge at sublinear rates for this problem, the proposed method incorporates a memory of previous gradient values in order to achieve a linear convergence rate. In a machine learning context, numerical experiments indicate that the new algorithm can dramatically outperform standard algorithms, both in terms of optimizing the training error and reducing the test error quickly. 1

6 0.59075183 340 nips-2012-The representer theorem for Hilbert spaces: a necessary and sufficient condition

7 0.52717972 335 nips-2012-The Bethe Partition Function of Log-supermodular Graphical Models

8 0.52266514 240 nips-2012-Newton-Like Methods for Sparse Inverse Covariance Estimation

9 0.4756794 302 nips-2012-Scaling MPE Inference for Constrained Continuous Markov Random Fields with Consensus Optimization

10 0.45519844 164 nips-2012-Iterative Thresholding Algorithm for Sparse Inverse Covariance Estimation

11 0.45409533 290 nips-2012-Recovery of Sparse Probability Measures via Convex Programming

12 0.44243363 106 nips-2012-Dynamical And-Or Graph Learning for Object Shape Modeling and Detection

13 0.43913487 6 nips-2012-A Convex Formulation for Learning Scale-Free Networks via Submodular Relaxation

14 0.42372617 29 nips-2012-Accelerated Training for Matrix-norm Regularization: A Boosting Approach

15 0.42328131 85 nips-2012-Convergence and Energy Landscape for Cheeger Cut Clustering

16 0.41459379 263 nips-2012-Optimal Regularized Dual Averaging Methods for Stochastic Optimization

17 0.40459779 35 nips-2012-Adaptive Learning of Smoothing Functions: Application to Electricity Load Forecasting

18 0.37382957 134 nips-2012-Finite Sample Convergence Rates of Zero-Order Stochastic Optimization Methods

19 0.3628723 21 nips-2012-A Unifying Perspective of Parametric Policy Search Methods for Markov Decision Processes

20 0.344787 285 nips-2012-Query Complexity of Derivative-Free Optimization


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.449), (21, 0.016), (38, 0.178), (42, 0.016), (54, 0.01), (74, 0.024), (76, 0.13), (80, 0.06), (92, 0.033)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.93682575 124 nips-2012-Factorial LDA: Sparse Multi-Dimensional Text Models

Author: Michael Paul, Mark Dredze

Abstract: Latent variable models can be enriched with a multi-dimensional structure to consider the many latent factors in a text corpus, such as topic, author perspective and sentiment. We introduce factorial LDA, a multi-dimensional model in which a document is influenced by K different factors, and each word token depends on a K-dimensional vector of latent variables. Our model incorporates structured word priors and learns a sparse product of factors. Experiments on research abstracts show that our model can learn latent factors such as research topic, scientific discipline, and focus (methods vs. applications). Our modeling improvements reduce test perplexity and improve human interpretability of the discovered factors. 1

2 0.89279211 191 nips-2012-Learning the Architecture of Sum-Product Networks Using Clustering on Variables

Author: Aaron Dennis, Dan Ventura

Abstract: The sum-product network (SPN) is a recently-proposed deep model consisting of a network of sum and product nodes, and has been shown to be competitive with state-of-the-art deep models on certain difficult tasks such as image completion. Designing an SPN network architecture that is suitable for the task at hand is an open question. We propose an algorithm for learning the SPN architecture from data. The idea is to cluster variables (as opposed to data instances) in order to identify variable subsets that strongly interact with one another. Nodes in the SPN network are then allocated towards explaining these interactions. Experimental evidence shows that learning the SPN architecture significantly improves its performance compared to using a previously-proposed static architecture. 1

same-paper 3 0.86808974 282 nips-2012-Proximal Newton-type methods for convex optimization

Author: Jason Lee, Yuekai Sun, Michael Saunders

Abstract: We seek to solve convex optimization problems in composite form: minimize f (x) := g(x) + h(x), n x∈R where g is convex and continuously differentiable and h : Rn → R is a convex but not necessarily differentiable function whose proximal mapping can be evaluated efficiently. We derive a generalization of Newton-type methods to handle such convex but nonsmooth objective functions. We prove such methods are globally convergent and achieve superlinear rates of convergence in the vicinity of an optimal solution. We also demonstrate the performance of these methods using problems of relevance in machine learning and statistics. 1

4 0.8482824 233 nips-2012-Multiresolution Gaussian Processes

Author: David B. Dunson, Emily B. Fox

Abstract: We propose a multiresolution Gaussian process to capture long-range, nonMarkovian dependencies while allowing for abrupt changes and non-stationarity. The multiresolution GP hierarchically couples a collection of smooth GPs, each defined over an element of a random nested partition. Long-range dependencies are captured by the top-level GP while the partition points define the abrupt changes. Due to the inherent conjugacy of the GPs, one can analytically marginalize the GPs and compute the marginal likelihood of the observations given the partition tree. This property allows for efficient inference of the partition itself, for which we employ graph-theoretic techniques. We apply the multiresolution GP to the analysis of magnetoencephalography (MEG) recordings of brain activity.

5 0.84030783 192 nips-2012-Learning the Dependency Structure of Latent Factors

Author: Yunlong He, Yanjun Qi, Koray Kavukcuoglu, Haesun Park

Abstract: In this paper, we study latent factor models with dependency structure in the latent space. We propose a general learning framework which induces sparsity on the undirected graphical model imposed on the vector of latent factors. A novel latent factor model SLFA is then proposed as a matrix factorization problem with a special regularization term that encourages collaborative reconstruction. The main benefit (novelty) of the model is that we can simultaneously learn the lowerdimensional representation for data and model the pairwise relationships between latent factors explicitly. An on-line learning algorithm is devised to make the model feasible for large-scale learning problems. Experimental results on two synthetic data and two real-world data sets demonstrate that pairwise relationships and latent factors learned by our model provide a more structured way of exploring high-dimensional data, and the learned representations achieve the state-of-the-art classification performance. 1

6 0.83966011 270 nips-2012-Phoneme Classification using Constrained Variational Gaussian Process Dynamical System

7 0.81767565 7 nips-2012-A Divide-and-Conquer Method for Sparse Inverse Covariance Estimation

8 0.74086517 12 nips-2012-A Neural Autoregressive Topic Model

9 0.72928935 332 nips-2012-Symmetric Correspondence Topic Models for Multilingual Text Analysis

10 0.69691139 342 nips-2012-The variational hierarchical EM algorithm for clustering hidden Markov models

11 0.64613849 78 nips-2012-Compressive Sensing MRI with Wavelet Tree Sparsity

12 0.64360917 19 nips-2012-A Spectral Algorithm for Latent Dirichlet Allocation

13 0.63930953 354 nips-2012-Truly Nonparametric Online Variational Inference for Hierarchical Dirichlet Processes

14 0.6326499 166 nips-2012-Joint Modeling of a Matrix with Associated Text via Latent Binary Features

15 0.62838793 69 nips-2012-Clustering Sparse Graphs

16 0.62460142 47 nips-2012-Augment-and-Conquer Negative Binomial Processes

17 0.62281817 150 nips-2012-Hierarchical spike coding of sound

18 0.61597729 104 nips-2012-Dual-Space Analysis of the Sparse Linear Model

19 0.60965097 172 nips-2012-Latent Graphical Model Selection: Efficient Methods for Locally Tree-like Graphs

20 0.60864413 220 nips-2012-Monte Carlo Methods for Maximum Margin Supervised Topic Models