nips nips2012 nips2012-300 knowledge-graph by maker-knowledge-mining

300 nips-2012-Scalable nonconvex inexact proximal splitting


Source: pdf

Author: Suvrit Sra

Abstract: We study a class of large-scale, nonsmooth, and nonconvex optimization problems. In particular, we focus on nonconvex problems with composite objectives. This class includes the extensively studied class of convex composite objective problems as a subclass. To solve composite nonconvex problems we introduce a powerful new framework based on asymptotically nonvanishing errors, avoiding the common stronger assumption of vanishing errors. Within our new framework we derive both batch and incremental proximal splitting algorithms. To our knowledge, our work is first to develop and analyze incremental nonconvex proximalsplitting algorithms, even if we were to disregard the ability to handle nonvanishing errors. We illustrate one instance of our general framework by showing an application to large-scale nonsmooth matrix factorization. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Scalable nonconvex inexact proximal splitting Suvrit Sra Max Planck Institute for Intelligent Systems 72076 T¨ bigen, Germany u suvrit@tuebingen. [sent-1, score-0.521]

2 de Abstract We study a class of large-scale, nonsmooth, and nonconvex optimization problems. [sent-3, score-0.227]

3 In particular, we focus on nonconvex problems with composite objectives. [sent-4, score-0.32]

4 This class includes the extensively studied class of convex composite objective problems as a subclass. [sent-5, score-0.184]

5 To solve composite nonconvex problems we introduce a powerful new framework based on asymptotically nonvanishing errors, avoiding the common stronger assumption of vanishing errors. [sent-6, score-0.465]

6 Within our new framework we derive both batch and incremental proximal splitting algorithms. [sent-7, score-0.321]

7 To our knowledge, our work is first to develop and analyze incremental nonconvex proximalsplitting algorithms, even if we were to disregard the ability to handle nonvanishing errors. [sent-8, score-0.475]

8 We illustrate one instance of our general framework by showing an application to large-scale nonsmooth matrix factorization. [sent-9, score-0.102]

9 (2) Problem (1) is a natural but far-reaching generalization of composite objective convex problems, which enjoy tremendous importance in machine learning; see e. [sent-14, score-0.184]

10 Although, convex formulations are extremely useful, for many difficult problems a nonconvex formulation is natural. [sent-17, score-0.279]

11 Our framework solves (1) by “splitting” the task into smooth (gradient) and nonsmooth (proximal) parts. [sent-21, score-0.102]

12 This capability proves critical to obtaining a scalable, incremental-gradient variant of NIPS, which, to our knowledge, is the first incremental proximal-splitting method for nonconvex problems. [sent-23, score-0.363]

13 Notably, it does not require the errors to vanish in the limit, which is a more realistic assumption as often one has limited to no control over computational errors inherent to a complex system. [sent-25, score-0.111]

14 In accord with the errors, NIPS also does not require stepsizes (learning rates) to shrink to zero. [sent-26, score-0.084]

15 In contrast, most incremental-gradient methods [5] and stochastic gradient algorithms [16] do assume that the computational errors and stepsizes decay to zero. [sent-27, score-0.201]

16 1 Our analysis builds on the remarkable work of Solodov [29], who studied the simpler setting of differentiable nonconvex problems (which correspond with h ≡ 0 in (1)). [sent-29, score-0.267]

17 NIPS is strictly more general: unlike [29] it solves a non-differentiable problem by allowing a nonsmooth regularizer h ≡ 0, and this h is tackled by invoking proximal-splitting [8]. [sent-30, score-0.102]

18 It retains the simplicity of gradient-projection while handling the nonsmooth regularizer h via its proximity operator. [sent-32, score-0.167]

19 This approach is especially attractive because for several important choices of h, efficient implementations of the associated proximity operators exist [2, 22, 23]. [sent-33, score-0.093]

20 For convex problems, an alternative to proximal splitting is the subgradient method; similarly, for nonconvex problems one may use a generalized subgradient method [7, 12]. [sent-34, score-0.532]

21 However, as in the convex case, the use of subgradients has drawbacks: it fails to exploit the composite structure, and even when using sparsity promoting regularizers it does not generate intermediate sparse iterates [11]. [sent-35, score-0.235]

22 Among batch nonconvex splitting methods, an early paper is [14]. [sent-36, score-0.325]

23 More recently, in his pioneering paper on convex composite minimization, Nesterov [26] also briefly discussed nonconvex problems. [sent-37, score-0.372]

24 [1] have introduced a generic method for nonconvex nonsmooth problems based on Kurdyka-Łojasiewicz theory, but their entire framework too hinges on descent. [sent-40, score-0.329]

25 In general, the insistence on strict descent and exact gradients makes many of the methods unsuitable for incremental, stochastic, or online variants, all of which usually lead to a nonmonotone objective values especially due to inexact gradients. [sent-42, score-0.224]

26 Among nonmonotonic methods that apply to (1), we are aware of the generalized gradient-type algorithms of [31] and the stochastic generalized gradient methods of [12]. [sent-43, score-0.15]

27 Both methods, however, are analogous to the usual subgradient-based algorithms that fail to exploit the composite objective structure, unlike proximal-splitting methods. [sent-44, score-0.132]

28 But proximal-splitting methods do not apply out-of-the-box to (1): nonconvexity raises significant obstructions, especially because nonmonotonic descent in the objective function values is allowed and inexact gradient might be used. [sent-45, score-0.256]

29 Overcoming these obstructions to achieve a scalable non-descent based method that allows inexact gradients is what makes our NIPS framework novel. [sent-46, score-0.157]

30 The proximity operator for g, indexed by η > 0, is the nonlinear map [see e. [sent-52, score-0.095]

31 It is also key to Rockafellar’s classic proximal point algorithm [27], and it arises in a host of proximal-splitting methods [2, 3, 8, 11], most notably in forward-backward splitting (FBS) [8]. [sent-58, score-0.189]

32 It minimizes convex composite objective functions by alternating between “forward” (gradient) steps and “backward” (proximal) steps. [sent-60, score-0.184]

33 Therefore, to tackle nonconvex f we must take a different approach. [sent-66, score-0.227]

34 Thus, the key challenge is: how to retain the algorithmic simplicity of FBS and allow nonconvex losses, without sacrificing scalability? [sent-69, score-0.227]

35 We address this challenge by introducing the following inexact proximal-splitting iteration: g xk+1 = Pηk (xk − ηk f (xk ) + ηk e(xk )), k = 0, 1, . [sent-70, score-0.105]

36 , (7) where e(xk ) models the computational errors in computing the gradient f (xk ). [sent-73, score-0.097]

37 (8) Condition (8) is weaker than the typical vanishing error requirements k η e(xk ) < ∞, lim η e(xk ) = 0, k→∞ which are stipulated by most analyses of methods with gradient errors [4, 5]. [sent-75, score-0.159]

38 We will, however, show that the iterates produced by (7) do progress towards reasonable inexact stationary points. [sent-77, score-0.159]

39 We note in passing that even if we assume the simpler case of vanishing errors, NIPS is still the first nonconvex proximalsplitting framework that does not insist on monotonicity, which complicates convergence analysis but ultimately proves crucial to scalability. [sent-78, score-0.393]

40 Algorithm 1 Inexact Nonconvex Proximal Splitting (NIPS) g Input: Operator Pη , and a sequence {ηk } satisfying c ≤ lim inf k ηk , lim supk ηk ≤ min {1, 2/L − c} , 0 < c < 1/L. [sent-79, score-0.097]

41 1 f (xk ) − e(xk ) Convergence analysis We begin by characterizing inexact stationarity. [sent-81, score-0.105]

42 (11) This equation helps us define a measure of inexact stationarity: the proximal residual g ρ(x) := x − P1 (x − ∗ f (x)). [sent-84, score-0.252]

43 Define the functions g g 1 pg (η) := η Pη (y − ηz) − y , and qg (η) := Pη (y − ηz) − y . [sent-105, score-0.139]

44 (16) Then, pg (η) is a decreasing function of η, and qg (η) an increasing function of η. [sent-106, score-0.139]

45 Consider the “deflected” proximal objective 1 mg (x, η; y, z) := z, x − y + 2η x − y 2 + g(x), for some y, z ∈ X . [sent-110, score-0.193]

46 (17) Associate to objective mg the deflected Moreau-envelope Eg (η) := inf mg (x, η; y, z), (18) x∈X g whose infimum is attained at the unique point Pη (y − ηz). [sent-111, score-0.133]

47 Similarly, define eg (γ) := Eg (1/γ); this function is concave in γ as it is a pointwise infimum (indexed by x) of functions linear g 1 in γ [see e. [sent-116, score-0.091]

48 Thus, its derivative eg (γ) = 2 P1/γ (x − γ −1 y) − x 2 = qg (1/γ), is a ˆ decreasing function of γ. [sent-121, score-0.206]

49 Set η = 1/γ to conclude the argument about qg (η). [sent-122, score-0.097]

50 Let xk+1 , xk , ηk , and X be as in (7), and that ηk e(xk ) ≤ (xk ) holds. [sent-127, score-0.813]

51 Then, Φ(xk ) − Φ(xk+1 ) 2−Lηk 2ηk ≥ xk+1 − xk 2 − 1 ηk (xk ) xk+1 − xk . [sent-128, score-1.626]

52 (21) in (21), and rearrange to obtain k+1 f (xk ), xk+1 − xk + − xk ) + sk+1 , xk − xk+1 . [sent-131, score-2.439]

53 4 Next we further bound (20) by deriving two-sided bounds on xk+1 − xk . [sent-135, score-0.813]

54 Let xk+1 , xk , and (xk ) be as before; also let c and ηk satisfy (9). [sent-137, score-0.813]

55 Then, c ρ(xk ) − (xk ) ≤ xk+1 − xk ≤ ρ(xk ) + (xk ). [sent-138, score-0.813]

56 First observe that from Lemma 4 that for ηk > 0 it holds that if 1 ≤ ηk then q(1) ≤ qg (ηk ), and if ηk ≤ 1 then pg (1) ≤ pg (ηk ) = 1 ηk qg (ηk ). [sent-140, score-0.278]

57 (25) Using (25), the triangle inequality, and Lemma 3, we have min {1, ηk } qg (1) = min {1, ηk } ρ(xk ) ≤ g Pηk (xk − ηk f (xk )) − xk ≤ g xk+1 − xk + xk+1 − Pηk (xk − ηk f (xk )) ≤ xk+1 − xk + ηk e(xk ) ≤ xk+1 − xk + (xk ). [sent-141, score-3.349]

58 From (9) it follows that for sufficiently large k we have xk+1 − xk ≥ c ρ(xk ) − (xk ). [sent-142, score-0.813]

59 For the upper bound note that g g xk+1 − xk ≤ xk − Pηk (xk − ηk f (xk )) + Pηk (xk − ηk f (xk )) − xk+1 ≤ max {1, ηk } ρ(xk ) + ηk e(xk ) ≤ ρ(xk ) + (xk ). [sent-143, score-1.626]

60 Let xk , xk+1 , ηk , and c be as above and k sufficiently large so that c and ηk satisfy (9). [sent-146, score-0.813]

61 There exists a limit point x∗ of the sequence xk , and a constant K > 0, such that ρ(x∗ ) ≤ K (x∗ ). [sent-154, score-0.813]

62 If Φ(xk ) converges, then for every limit point x∗ of xk it holds that ρ(x∗ ) ≤ K (x∗ ). [sent-155, score-0.813]

63 The statement of the theorem is written in a conditional form, because nonvanishing errors e(x) prevent us from making a stronger statement. [sent-162, score-0.156]

64 In particular, once the iterates enter a region where the residual norm falls below the error threshold, the behavior of xk may be arbitrary. [sent-163, score-0.89]

65 This, however, is a small price to pay for having the added flexibility of nonvanishing errors. [sent-164, score-0.11]

66 Under the stronger assumption of vanishing errors (and diminishing stepsizes), we can also obtain guarantees to exact stationary points. [sent-165, score-0.107]

67 3 Scaling up NIPS: incremental variant We now apply NIPS to the large-scale setting, where we have composite objectives of the form Φ(x) := T t=1 ft (x) + g(x), (27) 1 where each ft : Rn → R is a CLt (X ) function. [sent-166, score-0.594]

68 It is well-known that for such decomposable objectives it can be advantageous to replace the full gradient t ft (x) by an incremental gradient fσ(t) (x), where σ(t) is some suitable index. [sent-168, score-0.406]

69 5 Nonconvex incremental methods for differentiable problems have been extensively analyzed, e. [sent-169, score-0.147]

70 However, when g(x) = 0, the only incremental methods that we are aware of are stochastic generalized gradient methods of [12] or the generalized gradient methods of [31]. [sent-172, score-0.277]

71 As previously mentioned, both of these fail to exploit the composite structure of the objective function, a disadvantage even in the convex case [11]. [sent-173, score-0.184]

72 In stark contrast, we do exploit the composite structure of (27). [sent-174, score-0.093]

73 Formally, we propose the following incremental nonconvex proximal-splitting iteration: T xk+1 = M xk − ηk xk,1 = xk , t=1 ft (xk,t ) , k = 0, 1, . [sent-175, score-2.157]

74 , xk,t+1 = O(xk,t − ηk ft (xk,t )), (28) t = 1, . [sent-178, score-0.197]

75 For example, when X = Rn , g(x) ≡ 0, M = O = Id, and ηk → 0, then (28) reduces to the classic incremental gradient method (IGM) [4], and to the IGM of [30], if lim ηk = η > 0. [sent-182, score-0.185]

76 If X a closed ¯ convex set, g(x) ≡ 0, M is orthogonal projection onto X , O = Id, and ηk → 0, then iteration (28) reduces to (projected) IGM [4, 5]. [sent-183, score-0.097]

77 We begin by rewriting (28) in a form that matches the main iteration (7): g xk+1 = Pη xk − ηk g = Pη xk − ηk g = Pη xk − ηk T ft (xk,t ) t=1 T t=1 t T ft (xk ) + ηk t=1 ft (xk ) − ft (xk,t ) (29) ft (xk ) + ηk e(xk ) . [sent-190, score-3.45]

78 From the definition of a proximity operator (5), we have the inequality =⇒ 1 2 1 2 xk,t+1 − xk,t + ηk ft (xk,t ) xk,t+1 − xk,t 2 ≤ ηk 2 + ηk g(xk,t+1 ) ≤ 1 2 ηk ft (xk,t ) 2 + ηk g(xk,t ), ft (xk,t ), xk,t − xk,t+1 + ηk (g(xk,t ) − g(xk,t+1 )). [sent-197, score-0.686]

79 Therefore, 1 2 xk,t+1 − xk,t 2 ≤ ηk st , xk,t − xk,t+1 + ≤ ηk st + =⇒ ft (xk,t ), xk,t − xk,t+1 ft (xk,t ) xk,t − xk,t+1 xk,t+1 − xk,t ≤ 2ηk Lemma 9 proves helpful in bounding the overall error. [sent-199, score-0.501]

80 If for all xk ∈ X , ft (xk ) ≤ M and ∂g(xk ) ≤ G, then there exists a constant K1 > 0 such that e(xk ) ≤ K1 . [sent-202, score-1.01]

81 To bound the error of using xk,t instead of xk first define the term := ft (xk,t ) − ft (xk ) , t = 1, . [sent-204, score-1.207]

82 (32) = 0, (32) then leads to the bound 1 t−1 j=1 T −1 (1 + 2ηk L)t−1−j βj = 2ηk L (1 + 2ηk L)T −t βt ≤ t=1 T −1 (1 + 2ηk L)T −1 t=1 T −t−1 βt j=0 (1 + 2ηk L)j ft (x) + st ≤ C1 (T − 1)(M + G) =: K1 . [sent-209, score-0.236]

83 Thus, the error norm e(xk ) is bounded from above by a constant, whereby it satisfies the requirement (8), making the incremental NIPS method (28) a special case of the general NIPS framework. [sent-210, score-0.143]

84 We do, however, provide an illustrative application of NIPS to a challenging nonconvex problem: sparsity regularized low-rank matrix factorization min X,A≥0 1 2 Y − XA 2 F T + ψ0 (X) + t=1 ψt (at ), (33) where Y ∈ Rm×T , X ∈ Rm×K and A ∈ RK×T , with a1 , . [sent-213, score-0.292]

85 Problem (33) generalizes the well-known nonnegative matrix factorization (NMF) problem of [20] by permitting arbitrary Y (not necessarily nonnegative), and adding regularizers on X and A. [sent-217, score-0.091]

86 A related class of problems was studied in [23], but with a crucial difference: the formulation in [23] does not allow nonsmooth regularizers on X. [sent-218, score-0.146]

87 On a more theoretical note, [23] considered stochastic-gradient like methods whose analysis requires computational errors and stepsizes to vanish, whereas our method is deterministic and allows nonvanishing stepsizes and errors. [sent-220, score-0.324]

88 We eliminate A and consider minX φ(X) := T t=1 ft (X) + g(X), where g(X) := ψ0 (X) + δ(X|≥ 0), (34) and where each ft (X) for 1 ≤ t ≤ T is defined as ft (X) := mina 1 2 yt − Xa 2 + gt (a), (35) 1 where gt (a) := ψt (a) + δ(a|≥ 0). [sent-222, score-0.591]

89 For simplicity, assume that (35) attains its unique minimum, say a∗ , then ft (X) is differentiable and we have X ft (X) = (Xa∗ − yt )(a∗ )T . [sent-223, score-0.434]

90 Figure 2 shows numerical results comparing the stochastic generalized gradient (SGGD) algorithm of [12] against NIPS, when started at the same point. [sent-249, score-0.095]

91 5 Discussion We presented a new framework called NIPS, which solves a broad class of nonconvex composite objective problems. [sent-266, score-0.359]

92 NIPS permits nonvanishing computational errors, which can be practically useful. [sent-267, score-0.11]

93 We specialized NIPS to also obtain a scalable incremental version. [sent-268, score-0.128]

94 For example, batch and incremental convex FBS, convex and nonconvex gradient projection, the proximalpoint algorithm, among others. [sent-271, score-0.514]

95 Theoretically, however, the most exciting open problem resulting from this paper is: extend NIPS in a scalable way when even the nonsmooth part is nonconvex. [sent-272, score-0.123]

96 Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward-backward splitting, and regularized Gauss-Seidel methods. [sent-279, score-0.146]

97 Stochastic generalized gradient method for nonconvex nonsmooth stochastic optimization. [sent-359, score-0.424]

98 A generalized proximal point algorithm for certain non-convex minimization problems. [sent-377, score-0.14]

99 Convergence properties of backpropagation for neural nets via theory of stochastic gradient methods. [sent-391, score-0.089]

100 Incremental gradient algorithms with stepsizes bounded away from zero. [sent-488, score-0.135]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('xk', 0.813), ('nonconvex', 0.227), ('ft', 0.197), ('proximal', 0.116), ('nonvanishing', 0.11), ('sggd', 0.11), ('incremental', 0.107), ('inexact', 0.105), ('nonsmooth', 0.102), ('spams', 0.097), ('qg', 0.097), ('composite', 0.093), ('eg', 0.091), ('stepsizes', 0.084), ('fbs', 0.083), ('nmf', 0.073), ('splitting', 0.073), ('nips', 0.071), ('proximity', 0.065), ('convex', 0.052), ('gradient', 0.051), ('factorization', 0.047), ('igm', 0.047), ('errors', 0.046), ('lemma', 0.046), ('pg', 0.042), ('cl', 0.041), ('differentiable', 0.04), ('id', 0.04), ('st', 0.039), ('stepsize', 0.039), ('objective', 0.039), ('lsc', 0.038), ('mg', 0.038), ('atlab', 0.036), ('cbcl', 0.036), ('vanishing', 0.035), ('rn', 0.035), ('yale', 0.033), ('stationarity', 0.031), ('attouch', 0.031), ('dmg', 0.031), ('gafni', 0.031), ('insistence', 0.031), ('nonmonotonic', 0.031), ('obstructions', 0.031), ('proximalsplitting', 0.031), ('solodov', 0.031), ('residual', 0.031), ('descent', 0.03), ('operator', 0.03), ('convergence', 0.03), ('xa', 0.029), ('rand', 0.029), ('penalized', 0.029), ('proves', 0.029), ('iterates', 0.028), ('operators', 0.028), ('moreau', 0.028), ('rockafellar', 0.028), ('variants', 0.027), ('lim', 0.027), ('iteration', 0.026), ('regularizers', 0.026), ('stationary', 0.026), ('suvrit', 0.025), ('fukushima', 0.025), ('supk', 0.025), ('batch', 0.025), ('combettes', 0.024), ('invoke', 0.024), ('generalized', 0.024), ('sk', 0.023), ('complicates', 0.023), ('obviously', 0.022), ('ected', 0.022), ('scalable', 0.021), ('seconds', 0.021), ('stochastic', 0.02), ('subgradient', 0.02), ('unconstrained', 0.02), ('projection', 0.019), ('sra', 0.019), ('vanish', 0.019), ('strict', 0.019), ('mum', 0.019), ('minx', 0.019), ('crucial', 0.018), ('sparsity', 0.018), ('nonnegative', 0.018), ('inf', 0.018), ('derivative', 0.018), ('whereby', 0.018), ('sparse', 0.018), ('norm', 0.018), ('monotonicity', 0.018), ('backpropagation', 0.018), ('plots', 0.018), ('nesterov', 0.017), ('mairal', 0.017)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000002 300 nips-2012-Scalable nonconvex inexact proximal splitting

Author: Suvrit Sra

Abstract: We study a class of large-scale, nonsmooth, and nonconvex optimization problems. In particular, we focus on nonconvex problems with composite objectives. This class includes the extensively studied class of convex composite objective problems as a subclass. To solve composite nonconvex problems we introduce a powerful new framework based on asymptotically nonvanishing errors, avoiding the common stronger assumption of vanishing errors. Within our new framework we derive both batch and incremental proximal splitting algorithms. To our knowledge, our work is first to develop and analyze incremental nonconvex proximalsplitting algorithms, even if we were to disregard the ability to handle nonvanishing errors. We illustrate one instance of our general framework by showing an application to large-scale nonsmooth matrix factorization. 1

2 0.48830867 282 nips-2012-Proximal Newton-type methods for convex optimization

Author: Jason Lee, Yuekai Sun, Michael Saunders

Abstract: We seek to solve convex optimization problems in composite form: minimize f (x) := g(x) + h(x), n x∈R where g is convex and continuously differentiable and h : Rn → R is a convex but not necessarily differentiable function whose proximal mapping can be evaluated efficiently. We derive a generalization of Newton-type methods to handle such convex but nonsmooth objective functions. We prove such methods are globally convergent and achieve superlinear rates of convergence in the vicinity of an optimal solution. We also demonstrate the performance of these methods using problems of relevance in machine learning and statistics. 1

3 0.37017182 120 nips-2012-Exact and Stable Recovery of Sequences of Signals with Sparse Increments via Differential 1-Minimization

Author: Demba Ba, Behtash Babadi, Patrick Purdon, Emery Brown

Abstract: We consider the problem of recovering a sequence of vectors, (xk )K , for which k=0 the increments xk − xk−1 are Sk -sparse (with Sk typically smaller than S1 ), based on linear measurements (yk = Ak xk + ek )K , where Ak and ek denote the meak=1 surement matrix and noise, respectively. Assuming each Ak obeys the restricted isometry property (RIP) of a certain order—depending only on Sk —we show that in the absence of noise a convex program, which minimizes the weighted sum of the ℓ1 -norm of successive differences subject to the linear measurement constraints, recovers the sequence (xk )K exactly. This is an interesting result bek=1 cause this convex program is equivalent to a standard compressive sensing problem with a highly-structured aggregate measurement matrix which does not satisfy the RIP requirements in the standard sense, and yet we can achieve exact recovery. In the presence of bounded noise, we propose a quadratically-constrained convex program for recovery and derive bounds on the reconstruction error of the sequence. We supplement our theoretical analysis with simulations and an application to real video data. These further support the validity of the proposed approach for acquisition and recovery of signals with time-varying sparsity.

4 0.26524007 27 nips-2012-A quasi-Newton proximal splitting method

Author: Stephen Becker, Jalal Fadili

Abstract: A new result in convex analysis on the calculation of proximity operators in certain scaled norms is derived. We describe efficient implementations of the proximity calculation for a useful class of functions; the implementations exploit the piece-wise linear nature of the dual problem. The second part of the paper applies the previous result to acceleration of convex minimization problems, and leads to an elegant quasi-Newton method. The optimization method compares favorably against state-of-the-art alternatives. The algorithm has extensive applications including signal processing, sparse recovery and machine learning and classification. 1

5 0.26324677 335 nips-2012-The Bethe Partition Function of Log-supermodular Graphical Models

Author: Nicholas Ruozzi

Abstract: Sudderth, Wainwright, and Willsky conjectured that the Bethe approximation corresponding to any fixed point of the belief propagation algorithm over an attractive, pairwise binary graphical model provides a lower bound on the true partition function. In this work, we resolve this conjecture in the affirmative by demonstrating that, for any graphical model with binary variables whose potential functions (not necessarily pairwise) are all log-supermodular, the Bethe partition function always lower bounds the true partition function. The proof of this result follows from a new variant of the “four functions” theorem that may be of independent interest. 1

6 0.15180667 340 nips-2012-The representer theorem for Hilbert spaces: a necessary and sufficient condition

7 0.14907669 20 nips-2012-A Stochastic Gradient Method with an Exponential Convergence Rate for Finite Training Sets

8 0.14384542 106 nips-2012-Dynamical And-Or Graph Learning for Object Shape Modeling and Detection

9 0.12979063 60 nips-2012-Bayesian nonparametric models for ranked data

10 0.12517273 290 nips-2012-Recovery of Sparse Probability Measures via Convex Programming

11 0.11803151 13 nips-2012-A Nonparametric Conjugate Prior Distribution for the Maximizing Argument of a Noisy Function

12 0.10689329 213 nips-2012-Minimization of Continuous Bethe Approximations: A Positive Variation

13 0.10410269 29 nips-2012-Accelerated Training for Matrix-norm Regularization: A Boosting Approach

14 0.097904302 35 nips-2012-Adaptive Learning of Smoothing Functions: Application to Electricity Load Forecasting

15 0.094233342 244 nips-2012-Nonconvex Penalization Using Laplace Exponents and Concave Conjugates

16 0.080742002 351 nips-2012-Transelliptical Component Analysis

17 0.079630025 6 nips-2012-A Convex Formulation for Learning Scale-Free Networks via Submodular Relaxation

18 0.073623747 164 nips-2012-Iterative Thresholding Algorithm for Sparse Inverse Covariance Estimation

19 0.072076783 134 nips-2012-Finite Sample Convergence Rates of Zero-Order Stochastic Optimization Methods

20 0.070970766 302 nips-2012-Scaling MPE Inference for Constrained Continuous Markov Random Fields with Consensus Optimization


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.163), (1, 0.052), (2, 0.204), (3, -0.085), (4, 0.133), (5, 0.13), (6, 0.032), (7, -0.246), (8, 0.391), (9, 0.119), (10, -0.197), (11, 0.188), (12, 0.152), (13, -0.082), (14, -0.172), (15, -0.104), (16, -0.073), (17, 0.098), (18, -0.036), (19, 0.012), (20, -0.187), (21, -0.025), (22, -0.015), (23, -0.041), (24, 0.066), (25, -0.014), (26, -0.03), (27, -0.065), (28, 0.02), (29, -0.088), (30, 0.004), (31, -0.06), (32, 0.034), (33, 0.077), (34, 0.066), (35, 0.025), (36, 0.036), (37, -0.011), (38, 0.017), (39, -0.029), (40, 0.011), (41, -0.023), (42, 0.025), (43, -0.055), (44, 0.045), (45, 0.051), (46, -0.012), (47, 0.042), (48, 0.004), (49, -0.067)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99094737 300 nips-2012-Scalable nonconvex inexact proximal splitting

Author: Suvrit Sra

Abstract: We study a class of large-scale, nonsmooth, and nonconvex optimization problems. In particular, we focus on nonconvex problems with composite objectives. This class includes the extensively studied class of convex composite objective problems as a subclass. To solve composite nonconvex problems we introduce a powerful new framework based on asymptotically nonvanishing errors, avoiding the common stronger assumption of vanishing errors. Within our new framework we derive both batch and incremental proximal splitting algorithms. To our knowledge, our work is first to develop and analyze incremental nonconvex proximalsplitting algorithms, even if we were to disregard the ability to handle nonvanishing errors. We illustrate one instance of our general framework by showing an application to large-scale nonsmooth matrix factorization. 1

2 0.88025194 282 nips-2012-Proximal Newton-type methods for convex optimization

Author: Jason Lee, Yuekai Sun, Michael Saunders

Abstract: We seek to solve convex optimization problems in composite form: minimize f (x) := g(x) + h(x), n x∈R where g is convex and continuously differentiable and h : Rn → R is a convex but not necessarily differentiable function whose proximal mapping can be evaluated efficiently. We derive a generalization of Newton-type methods to handle such convex but nonsmooth objective functions. We prove such methods are globally convergent and achieve superlinear rates of convergence in the vicinity of an optimal solution. We also demonstrate the performance of these methods using problems of relevance in machine learning and statistics. 1

3 0.83448875 120 nips-2012-Exact and Stable Recovery of Sequences of Signals with Sparse Increments via Differential 1-Minimization

Author: Demba Ba, Behtash Babadi, Patrick Purdon, Emery Brown

Abstract: We consider the problem of recovering a sequence of vectors, (xk )K , for which k=0 the increments xk − xk−1 are Sk -sparse (with Sk typically smaller than S1 ), based on linear measurements (yk = Ak xk + ek )K , where Ak and ek denote the meak=1 surement matrix and noise, respectively. Assuming each Ak obeys the restricted isometry property (RIP) of a certain order—depending only on Sk —we show that in the absence of noise a convex program, which minimizes the weighted sum of the ℓ1 -norm of successive differences subject to the linear measurement constraints, recovers the sequence (xk )K exactly. This is an interesting result bek=1 cause this convex program is equivalent to a standard compressive sensing problem with a highly-structured aggregate measurement matrix which does not satisfy the RIP requirements in the standard sense, and yet we can achieve exact recovery. In the presence of bounded noise, we propose a quadratically-constrained convex program for recovery and derive bounds on the reconstruction error of the sequence. We supplement our theoretical analysis with simulations and an application to real video data. These further support the validity of the proposed approach for acquisition and recovery of signals with time-varying sparsity.

4 0.77457494 27 nips-2012-A quasi-Newton proximal splitting method

Author: Stephen Becker, Jalal Fadili

Abstract: A new result in convex analysis on the calculation of proximity operators in certain scaled norms is derived. We describe efficient implementations of the proximity calculation for a useful class of functions; the implementations exploit the piece-wise linear nature of the dual problem. The second part of the paper applies the previous result to acceleration of convex minimization problems, and leads to an elegant quasi-Newton method. The optimization method compares favorably against state-of-the-art alternatives. The algorithm has extensive applications including signal processing, sparse recovery and machine learning and classification. 1

5 0.60194749 340 nips-2012-The representer theorem for Hilbert spaces: a necessary and sufficient condition

Author: Francesco Dinuzzo, Bernhard Schölkopf

Abstract: The representer theorem is a property that lies at the foundation of regularization theory and kernel methods. A class of regularization functionals is said to admit a linear representer theorem if every member of the class admits minimizers that lie in the finite dimensional subspace spanned by the representers of the data. A recent characterization states that certain classes of regularization functionals with differentiable regularization term admit a linear representer theorem for any choice of the data if and only if the regularization term is a radial nondecreasing function. In this paper, we extend such result by weakening the assumptions on the regularization term. In particular, the main result of this paper implies that, for a sufficiently large family of regularization functionals, radial nondecreasing functions are the only lower semicontinuous regularization terms that guarantee existence of a representer theorem for any choice of the data. 1

6 0.59350693 20 nips-2012-A Stochastic Gradient Method with an Exponential Convergence Rate for Finite Training Sets

7 0.59237319 335 nips-2012-The Bethe Partition Function of Log-supermodular Graphical Models

8 0.43504718 106 nips-2012-Dynamical And-Or Graph Learning for Object Shape Modeling and Detection

9 0.40823817 35 nips-2012-Adaptive Learning of Smoothing Functions: Application to Electricity Load Forecasting

10 0.38967618 290 nips-2012-Recovery of Sparse Probability Measures via Convex Programming

11 0.38436064 302 nips-2012-Scaling MPE Inference for Constrained Continuous Markov Random Fields with Consensus Optimization

12 0.34073597 85 nips-2012-Convergence and Energy Landscape for Cheeger Cut Clustering

13 0.3283577 60 nips-2012-Bayesian nonparametric models for ranked data

14 0.32230774 213 nips-2012-Minimization of Continuous Bethe Approximations: A Positive Variation

15 0.32210961 29 nips-2012-Accelerated Training for Matrix-norm Regularization: A Boosting Approach

16 0.2687901 263 nips-2012-Optimal Regularized Dual Averaging Methods for Stochastic Optimization

17 0.26493505 240 nips-2012-Newton-Like Methods for Sparse Inverse Covariance Estimation

18 0.25583327 34 nips-2012-Active Learning of Multi-Index Function Models

19 0.25000161 134 nips-2012-Finite Sample Convergence Rates of Zero-Order Stochastic Optimization Methods

20 0.23778148 21 nips-2012-A Unifying Perspective of Parametric Policy Search Methods for Markov Decision Processes


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.086), (21, 0.324), (38, 0.143), (42, 0.026), (54, 0.03), (55, 0.017), (64, 0.013), (74, 0.02), (76, 0.116), (80, 0.085), (92, 0.027)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.92191941 195 nips-2012-Learning visual motion in recurrent neural networks

Author: Marius Pachitariu, Maneesh Sahani

Abstract: We present a dynamic nonlinear generative model for visual motion based on a latent representation of binary-gated Gaussian variables. Trained on sequences of images, the model learns to represent different movement directions in different variables. We use an online approximate inference scheme that can be mapped to the dynamics of networks of neurons. Probed with drifting grating stimuli and moving bars of light, neurons in the model show patterns of responses analogous to those of direction-selective simple cells in primary visual cortex. Most model neurons also show speed tuning and respond equally well to a range of motion directions and speeds aligned to the constraint line of their respective preferred speed. We show how these computations are enabled by a specific pattern of recurrent connections learned by the model. 1

same-paper 2 0.87792677 300 nips-2012-Scalable nonconvex inexact proximal splitting

Author: Suvrit Sra

Abstract: We study a class of large-scale, nonsmooth, and nonconvex optimization problems. In particular, we focus on nonconvex problems with composite objectives. This class includes the extensively studied class of convex composite objective problems as a subclass. To solve composite nonconvex problems we introduce a powerful new framework based on asymptotically nonvanishing errors, avoiding the common stronger assumption of vanishing errors. Within our new framework we derive both batch and incremental proximal splitting algorithms. To our knowledge, our work is first to develop and analyze incremental nonconvex proximalsplitting algorithms, even if we were to disregard the ability to handle nonvanishing errors. We illustrate one instance of our general framework by showing an application to large-scale nonsmooth matrix factorization. 1

3 0.87135738 1 nips-2012-3D Object Detection and Viewpoint Estimation with a Deformable 3D Cuboid Model

Author: Sanja Fidler, Sven Dickinson, Raquel Urtasun

Abstract: This paper addresses the problem of category-level 3D object detection. Given a monocular image, our aim is to localize the objects in 3D by enclosing them with tight oriented 3D bounding boxes. We propose a novel approach that extends the well-acclaimed deformable part-based model [1] to reason in 3D. Our model represents an object class as a deformable 3D cuboid composed of faces and parts, which are both allowed to deform with respect to their anchors on the 3D box. We model the appearance of each face in fronto-parallel coordinates, thus effectively factoring out the appearance variation induced by viewpoint. Our model reasons about face visibility patters called aspects. We train the cuboid model jointly and discriminatively and share weights across all aspects to attain efficiency. Inference then entails sliding and rotating the box in 3D and scoring object hypotheses. While for inference we discretize the search space, the variables are continuous in our model. We demonstrate the effectiveness of our approach in indoor and outdoor scenarios, and show that our approach significantly outperforms the stateof-the-art in both 2D [1] and 3D object detection [2]. 1

4 0.82711673 60 nips-2012-Bayesian nonparametric models for ranked data

Author: Francois Caron, Yee W. Teh

Abstract: We develop a Bayesian nonparametric extension of the popular Plackett-Luce choice model that can handle an infinite number of choice items. Our framework is based on the theory of random atomic measures, with the prior specified by a gamma process. We derive a posterior characterization and a simple and effective Gibbs sampler for posterior simulation. We develop a time-varying extension of our model, and apply it to the New York Times lists of weekly bestselling books. 1

5 0.80315816 105 nips-2012-Dynamic Pruning of Factor Graphs for Maximum Marginal Prediction

Author: Christoph H. Lampert

Abstract: We study the problem of maximum marginal prediction (MMP) in probabilistic graphical models, a task that occurs, for example, as the Bayes optimal decision rule under a Hamming loss. MMP is typically performed as a two-stage procedure: one estimates each variable’s marginal probability and then forms a prediction from the states of maximal probability. In this work we propose a simple yet effective technique for accelerating MMP when inference is sampling-based: instead of the above two-stage procedure we directly estimate the posterior probability of each decision variable. This allows us to identify the point of time when we are sufficiently certain about any individual decision. Whenever this is the case, we dynamically prune the variables we are confident about from the underlying factor graph. Consequently, at any time only samples of variables whose decision is still uncertain need to be created. Experiments in two prototypical scenarios, multi-label classification and image inpainting, show that adaptive sampling can drastically accelerate MMP without sacrificing prediction accuracy. 1

6 0.79440826 302 nips-2012-Scaling MPE Inference for Constrained Continuous Markov Random Fields with Consensus Optimization

7 0.76646972 123 nips-2012-Exponential Concentration for Mutual Information Estimation with Application to Forests

8 0.71760118 23 nips-2012-A lattice filter model of the visual pathway

9 0.70891345 18 nips-2012-A Simple and Practical Algorithm for Differentially Private Data Release

10 0.69741064 113 nips-2012-Efficient and direct estimation of a neural subunit model for sensory coding

11 0.67947042 190 nips-2012-Learning optimal spike-based representations

12 0.66361821 114 nips-2012-Efficient coding provides a direct link between prior and likelihood in perceptual Bayesian inference

13 0.66169137 94 nips-2012-Delay Compensation with Dynamical Synapses

14 0.6604135 120 nips-2012-Exact and Stable Recovery of Sequences of Signals with Sparse Increments via Differential 1-Minimization

15 0.65453911 152 nips-2012-Homeostatic plasticity in Bayesian spiking networks as Expectation Maximization with posterior constraints

16 0.65339452 112 nips-2012-Efficient Spike-Coding with Multiplicative Adaptation in a Spike Response Model

17 0.64687127 333 nips-2012-Synchronization can Control Regularization in Neural Systems via Correlated Noise Processes

18 0.64385659 24 nips-2012-A mechanistic model of early sensory processing based on subtracting sparse representations

19 0.64058137 137 nips-2012-From Deformations to Parts: Motion-based Segmentation of 3D Objects

20 0.63706577 239 nips-2012-Neuronal Spike Generation Mechanism as an Oversampling, Noise-shaping A-to-D converter