nips nips2009 nips2009-180 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Gert R. Lanckriet, Bharath K. Sriperumbudur
Abstract: The concave-convex procedure (CCCP) is a majorization-minimization algorithm that solves d.c. (difference of convex functions) programs as a sequence of convex programs. In machine learning, CCCP is extensively used in many learning algorithms like sparse support vector machines (SVMs), transductive SVMs, sparse principal component analysis, etc. Though widely used in many applications, the convergence behavior of CCCP has not gotten a lot of specific attention. Yuille and Rangarajan analyzed its convergence in their original paper, however, we believe the analysis is not complete. Although the convergence of CCCP can be derived from the convergence of the d.c. algorithm (DCA), its proof is more specialized and technical than actually required for the specific case of CCCP. In this paper, we follow a different reasoning and show how Zangwill’s global convergence theory of iterative algorithms provides a natural framework to prove the convergence of CCCP, allowing a more elegant and simple proof. This underlines Zangwill’s theory as a powerful and general framework to deal with the convergence issues of iterative algorithms, after also being used to prove the convergence of algorithms like expectation-maximization, generalized alternating minimization, etc. In this paper, we provide a rigorous analysis of the convergence of CCCP by addressing these questions: (i) When does CCCP find a local minimum or a stationary point of the d.c. program under consideration? (ii) When does the sequence generated by CCCP converge? We also present an open problem on the issue of local convergence of CCCP.
Reference: text
sentIndex sentText sentNum sentScore
1 (difference of convex functions) programs as a sequence of convex programs. [sent-9, score-0.262]
2 Though widely used in many applications, the convergence behavior of CCCP has not gotten a lot of specific attention. [sent-11, score-0.236]
3 Yuille and Rangarajan analyzed its convergence in their original paper, however, we believe the analysis is not complete. [sent-12, score-0.233]
4 Although the convergence of CCCP can be derived from the convergence of the d. [sent-13, score-0.414]
5 In this paper, we follow a different reasoning and show how Zangwill’s global convergence theory of iterative algorithms provides a natural framework to prove the convergence of CCCP, allowing a more elegant and simple proof. [sent-16, score-0.592]
6 This underlines Zangwill’s theory as a powerful and general framework to deal with the convergence issues of iterative algorithms, after also being used to prove the convergence of algorithms like expectation-maximization, generalized alternating minimization, etc. [sent-17, score-0.575]
7 In this paper, we provide a rigorous analysis of the convergence of CCCP by addressing these questions: (i) When does CCCP find a local minimum or a stationary point of the d. [sent-18, score-0.435]
8 We also present an open problem on the issue of local convergence of CCCP. [sent-22, score-0.266]
9 (difference of convex functions) programs of the form, min f (x) x s. [sent-25, score-0.172]
10 ci (x) ≤ 0, i ∈ [m], dj (x) = 0, j ∈ [p], (1) where f (x) = u(x) − v(x) with u, v and ci being real-valued convex functions, dj being an affine function, all defined on Rn . [sent-27, score-0.348]
11 The CCCP 1 algorithm is an iterative procedure that solves the following sequence of convex programs, x(l+1) ∈ arg min u(x) − xT v(x(l) ) x s. [sent-33, score-0.29]
12 (2) As can be seen from (2), the idea of CCCP is to linearize the concave part of f , which is −v, around a solution obtained in the current iterate so that u(x) − xT v(x(l) ) is convex in x, and therefore the non-convex program in (1) is solved as a sequence of convex programs as shown in (2). [sent-36, score-0.359]
13 The algorithm in (2) starts at some random point x(0) ∈ {x : ci (x) ≤ 0, i ∈ [m]; dj (x) = 0, j ∈ [p]}, solves the program in (2) and therefore generates a sequence {x(l) }∞ . [sent-42, score-0.329]
14 The goal of this paper l=0 is to study the convergence of {x(l) }∞ : (i) When does CCCP find a local minimum or a stationary l=0 point1 of the program in (1)? [sent-43, score-0.458]
15 , f (x(l+1) ) ≤ f (x(l) ) and argued that this descent property ensures the convergence of {x(l) }∞ to a minimum or saddle point of the program in (1). [sent-50, score-0.346]
16 Answering the previous questions, however, requires a rigorous proof of the convergence of CCCP that explicitly mentions the conditions under which it can happen. [sent-52, score-0.292]
17 program of the form min{u(x) − v(x) : x ∈ Rn }, where it is assumed that u and v are proper lower semi-continuous convex functions, which form a larger class of functions than the class of differentiable functions. [sent-59, score-0.19]
18 Unlike in CCCP, DCA involves constructing two sets of convex programs (called the primal and dual programs) and solving them iteratively in succession such that the solution of the primal is the initialization to the dual and vice-versa. [sent-61, score-0.145]
19 [8, Theorem 3] proves the convergence of DCA for general d. [sent-63, score-0.207]
20 In this work, we follow a fundamentally different approach and show that the convergence of CCCP, specifically, can be analyzed in a more simple and elegant way, by relying on Zangwill’s global convergence theory of iterative algorithms. [sent-70, score-0.624]
21 The tools employed in our proof are of completely different flavor than the ones used in the proof of DCA convergence: DCA convergence analysis exploits d. [sent-72, score-0.273]
22 Zangwill’s theory is a powerful and general framework to deal with the convergence issues of iterative algorithms. [sent-75, score-0.279]
23 It has also been used to prove the convergence of the expectation-maximation (EM) algorithm [29], generalized alternating minimization algorithms [12], multiplicative updates in non-negative quadratic programming [25], etc. [sent-76, score-0.349]
24 and is therefore a natural framework to analyze the convergence of CCCP in a more direct way. [sent-77, score-0.231]
25 In Section 3, we present Zangwill’s theory of global convergence, which is a general framework to analyze the convergence behavior of iterative algorithms. [sent-85, score-0.393]
26 This theory is used to address the global convergence of CCCP in Section 4. [sent-86, score-0.268]
27 This involves analyzing the fixed points of the CCCP algorithm in (2) and then showing that the fixed points are the stationary points of the program in (1). [sent-87, score-0.394]
28 1 to analyze the convergence of the constrained concave-convex procedure that was proposed by [26] to deal with d. [sent-89, score-0.289]
29 We briefly discuss the local convergence issues of CCCP in Section 5 and conclude the section with an open question. [sent-94, score-0.266]
30 The majorization algorithm corresponding with this majorization function g updates x at iteration l by x(l+1) ∈ arg min g(x, x(l) ), x∈Ω (4) unless we already have x(l) ∈ arg minx∈Ω g(x, x(l) ), in which case the algorithm stops. [sent-103, score-0.323]
31 The majorization function, g is usually constructed by using Jensen’s inequality for convex functions, the first-order Taylor approximation or the quadratic upper bound principle [1]. [sent-104, score-0.215]
32 x∈Ω 3 (7) If Ω is a convex set, then the above procedure reduces to CCCP, which solves a sequence of convex programs. [sent-125, score-0.234]
33 This behavior can be analyzed by using the global convergence theory of iterative algorithms developed by Zangwill [31]. [sent-134, score-0.417]
34 To understand the convergence of an iterative procedure like CCCP, we need to understand the notion of a set-valued mapping, or point-to-set mapping, which is central to the theory of global convergence. [sent-137, score-0.36]
35 A point-to-set map Ψ is said to be closed at x0 ∈ X if xk → x0 as k → ∞, xk ∈ X and yk → y0 as k → ∞, yk ∈ Ψ(xk ), imply y0 ∈ Ψ(x0 ). [sent-141, score-0.537]
36 A point-to-set map Ψ is said to be closed on S ⊂ X if it is closed at every point of S. [sent-143, score-0.21]
37 A fixed point of the map Ψ : X → P(X) is a point x for which {x} = Ψ(x), whereas a generalized fixed point of Ψ is a point for which x ∈ Ψ(x). [sent-144, score-0.213]
38 Ψ is said to be uniformly compact on X if there exists a compact set H independent of x such that Ψ(x) ⊂ H for all x ∈ X. [sent-145, score-0.218]
39 A is said to be globally convergent if for any chosen initial point x0 , the sequence {xk }∞ generated by xk+1 ∈ A(xk ) (or a subsequence) converges to a point for which a k=0 necessary condition of optimality holds. [sent-156, score-0.235]
40 The property of global convergence expresses, in a sense, the certainty that the algorithm works. [sent-157, score-0.268]
41 It is very important to stress the fact that it does not imply (contrary to what the term might suggest) convergence to a global optimum for all initial points x0 . [sent-158, score-0.355]
42 With the above mentioned concepts, we now state Zangwill’s global convergence theorem [31, Convergence theorem A, page 91]. [sent-159, score-0.448]
43 Suppose (1) All points xk are in a compact set S ⊂ X. [sent-163, score-0.325]
44 The general idea in showing the global convergence of an algorithm, A is to invoke Theorem 2 by appropriately defining φ and Γ. [sent-171, score-0.291]
45 For an algorithm A that solves the minimization problem, min{f (x) : x ∈ Ω}, the solution set, Γ is usually chosen to be the set of corresponding stationary points and φ can be chosen to be the objective function itself, i. [sent-172, score-0.256]
46 In Theorem 2, the convergence of φ(xk ) to φ(x∗ ) does not automatically imply the convergence of xk to x∗ . [sent-175, score-0.634]
47 Let A : X → P(X) be a point-to-set map such that A is uniformly compact, closed and strictly monotone on X, where X is a closed subset of Rn . [sent-180, score-0.225]
48 If {xk }∞ is any sequence k=0 generated by A, then all limit points will be fixed points of A, φ(xk ) → φ(x∗ ) =: φ∗ as k → ∞, where x∗ is a fixed point, xk+1 − xk → 0, and either {xk }∞ converges or the set of limit points k=0 of {xk }∞ is connected. [sent-181, score-0.522]
49 Using these results on the global convergence of algorithms, [29] has studied the convergence properties of the EM algorithm, while [12] analyzed the convergence of generalized alternating minimization procedures. [sent-185, score-0.803]
50 In the following section, we use these results to analyze the convergence of CCCP. [sent-186, score-0.231]
51 Let Acccp be the point-to-set map, x(l+1) ∈ Acccp (x(l) ) such that Acccp (y) = arg min{u(x) − xT v(y) : x ∈ Ω}, (9) where Ω := {x : ci (x) ≤ 0, i ∈ [m], dj (x) = 0, j ∈ [p]}. [sent-190, score-0.165]
52 We now present the global convergence theorem for CCCP. [sent-192, score-0.358]
53 Then, assuming suitable constraint qualification, all the limit points of {x(l) }∞ are l=0 stationary points of the d. [sent-198, score-0.321]
54 In addition liml→∞ (u(x(l) )−v(x(l) )) = u(x∗ )−v(x∗ ), where x∗ is some stationary point of Acccp . [sent-201, score-0.174]
55 The idea of the proof is to show that any generalized fixed point of Acccp is a stationary point of (1), which is shown below in Lemma 5, and then use Theorem 2 to analyze the generalized fixed points. [sent-203, score-0.351]
56 Then, x∗ is a stationary point of the program in (1). [sent-206, score-0.251]
57 Then, there exists ∗ Lagrange multipliers {ηi }m ⊂ R+ and {µ∗ }p ⊂ R such that the following KKT conditions i=1 j j=1 hold: m p ∗ u(x∗ ) − v(x∗ ) + i=1 ηi ci (x∗ ) + j=1 µ∗ dj (x∗ ) = 0, j ∗ ∗ (10) ci (x∗ ) ≤ 0, ηi ≥ 0, ci (x∗ )ηi = 0, ∀ i ∈ [m] d (x ) = 0, µ∗ ∈ R, ∀ j ∈ [p]. [sent-209, score-0.322]
58 j ∗ j ∗ (10) is exactly the KKT conditions of (1) which are satisfied by (x∗ , {ηi }, {µ∗ }) and therefore, x∗ j is a stationary point of (1). [sent-210, score-0.206]
59 Therefore, by Theorem 2, all the limit points of {x(l) }∞ are the generalized fixed points of Acccp and liml→∞ (u(x(l) ) − l=0 v(x(l) )) = u(x∗ ) − v(x∗ ), where x∗ is some generalized fixed point of Acccp . [sent-225, score-0.279]
60 By Lemma 5, since the generalized fixed points of Acccp are stationary points of (1), the result follows. [sent-226, score-0.301]
61 Such an oscillatory behavior can be avoided if we allow Acccp to have fixed points instead of generalized fixed points. [sent-239, score-0.154]
62 With appropriate assumptions on u and v, the following stronger result can be obtained on the convergence of CCCP through Theorem 3. [sent-240, score-0.207]
63 Suppose Acccp is uniformly compact on Ω and Acccp (x) is nonempty for any x ∈ Ω. [sent-245, score-0.153]
64 Then, assuming suitable constraint qualification, all the limit points of {x(l) }∞ l=0 are stationary points of the d. [sent-246, score-0.321]
65 Since u and v are strictly convex, the strict descent property in (8) holds and therefore Acccp is strictly monotonic with respect to f . [sent-251, score-0.187]
66 Under the assumptions made about Acccp , Theorem 3 can be invoked, which says that all the limit points of {x(l) }∞ are fixed points of Acccp , which either l=0 converge or form a connected compact set. [sent-252, score-0.258]
67 From Lemma 5, the set of fixed points of Acccp are already in the set of stationary points of (1) and the desired result follows from Theorem 3. [sent-253, score-0.258]
68 These results explicitly provide sufficient conditions on u, v, {ci } and {dj } under which the CCCP algorithm finds a stationary point of (1) along with the convergence of the sequence generated by the algorithm. [sent-255, score-0.462]
69 From Theorem 8, it should be clear that convergence of f (x(l) ) to f ∗ does not automatically imply the convergence of x(l) to x∗ . [sent-256, score-0.442]
70 The convergence in the latter sense requires more stringent conditions like the finiteness of the set of stationary points of (1) that assume the value of f ∗ . [sent-257, score-0.438]
71 4 Weierstrass theorem states: If f is a real continuous function on a compact set K ⊂ Rn , then the problem min{f (x) : x ∈ K} has an optimal solution x∗ ∈ K. [sent-258, score-0.164]
72 ui (x) − vi (x) ≤ 0, i ∈ [m], (12) where {ui }, {vi } are real-valued convex and differentiable functions defined on Rn . [sent-267, score-0.204]
73 u0 (x) − v0 (x; x(l) ) ui (x) − vi (x; x(l) ) ≤ 0, i ∈ [m], (13) where vi (x; x(l) ) := vi (x(l) ) + (x − x(l) )T vi (x(l) ). [sent-270, score-0.238]
74 Though [26, Theorem 1] have provided a convergence analysis for the algorithm in (13), it is however not complete due to the fact that the convergence of {x(l) }∞ is assumed. [sent-272, score-0.414]
75 In this subsection, we provide its convergence analysis, following an l=0 approach similar to what we did for CCCP by considering a point-to-set map, Bccp associated with the iterative algorithm in (13), where x(l+1) ∈ Bccp (x(l) ). [sent-273, score-0.279]
76 In Theorem 10, we provide the global convergence result for the constrained concave-convex procedure, which is an equivalent version of Theorem 4 for CCCP. [sent-274, score-0.306]
77 Then, x∗ is a stationary point of the program in (12). [sent-279, score-0.251]
78 i ∗ i ∗ i ∗ which is exactly the KKT conditions for (12) satisfied by (x∗ , {ηi }) and therefore, x∗ is a stationary point of (12). [sent-282, score-0.206]
79 Suppose Bccp is uniformly compact on Ω := {x : ui (x) − vi (x) ≤ 0, i ∈ [m]} and Bccp (x) is nonempty for any x ∈ Ω. [sent-287, score-0.244]
80 Then, assuming suitable constraint qualification, all the limit points of {x(l) }∞ are stationary points of the d. [sent-288, score-0.321]
81 In addition l=0 liml→∞ (u0 (x(l) ) − v0 (x(l) )) = u0 (x∗ ) − v0 (x∗ ), where x∗ is some stationary point of Bccp . [sent-291, score-0.174]
82 [26, Theorem 1] has proved the descent property, similar to that of (5), which simply follows from the linear majorization idea and therefore the descent property in condition (2) of Theorem 2 holds. [sent-295, score-0.179]
83 5 On the local convergence of CCCP: An open problem The study so far has been devoted to the global convergence analysis of CCCP and the constrained concave-convex procedure. [sent-297, score-0.572]
84 As mentioned before, we say an algorithm is globally convergent if for any chosen starting point, x0 , the sequence {xk }∞ generated by xk+1 ∈ A(xk ) converges to a k=0 point for which a necessary condition of optimality holds. [sent-298, score-0.161]
85 In the results so far, we have shown 7 that all the limit points of any sequence generated by CCCP (resp. [sent-299, score-0.149]
86 its constrained version) are the stationary points (local extrema or saddle points) of the program in (1) (resp. [sent-300, score-0.314]
87 This is the question of local convergence that needs to be addressed. [sent-304, score-0.241]
88 [24] has studied the local convergence of bound optimization algorithms (of which CCCP is an example) to compare the rate of convergence of such methods to that of gradient and second-order methods. [sent-305, score-0.49]
89 They showed that depending on the curvature of u and v, CCCP will exhibit either quasi-Newton behavior with fast, typically superlinear convergence or extremely slow, first-order convergence behavior. [sent-307, score-0.443]
90 3] provides a way to study the local convergence of iterative algorithms. [sent-311, score-0.313]
91 Few remarks are in place regarding the usage of Proposition 11 to study the local convergence of CCCP. [sent-315, score-0.241]
92 Note that Proposition 11 treats Ψ as a point-to-point map which can be obtained by choosing u and v to be strictly convex so that x(l+1) is the unique minimizer of (2). [sent-316, score-0.141]
93 Therefore, the desired result of local convergence with at least linear rate of convergence is obtained if we show that ρ(Ψ (x∗ )) < 1. [sent-318, score-0.448]
94 On the other hand, the local convergence behavior of DCA has been proved for two important classes of d. [sent-321, score-0.27]
95 In this work, we analyze its global convergence behavior by using results from the global convergence theory of iterative algorithms. [sent-328, score-0.661]
96 We explicitly mention the conditions under which any sequence generated by CCCP converges to a stationary point of a d. [sent-329, score-0.277]
97 The proposed approach allows an elegant and direct proof and is fundamentally different from the highly technical proof for the convergence of DCA, which implies convergence for CCCP. [sent-332, score-0.531]
98 It illustrates the power and generality of Zangwill’s global convergence theory as a framework for proving the convergence of iterative algorithms. [sent-333, score-0.573]
99 We also briefly discuss the local convergence of CCCP and present an open question, the settlement of which would address the local convergence behavior of CCCP. [sent-334, score-0.536]
100 Sufficient conditions for the convergence of monotonic mathematical programming algorithms. [sent-496, score-0.298]
wordName wordTfidf (topN-words)
[('cccp', 0.579), ('acccp', 0.555), ('convergence', 0.207), ('xk', 0.192), ('stationary', 0.14), ('bccp', 0.139), ('majorization', 0.123), ('dca', 0.121), ('mm', 0.105), ('zangwill', 0.099), ('quali', 0.093), ('theorem', 0.09), ('program', 0.077), ('programs', 0.077), ('ci', 0.075), ('compact', 0.074), ('iterative', 0.072), ('kkt', 0.069), ('convex', 0.068), ('dj', 0.065), ('global', 0.061), ('points', 0.059), ('closed', 0.051), ('nonempty', 0.049), ('sequence', 0.049), ('vi', 0.049), ('rn', 0.048), ('strict', 0.047), ('hunter', 0.046), ('rangarajan', 0.046), ('differentiable', 0.045), ('yuille', 0.044), ('generalized', 0.043), ('ui', 0.042), ('limit', 0.041), ('dinh', 0.04), ('hoai', 0.04), ('liml', 0.04), ('said', 0.04), ('strictly', 0.039), ('constrained', 0.038), ('pham', 0.037), ('suppose', 0.037), ('lemma', 0.036), ('point', 0.034), ('local', 0.034), ('map', 0.034), ('monotonic', 0.034), ('transductive', 0.034), ('svms', 0.034), ('convergent', 0.033), ('proof', 0.033), ('conditions', 0.032), ('surrogate', 0.032), ('em', 0.031), ('minorization', 0.031), ('ortega', 0.031), ('ostrowski', 0.031), ('weierstrass', 0.031), ('uniformly', 0.03), ('solves', 0.029), ('satis', 0.029), ('behavior', 0.029), ('minimization', 0.028), ('fundamentally', 0.028), ('descent', 0.028), ('imply', 0.028), ('min', 0.027), ('proposition', 0.027), ('questions', 0.026), ('analyzed', 0.026), ('proving', 0.026), ('open', 0.025), ('programming', 0.025), ('arg', 0.025), ('converge', 0.025), ('inequality', 0.024), ('alternating', 0.024), ('analyze', 0.024), ('xed', 0.023), ('elegant', 0.023), ('gert', 0.023), ('invoke', 0.023), ('oscillatory', 0.023), ('optimality', 0.023), ('converges', 0.022), ('constraint', 0.022), ('nonconvex', 0.022), ('jolla', 0.022), ('sriperumbudur', 0.022), ('subsequence', 0.022), ('trust', 0.022), ('algorithms', 0.022), ('unconstrained', 0.021), ('optimization', 0.02), ('concave', 0.02), ('closure', 0.02), ('monotone', 0.02), ('procedure', 0.02), ('rigorous', 0.02)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999928 180 nips-2009-On the Convergence of the Concave-Convex Procedure
Author: Gert R. Lanckriet, Bharath K. Sriperumbudur
Abstract: The concave-convex procedure (CCCP) is a majorization-minimization algorithm that solves d.c. (difference of convex functions) programs as a sequence of convex programs. In machine learning, CCCP is extensively used in many learning algorithms like sparse support vector machines (SVMs), transductive SVMs, sparse principal component analysis, etc. Though widely used in many applications, the convergence behavior of CCCP has not gotten a lot of specific attention. Yuille and Rangarajan analyzed its convergence in their original paper, however, we believe the analysis is not complete. Although the convergence of CCCP can be derived from the convergence of the d.c. algorithm (DCA), its proof is more specialized and technical than actually required for the specific case of CCCP. In this paper, we follow a different reasoning and show how Zangwill’s global convergence theory of iterative algorithms provides a natural framework to prove the convergence of CCCP, allowing a more elegant and simple proof. This underlines Zangwill’s theory as a powerful and general framework to deal with the convergence issues of iterative algorithms, after also being used to prove the convergence of algorithms like expectation-maximization, generalized alternating minimization, etc. In this paper, we provide a rigorous analysis of the convergence of CCCP by addressing these questions: (i) When does CCCP find a local minimum or a stationary point of the d.c. program under consideration? (ii) When does the sequence generated by CCCP converge? We also present an open problem on the issue of local convergence of CCCP.
2 0.083095565 147 nips-2009-Matrix Completion from Noisy Entries
Author: Raghunandan Keshavan, Andrea Montanari, Sewoong Oh
Abstract: Given a matrix M of low-rank, we consider the problem of reconstructing it from noisy observations of a small, random subset of its entries. The problem arises in a variety of applications, from collaborative filtering (the ‘Netflix problem’) to structure-from-motion and positioning. We study a low complexity algorithm introduced in [1], based on a combination of spectral techniques and manifold optimization, that we call here O PT S PACE. We prove performance guarantees that are order-optimal in a number of circumstances. 1
3 0.077131435 30 nips-2009-An Integer Projected Fixed Point Method for Graph Matching and MAP Inference
Author: Marius Leordeanu, Martial Hebert, Rahul Sukthankar
Abstract: Graph matching and MAP inference are essential problems in computer vision and machine learning. We introduce a novel algorithm that can accommodate both problems and solve them efficiently. Recent graph matching algorithms are based on a general quadratic programming formulation, which takes in consideration both unary and second-order terms reflecting the similarities in local appearance as well as in the pairwise geometric relationships between the matched features. This problem is NP-hard, therefore most algorithms find approximate solutions by relaxing the original problem. They find the optimal continuous solution of the modified problem, ignoring during optimization the original discrete constraints. Then the continuous solution is quickly binarized at the end, but very little attention is put into this final discretization step. In this paper we argue that the stage in which a discrete solution is found is crucial for good performance. We propose an efficient algorithm, with climbing and convergence properties, that optimizes in the discrete domain the quadratic score, and it gives excellent results either by itself or by starting from the solution returned by any graph matching algorithm. In practice it outperforms state-or-the art graph matching algorithms and it also significantly improves their performance if used in combination. When applied to MAP inference, the algorithm is a parallel extension of Iterated Conditional Modes (ICM) with climbing and convergence properties that make it a compelling alternative to the sequential ICM. In our experiments on MAP inference our algorithm proved its effectiveness by significantly outperforming [13], ICM and Max-Product Belief Propagation. 1
4 0.057203159 33 nips-2009-Analysis of SVM with Indefinite Kernels
Author: Yiming Ying, Colin Campbell, Mark Girolami
Abstract: The recent introduction of indefinite SVM by Luss and d’Aspremont [15] has effectively demonstrated SVM classification with a non-positive semi-definite kernel (indefinite kernel). This paper studies the properties of the objective function introduced there. In particular, we show that the objective function is continuously differentiable and its gradient can be explicitly computed. Indeed, we further show that its gradient is Lipschitz continuous. The main idea behind our analysis is that the objective function is smoothed by the penalty term, in its saddle (min-max) representation, measuring the distance between the indefinite kernel matrix and the proxy positive semi-definite one. Our elementary result greatly facilitates the application of gradient-based algorithms. Based on our analysis, we further develop Nesterov’s smooth optimization approach [17, 18] for indefinite SVM which has an optimal convergence rate for smooth problems. Experiments on various benchmark datasets validate our analysis and demonstrate the efficiency of our proposed algorithms.
5 0.055987298 222 nips-2009-Sparse Estimation Using General Likelihoods and Non-Factorial Priors
Author: David P. Wipf, Srikantan S. Nagarajan
Abstract: Finding maximally sparse representations from overcomplete feature dictionaries frequently involves minimizing a cost function composed of a likelihood (or data fit) term and a prior (or penalty function) that favors sparsity. While typically the prior is factorial, here we examine non-factorial alternatives that have a number of desirable properties relevant to sparse estimation and are easily implemented using an efficient and globally-convergent, reweighted 1 -norm minimization procedure. The first method under consideration arises from the sparse Bayesian learning (SBL) framework. Although based on a highly non-convex underlying cost function, in the context of canonical sparse estimation problems, we prove uniform superiority of this method over the Lasso in that, (i) it can never do worse, and (ii) for any dictionary and sparsity profile, there will always exist cases where it does better. These results challenge the prevailing reliance on strictly convex penalty functions for finding sparse solutions. We then derive a new non-factorial variant with similar properties that exhibits further performance improvements in some empirical tests. For both of these methods, as well as traditional factorial analogs, we demonstrate the effectiveness of reweighted 1 -norm algorithms in handling more general sparse estimation problems involving classification, group feature selection, and non-negativity constraints. As a byproduct of this development, a rigorous reformulation of sparse Bayesian classification (e.g., the relevance vector machine) is derived that, unlike the original, involves no approximation steps and descends a well-defined objective function. 1
6 0.054950364 20 nips-2009-A unified framework for high-dimensional analysis of $M$-estimators with decomposable regularizers
7 0.053770386 61 nips-2009-Convex Relaxation of Mixture Regression with Efficient Algorithms
8 0.050410375 208 nips-2009-Robust Principal Component Analysis: Exact Recovery of Corrupted Low-Rank Matrices via Convex Optimization
9 0.050124452 55 nips-2009-Compressed Least-Squares Regression
10 0.049755365 116 nips-2009-Information-theoretic lower bounds on the oracle complexity of convex optimization
11 0.049418334 245 nips-2009-Thresholding Procedures for High Dimensional Variable Selection and Statistical Estimation
12 0.048702255 67 nips-2009-Directed Regression
13 0.046718337 128 nips-2009-Learning Non-Linear Combinations of Kernels
14 0.046614829 41 nips-2009-Bayesian Source Localization with the Multivariate Laplace Prior
15 0.044466335 223 nips-2009-Sparse Metric Learning via Smooth Optimization
16 0.044091668 26 nips-2009-Adaptive Regularization for Transductive Support Vector Machine
17 0.043444175 9 nips-2009-A Game-Theoretic Approach to Hypergraph Clustering
18 0.043433867 60 nips-2009-Convergent Temporal-Difference Learning with Arbitrary Smooth Function Approximation
19 0.042940237 229 nips-2009-Statistical Analysis of Semi-Supervised Learning: The Limit of Infinite Unlabelled Data
20 0.042770498 187 nips-2009-Particle-based Variational Inference for Continuous Systems
topicId topicWeight
[(0, -0.131), (1, 0.111), (2, 0.018), (3, 0.035), (4, -0.005), (5, -0.007), (6, 0.03), (7, -0.017), (8, 0.038), (9, 0.009), (10, -0.011), (11, -0.011), (12, 0.013), (13, 0.024), (14, 0.014), (15, -0.012), (16, -0.024), (17, -0.047), (18, -0.043), (19, -0.038), (20, -0.032), (21, 0.013), (22, 0.014), (23, 0.02), (24, -0.041), (25, -0.032), (26, 0.03), (27, -0.073), (28, -0.055), (29, 0.062), (30, 0.098), (31, -0.019), (32, -0.038), (33, -0.069), (34, 0.051), (35, 0.013), (36, 0.055), (37, -0.066), (38, -0.086), (39, 0.027), (40, -0.041), (41, 0.026), (42, -0.09), (43, 0.003), (44, 0.098), (45, 0.012), (46, -0.053), (47, -0.053), (48, -0.122), (49, -0.11)]
simIndex simValue paperId paperTitle
same-paper 1 0.92690325 180 nips-2009-On the Convergence of the Concave-Convex Procedure
Author: Gert R. Lanckriet, Bharath K. Sriperumbudur
Abstract: The concave-convex procedure (CCCP) is a majorization-minimization algorithm that solves d.c. (difference of convex functions) programs as a sequence of convex programs. In machine learning, CCCP is extensively used in many learning algorithms like sparse support vector machines (SVMs), transductive SVMs, sparse principal component analysis, etc. Though widely used in many applications, the convergence behavior of CCCP has not gotten a lot of specific attention. Yuille and Rangarajan analyzed its convergence in their original paper, however, we believe the analysis is not complete. Although the convergence of CCCP can be derived from the convergence of the d.c. algorithm (DCA), its proof is more specialized and technical than actually required for the specific case of CCCP. In this paper, we follow a different reasoning and show how Zangwill’s global convergence theory of iterative algorithms provides a natural framework to prove the convergence of CCCP, allowing a more elegant and simple proof. This underlines Zangwill’s theory as a powerful and general framework to deal with the convergence issues of iterative algorithms, after also being used to prove the convergence of algorithms like expectation-maximization, generalized alternating minimization, etc. In this paper, we provide a rigorous analysis of the convergence of CCCP by addressing these questions: (i) When does CCCP find a local minimum or a stationary point of the d.c. program under consideration? (ii) When does the sequence generated by CCCP converge? We also present an open problem on the issue of local convergence of CCCP.
2 0.70603472 30 nips-2009-An Integer Projected Fixed Point Method for Graph Matching and MAP Inference
Author: Marius Leordeanu, Martial Hebert, Rahul Sukthankar
Abstract: Graph matching and MAP inference are essential problems in computer vision and machine learning. We introduce a novel algorithm that can accommodate both problems and solve them efficiently. Recent graph matching algorithms are based on a general quadratic programming formulation, which takes in consideration both unary and second-order terms reflecting the similarities in local appearance as well as in the pairwise geometric relationships between the matched features. This problem is NP-hard, therefore most algorithms find approximate solutions by relaxing the original problem. They find the optimal continuous solution of the modified problem, ignoring during optimization the original discrete constraints. Then the continuous solution is quickly binarized at the end, but very little attention is put into this final discretization step. In this paper we argue that the stage in which a discrete solution is found is crucial for good performance. We propose an efficient algorithm, with climbing and convergence properties, that optimizes in the discrete domain the quadratic score, and it gives excellent results either by itself or by starting from the solution returned by any graph matching algorithm. In practice it outperforms state-or-the art graph matching algorithms and it also significantly improves their performance if used in combination. When applied to MAP inference, the algorithm is a parallel extension of Iterated Conditional Modes (ICM) with climbing and convergence properties that make it a compelling alternative to the sequential ICM. In our experiments on MAP inference our algorithm proved its effectiveness by significantly outperforming [13], ICM and Max-Product Belief Propagation. 1
3 0.571787 147 nips-2009-Matrix Completion from Noisy Entries
Author: Raghunandan Keshavan, Andrea Montanari, Sewoong Oh
Abstract: Given a matrix M of low-rank, we consider the problem of reconstructing it from noisy observations of a small, random subset of its entries. The problem arises in a variety of applications, from collaborative filtering (the ‘Netflix problem’) to structure-from-motion and positioning. We study a low complexity algorithm introduced in [1], based on a combination of spectral techniques and manifold optimization, that we call here O PT S PACE. We prove performance guarantees that are order-optimal in a number of circumstances. 1
4 0.5528661 33 nips-2009-Analysis of SVM with Indefinite Kernels
Author: Yiming Ying, Colin Campbell, Mark Girolami
Abstract: The recent introduction of indefinite SVM by Luss and d’Aspremont [15] has effectively demonstrated SVM classification with a non-positive semi-definite kernel (indefinite kernel). This paper studies the properties of the objective function introduced there. In particular, we show that the objective function is continuously differentiable and its gradient can be explicitly computed. Indeed, we further show that its gradient is Lipschitz continuous. The main idea behind our analysis is that the objective function is smoothed by the penalty term, in its saddle (min-max) representation, measuring the distance between the indefinite kernel matrix and the proxy positive semi-definite one. Our elementary result greatly facilitates the application of gradient-based algorithms. Based on our analysis, we further develop Nesterov’s smooth optimization approach [17, 18] for indefinite SVM which has an optimal convergence rate for smooth problems. Experiments on various benchmark datasets validate our analysis and demonstrate the efficiency of our proposed algorithms.
5 0.48903152 79 nips-2009-Efficient Recovery of Jointly Sparse Vectors
Author: Liang Sun, Jun Liu, Jianhui Chen, Jieping Ye
Abstract: We consider the reconstruction of sparse signals in the multiple measurement vector (MMV) model, in which the signal, represented as a matrix, consists of a set of jointly sparse vectors. MMV is an extension of the single measurement vector (SMV) model employed in standard compressive sensing (CS). Recent theoretical studies focus on the convex relaxation of the MMV problem based on the (2, 1)-norm minimization, which is an extension of the well-known 1-norm minimization employed in SMV. However, the resulting convex optimization problem in MMV is significantly much more difficult to solve than the one in SMV. Existing algorithms reformulate it as a second-order cone programming (SOCP) or semidefinite programming (SDP) problem, which is computationally expensive to solve for problems of moderate size. In this paper, we propose a new (dual) reformulation of the convex optimization problem in MMV and develop an efficient algorithm based on the prox-method. Interestingly, our theoretical analysis reveals the close connection between the proposed reformulation and multiple kernel learning. Our simulation studies demonstrate the scalability of the proposed algorithm.
6 0.47048536 116 nips-2009-Information-theoretic lower bounds on the oracle complexity of convex optimization
7 0.46809152 141 nips-2009-Local Rules for Global MAP: When Do They Work ?
8 0.46731976 222 nips-2009-Sparse Estimation Using General Likelihoods and Non-Factorial Priors
9 0.45452702 11 nips-2009-A General Projection Property for Distribution Families
10 0.45312908 239 nips-2009-Submodularity Cuts and Applications
11 0.44934052 138 nips-2009-Learning with Compressible Priors
12 0.44865268 208 nips-2009-Robust Principal Component Analysis: Exact Recovery of Corrupted Low-Rank Matrices via Convex Optimization
13 0.44336623 245 nips-2009-Thresholding Procedures for High Dimensional Variable Selection and Statistical Estimation
14 0.44055742 67 nips-2009-Directed Regression
15 0.4328787 55 nips-2009-Compressed Least-Squares Regression
16 0.43112493 1 nips-2009-$L 1$-Penalized Robust Estimation for a Class of Inverse Problems Arising in Multiview Geometry
17 0.40039939 223 nips-2009-Sparse Metric Learning via Smooth Optimization
18 0.3974933 94 nips-2009-Fast Learning from Non-i.i.d. Observations
19 0.3962402 92 nips-2009-Fast Graph Laplacian Regularized Kernel Learning via Semidefinite–Quadratic–Linear Programming
20 0.38934684 61 nips-2009-Convex Relaxation of Mixture Regression with Efficient Algorithms
topicId topicWeight
[(7, 0.011), (24, 0.083), (25, 0.082), (35, 0.038), (36, 0.134), (39, 0.024), (58, 0.117), (61, 0.034), (70, 0.242), (71, 0.05), (81, 0.015), (86, 0.054), (91, 0.017)]
simIndex simValue paperId paperTitle
1 0.8727982 24 nips-2009-Adapting to the Shifting Intent of Search Queries
Author: Umar Syed, Aleksandrs Slivkins, Nina Mishra
Abstract: Search engines today present results that are often oblivious to recent shifts in intent. For example, the meaning of the query ‘independence day’ shifts in early July to a US holiday and to a movie around the time of the box office release. While no studies exactly quantify the magnitude of intent-shifting traffic, studies suggest that news events, seasonal topics, pop culture, etc account for 1/2 the search queries. This paper shows that the signals a search engine receives can be used to both determine that a shift in intent happened, as well as find a result that is now more relevant. We present a meta-algorithm that marries a classifier with a bandit algorithm to achieve regret that depends logarithmically on the number of query impressions, under certain assumptions. We provide strong evidence that this regret is close to the best achievable. Finally, via a series of experiments, we demonstrate that our algorithm outperforms prior approaches, particularly as the amount of intent-shifting traffic increases. 1
same-paper 2 0.82207274 180 nips-2009-On the Convergence of the Concave-Convex Procedure
Author: Gert R. Lanckriet, Bharath K. Sriperumbudur
Abstract: The concave-convex procedure (CCCP) is a majorization-minimization algorithm that solves d.c. (difference of convex functions) programs as a sequence of convex programs. In machine learning, CCCP is extensively used in many learning algorithms like sparse support vector machines (SVMs), transductive SVMs, sparse principal component analysis, etc. Though widely used in many applications, the convergence behavior of CCCP has not gotten a lot of specific attention. Yuille and Rangarajan analyzed its convergence in their original paper, however, we believe the analysis is not complete. Although the convergence of CCCP can be derived from the convergence of the d.c. algorithm (DCA), its proof is more specialized and technical than actually required for the specific case of CCCP. In this paper, we follow a different reasoning and show how Zangwill’s global convergence theory of iterative algorithms provides a natural framework to prove the convergence of CCCP, allowing a more elegant and simple proof. This underlines Zangwill’s theory as a powerful and general framework to deal with the convergence issues of iterative algorithms, after also being used to prove the convergence of algorithms like expectation-maximization, generalized alternating minimization, etc. In this paper, we provide a rigorous analysis of the convergence of CCCP by addressing these questions: (i) When does CCCP find a local minimum or a stationary point of the d.c. program under consideration? (ii) When does the sequence generated by CCCP converge? We also present an open problem on the issue of local convergence of CCCP.
3 0.67915887 20 nips-2009-A unified framework for high-dimensional analysis of $M$-estimators with decomposable regularizers
Author: Sahand Negahban, Bin Yu, Martin J. Wainwright, Pradeep K. Ravikumar
Abstract: High-dimensional statistical inference deals with models in which the the number of parameters p is comparable to or larger than the sample size n. Since it is usually impossible to obtain consistent procedures unless p/n → 0, a line of recent work has studied models with various types of structure (e.g., sparse vectors; block-structured matrices; low-rank matrices; Markov assumptions). In such settings, a general approach to estimation is to solve a regularized convex program (known as a regularized M -estimator) which combines a loss function (measuring how well the model fits the data) with some regularization function that encourages the assumed structure. The goal of this paper is to provide a unified framework for establishing consistency and convergence rates for such regularized M estimators under high-dimensional scaling. We state one main theorem and show how it can be used to re-derive several existing results, and also to obtain several new results on consistency and convergence rates. Our analysis also identifies two key properties of loss and regularization functions, referred to as restricted strong convexity and decomposability, that ensure the corresponding regularized M -estimators have fast convergence rates. 1
4 0.66905791 207 nips-2009-Robust Nonparametric Regression with Metric-Space Valued Output
Author: Matthias Hein
Abstract: Motivated by recent developments in manifold-valued regression we propose a family of nonparametric kernel-smoothing estimators with metric-space valued output including several robust versions. Depending on the choice of the output space and the metric the estimator reduces to partially well-known procedures for multi-class classification, multivariate regression in Euclidean space, regression with manifold-valued output and even some cases of structured output learning. In this paper we focus on the case of regression with manifold-valued input and output. We show pointwise and Bayes consistency for all estimators in the family for the case of manifold-valued output and illustrate the robustness properties of the estimators with experiments. 1
5 0.66677898 22 nips-2009-Accelerated Gradient Methods for Stochastic Optimization and Online Learning
Author: Chonghai Hu, Weike Pan, James T. Kwok
Abstract: Regularized risk minimization often involves non-smooth optimization, either because of the loss function (e.g., hinge loss) or the regularizer (e.g., ℓ1 -regularizer). Gradient methods, though highly scalable and easy to implement, are known to converge slowly. In this paper, we develop a novel accelerated gradient method for stochastic optimization while still preserving their computational simplicity and scalability. The proposed algorithm, called SAGE (Stochastic Accelerated GradiEnt), exhibits fast convergence rates on stochastic composite optimization with convex or strongly convex objectives. Experimental results show that SAGE is faster than recent (sub)gradient methods including FOLOS, SMIDAS and SCD. Moreover, SAGE can also be extended for online learning, resulting in a simple algorithm but with the best regret bounds currently known for these problems. 1
6 0.66489494 173 nips-2009-Nonparametric Greedy Algorithms for the Sparse Learning Problem
7 0.66365349 94 nips-2009-Fast Learning from Non-i.i.d. Observations
8 0.66278023 61 nips-2009-Convex Relaxation of Mixture Regression with Efficient Algorithms
9 0.66228062 76 nips-2009-Efficient Learning using Forward-Backward Splitting
10 0.6614958 37 nips-2009-Asymptotically Optimal Regularization in Smooth Parametric Models
11 0.66058683 169 nips-2009-Nonlinear Learning using Local Coordinate Coding
12 0.65937859 254 nips-2009-Variational Gaussian-process factor analysis for modeling spatio-temporal data
13 0.65786189 97 nips-2009-Free energy score space
14 0.65784544 1 nips-2009-$L 1$-Penalized Robust Estimation for a Class of Inverse Problems Arising in Multiview Geometry
15 0.65724832 71 nips-2009-Distribution-Calibrated Hierarchical Classification
16 0.65711445 225 nips-2009-Sparsistent Learning of Varying-coefficient Models with Structural Changes
17 0.65615362 36 nips-2009-Asymptotic Analysis of MAP Estimation via the Replica Method and Compressed Sensing
18 0.65507179 157 nips-2009-Multi-Label Prediction via Compressed Sensing
19 0.65423226 98 nips-2009-From PAC-Bayes Bounds to KL Regularization
20 0.6541698 147 nips-2009-Matrix Completion from Noisy Entries