nips nips2013 nips2013-268 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Stefanie Jegelka, Francis Bach, Suvrit Sra
Abstract: Recently, it has become evident that submodularity naturally captures widely occurring concepts in machine learning, signal processing and computer vision. Consequently, there is need for efficient optimization procedures for submodular functions, especially for minimization problems. While general submodular minimization is challenging, we propose a new method that exploits existing decomposability of submodular functions. In contrast to previous approaches, our method is neither approximate, nor impractical, nor does it need any cumbersome parameter tuning. Moreover, it is easy to implement and parallelize. A key component of our method is a formulation of the discrete submodular minimization problem as a continuous best approximation problem that is solved through a sequence of reflections, and its solution can be easily thresholded to obtain an optimal discrete solution. This method solves both the continuous and discrete formulations of the problem, and therefore has applications in learning, inference, and reconstruction. In our experiments, we illustrate the benefits of our method on two image segmentation tasks. 1
Reference: text
sentIndex sentText sentNum sentScore
1 Consequently, there is need for efficient optimization procedures for submodular functions, especially for minimization problems. [sent-2, score-0.598]
2 While general submodular minimization is challenging, we propose a new method that exploits existing decomposability of submodular functions. [sent-3, score-1.064]
3 A key component of our method is a formulation of the discrete submodular minimization problem as a continuous best approximation problem that is solved through a sequence of reflections, and its solution can be easily thresholded to obtain an optimal discrete solution. [sent-6, score-0.748]
4 A set function F : 2V → R on a set V is submodular if for all subsets S, T ⊆ V , we have F (S ∪ T ) + F (S ∩ T ) ≤ F (S) + F (T ). [sent-10, score-0.491]
5 Several problems in these areas can be phrased as submodular optimization tasks: notable examples include graph cut-based image segmentation [7], sensor placement [30], or document summarization [31]. [sent-12, score-0.618]
6 The theoretical complexity of submodular optimization is well-understood: unconstrained minimization of submodular set functions is polynomial-time [19] while submodular maximization is NP-hard. [sent-14, score-1.637]
7 Generic submodular maximization admits efficient algorithms that can attain approximate optima with global guarantees; these algorithms are typically based on local search techniques [16, 35]. [sent-16, score-0.513]
8 In contrast, although polynomial-time solvable, submodular function minimization (SFM) which seeks to solve min F (S), S⊆V (1) poses substantial algorithmic difficulties. [sent-17, score-0.603]
9 Submodular minimization algorithms may be obtained from two main perspectives: combinatorial and continuous. [sent-19, score-0.167]
10 This idea exploits the fundamental connection between a submodular function F and its Lov´ sz extension f [32], which a is continuous and convex. [sent-25, score-0.613]
11 x∈[0,1]n (2) The Lov´ sz extension f is nonsmooth, so we might have to resort to subgradient methods. [sent-27, score-0.228]
12 While a a fundamental result of Edmonds [15] demonstrates that a subgradient of f can be computed in O(n log n) time, subgradient methods can be sensitive to choices of the step size, and can be slow. [sent-28, score-0.212]
13 The “smoothing technique” of [36] does not in general apply here because computing a smoothed gradient is equivalent to solving the submodular minimization problem. [sent-30, score-0.648]
14 An alternative to minimizing the Lov´ sz extension directly on [0, 1]n is to consider a slightly modified a convex problem. [sent-32, score-0.214]
15 Specifically, the exact solution of the discrete problem minS⊆V F (S) and of its nonsmooth convex relaxation minx∈[0,1]n f (x) may be found as a level set S0 = {k | x∗ 0} of k the unique point x∗ that minimizes the strongly convex function [1, 10]: f (x) + 1 2 x 2. [sent-33, score-0.341]
16 (3) We will refer to the minimization of (3) as the proximal problem due to its close similarity to proximity operators used in convex optimization [12]. [sent-34, score-0.298]
17 These changes allow us to consider a convex dual which is amenable to smooth optimization techniques. [sent-39, score-0.405]
18 However, the submodular function is not always generic and given via a black-box, but has known structure. [sent-46, score-0.491]
19 This structure allows the use of (parallelizable) dual decomposition techniques for the problem in Eq. [sent-49, score-0.249]
20 Our main insight is that, despite seemingly counter-intuitive, the proximal problem (3) offers a much more user-friendly tool for solving (1) than its natural convex counterpart (2), both in implementation and running time. [sent-53, score-0.233]
21 This allows decomposition techniques which combine well with orthogonal projection and reflection methods that (a) exhibit faster convergence, (b) are easily parallelizable, (c) require no extra hyperparameters, and (d) are extremely easy to implement. [sent-55, score-0.16]
22 Second, our experiments suggest that projection and reflection methods can work very well for solving the combinatorial problem (1). [sent-60, score-0.152]
23 (2) In Section 5, we demonstrate the empirical gains of using accelerated proximal methods, Douglas-Rachford and block coordinate descent methods over existing approaches: fewer hyperparameters and faster convergence. [sent-67, score-0.277]
24 2 Review of relevant results from submodular analysis The relevant concepts we review here are the Lov´ sz extension, base polytopes of submodular a functions, and relationships between proximal and discrete problems. [sent-68, score-1.311]
25 In this paper, we are going to use two important results: (a) if the set function F is submodular, then its Lov´ sz extension f is convex, and (b) minimizing the set function F is equivalent to minimizing a f (x) with respect to x ∈ [0, 1]n . [sent-83, score-0.182]
26 Moreover, for a submodular function, the Lov´ sz extension happens to be the support function of the base polytope B(F ) defined as a B(F ) = {y ∈ Rn | ∀S ⊂ V, y(S) F (S) and y(V ) = F (V )}, that is f (x) = maxy∈B(F ) y x [15]. [sent-85, score-0.613]
27 We may derive a dual problem to the discrete problem in Eq. [sent-96, score-0.299]
28 This allows to obtain dual certificates of optimality from any y ∈ B(F ) and x ∈ [0, 1]n . [sent-99, score-0.213]
29 In practice, the duality gap of the discrete problem is usually much lower than that of the proximal version of the same problem, as we will see in Section 5. [sent-114, score-0.4]
30 The dual problem of Problem (3) reads as follows: min f (x) + 1 x 2 x∈Rn 2 2 = min max y x + 1 x 2 n x∈R y∈B(F ) 2 2 1 = max min y x + 2 x n y∈B(F ) x∈R 2 2 1 = max − 2 y 2 , 2 y∈B(F ) where primal and dual variables are linked as x = −y. [sent-118, score-0.662]
31 Observe that this dual problem is equivalent to finding the orthogonal projection of 0 onto B(F ). [sent-119, score-0.325]
32 Given a solution x∗ of the proximal ∗ problem, we have seen how to get Sµ for any µ by simply thresholding x∗ at µ. [sent-122, score-0.15]
33 The resulting algorithm makes O(n) calls to the submodular function oracle. [sent-125, score-0.491]
34 [42] for cuts to general submodular functions and obtain a solution to (3) up to precision ε in O(min{n, log 1 }) iterations. [sent-127, score-0.597]
35 Beyond squared 2 -norms, our algorithm equally applies to computing all p minimizers of f (x) + j=1 hj (xj ) for arbitrary smooth strictly convex functions hj , j = 1, . [sent-129, score-0.224]
36 3 Decomposition of submodular functions Following [28, 29, 38, 41], we assume that our function F may be decomposed as the sum F (S) = r j=1 Fj (S) of r “simple” functions. [sent-133, score-0.569]
37 For such functions, Problems (1) and (3) become min S⊆V r j=1 Fj (S) = min n x∈[0,1] r j=1 fj (x) min n x∈R r j=1 fj (x) + 1 2 x 2. [sent-136, score-0.638]
38 2 (5) The key to the algorithms presented here is to be able to minimize 1 x − z 2 + fj (x), or equivalently, 2 2 1 to orthogonally project z onto B(Fj ): min 2 y − z 2 subject to y ∈ B(Fj ). [sent-137, score-0.341]
39 As shown at the end of Section 2, projecting onto B(Fj ) is easy as soon as the corresponding submodular minimization problems are easy. [sent-139, score-0.644]
40 A widely used class of submodular functions are graph cuts. [sent-142, score-0.58]
41 Message passing algorithms apply to trees, while the proximal problem for paths is very efficiently solved by [2]. [sent-144, score-0.153]
42 Another important class of submodular functions is that of concave functions of cardinality, i. [sent-148, score-0.669]
43 This class of functions too admits to solve the proximal problem in O(n log n) time [22, 23, 26]. [sent-157, score-0.208]
44 1 Dual decomposition of the nonsmooth problem We first review existing dual decomposition techniques for the nonsmooth problem (1). [sent-161, score-0.505]
45 We follow [29] to derive a dual formulation (see appendix in [25]): Lemma 1. [sent-163, score-0.213]
46 The dual of Problem (1) may be written in terms of variables λ1 , . [sent-164, score-0.234]
47 , λr ) ∈ Hr | r j=1 λj = 0 (6) where gj (λj ) = minS⊂V Fj (S) − λj (S) is a nonsmooth concave function. [sent-172, score-0.227]
48 The dual is the maximization of a nonsmooth concave function over a convex set, onto which it is r easy to project: the projection of a vector y has j-th block equal to yj − 1 k=1 yk . [sent-173, score-0.603]
49 Moreover, in r our setup, functions gj and their subgradients may be computed efficiently through SFM. [sent-174, score-0.164]
50 Computing subgradients for any fj means calling the greedy algorithm, which runs in time O(n log n). [sent-176, score-0.307]
51 Primal subgradient descent (primal-sgd): Agnostic to any decomposition properties, we may apply a standard simple subgradient method to f . [sent-178, score-0.306]
52 A subgradient of f may be obtained from the √ subgradients of the components fj . [sent-179, score-0.434]
53 Dual subgradient descent (dual-sgd) [29]: Applying a subgradient method to the nonsmooth dual √ in Lemma 1 leads to a convergence rate of O(1/ t). [sent-181, score-0.594]
54 Computing a subgradient requires minimizing the submodular functions Fj individually. [sent-182, score-0.684]
55 Primal smoothing (primal-smooth) [41]: The nonsmooth primal may be smoothed in several ways ε ˜ε by smoothing the fj individually; one example is fj (xj ) = maxyj ∈B(Fj ) yj xj − 2 yj 2 . [sent-184, score-1.071]
56 Computing fj means solving the proximal problem for Fj . [sent-186, score-0.445]
57 Dual smoothing (dual-smooth): Instead of the primal, the dual (6) may be smoothed, e. [sent-188, score-0.307]
58 , by entropy [8, 38] applied to each gj as gj (λj ) = minx∈[0,1]n fj (x) + εh(x) where h(x) is a negative ˜ε entropy. [sent-190, score-0.38]
59 This method too requires solving proximal problems for all Fj in each iteration. [sent-192, score-0.205]
60 Dual smoothing with entropy also admits coordinate descent methods [34] that exploit the decomposition, but we do not compare to those here. [sent-193, score-0.159]
61 2 Dual decomposition methods for proximal problems We may also consider Eq. [sent-195, score-0.22]
62 (3) and first derive a dual problem using the same technique as in Section 3. [sent-196, score-0.213]
63 Lemma 2 (proved in the appendix in [25]) formally presents our dual formulation as a best approximation problem. [sent-198, score-0.213]
64 The primal variable can be recovered as x = − j yj . [sent-199, score-0.169]
65 (7) We can actually eliminate the λj and obtain the simpler looking dual problem 2 r 1 yj s. [sent-208, score-0.257]
66 , r} (8) max − y j=1 2 2 Such a dual was also used in [40]. [sent-213, score-0.213]
67 For the simpler dual (8) the case r = 2 is of special interest; it reads 1 max − y1 + y2 2 ⇐⇒ min y1 − (−y2 ) 2 . [sent-215, score-0.264]
68 We are now ready to present algorithms that exploit our dual formulations. [sent-220, score-0.213]
69 4 Algorithms We describe a few competing methods for solving our smooth dual formulations. [sent-221, score-0.36]
70 We describe the details for the special 2-block case (9); the same arguments apply to the block dual from Lemma 2. [sent-222, score-0.24]
71 Notice that the BCD iteration (10) is nothing but alternating projections onto the convex polyhedra B(F1 ) and B(F2 ). [sent-228, score-0.202]
72 However, despite its attractive simplicity, it is known that BCD (in its alternating projections form), can converge arbitrarily slowly [4] depending on the relative orientation of the convex sets onto which one projects. [sent-230, score-0.164]
73 zk+1 = zk + γk (vk − zk ), (12) where γk ∈ [0, 2] is a sequence of scalars that satisfy k γk (2 − γk ) = ∞. [sent-239, score-0.244]
74 This is where the special structure of our dual problem proves crucial, a recognition that is subtle yet remarkably important. [sent-245, score-0.213]
75 [4] show how to extract convergent sequences from these iterates, which actually solve the corresponding best approximation problem; for us this is nothing but the dual (9) that we wanted to solve in the first place. [sent-252, score-0.213]
76 The sequences {xk } and {ΠA ΠB zk } are bounded; the weak cluster points of either of the two sequences {(ΠA RB zk , xk )}k≥0 {(ΠA xk , xk )}k≥0 , (15) are solutions best approximation problem mina,b a − b such that a ∈ A and b ∈ B. [sent-261, score-0.406]
77 The key consequence of Theorem 3 is that we can apply DR with impunity to (14), and extract from its iterates the optimal solution to problem (9) (from which recovering the primal is trivial). [sent-262, score-0.18]
78 The most important feature of solving the dual (9) in this way is that absolutely no stepsize tuning is required, making the method very practical and user friendly. [sent-263, score-0.255]
79 6 pBCD, iter 1 pBCD, iter 7 DR, iter 1 DR, iter 4 smooth gap νs = 3. [sent-264, score-0.386]
80 9 · 10−1 Figure 1: Segmentation results for the slowest and fastest projection method, with smooth (νs ) and discrete (νd ) duality gaps. [sent-272, score-0.357]
81 5 Experiments We empirically compare the proposed projection methods2 to the (smoothed) subgradient methods discussed in Section 3. [sent-274, score-0.152]
82 For solving the proximal problem, we apply block coordinate descent (BCD) and Douglas-Rachford (DR) to Problem (8) if applicable, and also to (7) (BCD-para, DR-para). [sent-276, score-0.262]
83 The main iteration cost of all methods except for the primal subgradient method is the orthogonal projection onto polytopes B(Fj ). [sent-278, score-0.428]
84 The primal subgradient method uses the greedy algorithm in each iteration, which runs in O(n log n). [sent-279, score-0.231]
85 We do not include Frank-Wolfe methods here, since FW is equivalent to a subgradient descent on the primal and converges correspondingly slowly. [sent-281, score-0.268]
86 As benchmark problems, we use (i) graph cut problems for segmentation, or MAP inference in a 4-neighborhood grid-structured MRF, and (ii) concave functions similar to [41], but together with graph cut functions. [sent-282, score-0.313]
87 Figure 2 shows the duality gaps for the discrete and smooth (where applicable) problems for two instances of segmentation problems. [sent-291, score-0.503]
88 The algorithms working with the proximal problems are much faster than the ones directly solving the nonsmooth problem. [sent-292, score-0.34]
89 We also see that the discrete gap shrinks faster than the smooth gap, i. [sent-298, score-0.26]
90 , the optimal discrete solution does not require to solve the smooth problem to extremely high accuracy. [sent-300, score-0.215]
91 In summary, our experiments suggest that projection methods can be extremely useful for solving the combinatorial submodular minimization problem. [sent-306, score-0.749]
92 6 Conclusion We have presented a novel approach to submodular function minimization based on the equivalence with a best approximation problem. [sent-311, score-0.573]
93 Left: discrete duality gaps for various optimization schemes for the nonsmooth problem, from 1 to 1000 iterations. [sent-315, score-0.463]
94 Middle: discrete duality gaps for various optimization schemes for the smooth problem, from 1 to 100 iterations. [sent-316, score-0.458]
95 Moreover, a generalization beyond submodular functions of the relationships between combinatorial optimization problems and convex problems would enable the application of our framework to other common situations such as multiple labels (see, e. [sent-325, score-0.767]
96 Efficient solutions to relaxations of combinatorial problems with submodular penalties via the Lov´ sz extension and non-smooth convex optimization. [sent-398, score-0.773]
97 A submodular function minimization algorithm based on the minimum-norm base. [sent-452, score-0.573]
98 About strongly polynomial time algorithms for quadratic optimization over submodular constraints. [sent-463, score-0.516]
99 An analysis of approximations for maximizing submodular set functions–I. [sent-544, score-0.491]
100 A faster strongly polynomial time algorithm for submodular function minimization. [sent-557, score-0.516]
wordName wordTfidf (topN-words)
[('submodular', 0.491), ('bcd', 0.373), ('dr', 0.294), ('fj', 0.274), ('dual', 0.213), ('para', 0.213), ('duality', 0.141), ('proximal', 0.129), ('sfm', 0.125), ('primal', 0.125), ('gaps', 0.122), ('zk', 0.122), ('lov', 0.119), ('nonsmooth', 0.11), ('accel', 0.107), ('subgradient', 0.106), ('smooth', 0.105), ('grad', 0.094), ('ection', 0.092), ('sz', 0.088), ('minimization', 0.082), ('smoothing', 0.073), ('sgd', 0.072), ('gap', 0.065), ('discrete', 0.065), ('combinatorial', 0.064), ('concave', 0.064), ('convex', 0.062), ('functions', 0.057), ('iter', 0.054), ('xk', 0.054), ('gj', 0.053), ('prox', 0.048), ('polytopes', 0.047), ('cut', 0.047), ('projection', 0.046), ('yj', 0.044), ('dom', 0.043), ('submodularity', 0.042), ('solving', 0.042), ('bauschke', 0.041), ('jegelka', 0.041), ('hr', 0.041), ('rb', 0.041), ('minx', 0.04), ('iteration', 0.038), ('descent', 0.037), ('onto', 0.037), ('segmentation', 0.036), ('decomposition', 0.036), ('pbcd', 0.036), ('tarjan', 0.036), ('mrf', 0.034), ('iterates', 0.034), ('extension', 0.034), ('alternating', 0.034), ('problems', 0.034), ('rn', 0.034), ('subgradients', 0.033), ('smoothed', 0.033), ('graph', 0.032), ('accelerated', 0.032), ('parallelizable', 0.031), ('projections', 0.031), ('splitting', 0.03), ('min', 0.03), ('variation', 0.03), ('ow', 0.03), ('minimizing', 0.03), ('orthogonal', 0.029), ('polymatroid', 0.029), ('duals', 0.029), ('rj', 0.029), ('cuts', 0.028), ('parallel', 0.028), ('re', 0.028), ('coordinate', 0.027), ('ections', 0.027), ('block', 0.027), ('combettes', 0.026), ('optimization', 0.025), ('occurring', 0.025), ('faster', 0.025), ('speedup', 0.025), ('solved', 0.024), ('mins', 0.024), ('diverge', 0.024), ('extremely', 0.024), ('ri', 0.024), ('energy', 0.023), ('convergence', 0.022), ('admits', 0.022), ('ra', 0.022), ('decompositions', 0.021), ('tpami', 0.021), ('solution', 0.021), ('nesterov', 0.021), ('reads', 0.021), ('rooted', 0.021), ('may', 0.021)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000001 268 nips-2013-Reflection methods for user-friendly submodular optimization
Author: Stefanie Jegelka, Francis Bach, Suvrit Sra
Abstract: Recently, it has become evident that submodularity naturally captures widely occurring concepts in machine learning, signal processing and computer vision. Consequently, there is need for efficient optimization procedures for submodular functions, especially for minimization problems. While general submodular minimization is challenging, we propose a new method that exploits existing decomposability of submodular functions. In contrast to previous approaches, our method is neither approximate, nor impractical, nor does it need any cumbersome parameter tuning. Moreover, it is easy to implement and parallelize. A key component of our method is a formulation of the discrete submodular minimization problem as a continuous best approximation problem that is solved through a sequence of reflections, and its solution can be easily thresholded to obtain an optimal discrete solution. This method solves both the continuous and discrete formulations of the problem, and therefore has applications in learning, inference, and reconstruction. In our experiments, we illustrate the benefits of our method on two image segmentation tasks. 1
2 0.42490038 78 nips-2013-Curvature and Optimal Algorithms for Learning and Minimizing Submodular Functions
Author: Rishabh K. Iyer, Stefanie Jegelka, Jeff A. Bilmes
Abstract: We investigate three related and important problems connected to machine learning: approximating a submodular function everywhere, learning a submodular function (in a PAC-like setting [28]), and constrained minimization of submodular functions. We show that the complexity of all three problems depends on the “curvature” of the submodular function, and provide lower and upper bounds that refine and improve previous results [2, 6, 8, 27]. Our proof techniques are fairly generic. We either use a black-box transformation of the function (for approximation and learning), or a transformation of algorithms to use an appropriate surrogate function (for minimization). Curiously, curvature has been known to influence approximations for submodular maximization [3, 29], but its effect on minimization, approximation and learning has hitherto been open. We complete this picture, and also support our theoretical claims by empirical results. 1
3 0.38648215 319 nips-2013-Submodular Optimization with Submodular Cover and Submodular Knapsack Constraints
Author: Rishabh K. Iyer, Jeff A. Bilmes
Abstract: We investigate two new optimization problems — minimizing a submodular function subject to a submodular lower bound constraint (submodular cover) and maximizing a submodular function subject to a submodular upper bound constraint (submodular knapsack). We are motivated by a number of real-world applications in machine learning including sensor placement and data subset selection, which require maximizing a certain submodular function (like coverage or diversity) while simultaneously minimizing another (like cooperative cost). These problems are often posed as minimizing the difference between submodular functions [9, 25] which is in the worst case inapproximable. We show, however, that by phrasing these problems as constrained optimization, which is more natural for many applications, we achieve a number of bounded approximation guarantees. We also show that both these problems are closely related and an approximation algorithm solving one can be used to obtain an approximation guarantee for the other. We provide hardness results for both problems thus showing that our approximation factors are tight up to log-factors. Finally, we empirically demonstrate the performance and good scalability properties of our algorithms. 1
4 0.24048412 97 nips-2013-Distributed Submodular Maximization: Identifying Representative Elements in Massive Data
Author: Baharan Mirzasoleiman, Amin Karbasi, Rik Sarkar, Andreas Krause
Abstract: Many large-scale machine learning problems (such as clustering, non-parametric learning, kernel machines, etc.) require selecting, out of a massive data set, a manageable yet representative subset. Such problems can often be reduced to maximizing a submodular set function subject to cardinality constraints. Classical approaches require centralized access to the full data set; but for truly large-scale problems, rendering the data centrally is often impractical. In this paper, we consider the problem of submodular function maximization in a distributed fashion. We develop a simple, two-stage protocol G REE D I, that is easily implemented using MapReduce style computations. We theoretically analyze our approach, and show, that under certain natural conditions, performance close to the (impractical) centralized approach can be achieved. In our extensive experiments, we demonstrate the effectiveness of our approach on several applications, including sparse Gaussian process inference and exemplar-based clustering, on tens of millions of data points using Hadoop. 1
5 0.19222341 29 nips-2013-Adaptive Submodular Maximization in Bandit Setting
Author: Victor Gabillon, Branislav Kveton, Zheng Wen, Brian Eriksson, S. Muthukrishnan
Abstract: Maximization of submodular functions has wide applications in machine learning and artificial intelligence. Adaptive submodular maximization has been traditionally studied under the assumption that the model of the world, the expected gain of choosing an item given previously selected items and their states, is known. In this paper, we study the setting where the expected gain is initially unknown, and it is learned by interacting repeatedly with the optimized function. We propose an efficient algorithm for solving our problem and prove that its expected cumulative regret increases logarithmically with time. Our regret bound captures the inherent property of submodular maximization, earlier mistakes are more costly than later ones. We refer to our approach as Optimistic Adaptive Submodular Maximization (OASM) because it trades off exploration and exploitation based on the optimism in the face of uncertainty principle. We evaluate our method on a preference elicitation problem and show that non-trivial K-step policies can be learned from just a few hundred interactions with the problem. 1
6 0.18076621 56 nips-2013-Better Approximation and Faster Algorithm Using the Proximal Average
7 0.14235091 207 nips-2013-Near-optimal Anomaly Detection in Graphs using Lovasz Extended Scan Statistic
8 0.10193785 333 nips-2013-Trading Computation for Communication: Distributed Stochastic Dual Coordinate Ascent
9 0.095254987 224 nips-2013-On the Sample Complexity of Subspace Learning
10 0.092548683 318 nips-2013-Structured Learning via Logistic Regression
11 0.087188572 222 nips-2013-On the Linear Convergence of the Proximal Gradient Method for Trace Norm Regularization
12 0.082955599 20 nips-2013-Accelerating Stochastic Gradient Descent using Predictive Variance Reduction
13 0.076409139 328 nips-2013-The Total Variation on Hypergraphs - Learning on Hypergraphs Revisited
14 0.072658151 132 nips-2013-Global MAP-Optimality by Shrinking the Combinatorial Search Area with Convex Relaxation
15 0.071169659 19 nips-2013-Accelerated Mini-Batch Stochastic Dual Coordinate Ascent
16 0.069433123 359 nips-2013-Σ-Optimality for Active Learning on Gaussian Random Fields
17 0.064854026 202 nips-2013-Multiclass Total Variation Clustering
18 0.06473536 249 nips-2013-Polar Operators for Structured Sparse Estimation
19 0.062648319 313 nips-2013-Stochastic Majorization-Minimization Algorithms for Large-Scale Optimization
20 0.060715478 116 nips-2013-Fantope Projection and Selection: A near-optimal convex relaxation of sparse PCA
topicId topicWeight
[(0, 0.165), (1, 0.023), (2, 0.076), (3, 0.056), (4, 0.077), (5, 0.159), (6, -0.34), (7, -0.399), (8, 0.151), (9, -0.179), (10, -0.12), (11, 0.018), (12, -0.088), (13, 0.003), (14, -0.042), (15, 0.103), (16, 0.066), (17, -0.019), (18, 0.031), (19, -0.005), (20, 0.005), (21, 0.007), (22, 0.022), (23, 0.034), (24, 0.002), (25, -0.016), (26, 0.044), (27, -0.009), (28, 0.041), (29, 0.05), (30, -0.028), (31, 0.02), (32, -0.017), (33, 0.019), (34, 0.012), (35, -0.014), (36, -0.016), (37, 0.036), (38, -0.01), (39, -0.021), (40, 0.023), (41, 0.026), (42, -0.001), (43, 0.028), (44, -0.025), (45, 0.032), (46, -0.005), (47, -0.02), (48, 0.007), (49, -0.027)]
simIndex simValue paperId paperTitle
same-paper 1 0.94147247 268 nips-2013-Reflection methods for user-friendly submodular optimization
Author: Stefanie Jegelka, Francis Bach, Suvrit Sra
Abstract: Recently, it has become evident that submodularity naturally captures widely occurring concepts in machine learning, signal processing and computer vision. Consequently, there is need for efficient optimization procedures for submodular functions, especially for minimization problems. While general submodular minimization is challenging, we propose a new method that exploits existing decomposability of submodular functions. In contrast to previous approaches, our method is neither approximate, nor impractical, nor does it need any cumbersome parameter tuning. Moreover, it is easy to implement and parallelize. A key component of our method is a formulation of the discrete submodular minimization problem as a continuous best approximation problem that is solved through a sequence of reflections, and its solution can be easily thresholded to obtain an optimal discrete solution. This method solves both the continuous and discrete formulations of the problem, and therefore has applications in learning, inference, and reconstruction. In our experiments, we illustrate the benefits of our method on two image segmentation tasks. 1
2 0.92014927 319 nips-2013-Submodular Optimization with Submodular Cover and Submodular Knapsack Constraints
Author: Rishabh K. Iyer, Jeff A. Bilmes
Abstract: We investigate two new optimization problems — minimizing a submodular function subject to a submodular lower bound constraint (submodular cover) and maximizing a submodular function subject to a submodular upper bound constraint (submodular knapsack). We are motivated by a number of real-world applications in machine learning including sensor placement and data subset selection, which require maximizing a certain submodular function (like coverage or diversity) while simultaneously minimizing another (like cooperative cost). These problems are often posed as minimizing the difference between submodular functions [9, 25] which is in the worst case inapproximable. We show, however, that by phrasing these problems as constrained optimization, which is more natural for many applications, we achieve a number of bounded approximation guarantees. We also show that both these problems are closely related and an approximation algorithm solving one can be used to obtain an approximation guarantee for the other. We provide hardness results for both problems thus showing that our approximation factors are tight up to log-factors. Finally, we empirically demonstrate the performance and good scalability properties of our algorithms. 1
3 0.90808564 78 nips-2013-Curvature and Optimal Algorithms for Learning and Minimizing Submodular Functions
Author: Rishabh K. Iyer, Stefanie Jegelka, Jeff A. Bilmes
Abstract: We investigate three related and important problems connected to machine learning: approximating a submodular function everywhere, learning a submodular function (in a PAC-like setting [28]), and constrained minimization of submodular functions. We show that the complexity of all three problems depends on the “curvature” of the submodular function, and provide lower and upper bounds that refine and improve previous results [2, 6, 8, 27]. Our proof techniques are fairly generic. We either use a black-box transformation of the function (for approximation and learning), or a transformation of algorithms to use an appropriate surrogate function (for minimization). Curiously, curvature has been known to influence approximations for submodular maximization [3, 29], but its effect on minimization, approximation and learning has hitherto been open. We complete this picture, and also support our theoretical claims by empirical results. 1
4 0.80802423 97 nips-2013-Distributed Submodular Maximization: Identifying Representative Elements in Massive Data
Author: Baharan Mirzasoleiman, Amin Karbasi, Rik Sarkar, Andreas Krause
Abstract: Many large-scale machine learning problems (such as clustering, non-parametric learning, kernel machines, etc.) require selecting, out of a massive data set, a manageable yet representative subset. Such problems can often be reduced to maximizing a submodular set function subject to cardinality constraints. Classical approaches require centralized access to the full data set; but for truly large-scale problems, rendering the data centrally is often impractical. In this paper, we consider the problem of submodular function maximization in a distributed fashion. We develop a simple, two-stage protocol G REE D I, that is easily implemented using MapReduce style computations. We theoretically analyze our approach, and show, that under certain natural conditions, performance close to the (impractical) centralized approach can be achieved. In our extensive experiments, we demonstrate the effectiveness of our approach on several applications, including sparse Gaussian process inference and exemplar-based clustering, on tens of millions of data points using Hadoop. 1
5 0.54986775 29 nips-2013-Adaptive Submodular Maximization in Bandit Setting
Author: Victor Gabillon, Branislav Kveton, Zheng Wen, Brian Eriksson, S. Muthukrishnan
Abstract: Maximization of submodular functions has wide applications in machine learning and artificial intelligence. Adaptive submodular maximization has been traditionally studied under the assumption that the model of the world, the expected gain of choosing an item given previously selected items and their states, is known. In this paper, we study the setting where the expected gain is initially unknown, and it is learned by interacting repeatedly with the optimized function. We propose an efficient algorithm for solving our problem and prove that its expected cumulative regret increases logarithmically with time. Our regret bound captures the inherent property of submodular maximization, earlier mistakes are more costly than later ones. We refer to our approach as Optimistic Adaptive Submodular Maximization (OASM) because it trades off exploration and exploitation based on the optimism in the face of uncertainty principle. We evaluate our method on a preference elicitation problem and show that non-trivial K-step policies can be learned from just a few hundred interactions with the problem. 1
6 0.44171333 249 nips-2013-Polar Operators for Structured Sparse Estimation
7 0.43493128 56 nips-2013-Better Approximation and Faster Algorithm Using the Proximal Average
8 0.42187738 328 nips-2013-The Total Variation on Hypergraphs - Learning on Hypergraphs Revisited
9 0.41167411 207 nips-2013-Near-optimal Anomaly Detection in Graphs using Lovasz Extended Scan Statistic
10 0.37815776 202 nips-2013-Multiclass Total Variation Clustering
11 0.36242163 132 nips-2013-Global MAP-Optimality by Shrinking the Combinatorial Search Area with Convex Relaxation
12 0.35220313 215 nips-2013-On Decomposing the Proximal Map
13 0.34540683 33 nips-2013-An Approximate, Efficient LP Solver for LP Rounding
14 0.343741 19 nips-2013-Accelerated Mini-Batch Stochastic Dual Coordinate Ascent
15 0.3435109 313 nips-2013-Stochastic Majorization-Minimization Algorithms for Large-Scale Optimization
16 0.33077964 333 nips-2013-Trading Computation for Communication: Distributed Stochastic Dual Coordinate Ascent
17 0.32730049 214 nips-2013-On Algorithms for Sparse Multi-factor NMF
18 0.31238413 222 nips-2013-On the Linear Convergence of the Proximal Gradient Method for Trace Norm Regularization
19 0.29992738 111 nips-2013-Estimation, Optimization, and Parallelism when Data is Sparse
20 0.27989164 184 nips-2013-Marginals-to-Models Reducibility
topicId topicWeight
[(13, 0.014), (16, 0.032), (33, 0.121), (34, 0.094), (41, 0.048), (49, 0.031), (56, 0.079), (70, 0.051), (85, 0.025), (89, 0.02), (93, 0.04), (95, 0.36)]
simIndex simValue paperId paperTitle
1 0.86935937 220 nips-2013-On the Complexity and Approximation of Binary Evidence in Lifted Inference
Author: Guy van den Broeck, Adnan Darwiche
Abstract: Lifted inference algorithms exploit symmetries in probabilistic models to speed up inference. They show impressive performance when calculating unconditional probabilities in relational models, but often resort to non-lifted inference when computing conditional probabilities. The reason is that conditioning on evidence breaks many of the model’s symmetries, which can preempt standard lifting techniques. Recent theoretical results show, for example, that conditioning on evidence which corresponds to binary relations is #P-hard, suggesting that no lifting is to be expected in the worst case. In this paper, we balance this negative result by identifying the Boolean rank of the evidence as a key parameter for characterizing the complexity of conditioning in lifted inference. In particular, we show that conditioning on binary evidence with bounded Boolean rank is efficient. This opens up the possibility of approximating evidence by a low-rank Boolean matrix factorization, which we investigate both theoretically and empirically. 1
same-paper 2 0.81823927 268 nips-2013-Reflection methods for user-friendly submodular optimization
Author: Stefanie Jegelka, Francis Bach, Suvrit Sra
Abstract: Recently, it has become evident that submodularity naturally captures widely occurring concepts in machine learning, signal processing and computer vision. Consequently, there is need for efficient optimization procedures for submodular functions, especially for minimization problems. While general submodular minimization is challenging, we propose a new method that exploits existing decomposability of submodular functions. In contrast to previous approaches, our method is neither approximate, nor impractical, nor does it need any cumbersome parameter tuning. Moreover, it is easy to implement and parallelize. A key component of our method is a formulation of the discrete submodular minimization problem as a continuous best approximation problem that is solved through a sequence of reflections, and its solution can be easily thresholded to obtain an optimal discrete solution. This method solves both the continuous and discrete formulations of the problem, and therefore has applications in learning, inference, and reconstruction. In our experiments, we illustrate the benefits of our method on two image segmentation tasks. 1
3 0.74946946 309 nips-2013-Statistical Active Learning Algorithms
Author: Maria-Florina Balcan, Vitaly Feldman
Abstract: We describe a framework for designing efficient active learning algorithms that are tolerant to random classification noise and differentially-private. The framework is based on active learning algorithms that are statistical in the sense that they rely on estimates of expectations of functions of filtered random examples. It builds on the powerful statistical query framework of Kearns [30]. We show that any efficient active statistical learning algorithm can be automatically converted to an efficient active learning algorithm which is tolerant to random classification noise as well as other forms of “uncorrelated” noise. We show that commonly studied concept classes including thresholds, rectangles, and linear separators can be efficiently actively learned in our framework. These results combined with our generic conversion lead to the first computationally-efficient algorithms for actively learning some of these concept classes in the presence of random classification noise that provide exponential improvement in the dependence on the error over their passive counterparts. In addition, we show that our algorithms can be automatically converted to efficient active differentially-private algorithms. This leads to the first differentially-private active learning algorithms with exponential label savings over the passive case. 1
4 0.71658111 271 nips-2013-Regularized M-estimators with nonconvexity: Statistical and algorithmic theory for local optima
Author: Po-Ling Loh, Martin J. Wainwright
Abstract: We establish theoretical results concerning local optima of regularized M estimators, where both loss and penalty functions are allowed to be nonconvex. Our results show that as long as the loss satisfies restricted strong convexity and the penalty satisfies suitable regularity conditions, any local optimum of the composite objective lies within statistical precision of the true parameter vector. Our theory covers a broad class of nonconvex objective functions, including corrected versions of the Lasso for errors-in-variables linear models and regression in generalized linear models using nonconvex regularizers such as SCAD and MCP. On the optimization side, we show that a simple adaptation of composite gradient descent may be used to compute a global optimum up to the statistical precision stat in log(1/ stat ) iterations, the fastest possible rate for any first-order method. We provide simulations to illustrate the sharpness of our theoretical predictions. 1
5 0.69157577 154 nips-2013-Learning Gaussian Graphical Models with Observed or Latent FVSs
Author: Ying Liu, Alan Willsky
Abstract: Gaussian Graphical Models (GGMs) or Gauss Markov random fields are widely used in many applications, and the trade-off between the modeling capacity and the efficiency of learning and inference has been an important research problem. In this paper, we study the family of GGMs with small feedback vertex sets (FVSs), where an FVS is a set of nodes whose removal breaks all the cycles. Exact inference such as computing the marginal distributions and the partition function has complexity O(k 2 n) using message-passing algorithms, where k is the size of the FVS, and n is the total number of nodes. We propose efficient structure learning algorithms for two cases: 1) All nodes are observed, which is useful in modeling social or flight networks where the FVS nodes often correspond to a small number of highly influential nodes, or hubs, while the rest of the networks is modeled by a tree. Regardless of the maximum degree, without knowing the full graph structure, we can exactly compute the maximum likelihood estimate with complexity O(kn2 + n2 log n) if the FVS is known or in polynomial time if the FVS is unknown but has bounded size. 2) The FVS nodes are latent variables, where structure learning is equivalent to decomposing an inverse covariance matrix (exactly or approximately) into the sum of a tree-structured matrix and a low-rank matrix. By incorporating efficient inference into the learning steps, we can obtain a learning algorithm using alternating low-rank corrections with complexity O(kn2 + n2 log n) per iteration. We perform experiments using both synthetic data as well as real data of flight delays to demonstrate the modeling capacity with FVSs of various sizes. 1
6 0.63386035 185 nips-2013-Matrix Completion From any Given Set of Observations
7 0.60205352 102 nips-2013-Efficient Algorithm for Privately Releasing Smooth Queries
8 0.59616637 78 nips-2013-Curvature and Optimal Algorithms for Learning and Minimizing Submodular Functions
9 0.58446109 42 nips-2013-Auditing: Active Learning with Outcome-Dependent Query Costs
10 0.56819254 149 nips-2013-Latent Structured Active Learning
11 0.56553549 184 nips-2013-Marginals-to-Models Reducibility
12 0.56002963 359 nips-2013-Σ-Optimality for Active Learning on Gaussian Random Fields
13 0.55303085 333 nips-2013-Trading Computation for Communication: Distributed Stochastic Dual Coordinate Ascent
14 0.55160844 207 nips-2013-Near-optimal Anomaly Detection in Graphs using Lovasz Extended Scan Statistic
15 0.55131602 97 nips-2013-Distributed Submodular Maximization: Identifying Representative Elements in Massive Data
16 0.54823929 319 nips-2013-Submodular Optimization with Submodular Cover and Submodular Knapsack Constraints
17 0.54409128 132 nips-2013-Global MAP-Optimality by Shrinking the Combinatorial Search Area with Convex Relaxation
18 0.53952312 33 nips-2013-An Approximate, Efficient LP Solver for LP Rounding
19 0.53543836 57 nips-2013-Beyond Pairwise: Provably Fast Algorithms for Approximate $k$-Way Similarity Search
20 0.5342387 122 nips-2013-First-order Decomposition Trees