nips nips2010 nips2010-69 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Peter Stobbe, Andreas Krause
Abstract: Many combinatorial problems arising in machine learning can be reduced to the problem of minimizing a submodular function. Submodular functions are a natural discrete analog of convex functions, and can be minimized in strongly polynomial time. Unfortunately, state-of-the-art algorithms for general submodular minimization are intractable for larger problems. In this paper, we introduce a novel subclass of submodular minimization problems that we call decomposable. Decomposable submodular functions are those that can be represented as sums of concave functions applied to modular functions. We develop an algorithm, SLG, that can efÄ?Ĺš ciently minimize decomposable submodular functions with tens of thousands of variables. Our algorithm exploits recent results in smoothed convex minimization. We apply SLG to synthetic benchmarks and a joint classiÄ?Ĺš cation-and-segmentation task, and show that it outperforms the state-of-the-art general purpose submodular minimization algorithms by several orders of magnitude. 1
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract Many combinatorial problems arising in machine learning can be reduced to the problem of minimizing a submodular function. [sent-4, score-0.697]
2 Submodular functions are a natural discrete analog of convex functions, and can be minimized in strongly polynomial time. [sent-5, score-0.145]
3 Unfortunately, state-of-the-art algorithms for general submodular minimization are intractable for larger problems. [sent-6, score-0.678]
4 In this paper, we introduce a novel subclass of submodular minimization problems that we call decomposable. [sent-7, score-0.678]
5 Decomposable submodular functions are those that can be represented as sums of concave functions applied to modular functions. [sent-8, score-0.886]
6 Ĺš ciently minimize decomposable submodular functions with tens of thousands of variables. [sent-10, score-0.778]
7 Our algorithm exploits recent results in smoothed convex minimization. [sent-11, score-0.195]
8 Ĺš cation-and-segmentation task, and show that it outperforms the state-of-the-art general purpose submodular minimization algorithms by several orders of magnitude. [sent-13, score-0.678]
9 In particular, the minimum of a submodular function can be found in strongly polynomial time [11]. [sent-27, score-0.603]
10 Unfortunately, while polynomial-time solvable, exact techniques for submodular minimization require a number of function evaluations on the order of n5 [12], where n is the number of variables in the problem (e. [sent-28, score-0.678]
11 Fortunately, several submodular minimization problems arising in machine learning have structure that allows solving them more efÄ? [sent-31, score-0.71]
12 Examples include symmetric functions that can be solved in O n3 evaluations using Queyranne’s algorithm [19], and functions that decompose into attractive, pairwise potentials, that can be solved using graph cutting techniques [7]. [sent-33, score-0.115]
13 In this paper, we introduce a novel class of submodular minimization problems that can be solved efÄ? [sent-34, score-0.678]
14 In particular, we develop an algorithm SLG, that can minimize a class of submodular functions that we call decomposable: These are functions that can be decomposed into sums of concave functions applied to modular (additive) functions. [sent-36, score-0.959]
15 Our algorithm is based on recent techniques of smoothed convex minimization [18] applied to the Lov´ sz extension. [sent-37, score-0.469]
16 Ĺš cation-and-segmentation task involving tens of thousands of variables, and show that it outperforms state-of-the-art algorithms for general submodular function minimization by several orders of magnitude. [sent-39, score-0.678]
17 A set function f is called submodular iff, for all A; B P E , we have f AĂ‚‘B f AĂ‚’B f A f B : (1) Submodular functions can alternatively, and perhaps more intuitively, be characterized in terms of their discrete derivatives. [sent-50, score-0.659]
18 Then, f is submodular iff: XP Ă‚—rg min @ A @ AaH a I P P AC @ A @ AC @ A Ă‚Ä„ @ Aa @ A @ A @ Ă‚Ä„k f @AA ! [sent-53, score-0.612]
19 Ă‚Ä„k f @B A; for all A B E and k P E n B: Note the analogy to concave functions; the discrete derivative is smaller for larger sets, in the same way that x h Ă‚Â x ! [sent-54, score-0.167]
20 Thus a simple example of a submodular function is f A jAj where is any concave function. [sent-57, score-0.714]
21 Yet despite this connection to concavity, it is in fact ‘easier’ to minimize a submodular function than to maximize it1 , just as it is easier to minimize a convex function. [sent-58, score-0.677]
22 One explanation for this is that submodular minimization can be reformulated as a convex minimization problem. [sent-59, score-0.851]
23 @CA @A @CA @A H @ Aa @ A To see this, consider taking a set function minimization problem, and reformulating it as a minimization problem over the unit cube ; n & Rn . [sent-60, score-0.22]
24 Ĺš ne the Lov´ sz extension in terms of the submodular polyhedron Pf : a Pf fv P Rn v Ă‚Ä„ eA f A ; for all A P E g; f x v Ă‚Ä„ x: v PP f a X @ A ~@ A a sup P ~ ~ The submodular polyhedron Pf is deÄ? [sent-71, score-1.499]
25 1 With the additional assumption that f is nondecreasing, maximizing a submodular function subject to a cardinality constraint jAj M is ‘easy’; a greedy algorithm is known to give a near-optimal answer [17]. [sent-81, score-0.58]
26 2 Equation (2) was used to show that submodular minimization can be achieved in polynomial time [16]. [sent-82, score-0.701]
27 Despite being convex, the Lov´ sz extension is non-smooth, and hence a simple subgradient a descent algorithm would need O =2 steps to achieve O accuracy. [sent-84, score-0.281]
28 One way this is done is to construct a smooth approximation of the non-smooth function, and then use an accelerated gradient descent algorithm which is highly effective for smooth functions. [sent-86, score-0.139]
29 Connections of this work with submodularity and combinatorial optimization are also explored in [4] and [2]. [sent-87, score-0.112]
30 In fact, in [2], Bach shows that computing the smoothed Lov´ sz gradient of a general submodular function is a equivalent to solving a submodular minimization problem. [sent-88, score-1.63]
31 In this paper, we do not treat general submodular functions, but rather a large class of submodular minimization functions that we call decomposable. [sent-89, score-1.304]
32 (To apply the smoothing technique of [18], special structural knowledge about the convex function is required, so it is natural that we would need special structural knowledge about the submodular function to leverage those results. [sent-90, score-0.623]
33 ) We further show that we can exploit the discrete structure of submodular minimization in a way that allows terminating the algorithm early with a certiÄ? [sent-91, score-0.711]
34 We call this class of functions decomposable submodular functions, as they decompose into a sum of concave functions applied to nonnegative modular functions2 . [sent-95, score-1.035]
35 Below, we give examples of decomposable submodular functions arising in applications. [sent-96, score-0.783]
36 Ĺš rst focus on the special case where all the concave functions are of the form j Ă‚Ä„ dj yj ; Ă‚Ä„ for some yj ; dj > . [sent-98, score-0.376]
37 Since these potentials are of key importance, we deÄ? [sent-99, score-0.13]
38 Ĺš ne the submodular functions w;y A y; w Ă‚Ä„ eA and call them threshold potentials. [sent-100, score-0.684]
39 In Section 5, we will show in how to generalize our approach to arbitrary decomposable submodular functions. [sent-101, score-0.705]
40 It can be expressed as a sum of a modular function and a threshold potential: @HA @ @IA @PA jA Ă‚’ fk; lgj A a @HA C @@PA Ă‚Â @IAAekl Ă‚Ä„ eA C @P@IA Ă‚Â @HA Ă‚Â @PAAĂ‚Ĺ ekl ; @AA 1 Why are such potential functions interesting? [sent-105, score-0.244]
41 Ĺš es pixels by minimizing a sum of 2-potentials: f A c Ă‚Ä„ eA k;l dkl Ă‚Â j Ă‚Â ekl Ă‚Ä„ eA j . [sent-113, score-0.186]
42 Ă‚‘Ă‚“ @ Aa C @I I A More generally, consider a higher order potential function: a concave function of the number of elements in some activation set S , jA Ă‚’ S j where is concave. [sent-116, score-0.162]
43 It can be shown that this can be written as a sum of a modular function and a positive linear combination of jS j Ă‚Â threshold potentials. [sent-117, score-0.138]
44 Ĺš cation performance can be improved by adding terms corresponding to such higher order potentials j jRj Ă‚’ Aj to the objective function where the functions j are piecewise linear concave functions, and the regions Rj of various sizes generated from a segmentation algorithm. [sent-119, score-0.391]
45 @ A I @ A Another canonical example of a submodular function is a set cover function. [sent-121, score-0.58]
46 Such a function can be reformulated as a combination of concave cardinality functions (details omitted here). [sent-122, score-0.212]
47 functions which are weighted combinations of set cover functions can be expressed as threshold potentials. [sent-125, score-0.15]
48 However, threshold potentials with nonuniform weights are strictly more general than concave cardinality potentials. [sent-126, score-0.322]
49 That is, there exists w and y such that w;y A cannot be expressed Ă‚€ as j j jRj Ă‚’ Aj for any collection of concave j and sets Rj . [sent-127, score-0.134]
50 @ Ă‚Ĺ @ A A Another example of decomposable functions arises in multiclass queuing systems [10]. [sent-128, score-0.171]
51 These are of the form f A c Ă‚Ä„ eA u Ă‚Ä„ eA v Ă‚Ä„ eA , where u; v are nonnegative weight vectors and is a nonincreasing concave function. [sent-129, score-0.158]
52 That is, we use TextonBoost to generate per-pixel Ă‚€ scores c, and then minimize f A c Ă‚Ä„ eA j jA Ă‚’ Rj jjRj n Aj, where the regions Rj are regions of pixels that we expect to be of the same class (e. [sent-133, score-0.16]
53 In our experience, this simple idea of using higher-order potentials can dramatically increase the quality of the classiÄ? [sent-142, score-0.13]
54 Ĺš cient minimization of a decomposable submodular function f based on smoothed convex minimization. [sent-145, score-0.998]
55 Ś ciently smooth the Lov´ sz a extension of f . [sent-148, score-0.261]
56 We then apply accelerated gradient descent to the gradient of the smoothed function. [sent-149, score-0.279]
57 Ś ciently smooth the Lov´ sz extension of f , so that we a can resort to algorithms for accelerated convex minimization. [sent-154, score-0.343]
58 Ĺš ciently smooth the threshold potentials w;y A y; w Ă‚Ä„ eA of Section 3, which are simple enough to allow efÄ? [sent-156, score-0.216]
59 Ĺš cient smoothing, but rich enough when combined to express a large class of submodular functions. [sent-157, score-0.58]
60 0, the Lov´ sz extension of w;y is a v ¥ x s. [sent-159, score-0.233]
61 2 The SLG Algorithm for Minimizing Sums of Threshold Potentials Stepping beyond a single threshold potential, we now assume that the submodular function to be minimized can be written as a nonnegative linear combination of threshold potentials and a modular Ă‚ˆ function, i. [sent-182, score-0.93]
62 This algorithm requires that the smoothed objective has a Lipschitz continuous gradient. [sent-185, score-0.182]
63 Fortunately, by construction, the smoothed threshold extensions j ;yj x all have = Lipw schitz gradient, a direct consequence of the characterization in Equation 4. [sent-187, score-0.21]
64 Fur thermore, the smoothed threshold extensions approximate the threshold extensions uniformly: j j ;yj x Ă‚Â wj ;yj x j for all x, so jf x Ă‚Â f x j D . [sent-189, score-0.331]
65 w 2 2 ~@ A ~@ A ~ Ă‚Ĺ ~ ~ Ă‚Ĺ ~ @A Ă‚Ĺ ~ @ A ~@ A @A @A I a ~ ~ One way to use the smoothed gradient is to specify an accuracy ", then minimize f for sufÄ? [sent-190, score-0.223]
66 Algorithm 1 formalizes our Smoothed Lov´ sz Gradient (SLG) algorithm: a Ă‚—rg min min@mĂ‚—x@ A A Algorithm 1: SLG: Smoothed Lov´ sz Gradient a Input: Accuracy "; decomposable function f . [sent-196, score-0.509]
67 Fortunately, once we have an "-optimal point for the Lov´ sz extension, we can efÄ? [sent-199, score-0.176]
68 Ĺš ciently round it to a set which is "-optimal for the original submodular function using Alg. [sent-200, score-0.58]
69 Algorithm 2: Set generation by rounding the continuous solution Input: Vector x P ; n ; submodular function f . [sent-202, score-0.61]
70 Below, we show that it is possible to use information about the current iterates to check optimality of a set and terminate the algorithm before the continuous problem has converged. [sent-211, score-0.131]
71 @ A min ~@ A min ~ ~@ A To prove optimality of a candidate set A, we can use a subgradient of f at eA . [sent-212, score-0.181]
72 Ĺš nding a subgradient g P @ f eA which demonstrates optimality may be extremely difÄ? [sent-216, score-0.117]
73 But our algorithm naturally suggests the subgradient we could use; the gradient of the smoothed extension is one such subgradient – provided a certain condition is satisÄ? [sent-218, score-0.349]
74 Lemma 1 states that if the components of point x corresponding to elements of A are all larger than all the other components by at least , then the gradient at x is a subgradient for f at eA (which by Equation 5 allows us to compute an optimality gap). [sent-223, score-0.161]
75 But even if the components are not separated, we can easily add a positive multiple of eA to separate them and then compute the gradient there to get an optimality gap. [sent-225, score-0.113]
76 But by Equation , for a set to be ~ ~@ A ~@ A min Ă‚‘Ă‚“ Ă‚‘Ă‚“ P P ~ Ă‚Ĺ P ~ S Algorithm 3: Set Optimality Check Input: Set A; decomposable function f ; scale ; x P Rn . [sent-227, score-0.157]
77 begin Ă‚€ kPA;lPE nA x l Ă‚Â x k ; g rf x eA ; gap ; g k eA k Ă‚Â eE nA k ; kPA Output: gap, which satisÄ? [sent-228, score-0.152]
78 So it is natural to choose A a fk X rf @xAĂ‚‘k Ă‚“ Hg. [sent-231, score-0.133]
79 5 Extension to General Concave Potentials To extend our algorithm to work on general concave functions, we note that an arbitrary concave function can be expressed as an integral of threshold potential functions. [sent-235, score-0.354]
80 T H @T Ax Ă‚Â @xA a @HA C min@x; yAHH@yAdy; Vx P Ă‚‘H; T Ă‚“ Lemma 2 For P C 2 0 This means that for a general sum of concave potentials as in Equation (3), we have: Ă‚? [sent-237, score-0.264]
81 w j Ă‚Ä„1 Ă‚ˆ H wj Ă‚Ä„ wj Ă‚Ä„ e A Ă‚Â HH y dy : f A c Ă‚Ä„ eA j wj ;y A j 0 j Then we can deÄ? [sent-238, score-0.211]
82 Conceptually, we just use a different smoothed gradient, but calculating it is more involved. [sent-241, score-0.152]
83 We reproduce the experimental setup of [8] designed to compare submodular minimization algorithms. [sent-246, score-0.678]
84 Ĺš nd the minimum cut of a randomly generated graph (which requires submodular minimization of a sum of 2-potentials) with the graph generated by the speciÄ? [sent-248, score-0.678]
85 To generate regions for our potentials, we randomly picked seed pixels and grew the regions based on HSV channels of the image. [sent-266, score-0.155]
86 We picked our seed pixels with a preference for pixels which were included in the least number of previously generated regions. [sent-267, score-0.132]
87 To generate the result shown in Figure 2(k), a problem with 4 variables and concave potentials, our MATLAB/mex implementation of SLG took 71. [sent-276, score-0.134]
88 Ĺš ciently minimizing a large class of submodular functions of practical importance. [sent-281, score-0.651]
89 We do so by decomposing the function into a sum of threshold potentials, whose Lov´ sz extensions are convenient for using modern smoothing techniques of convex optia mization. [sent-282, score-0.277]
90 This allows us to solve submodular minimization problems with thousands of variables, that cannot be expressed using only pairwise potentials. [sent-283, score-0.701]
91 Ĺš c types of submodular minimization problems, and combinatorial algorithms which assume nothing but submodularity but are impractical for large-scale problems. [sent-285, score-0.79]
92 Bach, Structured sparsity-inducing norms through submodular functions, Advances in Neural Information Processing Systems (2010). [sent-290, score-0.58]
93 Ś cient solutions to relaxations of combinatorial problems with submodular penalties via the Lov´ sz extension and non-smooth convex optimization, Proceeda ings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms, Society for Industrial and Applied Mathematics, 2007, pp. [sent-301, score-0.916]
94 Iwata, A push-relabel framework for submodular function minimization and applications to parametric optimization, Discrete Applied Mathematics 131 (2003), no. [sent-316, score-0.678]
95 Kale, Beyond convexity: Online submodular minimization, Advances in Neural Information Processing Systems 22 (Y. [sent-326, score-0.58]
96 Iwata, Computational geometric approach to submodular function minimization for multiclass queueing systems, Integer Programming and Combinatorial Optimization (2007), 267–279. [sent-338, score-0.678]
97 Fujishige, A combinatorial strongly polynomial algorithm for minimizing submodular functions, Journal of the ACM (JACM) 48 (2001), no. [sent-342, score-0.688]
98 Orlin, A simple combinatorial algorithm for submodular function minimization, Proceedings of the Nineteenth Annual ACM-SIAM Symposium on Discrete Algorithms, Society for Industrial and Applied Mathematics, 2009, pp. [sent-347, score-0.64]
99 Fisher, An analysis of the approximations for maximizing submodular set functions, Mathematical Programming 14 (1978), 265–294. [sent-370, score-0.58]
100 Queyranne, Minimizing symmetric submodular functions, Mathematical Programming 82 (1998), no. [sent-375, score-0.58]
wordName wordTfidf (topN-words)
[('submodular', 0.58), ('ea', 0.356), ('slg', 0.335), ('aa', 0.19), ('sz', 0.176), ('lov', 0.171), ('smoothed', 0.152), ('concave', 0.134), ('potentials', 0.13), ('ia', 0.129), ('decomposable', 0.125), ('minimization', 0.098), ('rf', 0.098), ('minnorm', 0.096), ('textonboost', 0.09), ('rg', 0.09), ('modular', 0.08), ('ja', 0.072), ('rj', 0.069), ('optimality', 0.069), ('xa', 0.068), ('ekl', 0.064), ('kpa', 0.064), ('vph', 0.064), ('wj', 0.063), ('yt', 0.062), ('xp', 0.061), ('ha', 0.06), ('combinatorial', 0.06), ('threshold', 0.058), ('extension', 0.057), ('pixels', 0.055), ('sk', 0.054), ('dj', 0.052), ('submodularity', 0.052), ('subgradient', 0.048), ('iwata', 0.048), ('jjrj', 0.048), ('stobbe', 0.048), ('yj', 0.046), ('functions', 0.046), ('gradient', 0.044), ('convex', 0.043), ('dkl', 0.042), ('accelerated', 0.039), ('figures', 0.039), ('regions', 0.039), ('polyhedron', 0.039), ('hh', 0.036), ('fk', 0.035), ('certi', 0.034), ('running', 0.034), ('gap', 0.033), ('pf', 0.033), ('discrete', 0.033), ('iterates', 0.032), ('reformulated', 0.032), ('arising', 0.032), ('dimacs', 0.032), ('gapt', 0.032), ('ih', 0.032), ('jaj', 0.032), ('jrj', 0.032), ('lgj', 0.032), ('lpe', 0.032), ('naj', 0.032), ('sfo', 0.032), ('min', 0.032), ('na', 0.031), ('krause', 0.03), ('continuous', 0.03), ('rn', 0.029), ('pasadena', 0.028), ('fleischer', 0.028), ('fv', 0.028), ('kvk', 0.028), ('smooth', 0.028), ('potential', 0.028), ('minimize', 0.027), ('tpr', 0.026), ('fujishige', 0.026), ('queyranne', 0.026), ('aj', 0.025), ('minimizing', 0.025), ('ac', 0.024), ('cate', 0.024), ('cube', 0.024), ('nonnegative', 0.024), ('pairwise', 0.023), ('polynomial', 0.023), ('kohli', 0.023), ('torr', 0.023), ('wh', 0.023), ('dy', 0.022), ('seed', 0.022), ('permutation', 0.022), ('piecewise', 0.021), ('segmentation', 0.021), ('begin', 0.021), ('kx', 0.021)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999994 69 nips-2010-Efficient Minimization of Decomposable Submodular Functions
Author: Peter Stobbe, Andreas Krause
Abstract: Many combinatorial problems arising in machine learning can be reduced to the problem of minimizing a submodular function. Submodular functions are a natural discrete analog of convex functions, and can be minimized in strongly polynomial time. Unfortunately, state-of-the-art algorithms for general submodular minimization are intractable for larger problems. In this paper, we introduce a novel subclass of submodular minimization problems that we call decomposable. Decomposable submodular functions are those that can be represented as sums of concave functions applied to modular functions. We develop an algorithm, SLG, that can efÄ?Ĺš ciently minimize decomposable submodular functions with tens of thousands of variables. Our algorithm exploits recent results in smoothed convex minimization. We apply SLG to synthetic benchmarks and a joint classiÄ?Ĺš cation-and-segmentation task, and show that it outperforms the state-of-the-art general purpose submodular minimization algorithms by several orders of magnitude. 1
2 0.48474485 258 nips-2010-Structured sparsity-inducing norms through submodular functions
Author: Francis R. Bach
Abstract: Sparse methods for supervised learning aim at finding good linear predictors from as few variables as possible, i.e., with small cardinality of their supports. This combinatorial selection problem is often turned into a convex optimization problem by replacing the cardinality function by its convex envelope (tightest convex lower bound), in this case the ℓ1 -norm. In this paper, we investigate more general set-functions than the cardinality, that may incorporate prior knowledge or structural constraints which are common in many applications: namely, we show that for nondecreasing submodular set-functions, the corresponding convex envelope can be obtained from its Lov´ sz extension, a common tool in submodua lar analysis. This defines a family of polyhedral norms, for which we provide generic algorithmic tools (subgradients and proximal operators) and theoretical results (conditions for support recovery or high-dimensional inference). By selecting specific submodular functions, we can give a new interpretation to known norms, such as those based on rank-statistics or grouped norms with potentially overlapping groups; we also define new norms, in particular ones that can be used as non-factorial priors for supervised learning.
3 0.33118981 166 nips-2010-Minimum Average Cost Clustering
Author: Kiyohito Nagano, Yoshinobu Kawahara, Satoru Iwata
Abstract: A number of objective functions in clustering problems can be described with submodular functions. In this paper, we introduce the minimum average cost criterion, and show that the theory of intersecting submodular functions can be used for clustering with submodular objective functions. The proposed algorithm does not require the number of clusters in advance, and it will be determined by the property of a given set of data points. The minimum average cost clustering problem is parameterized with a real variable, and surprisingly, we show that all information about optimal clusterings for all parameters can be computed in polynomial time in total. Additionally, we evaluate the performance of the proposed algorithm through computational experiments. 1
4 0.2700173 47 nips-2010-Co-regularization Based Semi-supervised Domain Adaptation
Author: Abhishek Kumar, Avishek Saha, Hal Daume
Abstract: This paper presents a co-regularization based approach to semi-supervised domain adaptation. Our proposed approach (EA++) builds on the notion of augmented space (introduced in E ASYA DAPT (EA) [1]) and harnesses unlabeled data in target domain to further assist the transfer of information from source to target. This semi-supervised approach to domain adaptation is extremely simple to implement and can be applied as a pre-processing step to any supervised learner. Our theoretical analysis (in terms of Rademacher complexity) of EA and EA++ show that the hypothesis class of EA++ has lower complexity (compared to EA) and hence results in tighter generalization bounds. Experimental results on sentiment analysis tasks reinforce our theoretical findings and demonstrate the efficacy of the proposed method when compared to EA as well as few other representative baseline approaches.
5 0.22797899 102 nips-2010-Generalized roof duality and bisubmodular functions
Author: Vladimir Kolmogorov
Abstract: ˆ Consider a convex relaxation f of a pseudo-boolean function f . We say that ˆ the relaxation is totally half-integral if f (x) is a polyhedral function with halfintegral extreme points x, and this property is preserved after adding an arbitrary combination of constraints of the form xi = xj , xi = 1 − xj , and xi = γ where 1 γ ∈ {0, 1, 2 } is a constant. A well-known example is the roof duality relaxation for quadratic pseudo-boolean functions f . We argue that total half-integrality is a natural requirement for generalizations of roof duality to arbitrary pseudo-boolean functions. Our contributions are as follows. First, we provide a complete characterization ˆ of totally half-integral relaxations f by establishing a one-to-one correspondence with bisubmodular functions. Second, we give a new characterization of bisubmodular functions. Finally, we show some relationships between general totally half-integral relaxations and relaxations based on the roof duality. 1
6 0.096500367 1 nips-2010-(RF)^2 -- Random Forest Random Field
7 0.083705239 180 nips-2010-Near-Optimal Bayesian Active Learning with Noisy Observations
8 0.070436865 68 nips-2010-Effects of Synaptic Weight Diffusion on Learning in Decision Making Networks
9 0.068018191 222 nips-2010-Random Walk Approach to Regret Minimization
10 0.062498506 181 nips-2010-Network Flow Algorithms for Structured Sparsity
11 0.060585044 63 nips-2010-Distributed Dual Averaging In Networks
12 0.052980412 92 nips-2010-Fast global convergence rates of gradient methods for high-dimensional statistical recovery
13 0.050983131 49 nips-2010-Computing Marginal Distributions over Continuous Markov Networks for Statistical Relational Learning
14 0.049401831 210 nips-2010-Practical Large-Scale Optimization for Max-norm Regularization
15 0.049014907 165 nips-2010-MAP estimation in Binary MRFs via Bipartite Multi-cuts
16 0.046941556 225 nips-2010-Relaxed Clipping: A Global Training Method for Robust Regression and Classification
17 0.046494331 239 nips-2010-Sidestepping Intractable Inference with Structured Ensemble Cascades
18 0.045800049 13 nips-2010-A Primal-Dual Message-Passing Algorithm for Approximated Large Scale Structured Prediction
19 0.044669971 38 nips-2010-Batch Bayesian Optimization via Simulation Matching
20 0.042359248 283 nips-2010-Variational Inference over Combinatorial Spaces
topicId topicWeight
[(0, 0.153), (1, 0.036), (2, 0.107), (3, 0.044), (4, 0.021), (5, -0.159), (6, -0.088), (7, 0.15), (8, 0.436), (9, 0.045), (10, 0.231), (11, 0.186), (12, 0.135), (13, -0.103), (14, -0.277), (15, 0.255), (16, 0.014), (17, -0.042), (18, 0.009), (19, 0.107), (20, -0.128), (21, -0.01), (22, 0.023), (23, -0.033), (24, 0.068), (25, 0.112), (26, 0.026), (27, -0.038), (28, 0.027), (29, -0.022), (30, 0.026), (31, 0.021), (32, -0.033), (33, -0.036), (34, -0.013), (35, -0.025), (36, 0.023), (37, 0.066), (38, 0.003), (39, -0.028), (40, 0.033), (41, -0.027), (42, 0.037), (43, 0.05), (44, 0.008), (45, -0.001), (46, -0.012), (47, -0.008), (48, -0.006), (49, -0.062)]
simIndex simValue paperId paperTitle
same-paper 1 0.95343697 69 nips-2010-Efficient Minimization of Decomposable Submodular Functions
Author: Peter Stobbe, Andreas Krause
Abstract: Many combinatorial problems arising in machine learning can be reduced to the problem of minimizing a submodular function. Submodular functions are a natural discrete analog of convex functions, and can be minimized in strongly polynomial time. Unfortunately, state-of-the-art algorithms for general submodular minimization are intractable for larger problems. In this paper, we introduce a novel subclass of submodular minimization problems that we call decomposable. Decomposable submodular functions are those that can be represented as sums of concave functions applied to modular functions. We develop an algorithm, SLG, that can efÄ?Ĺš ciently minimize decomposable submodular functions with tens of thousands of variables. Our algorithm exploits recent results in smoothed convex minimization. We apply SLG to synthetic benchmarks and a joint classiÄ?Ĺš cation-and-segmentation task, and show that it outperforms the state-of-the-art general purpose submodular minimization algorithms by several orders of magnitude. 1
2 0.84187108 258 nips-2010-Structured sparsity-inducing norms through submodular functions
Author: Francis R. Bach
Abstract: Sparse methods for supervised learning aim at finding good linear predictors from as few variables as possible, i.e., with small cardinality of their supports. This combinatorial selection problem is often turned into a convex optimization problem by replacing the cardinality function by its convex envelope (tightest convex lower bound), in this case the ℓ1 -norm. In this paper, we investigate more general set-functions than the cardinality, that may incorporate prior knowledge or structural constraints which are common in many applications: namely, we show that for nondecreasing submodular set-functions, the corresponding convex envelope can be obtained from its Lov´ sz extension, a common tool in submodua lar analysis. This defines a family of polyhedral norms, for which we provide generic algorithmic tools (subgradients and proximal operators) and theoretical results (conditions for support recovery or high-dimensional inference). By selecting specific submodular functions, we can give a new interpretation to known norms, such as those based on rank-statistics or grouped norms with potentially overlapping groups; we also define new norms, in particular ones that can be used as non-factorial priors for supervised learning.
3 0.77349669 102 nips-2010-Generalized roof duality and bisubmodular functions
Author: Vladimir Kolmogorov
Abstract: ˆ Consider a convex relaxation f of a pseudo-boolean function f . We say that ˆ the relaxation is totally half-integral if f (x) is a polyhedral function with halfintegral extreme points x, and this property is preserved after adding an arbitrary combination of constraints of the form xi = xj , xi = 1 − xj , and xi = γ where 1 γ ∈ {0, 1, 2 } is a constant. A well-known example is the roof duality relaxation for quadratic pseudo-boolean functions f . We argue that total half-integrality is a natural requirement for generalizations of roof duality to arbitrary pseudo-boolean functions. Our contributions are as follows. First, we provide a complete characterization ˆ of totally half-integral relaxations f by establishing a one-to-one correspondence with bisubmodular functions. Second, we give a new characterization of bisubmodular functions. Finally, we show some relationships between general totally half-integral relaxations and relaxations based on the roof duality. 1
4 0.74931234 166 nips-2010-Minimum Average Cost Clustering
Author: Kiyohito Nagano, Yoshinobu Kawahara, Satoru Iwata
Abstract: A number of objective functions in clustering problems can be described with submodular functions. In this paper, we introduce the minimum average cost criterion, and show that the theory of intersecting submodular functions can be used for clustering with submodular objective functions. The proposed algorithm does not require the number of clusters in advance, and it will be determined by the property of a given set of data points. The minimum average cost clustering problem is parameterized with a real variable, and surprisingly, we show that all information about optimal clusterings for all parameters can be computed in polynomial time in total. Additionally, we evaluate the performance of the proposed algorithm through computational experiments. 1
5 0.37779635 47 nips-2010-Co-regularization Based Semi-supervised Domain Adaptation
Author: Abhishek Kumar, Avishek Saha, Hal Daume
Abstract: This paper presents a co-regularization based approach to semi-supervised domain adaptation. Our proposed approach (EA++) builds on the notion of augmented space (introduced in E ASYA DAPT (EA) [1]) and harnesses unlabeled data in target domain to further assist the transfer of information from source to target. This semi-supervised approach to domain adaptation is extremely simple to implement and can be applied as a pre-processing step to any supervised learner. Our theoretical analysis (in terms of Rademacher complexity) of EA and EA++ show that the hypothesis class of EA++ has lower complexity (compared to EA) and hence results in tighter generalization bounds. Experimental results on sentiment analysis tasks reinforce our theoretical findings and demonstrate the efficacy of the proposed method when compared to EA as well as few other representative baseline approaches.
6 0.26231444 180 nips-2010-Near-Optimal Bayesian Active Learning with Noisy Observations
7 0.21990155 68 nips-2010-Effects of Synaptic Weight Diffusion on Learning in Decision Making Networks
8 0.21379682 181 nips-2010-Network Flow Algorithms for Structured Sparsity
9 0.20535102 1 nips-2010-(RF)^2 -- Random Forest Random Field
10 0.19954772 2 nips-2010-A Bayesian Approach to Concept Drift
11 0.1989948 49 nips-2010-Computing Marginal Distributions over Continuous Markov Networks for Statistical Relational Learning
12 0.18997762 274 nips-2010-Trading off Mistakes and Don't-Know Predictions
13 0.18541403 210 nips-2010-Practical Large-Scale Optimization for Max-norm Regularization
14 0.17997408 222 nips-2010-Random Walk Approach to Regret Minimization
15 0.17783572 234 nips-2010-Segmentation as Maximum-Weight Independent Set
16 0.17693314 86 nips-2010-Exploiting weakly-labeled Web images to improve object classification: a domain adaptation approach
17 0.17374796 7 nips-2010-A Family of Penalty Functions for Structured Sparsity
18 0.17226912 283 nips-2010-Variational Inference over Combinatorial Spaces
19 0.17097051 70 nips-2010-Efficient Optimization for Discriminative Latent Class Models
20 0.16652599 13 nips-2010-A Primal-Dual Message-Passing Algorithm for Approximated Large Scale Structured Prediction
topicId topicWeight
[(13, 0.039), (27, 0.048), (30, 0.059), (33, 0.02), (35, 0.035), (45, 0.2), (50, 0.046), (52, 0.029), (60, 0.034), (77, 0.037), (78, 0.026), (90, 0.024), (98, 0.312)]
simIndex simValue paperId paperTitle
same-paper 1 0.75444657 69 nips-2010-Efficient Minimization of Decomposable Submodular Functions
Author: Peter Stobbe, Andreas Krause
Abstract: Many combinatorial problems arising in machine learning can be reduced to the problem of minimizing a submodular function. Submodular functions are a natural discrete analog of convex functions, and can be minimized in strongly polynomial time. Unfortunately, state-of-the-art algorithms for general submodular minimization are intractable for larger problems. In this paper, we introduce a novel subclass of submodular minimization problems that we call decomposable. Decomposable submodular functions are those that can be represented as sums of concave functions applied to modular functions. We develop an algorithm, SLG, that can efÄ?Ĺš ciently minimize decomposable submodular functions with tens of thousands of variables. Our algorithm exploits recent results in smoothed convex minimization. We apply SLG to synthetic benchmarks and a joint classiÄ?Ĺš cation-and-segmentation task, and show that it outperforms the state-of-the-art general purpose submodular minimization algorithms by several orders of magnitude. 1
2 0.71459854 96 nips-2010-Fractionally Predictive Spiking Neurons
Author: Jaldert Rombouts, Sander M. Bohte
Abstract: Recent experimental work has suggested that the neural firing rate can be interpreted as a fractional derivative, at least when signal variation induces neural adaptation. Here, we show that the actual neural spike-train itself can be considered as the fractional derivative, provided that the neural signal is approximated by a sum of power-law kernels. A simple standard thresholding spiking neuron suffices to carry out such an approximation, given a suitable refractory response. Empirically, we find that the online approximation of signals with a sum of powerlaw kernels is beneficial for encoding signals with slowly varying components, like long-memory self-similar signals. For such signals, the online power-law kernel approximation typically required less than half the number of spikes for similar SNR as compared to sums of similar but exponentially decaying kernels. As power-law kernels can be accurately approximated using sums or cascades of weighted exponentials, we demonstrate that the corresponding decoding of spiketrains by a receiving neuron allows for natural and transparent temporal signal filtering by tuning the weights of the decoding kernel. 1
3 0.63469809 13 nips-2010-A Primal-Dual Message-Passing Algorithm for Approximated Large Scale Structured Prediction
Author: Tamir Hazan, Raquel Urtasun
Abstract: In this paper we propose an approximated structured prediction framework for large scale graphical models and derive message-passing algorithms for learning their parameters efficiently. We first relate CRFs and structured SVMs and show that in CRFs a variant of the log-partition function, known as the soft-max, smoothly approximates the hinge loss function of structured SVMs. We then propose an intuitive approximation for the structured prediction problem, using duality, based on a local entropy approximation and derive an efficient messagepassing algorithm that is guaranteed to converge. Unlike existing approaches, this allows us to learn efficiently graphical models with cycles and very large number of parameters. 1
4 0.59288508 63 nips-2010-Distributed Dual Averaging In Networks
Author: Alekh Agarwal, Martin J. Wainwright, John C. Duchi
Abstract: The goal of decentralized optimization over a network is to optimize a global objective formed by a sum of local (possibly nonsmooth) convex functions using only local computation and communication. We develop and analyze distributed algorithms based on dual averaging of subgradients, and provide sharp bounds on their convergence rates as a function of the network size and topology. Our analysis clearly separates the convergence of the optimization algorithm itself from the effects of communication constraints arising from the network structure. We show that the number of iterations required by our algorithm scales inversely in the spectral gap of the network. The sharpness of this prediction is confirmed both by theoretical lower bounds and simulations for various networks. 1
5 0.59141707 258 nips-2010-Structured sparsity-inducing norms through submodular functions
Author: Francis R. Bach
Abstract: Sparse methods for supervised learning aim at finding good linear predictors from as few variables as possible, i.e., with small cardinality of their supports. This combinatorial selection problem is often turned into a convex optimization problem by replacing the cardinality function by its convex envelope (tightest convex lower bound), in this case the ℓ1 -norm. In this paper, we investigate more general set-functions than the cardinality, that may incorporate prior knowledge or structural constraints which are common in many applications: namely, we show that for nondecreasing submodular set-functions, the corresponding convex envelope can be obtained from its Lov´ sz extension, a common tool in submodua lar analysis. This defines a family of polyhedral norms, for which we provide generic algorithmic tools (subgradients and proximal operators) and theoretical results (conditions for support recovery or high-dimensional inference). By selecting specific submodular functions, we can give a new interpretation to known norms, such as those based on rank-statistics or grouped norms with potentially overlapping groups; we also define new norms, in particular ones that can be used as non-factorial priors for supervised learning.
6 0.58985609 12 nips-2010-A Primal-Dual Algorithm for Group Sparse Regularization with Overlapping Groups
7 0.58955324 7 nips-2010-A Family of Penalty Functions for Structured Sparsity
8 0.58886153 92 nips-2010-Fast global convergence rates of gradient methods for high-dimensional statistical recovery
9 0.58852738 148 nips-2010-Learning Networks of Stochastic Differential Equations
10 0.58846945 243 nips-2010-Smoothness, Low Noise and Fast Rates
11 0.58821207 49 nips-2010-Computing Marginal Distributions over Continuous Markov Networks for Statistical Relational Learning
12 0.58817291 87 nips-2010-Extended Bayesian Information Criteria for Gaussian Graphical Models
13 0.58681703 287 nips-2010-Worst-Case Linear Discriminant Analysis
14 0.58677167 155 nips-2010-Learning the context of a category
15 0.58673352 277 nips-2010-Two-Layer Generalization Analysis for Ranking Using Rademacher Average
16 0.58656532 158 nips-2010-Learning via Gaussian Herding
17 0.58638227 290 nips-2010-t-logistic regression
18 0.58584869 260 nips-2010-Sufficient Conditions for Generating Group Level Sparsity in a Robust Minimax Framework
19 0.58571965 109 nips-2010-Group Sparse Coding with a Laplacian Scale Mixture Prior
20 0.58541328 199 nips-2010-Optimal learning rates for Kernel Conjugate Gradient regression