nips nips2010 nips2010-181 knowledge-graph by maker-knowledge-mining

181 nips-2010-Network Flow Algorithms for Structured Sparsity


Source: pdf

Author: Julien Mairal, Rodolphe Jenatton, Francis R. Bach, Guillaume R. Obozinski

Abstract: We consider a class of learning problems that involve a structured sparsityinducing norm defined as the sum of ℓ∞ -norms over groups of variables. Whereas a lot of effort has been put in developing fast optimization methods when the groups are disjoint or embedded in a specific hierarchical structure, we address here the case of general overlapping groups. To this end, we show that the corresponding optimization problem is related to network flow optimization. More precisely, the proximal problem associated with the norm we consider is dual to a quadratic min-cost flow problem. We propose an efficient procedure which computes its solution exactly in polynomial time. Our algorithm scales up to millions of variables, and opens up a whole new range of applications for structured sparse models. We present several experiments on image and video data, demonstrating the applicability and scalability of our approach for various problems. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 fr Abstract We consider a class of learning problems that involve a structured sparsityinducing norm defined as the sum of ℓ∞ -norms over groups of variables. [sent-9, score-0.401]

2 Whereas a lot of effort has been put in developing fast optimization methods when the groups are disjoint or embedded in a specific hierarchical structure, we address here the case of general overlapping groups. [sent-10, score-0.381]

3 More precisely, the proximal problem associated with the norm we consider is dual to a quadratic min-cost flow problem. [sent-12, score-0.482]

4 Our algorithm scales up to millions of variables, and opens up a whole new range of applications for structured sparse models. [sent-14, score-0.159]

5 By considering sums of norms of appropriate subsets, or groups, of variables, these regularizations control the sparsity patterns of the solutions. [sent-23, score-0.105]

6 While the settings where the penalized groups of variables do not overlap or are embedded in a treeshaped hierarchy [12] have already been studied, regularizations with general overlapping groups have, to the best of our knowledge, never been addressed with proximal methods. [sent-26, score-0.884]

7 This paper makes the following contributions: − It shows that the proximal operator associated with the structured norm we consider can be ∗ † Contributed equally. [sent-27, score-0.469]

8 Laboratoire d’Informatique de l’Ecole Normale Sup´ rieure (INRIA/ENS/CNRS UMR 8548) e 1 computed with a fast and scalable procedure by solving a quadratic min-cost flow problem. [sent-28, score-0.092]

9 − It shows that the dual norm of the sparsity-inducing norm we consider can also be evaluated efficiently, which enables us to compute duality gaps for the corresponding optimization problems. [sent-29, score-0.267]

10 − It demonstrates that our method is relevant for various applications, from video background subtraction to estimation of hierarchical structures for dictionary learning of natural image patches. [sent-30, score-0.296]

11 Denoting by G a set of groups of indices, such a penalty might for example take the form: ηg max |wj | = Ω(w) g∈G j∈g η g wg ∞, (2) g∈G where wj is the j-th entry of w for j in [1; p] {1, . [sent-34, score-0.259]

12 , p}, the vector wg in R|g| records the coefficients of w indexed by g in G, and the scalars ηg are positive weights. [sent-37, score-0.095]

13 If G is a more general partition of [1; p], variables are selected in groups rather than individually. [sent-40, score-0.176]

14 When the groups overlap, Ω is still a norm and sets groups of variables to zero together [5]. [sent-41, score-0.411]

15 The latter setting has first been considered for hierarchies [7, 11, 15], and then extended to general group structures [5]. [sent-42, score-0.091]

16 [12] who tackled the case of hierarchical groups, we propose to approach this problem with proximal methods, which we now introduce. [sent-46, score-0.337]

17 At each iteration, the function f is linearized at the current estimate w0 and the so-called proximal problem has to be solved: L min f (w0 ) + (w − w0 )⊤ ∇f (w0 ) + λΩ(w) + w − w0 2 . [sent-49, score-0.272]

18 We call proximal operator associated with the regulariza′ tion λ Ω the function that maps a vector u in Rp onto the (unique, by strong convexity) solution w⋆ of Eq. [sent-52, score-0.321]

19 Simple proximal methods use w⋆ as the next iterate, but accelerated variants [3, 4] are also based on the proximal operator and require to solve problem (3) exactly and efficiently to enjoy their fast convergence rates. [sent-54, score-0.593]

20 The approach we develop in the rest of this paper extends [12] to the case of general overlapping groups when Ω is a weighted sum of ℓ∞ -norms, broadening the application of these regularizations to a wider spectrum of problems. [sent-57, score-0.368]

21 2 1 Note that other types of structured sparse models have also been introduced, either through a different norm [6], or through non-convex criteria [8, 9, 10]. [sent-58, score-0.218]

22 2 3 A Quadratic Min-Cost Flow Formulation In this section, we show that a convex dual of problem (3) for general overlapping groups G can be reformulated as a quadratic min-cost flow problem. [sent-60, score-0.411]

23 We present an efficient algorithm to solve it exactly, as well as a related algorithm to compute the dual norm of Ω. [sent-61, score-0.147]

24 We start by considering the dual formulation to problem (3) introduced in [12], for the case where Ω is a sum of ℓ∞ -norms: Lemma 1 (Dual of the proximal problem [12]) Given u in Rp , consider the problem 2 1 min u− s. [sent-62, score-0.423]

25 Without loss of generality,3 we assume from now on that the scalars uj are all non-negative, and we constrain the entries of ξ to be non-negative. [sent-68, score-0.15]

26 We now introduce a graph modeling of problem (4). [sent-69, score-0.103]

27 1 Graph Model Let G be a directed graph G = (V, E, s, t), where V is a set of vertices, E ⊆ V × V a set of arcs, s a source, and t a sink. [sent-71, score-0.103]

28 Let c and c′ be two functions on the arcs, c : E → R and c′ : E → R+ , where c is a cost function and c′ is a non-negative capacity function. [sent-72, score-0.118]

29 For simplicity, we identify groups and indices with the vertices of the graph. [sent-76, score-0.226]

30 (ii) For every group g in G, E contains an arc (s, g). [sent-77, score-0.224]

31 (iii) For every group g in G, and every index j in g, E contains an arc (g, j) with zero cost and infinite capacity. [sent-79, score-0.281]

32 j (iv) For every index j in [1; p], E contains an arc (j, t) with infinite capacity and a 1 ¯ ¯ cost cj 2 (uj − ξ j )2 , where ξ j is the flow on (j, t). [sent-81, score-0.292]

33 The flows ξ g associated with G can now j be identified with the variables of problem (4): indeed, the sum of the costs on the edges leading to the sink is equal to the objective function of (4), while the capacities of the arcs (s, g) match the constraints on each group. [sent-84, score-0.505]

34 This shows that finding a flow minimizing the sum of the costs on such a graph is equivalent to solving problem (4). [sent-85, score-0.192]

35 When some groups are included in others, the canonical graph can be simplified to yield a graph with a smaller number of edges. [sent-86, score-0.41]

36 Specifically, if h and g are groups with h ⊂ g, the edges (g, j) for j ∈ h carrying a flow ξ g can be removed and replaced by a single edge (g, h) of infinite capacity and j zero cost, carrying the flow j∈h ξ g . [sent-87, score-0.362]

37 This simplification is illustrated in Figure 1(d), with a graph j ¯⋆ equivalent to the one of Figure 1(c). [sent-88, score-0.103]

38 These simplifications are useful in practice, since they reduce the number of edges in the graph and improve the speed of the algorithms we are now going to present. [sent-90, score-0.155]

39 (4) derived in [12] show that for all j in [1; p], the signs of the non-zero coefficients ξ⋆g for g in G are the same as the signs of the entries uj . [sent-94, score-0.236]

40 (4), one can therefore flip the signs of the negative variables uj , then solve the modified dual formulation (with non-negative variables), which gives the magnitude of the entries ξ⋆g (the signs of these being known). [sent-96, score-0.357]

41 Figure 1: Graph representation of simple proximal problems with different group structures G. [sent-101, score-0.322]

42 The three indices 1, 2, 3 are represented as grey squares, and the groups g, h in G as red discs. [sent-102, score-0.176]

43 The source is linked to every group g, h with respective maximum capacity ληg , ληh and zero cost. [sent-103, score-0.173]

44 Each ¯ variable uj is linked to the sink t, with an infinite capacity, and with a cost cj 1 (uj − ξ j )2 . [sent-104, score-0.189]

45 All 2 other arcs in the graph have zero cost and infinite capacity. [sent-105, score-0.372]

46 They represent inclusion relationships in-between groups, and between groups and variables. [sent-106, score-0.176]

47 One of the simplest cases, where G contains a single group g (Ω is the ℓ∞ -norm) as in Figure 1(a), can be solved by an orthogonal projection on the ℓ1 -ball of radius ληg . [sent-111, score-0.108]

48 When the group structure is a tree as in Figure 1(d), the problem can be solved in O(pd) operations, where d is the depth of the tree [12, 18]. [sent-113, score-0.286]

49 4 The general case of overlapping groups is more difficult. [sent-114, score-0.26]

50 Hochbaum and Hong have shown in [18] that quadratic min-cost flow problems can be reduced to a specific parametric max-flow problem, for which an efficient algorithm exists [20]. [sent-115, score-0.091]

51 (4), we propose to use Algorithm 1 that also exploits the fact that our graphs have non-zero costs only on edges leading to the sink. [sent-117, score-0.097]

52 By definition, a parametric max-flow problem consists in solving, for every value of a parameter, a maxflow problem on a graph whose arc capacities depend on this parameter. [sent-121, score-0.43]

53 This equivalence, however, requires a non-trivial representation of structured sparsityinducing norms with submodular functions, as recently pointed out by [23]. [sent-123, score-0.204]

54 Algorithm 1 Computation of the proximal operator for overlapping groups. [sent-124, score-0.405]

55 1: Inputs: u ∈ Rp , a set of groups G, positive weights (ηg )g∈G , and λ (regularization parameter). [sent-125, score-0.176]

56 2: Build the initial graph G0 = (V0 , E0 , s, t) as explained in Section 3. [sent-126, score-0.103]

57 ¯ 4: Return: w = u − ξ (optimal solution of the proximal problem). [sent-129, score-0.272]

58 j∈Vu γ j ≤ λ 2 2: For all nodes j in Vu , set γ j to be the capacity of the arc (j, t). [sent-133, score-0.235]

59 ¯ 3: Max-flow step: Update (ξ j )j∈Vu by computing a max-flow on the graph (V, E, s, t). [sent-134, score-0.103]

60 ξ 5: Denote by (s, V + ) and (V − , t) the two disjoint subsets of (V, s, t) separated by the minimum (s, t)-cut of the graph, and remove the arcs between V + and V − . [sent-137, score-0.298]

61 At this point, it is possible to show that the value of the optimal min-cost flow on all arcs between V + and V − is necessary zero. [sent-149, score-0.243]

62 Some implementation details are crucial to the efficiency of the algorithm: • Exploiting connected components: When there exists no arc between two subsets of V , it is possible to process them independently in order to solve the global min-cost flow problem. [sent-155, score-0.17]

63 The g∈Vgr ηg and |γ j | ≤ λ j∈Vu γ j ≤ λ 2 ¯ idea is that the structure of the graph will not allow ξ j to be greater than λ g∋j ηg after the maxflow step. [sent-162, score-0.103]

64 Adding these additional constraints leads to better performance when the graph is not well balanced. [sent-163, score-0.103]

65 3 Computation of the Dual Norm The dual norm Ω∗ of Ω, defined for any vector κ in Rp by Ω∗ (κ) maxΩ(z)≤1 z⊤ κ, is a key quantity to study sparsity-inducing regularizations [5, 15, 27]. [sent-166, score-0.225]

66 We use it here to monitor the convergence of the proximal method through a duality gap, and define a proper optimality criterion for problem (1). [sent-167, score-0.333]

67 The duality gap for problem (1) can be derived from standard Fenchel duality arguments [28] and it is equal to f (w) + λΩ(w) + f ∗ (−κ) for w, κ in Rp with Ω∗ (κ) ≤ λ. [sent-169, score-0.155]

68 Therefore, evaluating the duality gap requires to compute efficiently Ω∗ in order to find a feasible dual variable κ. [sent-170, score-0.182]

69 / j (5) g∈G In the network problem associated with (5), the capacities on the arcs (s, g), g ∈ G, are set to τ ηg , and the capacities on the arcs (j, t), j in [1; p], are fixed to κj . [sent-174, score-0.764]

70 Solving problem (5) amounts to finding the smallest value of τ , such that there exists a flow saturating the capacities κj on the arcs ¯ leading to the sink t (i. [sent-175, score-0.419]

71 1: Inputs: κ ∈ Rp , a set of groups G, positive weights (ηg )g∈G . [sent-180, score-0.176]

72 2: Build the initial graph G0 = (V0 , E0 , s, t) as explained in Section 3. [sent-181, score-0.103]

73 Function dualNorm(V = Vu ∪ Vgr , E) 1: τ ← ( j∈Vu κj )/( g∈Vgr ηg ) and set the capacities of arcs (s, g) to τ ηg for all g in Vgr . [sent-185, score-0.368]

74 ¯ 2: Max-flow step: Update (ξ j )j∈Vu by computing a max-flow on the graph (V, E, s, t). [sent-186, score-0.103]

75 4 Applications and Experiments Our experiments use the algorithm of [4] based on our proximal operator, with weights ηg set to 1. [sent-191, score-0.272]

76 The following families of groups G using this spatial information are thus considered: (1) every contiguous sequence of length 3 for the one-dimensional case, and (2) every 3×3-square in the two-dimensional setting. [sent-198, score-0.238]

77 We have also run ProxFlow and SG on a larger dataset with (n, p) = (100, 106 ): after 12 hours, ProxFlow and SG have reached a relative duality gap of 0. [sent-210, score-0.094]

78 2 Background Subtraction Following [9, 10], we consider a background subtraction task. [sent-226, score-0.125]

79 Given a sequence of frames from a fixed camera, we try to segment out foreground objects in a new image. [sent-227, score-0.113]

80 In this formulation, the ℓ1 -norm penalty on e does not take into account the fact that neighboring pixels in y are likely to share the same label (background or foreground), which may lead to scattered pieces of foreground and background regions (Figure 3). [sent-234, score-0.229]

81 We therefore put an additional structured regularization term Ω on e, where the groups in G are all the overlapping 3×3-squares on the image. [sent-235, score-0.382]

82 As shown in Figure 3, adding Ω improves the background subtraction results for the two tested videos, by encoding, unlike the ℓ1 -norm, both spatial and color consistency. [sent-243, score-0.125]

83 have recently proposed to use a hierarchical structured norm to learn dictionaries of natural image patches. [sent-246, score-0.252]

84 , yn } of dimension m as sparse linear combinations of elements from a dictionary X = [x1 , . [sent-250, score-0.176]

85 In [12], the dictionary elements are embedded in a predefined tree T , via a particular instance of the structured norm Ω; we refer to it as Ωtree , and call G the underlying set of groups. [sent-255, score-0.386]

86 In this case, each signal yi admits a sparse decomposition in the form of a subtree of dictionary elements. [sent-256, score-0.176]

87 htm 7 Inspired by ideas from multi-task learning [14], we propose to learn the tree structure T by pruning irrelevant parts of a larger initial tree T0 . [sent-260, score-0.208]

88 The overall penalty on W, which results from the combination of Ωtree and Ωjoint , is itself an instance of Ω with general overlapping groups, as defined in Eq (2). [sent-269, score-0.11]

89 We study whether learning the hierarchy of the dictionary elements improves the denoising performance, compared to standard sparse coding (i. [sent-274, score-0.253]

90 , when Ωtree is the ℓ1 -norm and λ2 = 0) and the hierarchical dictionary learning of [12] based on predefined trees (i. [sent-276, score-0.171]

91 Since problem (6) is too large to be solved many times to select the regularization parameters (λ1 , λ2 ) rigorously, we use the following heuristics: we optimize mostly with the currently pruned tree held fixed (i. [sent-281, score-0.165]

92 We consider the same hierarchies as in [12], involving between 30 and 400 dictionary elements. [sent-286, score-0.147]

93 The regularization parameter λ1 is selected on the validation set of 25 000 patches, for both sparse coding (Flat) and hierarchical dictionary learning (Tree). [sent-287, score-0.274]

94 Starting from the tree giving the best performance (in this case the largest one, see Figure 4), we solve problem (6) following our heuristics, for increasing values of λ2 . [sent-288, score-0.104]

95 19 0 100 200 300 Dictionary Size 400 Figure 4: Left: Hierarchy obtained by pruning a larger tree of 76 elements. [sent-295, score-0.104]

96 5 Conclusion We have presented a new optimization framework for solving sparse structured problems involving sums of ℓ∞ -norms of any (overlapping) groups of variables. [sent-298, score-0.364]

97 Interestingly, this sheds new light on connections between sparse methods and the literature of network flow optimization. [sent-299, score-0.098]

98 In particular, the proximal operator for the formulation we consider can be cast as a quadratic min-cost flow problem, for which we propose an efficient and simple algorithm. [sent-300, score-0.417]

99 Tree-guided group lasso for multi-task regression with structured sparsity. [sent-391, score-0.139]

100 About strongly polynomial time algorithms for quadratic optimization over submodular constraints. [sent-447, score-0.104]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('ow', 0.476), ('vu', 0.288), ('proximal', 0.272), ('arcs', 0.243), ('proxflow', 0.189), ('vgr', 0.189), ('groups', 0.176), ('rp', 0.168), ('computeflow', 0.166), ('sg', 0.146), ('arc', 0.143), ('capacities', 0.125), ('foreground', 0.113), ('uj', 0.112), ('dictionary', 0.106), ('tree', 0.104), ('graph', 0.103), ('capacity', 0.092), ('structured', 0.089), ('dual', 0.088), ('jenatton', 0.087), ('overlapping', 0.084), ('dct', 0.083), ('regularizations', 0.078), ('subtraction', 0.076), ('willow', 0.071), ('dualnorm', 0.071), ('sparse', 0.07), ('xw', 0.068), ('cp', 0.066), ('hierarchical', 0.065), ('quadratic', 0.063), ('signs', 0.062), ('duality', 0.061), ('norm', 0.059), ('wg', 0.057), ('qp', 0.056), ('obozinski', 0.054), ('inria', 0.052), ('sink', 0.051), ('cpu', 0.051), ('group', 0.05), ('vertices', 0.05), ('background', 0.049), ('operator', 0.049), ('sparsityinducing', 0.047), ('xwi', 0.047), ('ows', 0.042), ('flat', 0.042), ('polymatroid', 0.042), ('graphs', 0.041), ('hierarchies', 0.041), ('submodular', 0.041), ('hierarchy', 0.041), ('pixels', 0.041), ('mairal', 0.04), ('cients', 0.039), ('dictionaries', 0.039), ('duarte', 0.038), ('conservation', 0.038), ('scalars', 0.038), ('denoising', 0.036), ('hochbaum', 0.036), ('carrying', 0.034), ('preprint', 0.033), ('simpli', 0.033), ('gap', 0.033), ('formulation', 0.033), ('coef', 0.033), ('regularization', 0.033), ('heuristics', 0.032), ('vertex', 0.031), ('every', 0.031), ('fenchel', 0.031), ('patches', 0.03), ('costs', 0.03), ('technical', 0.03), ('flow', 0.03), ('sum', 0.03), ('projection', 0.03), ('operations', 0.029), ('nonsmooth', 0.029), ('overlap', 0.029), ('solving', 0.029), ('canonical', 0.028), ('disjoint', 0.028), ('parametric', 0.028), ('solved', 0.028), ('network', 0.028), ('embedded', 0.028), ('rn', 0.027), ('norms', 0.027), ('arxiv', 0.027), ('therein', 0.027), ('subsets', 0.027), ('return', 0.027), ('penalty', 0.026), ('cost', 0.026), ('edges', 0.026), ('speed', 0.026)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000002 181 nips-2010-Network Flow Algorithms for Structured Sparsity

Author: Julien Mairal, Rodolphe Jenatton, Francis R. Bach, Guillaume R. Obozinski

Abstract: We consider a class of learning problems that involve a structured sparsityinducing norm defined as the sum of ℓ∞ -norms over groups of variables. Whereas a lot of effort has been put in developing fast optimization methods when the groups are disjoint or embedded in a specific hierarchical structure, we address here the case of general overlapping groups. To this end, we show that the corresponding optimization problem is related to network flow optimization. More precisely, the proximal problem associated with the norm we consider is dual to a quadratic min-cost flow problem. We propose an efficient procedure which computes its solution exactly in polynomial time. Our algorithm scales up to millions of variables, and opens up a whole new range of applications for structured sparse models. We present several experiments on image and video data, demonstrating the applicability and scalability of our approach for various problems. 1

2 0.2006522 258 nips-2010-Structured sparsity-inducing norms through submodular functions

Author: Francis R. Bach

Abstract: Sparse methods for supervised learning aim at finding good linear predictors from as few variables as possible, i.e., with small cardinality of their supports. This combinatorial selection problem is often turned into a convex optimization problem by replacing the cardinality function by its convex envelope (tightest convex lower bound), in this case the ℓ1 -norm. In this paper, we investigate more general set-functions than the cardinality, that may incorporate prior knowledge or structural constraints which are common in many applications: namely, we show that for nondecreasing submodular set-functions, the corresponding convex envelope can be obtained from its Lov´ sz extension, a common tool in submodua lar analysis. This defines a family of polyhedral norms, for which we provide generic algorithmic tools (subgradients and proximal operators) and theoretical results (conditions for support recovery or high-dimensional inference). By selecting specific submodular functions, we can give a new interpretation to known norms, such as those based on rank-statistics or grouped norms with potentially overlapping groups; we also define new norms, in particular ones that can be used as non-factorial priors for supervised learning.

3 0.19907019 12 nips-2010-A Primal-Dual Algorithm for Group Sparse Regularization with Overlapping Groups

Author: Sofia Mosci, Silvia Villa, Alessandro Verri, Lorenzo Rosasco

Abstract: We deal with the problem of variable selection when variables must be selected group-wise, with possibly overlapping groups defined a priori. In particular we propose a new optimization procedure for solving the regularized algorithm presented in [12], where the group lasso penalty is generalized to overlapping groups of variables. While in [12] the proposed implementation requires explicit replication of the variables belonging to more than one group, our iterative procedure is based on a combination of proximal methods in the primal space and projected Newton method in a reduced dual space, corresponding to the active groups. This procedure provides a scalable alternative with no need for data duplication, and allows to deal with high dimensional problems without pre-processing for dimensionality reduction. The computational advantages of our scheme with respect to state-of-the-art algorithms using data duplication are shown empirically with numerical simulations. 1

4 0.19369505 141 nips-2010-Layered image motion with explicit occlusions, temporal consistency, and depth ordering

Author: Deqing Sun, Erik B. Sudderth, Michael J. Black

Abstract: Layered models are a powerful way of describing natural scenes containing smooth surfaces that may overlap and occlude each other. For image motion estimation, such models have a long history but have not achieved the wide use or accuracy of non-layered methods. We present a new probabilistic model of optical flow in layers that addresses many of the shortcomings of previous approaches. In particular, we define a probabilistic graphical model that explicitly captures: 1) occlusions and disocclusions; 2) depth ordering of the layers; 3) temporal consistency of the layer segmentation. Additionally the optical flow in each layer is modeled by a combination of a parametric model and a smooth deviation based on an MRF with a robust spatial prior; the resulting model allows roughness in layers. Finally, a key contribution is the formulation of the layers using an imagedependent hidden field prior based on recent models for static scene segmentation. The method achieves state-of-the-art results on the Middlebury benchmark and produces meaningful scene segmentations as well as detected occlusion regions.

5 0.18071423 208 nips-2010-Policy gradients in linearly-solvable MDPs

Author: Emanuel Todorov

Abstract: We present policy gradient results within the framework of linearly-solvable MDPs. For the first time, compatible function approximators and natural policy gradients are obtained by estimating the cost-to-go function, rather than the (much larger) state-action advantage function as is necessary in traditional MDPs. We also develop the first compatible function approximators and natural policy gradients for continuous-time stochastic systems.

6 0.12759221 58 nips-2010-Decomposing Isotonic Regression for Efficiently Solving Large Problems

7 0.11904529 170 nips-2010-Moreau-Yosida Regularization for Grouped Tree Structure Learning

8 0.11680168 187 nips-2010-Occlusion Detection and Motion Estimation with Convex Optimization

9 0.11308835 63 nips-2010-Distributed Dual Averaging In Networks

10 0.10569778 13 nips-2010-A Primal-Dual Message-Passing Algorithm for Approximated Large Scale Structured Prediction

11 0.10488939 246 nips-2010-Sparse Coding for Learning Interpretable Spatio-Temporal Primitives

12 0.10446548 210 nips-2010-Practical Large-Scale Optimization for Max-norm Regularization

13 0.093730316 198 nips-2010-Optimal Web-Scale Tiering as a Flow Problem

14 0.093527012 135 nips-2010-Label Embedding Trees for Large Multi-Class Tasks

15 0.090423062 109 nips-2010-Group Sparse Coding with a Laplacian Scale Mixture Prior

16 0.089622386 89 nips-2010-Factorized Latent Spaces with Structured Sparsity

17 0.08819183 7 nips-2010-A Family of Penalty Functions for Structured Sparsity

18 0.087487735 77 nips-2010-Epitome driven 3-D Diffusion Tensor image segmentation: on extracting specific structures

19 0.082683042 70 nips-2010-Efficient Optimization for Discriminative Latent Class Models

20 0.081744954 230 nips-2010-Robust Clustering as Ensembles of Affinity Relations


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.223), (1, 0.059), (2, 0.032), (3, 0.065), (4, 0.0), (5, -0.259), (6, -0.005), (7, 0.156), (8, 0.092), (9, -0.004), (10, 0.072), (11, 0.0), (12, -0.034), (13, 0.118), (14, -0.087), (15, -0.044), (16, -0.074), (17, -0.054), (18, -0.068), (19, -0.153), (20, 0.059), (21, -0.033), (22, -0.099), (23, -0.08), (24, -0.007), (25, -0.102), (26, -0.005), (27, 0.071), (28, -0.013), (29, 0.046), (30, 0.009), (31, 0.03), (32, 0.105), (33, -0.025), (34, -0.234), (35, 0.054), (36, -0.056), (37, -0.047), (38, 0.017), (39, 0.151), (40, -0.04), (41, 0.01), (42, -0.154), (43, -0.112), (44, -0.035), (45, 0.067), (46, -0.015), (47, -0.049), (48, -0.005), (49, 0.023)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95385563 181 nips-2010-Network Flow Algorithms for Structured Sparsity

Author: Julien Mairal, Rodolphe Jenatton, Francis R. Bach, Guillaume R. Obozinski

Abstract: We consider a class of learning problems that involve a structured sparsityinducing norm defined as the sum of ℓ∞ -norms over groups of variables. Whereas a lot of effort has been put in developing fast optimization methods when the groups are disjoint or embedded in a specific hierarchical structure, we address here the case of general overlapping groups. To this end, we show that the corresponding optimization problem is related to network flow optimization. More precisely, the proximal problem associated with the norm we consider is dual to a quadratic min-cost flow problem. We propose an efficient procedure which computes its solution exactly in polynomial time. Our algorithm scales up to millions of variables, and opens up a whole new range of applications for structured sparse models. We present several experiments on image and video data, demonstrating the applicability and scalability of our approach for various problems. 1

2 0.7153064 58 nips-2010-Decomposing Isotonic Regression for Efficiently Solving Large Problems

Author: Ronny Luss, Saharon Rosset, Moni Shahar

Abstract: A new algorithm for isotonic regression is presented based on recursively partitioning the solution space. We develop efficient methods for each partitioning subproblem through an equivalent representation as a network flow problem, and prove that this sequence of partitions converges to the global solution. These network flow problems can further be decomposed in order to solve very large problems. Success of isotonic regression in prediction and our algorithm’s favorable computational properties are demonstrated through simulated examples as large as 2 × 105 variables and 107 constraints.

3 0.63951641 12 nips-2010-A Primal-Dual Algorithm for Group Sparse Regularization with Overlapping Groups

Author: Sofia Mosci, Silvia Villa, Alessandro Verri, Lorenzo Rosasco

Abstract: We deal with the problem of variable selection when variables must be selected group-wise, with possibly overlapping groups defined a priori. In particular we propose a new optimization procedure for solving the regularized algorithm presented in [12], where the group lasso penalty is generalized to overlapping groups of variables. While in [12] the proposed implementation requires explicit replication of the variables belonging to more than one group, our iterative procedure is based on a combination of proximal methods in the primal space and projected Newton method in a reduced dual space, corresponding to the active groups. This procedure provides a scalable alternative with no need for data duplication, and allows to deal with high dimensional problems without pre-processing for dimensionality reduction. The computational advantages of our scheme with respect to state-of-the-art algorithms using data duplication are shown empirically with numerical simulations. 1

4 0.57644451 198 nips-2010-Optimal Web-Scale Tiering as a Flow Problem

Author: Gilbert Leung, Novi Quadrianto, Kostas Tsioutsiouliklis, Alex J. Smola

Abstract: We present a fast online solver for large scale parametric max-flow problems as they occur in portfolio optimization, inventory management, computer vision, and logistics. Our algorithm solves an integer linear program in an online fashion. It exploits total unimodularity of the constraint matrix and a Lagrangian relaxation to solve the problem as a convex online game. The algorithm generates approximate solutions of max-flow problems by performing stochastic gradient descent on a set of flows. We apply the algorithm to optimize tier arrangement of over 84 million web pages on a layered set of caches to serve an incoming query stream optimally. 1

5 0.5645411 141 nips-2010-Layered image motion with explicit occlusions, temporal consistency, and depth ordering

Author: Deqing Sun, Erik B. Sudderth, Michael J. Black

Abstract: Layered models are a powerful way of describing natural scenes containing smooth surfaces that may overlap and occlude each other. For image motion estimation, such models have a long history but have not achieved the wide use or accuracy of non-layered methods. We present a new probabilistic model of optical flow in layers that addresses many of the shortcomings of previous approaches. In particular, we define a probabilistic graphical model that explicitly captures: 1) occlusions and disocclusions; 2) depth ordering of the layers; 3) temporal consistency of the layer segmentation. Additionally the optical flow in each layer is modeled by a combination of a parametric model and a smooth deviation based on an MRF with a robust spatial prior; the resulting model allows roughness in layers. Finally, a key contribution is the formulation of the layers using an imagedependent hidden field prior based on recent models for static scene segmentation. The method achieves state-of-the-art results on the Middlebury benchmark and produces meaningful scene segmentations as well as detected occlusion regions.

6 0.55428118 187 nips-2010-Occlusion Detection and Motion Estimation with Convex Optimization

7 0.51901621 248 nips-2010-Sparse Inverse Covariance Selection via Alternating Linearization Methods

8 0.51178312 255 nips-2010-Static Analysis of Binary Executables Using Structural SVMs

9 0.48788297 170 nips-2010-Moreau-Yosida Regularization for Grouped Tree Structure Learning

10 0.48451516 234 nips-2010-Segmentation as Maximum-Weight Independent Set

11 0.48379797 30 nips-2010-An Inverse Power Method for Nonlinear Eigenproblems with Applications in 1-Spectral Clustering and Sparse PCA

12 0.48148718 13 nips-2010-A Primal-Dual Message-Passing Algorithm for Approximated Large Scale Structured Prediction

13 0.47624105 258 nips-2010-Structured sparsity-inducing norms through submodular functions

14 0.45084327 77 nips-2010-Epitome driven 3-D Diffusion Tensor image segmentation: on extracting specific structures

15 0.41818562 210 nips-2010-Practical Large-Scale Optimization for Max-norm Regularization

16 0.40737784 7 nips-2010-A Family of Penalty Functions for Structured Sparsity

17 0.39557385 259 nips-2010-Subgraph Detection Using Eigenvector L1 Norms

18 0.39354274 208 nips-2010-Policy gradients in linearly-solvable MDPs

19 0.39110816 246 nips-2010-Sparse Coding for Learning Interpretable Spatio-Temporal Primitives

20 0.38903004 102 nips-2010-Generalized roof duality and bisubmodular functions


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(13, 0.052), (17, 0.036), (25, 0.221), (27, 0.057), (30, 0.06), (35, 0.068), (45, 0.226), (50, 0.047), (52, 0.039), (60, 0.032), (77, 0.043), (90, 0.023)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.89676428 285 nips-2010-Why are some word orders more common than others? A uniform information density account

Author: Luke Maurits, Dan Navarro, Amy Perfors

Abstract: Languages vary widely in many ways, including their canonical word order. A basic aspect of the observed variation is the fact that some word orders are much more common than others. Although this regularity has been recognized for some time, it has not been well-explained. In this paper we offer an informationtheoretic explanation for the observed word-order distribution across languages, based on the concept of Uniform Information Density (UID). We suggest that object-first languages are particularly disfavored because they are highly nonoptimal if the goal is to distribute information content approximately evenly throughout a sentence, and that the rest of the observed word-order distribution is at least partially explainable in terms of UID. We support our theoretical analysis with data from child-directed speech and experimental work. 1

2 0.87457907 79 nips-2010-Estimating Spatial Layout of Rooms using Volumetric Reasoning about Objects and Surfaces

Author: Abhinav Gupta, Martial Hebert, Takeo Kanade, David M. Blei

Abstract: There has been a recent push in extraction of 3D spatial layout of scenes. However, none of these approaches model the 3D interaction between objects and the spatial layout. In this paper, we argue for a parametric representation of objects in 3D, which allows us to incorporate volumetric constraints of the physical world. We show that augmenting current structured prediction techniques with volumetric reasoning significantly improves the performance of the state-of-the-art. 1

same-paper 3 0.84104317 181 nips-2010-Network Flow Algorithms for Structured Sparsity

Author: Julien Mairal, Rodolphe Jenatton, Francis R. Bach, Guillaume R. Obozinski

Abstract: We consider a class of learning problems that involve a structured sparsityinducing norm defined as the sum of ℓ∞ -norms over groups of variables. Whereas a lot of effort has been put in developing fast optimization methods when the groups are disjoint or embedded in a specific hierarchical structure, we address here the case of general overlapping groups. To this end, we show that the corresponding optimization problem is related to network flow optimization. More precisely, the proximal problem associated with the norm we consider is dual to a quadratic min-cost flow problem. We propose an efficient procedure which computes its solution exactly in polynomial time. Our algorithm scales up to millions of variables, and opens up a whole new range of applications for structured sparse models. We present several experiments on image and video data, demonstrating the applicability and scalability of our approach for various problems. 1

4 0.82654214 85 nips-2010-Exact learning curves for Gaussian process regression on large random graphs

Author: Matthew Urry, Peter Sollich

Abstract: We study learning curves for Gaussian process regression which characterise performance in terms of the Bayes error averaged over datasets of a given size. Whilst learning curves are in general very difficult to calculate we show that for discrete input domains, where similarity between input points is characterised in terms of a graph, accurate predictions can be obtained. These should in fact become exact for large graphs drawn from a broad range of random graph ensembles with arbitrary degree distributions where each input (node) is connected only to a finite number of others. Our approach is based on translating the appropriate belief propagation equations to the graph ensemble. We demonstrate the accuracy of the predictions for Poisson (Erdos-Renyi) and regular random graphs, and discuss when and why previous approximations of the learning curve fail. 1

5 0.76163673 109 nips-2010-Group Sparse Coding with a Laplacian Scale Mixture Prior

Author: Pierre Garrigues, Bruno A. Olshausen

Abstract: We propose a class of sparse coding models that utilizes a Laplacian Scale Mixture (LSM) prior to model dependencies among coefficients. Each coefficient is modeled as a Laplacian distribution with a variable scale parameter, with a Gamma distribution prior over the scale parameter. We show that, due to the conjugacy of the Gamma prior, it is possible to derive efficient inference procedures for both the coefficients and the scale parameter. When the scale parameters of a group of coefficients are combined into a single variable, it is possible to describe the dependencies that occur due to common amplitude fluctuations among coefficients, which have been shown to constitute a large fraction of the redundancy in natural images [1]. We show that, as a consequence of this group sparse coding, the resulting inference of the coefficients follows a divisive normalization rule, and that this may be efficiently implemented in a network architecture similar to that which has been proposed to occur in primary visual cortex. We also demonstrate improvements in image coding and compressive sensing recovery using the LSM model. 1

6 0.76028585 7 nips-2010-A Family of Penalty Functions for Structured Sparsity

7 0.75415206 12 nips-2010-A Primal-Dual Algorithm for Group Sparse Regularization with Overlapping Groups

8 0.75208336 260 nips-2010-Sufficient Conditions for Generating Group Level Sparsity in a Robust Minimax Framework

9 0.75052392 149 nips-2010-Learning To Count Objects in Images

10 0.74980134 258 nips-2010-Structured sparsity-inducing norms through submodular functions

11 0.74969929 13 nips-2010-A Primal-Dual Message-Passing Algorithm for Approximated Large Scale Structured Prediction

12 0.74742627 92 nips-2010-Fast global convergence rates of gradient methods for high-dimensional statistical recovery

13 0.74673045 26 nips-2010-Adaptive Multi-Task Lasso: with Application to eQTL Detection

14 0.74626076 217 nips-2010-Probabilistic Multi-Task Feature Selection

15 0.7455669 265 nips-2010-The LASSO risk: asymptotic results and real world examples

16 0.74541134 73 nips-2010-Efficient and Robust Feature Selection via Joint ℓ2,1-Norms Minimization

17 0.74536359 117 nips-2010-Identifying graph-structured activation patterns in networks

18 0.74501568 148 nips-2010-Learning Networks of Stochastic Differential Equations

19 0.74498916 5 nips-2010-A Dirty Model for Multi-task Learning

20 0.74494094 155 nips-2010-Learning the context of a category