jmlr jmlr2011 jmlr2011-41 knowledge-graph by maker-knowledge-mining

41 jmlr-2011-Improved Moves for Truncated Convex Models


Source: pdf

Author: M. Pawan Kumar, Olga Veksler, Philip H.S. Torr

Abstract: We consider the problem of obtaining an approximate maximum a posteriori estimate of a discrete random field characterized by pairwise potentials that form a truncated convex model. For this problem, we propose two st-MINCUT based move making algorithms that we call Range Swap and Range Expansion. Our algorithms can be thought of as extensions of αβ-Swap and α-Expansion respectively that fully exploit the form of the pairwise potentials. Specifically, instead of dealing with one or two labels at each iteration, our methods explore a large search space by considering a range of labels (that is, an interval of consecutive labels). Furthermore, we show that Range Expansion provides the same multiplicative bounds as the standard linear programming (LP) relaxation in polynomial time. Compared to previous approaches based on the LP relaxation, for example interior-point algorithms or tree-reweighted message passing (TRW), our methods are faster as they use only the efficient st-MINCUT algorithm in their design. We demonstrate the usefulness of the proposed approaches on both synthetic and standard real data problems. Keywords: truncated convex models, move making algorithms, range moves, multiplicative bounds, linear programming relaxation

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 (b)-(c) Two examples of truncated convex potentials that will be of interest to us in this work: truncated linear metric (b) and truncated quadratic semi-metric (c). [sent-31, score-0.601]

2 It is common practice in computer vision to specify an energy function with arbitrary unary potentials and truncated convex pairwise potentials (Boykov et al. [sent-32, score-0.733]

3 (a,b)∈E Here, θa ( f (a)) denotes unary potentials and θab ( f (a), f (b)) denotes pairwise potentials, that is, θa ( f (a)) is the cost of assigning label l f (a) to variable va and θab ( f (a), f (b)) is the cost of assigning labels l f (a) and l f (b) to variables va and vb respectively. [sent-127, score-0.867]

4 Formally speaking, the pairwise potentials are of the form θab ( f (a), f (b)) = wab min{d( f (a) − f (b)), M}, 35 K UMAR , V EKSLER AND T ORR where wab ≥ 0 for all (a, b) ∈ E , d(·) is a convex function and M > 0 is the truncation factor. [sent-131, score-0.943]

5 Examples of pairwise potentials of this form include the truncated linear metric and the truncated quadratic semi-metric, that is, θab ( f (a), f (b)) = wab min{| f (a) − f (b)|, M}, θab ( f (a), f (b)) = wab min{( f (a) − f (b))2 , M}. [sent-135, score-1.186]

6 Formally, let f be the labeling obtained by an algorithm A (randomized or deterministic) for an instance of the MAP estimation problem belonging to a particular class (in our case when the pairwise potentials form a truncated convex model). [sent-140, score-0.577]

7 The pairwise potential wab min{d(i − j), M} of assigning labels li and l j to neighboring random variables va and vb respectively. [sent-169, score-0.721]

8 Definition 2: A labeling fˆ is said to be a local minimum over smooth labelings if the energy cannot be reduced further by changing the labels of any subset of random variables, say defined by S, such that the new labeling f is smooth with respect to S. [sent-197, score-0.656]

9 Note that this labeling is smooth since we can find a path from va to vc via vb such that the edges in the path lie in the convex part. [sent-205, score-0.57]

10 Iteration • Set im = 0 (where im indexes the interval to be used). [sent-218, score-1.188]

11 • While im < h — Define interval Im = [im + 1, jm ] where jm = min{im + L, h − 1} and d(L) ≥ M. [sent-219, score-1.093]

12 — Move from current labeling fm to a new labeling fm+1 using st-MINCUT such that (i) if fm+1 (a) = fm (a) then fm+1 (a) ∈ Im , (ii) Q( fm+1 , D; θ) ≤ Q( fm , D; θ). [sent-220, score-1.547]

13 At an iteration m, the Range Swap algorithm only considers the random variables va whose current labeling fm (a) lies in the interval Im = [im + 1, jm ] of length L. [sent-230, score-1.092]

14 In what follows, we will assume that jm = im + L instead of jm = min{im + L, h − 1}. [sent-236, score-1.055]

15 The new labeling fm+1 is obtained by constructing a graph such that every st-cut on the graph corresponds to a labeling f of the random variables that satisfies: f (a) ∈ Im , ∀va ∈ v(Sm ), f (a) = fm (a), ∀va ∈ v − v(Sm ). [sent-240, score-0.859]

16 At each iteration of our algorithm, we are given an interval Im = [im + 1, jm ] of L labels (that is, jm = im + L) where d(L) = M. [sent-248, score-1.135]

17 We also have the current labeling fm for all the random variables. [sent-249, score-0.585]

18 We construct a directed weighted graph (with non-negative weights) Gm = {Vm , Em , cm (·, ·)} such that for each va ∈ v(Sm ), we define vertices {aim +1 , aim +2 , · · · , a jm } ∈ Vm . [sent-250, score-0.608]

19 Note that all other potentials that specify the energy of the labeling are fixed during the iteration. [sent-253, score-0.483]

20 1 R EPRESENTING U NARY P OTENTIALS For all random variables va ∈ v(Sm ), we define the following edges that belong to the set Em : • For all k ∈ [im + 1, jm ), edges (ak , ak+1 ) have capacity cm (ak , ak+1 ) = θa (k), that is, the cost of assigning label lk to variable va . [sent-256, score-1.009]

21 • For all k ∈ [im + 1, jm ), edges (ak+1 , ak ) have capacity cm (ak+1 , ak ) = ∞. [sent-257, score-0.571]

22 40 I MPROVED M OVES FOR T RUNCATED C ONVEX M ODELS • Edges (a jm ,t) have capacity cm (a jm ,t) = θa ( jm ). [sent-258, score-0.849]

23 • Edges (t, a jm ) have capacity cm (t, a jm ) = ∞. [sent-259, score-0.609]

24 We interpret a finite cost st-cut as a relabeling of the random variables as follows: f (a) = k if st-cut includes edge (ak , ak+1 ) where k ∈ [im + 1, jm ), jm if st-cut includes edge (a jm ,t). [sent-267, score-0.785]

25 (3) Note that the sum of the unary potentials for the labeling f is exactly equal to the cost of the st-cut over the edges defined above. [sent-268, score-0.554]

26 Since fm+1 (b) is fixed to fm (b), the pairwise potential θab (i, fm+1 (b)) = θab (i, fm (b)) can be effectively treated as a unary potential of va . [sent-275, score-1.164]

27 Hence, similar to unary potentials, it can be formulated using the following edge in set Em : • For all k ∈ [im + 1, jm ), edges (ak , ak+1 ) have capacity cm (ak , ak+1 ) = θab (k, fm (b)), that is, the cost of assigning label lk to variable va and keeping the label of vb fixed to fm (b). [sent-276, score-1.632]

28 • For all k ∈ [im + 1, jm ), edges (ak+1 , ak ) have capacity cm (ak+1 , ak ) = ∞. [sent-277, score-0.571]

29 • Edges (a jm ,t) have capacity cm (a jm ,t) = θab ( jm , fm (b)). [sent-278, score-1.226]

30 • Edges (t, a jm ) have capacity cm (t, a jm ) = ∞. [sent-279, score-0.609]

31 41 K UMAR , V EKSLER AND T ORR Figure 3: Edges that are used to represent the pairwise potentials of two neighboring random variables va and vb such that (a, b) ∈ A (Sm ) are shown. [sent-282, score-0.486]

32 3 R EPRESENTING PAIRWISE P OTENTIALS WITH N O F IXED VARIABLES For all random variables va and vb such that (a, b) ∈ A (Sm ), we define edges (ak , bk′ ) ∈ Em where either one or both of k and k′ belong to the set (im + 1, jm ] (that is, at least one of them is not im + 1). [sent-287, score-1.133]

33 The capacity of these edges is given by cm (ak , bk′ ) = wab d(k − k′ + 1) − 2d(k − k′ ) + d(k − k′ − 1) . [sent-288, score-0.542]

34 Lemma 1: For the capacities defined in Equations (4) and (5), the cost of the st-cut which includes the edges (ak , ak+1 ) and (bk′ , bk′ +1 ) (that is, va and vb take labels lk and lk′ respectively) is given by wab d(k − k′ ) + κab , where the constant κab = wab d(L) (Proof in Appendix B). [sent-297, score-1.101]

35 In this case, we define the set Sm such that Sm = {a| fm (a) ∈ Im , d( fm (a), fm (b)) ≤ M, ∀(a, b) ∈ E , fm (b) ∈ Im }. [sent-305, score-1.508]

36 In other words, Sm consists of those random variables whose current label belongs to the interval Im and whose pairwise potential with all its neighboring random variables vb such that fm (b) ∈ Im lies in the convex part of the truncated convex model. [sent-306, score-0.757]

37 Property 2: For (a, b) ∈ B1 (Sm ), the cost of the st-cut exactly represents the pairwise potential θab ( f (a), fm (b)). [sent-314, score-0.49]

38 Similarly, for (a, b) ∈ B2 (Sm ), the cost of the st-cut exactly represents the pairwise potential θab ( fm (a), f (b)). [sent-315, score-0.49]

39 Property 3: For (a, b) ∈ A (Sm ), if f (a) ∈ Im and f (b) ∈ Im such that d( f (a) − f (b)) ≤ M, 43 K UMAR , V EKSLER AND T ORR then the cost of the st-cut exactly represents the pairwise potential θab ( f (a), f (b)) plus a constant κab , that is, wab d( f (a) − f (b)) + κab . [sent-316, score-0.452]

40 This follows from the fact that our graph construction overestimates the truncation part by the convex function wab d(·). [sent-319, score-0.461]

41 Since the potentials are either modeled exactly or are overestimated, it follows that the energy of the labeling fm+1 is less than or equal to the cost of the st-MINCUT on Gm . [sent-322, score-0.512]

42 We show that this interval provides a labeling that is at least as good as the labeling obtained by considering any of its subsets for which the optimal move can be computed. [sent-331, score-0.504]

43 Formally, let fm+1 be the labeling obtained by using ′ an interval of length L such that d(L) > M and let fm+1 be the labeling obtained by using a subset of the interval of length L′ such that d(L′ ) = M. [sent-332, score-0.492]

44 2), it is worth noting that the corresponding graph construction ensures that the cut corresponding to the labeling fm exactly models the energy Q( fm , D; θ) up to a constant. [sent-345, score-1.149]

45 This implies that the energy of the new labeling fm+1 is less than or equal to the energy of fm , that is, Q( fm+1 , D; θ) ≤ Q( fm , D; θ). [sent-346, score-1.204]

46 This follows from the fact that the cost of the st-MINCUT is less than or equal to the energy of the labeling fm but is greater than or equal to the energy of fm+1 . [sent-347, score-0.856]

47 It is worth noting that, unlike previous move making algorithms, Range Swap is not guaranteed to compute the optimal move other than in the special case when d(L) = M (where L = jm − im is the length of the interval). [sent-349, score-0.915]

48 In other words, for the case where d(L) > M, if in the mth iteration we ′ move from label fm to fm+1 then it is possible that there exists another labeling fm+1 such that ′ Q( fm+1 , D; θ) ≤ Q( fm+1 , D; θ), ′ fm+1 (a) ∈ Im , ∀va ∈ v(Sm ), ′ fm+1 (a) = fm (a), ∀va ∈ v − v(Sm ). [sent-350, score-1.047]

49 Unlike Range Swap, at an iteration m it considers all the random variables va regardless of whether their current labeling fm (a) lies in the interval Im . [sent-356, score-0.852]

50 It provides the option for each random variable va to either retain its old label fm (a) or change its label to fm+1 (a) ∈ Im . [sent-357, score-0.631]

51 Formally, the Range Expansion algorithm moves from labeling fm to fm+1 such that Q( fm+1 , D; θ) ≤ Q( fm , D; θ), fm+1 (a) = fm (a) OR fm+1 (a) ∈ Im , ∀va ∈ v. [sent-358, score-1.358]

52 In other words, if in the mth iteration we move from label fm to fm+1 then it is possible that there ′ exists another labeling fm+1 such that ′ Q( fm+1 , D; θ) < Q( fm+1 , D; θ), ′ ′ fm+1 (a) = fm (a) OR fm+1 (a) ∈ Im , ∀va ∈ v. [sent-360, score-1.047]

53 As in the 45 K UMAR , V EKSLER AND T ORR case of Range Swap, we move from labeling fm to fm+1 by constructing a graph such that every st-cut on the graph corresponds to a labeling f of the random variables that satisfies: f (a) = fm (a) OR f (a) ∈ Im , ∀va ∈ v. [sent-363, score-1.286]

54 The new labeling fm+1 is obtained in two steps: (i) we obtain a labeling f that corresponds to the st-MINCUT on our graph; and (ii) we choose the new labeling fm+1 as fm+1 = f if Q( f , D; θ) ≤ Q( fm , D; θ), fm otherwise. [sent-364, score-1.378]

55 (6) Note that, unlike Range Swap, step (ii) is required in Range Expansion since the labeling f obtained in step (i) may have greater energy than fm . [sent-365, score-0.706]

56 1 Graph Construction We construct a directed weighted graph (with non-negative weights) Gm = {Vm , Em , cm (·, ·)} such that Vm contains the source s, the sink t and the vertices {aim +1 , aim +2 , · · · , a jm } for each random variable va ∈ v. [sent-368, score-0.625]

57 The edges e ∈ Em with capacity cm (e) are of two types: (i) those that represent the unary potentials of a labeling corresponding to an st-cut in the graph and; (ii) those that represent the pairwise potentials of the labeling. [sent-369, score-0.902]

58 To this end, we change the capacity of the edge (s, aim +1 ) to cm (s, aim +1 ) = θa ( fm (a)) if ∞ otherwise. [sent-377, score-0.592]

59 We interpret a finite cost st-cut as a relabeling of the random variables as follows:  if st-cut includes edge (ak , ak+1 ) where k ∈ [im + 1, jm ),  k jm if st-cut includes edge (a jm ,t), f (a) = (8)  fm (a) if st-cut includes edge (s, aim +1 ). [sent-380, score-1.214]

60 46 I MPROVED M OVES FOR T RUNCATED C ONVEX M ODELS Note that the sum of the unary potentials for the labeling f is exactly equal to the cost of the st-cut over the edges defined above. [sent-381, score-0.554]

61 In order to model these cases, we incorporate the following additional edges: • If fm (a) ∈ Im and fm (b) ∈ Im then we add an edge (aim +1 , bim +1 ) with capacity wab M + κab /2 / (see Fig. [sent-395, score-1.216]

62 • If fm (a) ∈ Im and fm (b) ∈ Im then we add an edge (bim +1 , aim +1 ) with capacity wab M + κab /2 / (see Fig. [sent-397, score-1.201]

63 • If fm (a) ∈ Im and fm (b) ∈ Im , we introduce a new vertex pab . [sent-399, score-0.834]

64 5(c)): cm (aim +1 , pab ) = cm (pab , aim +1 ) = wab M + κab /2, cm (bim +1 , pab ) = cm (pab , bim +1 ) = wab M + κab /2, cm (s, pab ) = θab ( fm (a), fm (b)) + κab . [sent-401, score-2.12]

65 Property 5: The cost of the st-cut exactly represents the sum of the unary potentials associated with the corresponding labeling f , that is, ∑va ∈v θa ( f (a)). [sent-410, score-0.48]

66 Property 6: For (a, b) ∈ E , if f (a) = fm (a) ∈ Im and f (b) = fm (b) ∈ Im then the cost of the st-cut / / exactly represents the pairwise potential θab ( f (a), f (b)) plus a constant κab . [sent-411, score-0.867]

67 This is due to the fact that the st-cut contains the edge (s, pab ) whose capacity is θab ( fm (a), fm (b)) + κab . [sent-412, score-0.908]

68 This follows from the fact that our graph construction overestimates the truncation part by the convex function wab d(·) (see Lemma 1). [sent-418, score-0.461]

69 and d(x) Similarly, if f (a) = fm (a) ∈ Im and f (b) ∈ Im then the cost of the st-cut incorrectly represents / the pairwise potential θab ( f (a), f (b)), being ˆ wab d( f (b) − (im + 1)) + wab d( f (b) − (im + 1)) + wab M + κab . [sent-423, score-1.507]

70 In other words, the energy of the labeling f , and hence the energy of fm+1 , is less than or equal to the cost of the st-MINCUT on Gm . [sent-429, score-0.479]

71 Clearly, the following equation holds true: ∑ θa ( f ∗ (a)) = ∑ va ∈v ∑ θa ( f ∗ (a)), (11) Im ∈Ir va ∈v( f ∗ ,Im ) since f ∗ (a) belongs to exactly one interval in Ir for all va ∈ v. [sent-451, score-0.68]

72 ab ˆ • For (a, b) ∈ B1 ( f ∗ , Im ), we denote wab d( f ∗ (a) − (im + 1)) + wab d( f ∗ (a) − (im + 1)) + wab M m. [sent-453, score-1.206]

73 by ea ˆ • For (a, b) ∈ B2 ( f ∗ , Im ), we denote wab d( f ∗ (b) − (im + 1)) + wab d( f ∗ (b) − (im + 1)) + wab M m. [sent-454, score-1.017]

74 In other words, ∑ ≤ ∑ θa ( f (a)) + ∑ θab ( f (a), f (b)) (a,b)∈A ( f ∗ ,Im ) B ( f ∗ ,Im ) va ∈v( f ∗ ,Im ) θa ( f ∗ (a)) + ∑ em + ab em + a ∑ (a,b)∈B2 ( f ∗ ,Im ) (a,b)∈B1 ( f ∗ ,Im ) (a,b)∈A ( f ∗ ,Im ) va ∈v( f ∗ ,Im ) ∑ em , ∀Im . [sent-460, score-0.875]

75 Furthermore, using Equation (11), the summation of the above inequality can be written as Q( f , D; θ) ≤ ∑ Im ∈Ir ∑ θa ( f ∗ (a)) + va ∈v ∑ (a,b)∈A ( f ∗ ,Im ) ∑ em + ab (a,b)∈B1 ( f ∗ ,Im ) 50 em + a ∑ (a,b)∈B2 ( f ∗ ,Im ) em . [sent-463, score-0.661]

76 Hence, we get Q( f , D; θ) ≤ 1 ∑ L ∑ Im ∈Ir r ∑ θa ( f ∗ (a)) + va ∈v ∑ ∑ em + ab (a,b)∈A ( f ∗ ,Im ) ∑ em + a (a,b)∈B1 ( f ∗ ,Im ) em . [sent-466, score-0.661]

77 Specifically, the unary potentials θa (i) were sampled uniformly from the interval [0, 10] while the weights wab , which determine the pairwise potentials, were sampled uniformly from [0, 5]. [sent-502, score-0.681]

78 The Range Swap algorithm guarantees that at each iteration the energy of the new labeling obtained by the st-MINCUT algorithm is less than or equal to the energy of the previous labeling. [sent-555, score-0.465]

79 labeling may have a higher energy than the previous labeling (in which case the new labeling is discarded and the previous labeling is retained). [sent-602, score-0.953]

80 By canceling out the common terms, we see that ∑ ∑ ≤ ∑ ≤ ∑ θa ( f ∗ (a)) + θab ( fi (a), fi (b)) ∑ θa ( fi (a)) + va ∈v(Si ) θab ( fˆ(a), fˆ(b)) ∑ θa ( fˆ(a)) + va ∈v(Si ) θab ( f ∗ (a), f ∗ (b)). [sent-644, score-0.47]

81 Proof of Lemma 1 Lemma 1: For the capacities defined in Equations (4) and (5), the cost of the st-cut which includes the edges (ak , ak+1 ) and (bk′ , bk′ +1 ) (that is, va and vb take labels lk and lk′ respectively) is given by wab d(k − k′ ) + κab , where the constant κab = wab d(L). [sent-650, score-1.101]

82 +d(i′ − jm + 2) − 2d(i′ − jm + 1) + d(i′ − jm ) +d(i′ − jm + 1) − 2d(i′ − jm ) + d(i′ − jm − 1) d(i′ − k′ ) − d(i′ − k′ − 1) − d(i′ − jm ) + d(i′ − jm + 1). [sent-656, score-1.92]

83 = (14) Hence, it follows that jm k ∑ ∑ = d(i′ − j′ + 1) − 2d(i′ − j′ ) + d(i′ − j′ − 1) i′ =im +1 j′ =k′ +1 d(im + 1 − k′ ) − d(im − k′ ) − d(im − jm + 1) + d(im − jm ) +d(im + 2 − k′ ) − d(im + 1 − k′ ) − d(im − jm + 2) + d(im − jm + 1) . [sent-657, score-1.2]

84 +d(k − k′ − 1) − d(k − k′ − 2) − d(k − jm − 1) + d(k − jm − 2) = = +d(k − k′ ) − d(k − k′ − 1) − d(k − jm ) + d(k − jm − 1) d(k − k′ ) − d( jm − k) − d(k′ − im ) + d( jm − im ) d(k − k′ ) − d(L − k + im ) − d(im − k′ ) + d(L), (15) where the last expression holds because L = jm − im . [sent-660, score-3.98]

85 Similarly, it can be shown that jm ∑ = k′ ∑ i′ =k+1 j′ =im +1 ′ d(i′ − j′ + 1) − 2d(i′ − j′ ) + d(i′ − j′ − 1) d(k − k ) − d(L − k′ + im ) − d(im − k) + d(L). [sent-662, score-0.815]

86 (a) (b) Figure 8: The st-cut (the dashed curve between the two sets of nodes {aim +1 , · · · , a jm } and {bim +1 , · · · , b jm }; shown in red if viewed in color) that assigns f (a) ∈ Im and f (b) ∈ Im . [sent-665, score-0.48]

87 59 K UMAR , V EKSLER AND T ORR There are two possible cases to be considered: (i) fm (a) ∈ Im ; and (ii) fm (a) ∈ Im . [sent-676, score-0.754]

88 8(a)) (a f (a) , a f (a)+1 ) ∪ {(ai′ , b j′ ), im + 2 ≤ i′ ≤ f (a), im + 1 ≤ j′ ≤ jm } ∪{(aim +1 , b j′ ), im + 2 ≤ j′ ≤ jm } ∪ (aim +1 , bim +1 ). [sent-678, score-2.254]

89 8(b)) (a f (a) , a f (a)+1 ) ∪ {(ai′ , b j′ ), im + 2 ≤ i′ ≤ f (a), im + 1 ≤ j′ ≤ jm } ∪{(aim +1 , b j′ ), im + 2 ≤ j′ ≤ jm } ∪ (pab , bim +1 ). [sent-680, score-2.254]

90 However, the capacity of both these edges is equal to wab M + κab /2. [sent-684, score-0.469]

91 The cost of the st-cut for the edges in Equation (17) is given by wab (d(L − f (a) + im ) + d( f (a) − im )) 2 f (a) jm wab + ∑ ∑ 2 d(i′ − j′ + 1) − 2d(i′ − j′ ) + d(i′ − j′ − 1) i′ =im +2 j′ =im +1 jm wab d(im − j′ + 2) − 2d(im − j′ + 1) + d(im − j′ ) ′ =i +2 2 j m κab +wab M + . [sent-687, score-2.75]

92 2 In order to simplify the above expression, we begin by observing that + ∑ jm ∑ j′ =im +1 ′ (18) d(i′ − j′ + 1) − 2d(i′ − j′ ) + d(i′ − j′ − 1) d(i − im ) − d(i′ − im − 1) − d(i′ − jm ) + d(i′ − jm − 1). [sent-688, score-1.87]

93 = The above equation is obtained by substituting k′ = im in Equation (14). [sent-689, score-0.575]

94 It follows that f (a) jm wab d(i′ − j′ + 1) − 2d(i′ − j′ ) + d(i′ − j′ − 1) 2 i′ =im +2 j′ =im +1 ∑ = ∑ d(2) − d(1) − d(im − jm + 2) + d(im − jm + 1) +d(3) − d(2) − d(im − jm + 3) + d(im − jm + 2) . [sent-690, score-1.539]

95 Similarly, by substituting k′ = im + 1 in Equation (14), we get jm wab d(im − j′ + 2) − 2d(im − j′ + 1) + d(im − j′ ) ′ =i +2 2 j m ∑ = d(0) − d(1) − d( jm − im − 1) + d( jm − im ) = d(0) − d(1) − d(L − 1) + d(L). [sent-695, score-2.784]

96 Consider one such st-cut that results in the following labeling: f ∗ (a) if va ∈ v( f ∗ , Im ) f (a) = fm (a) otherwise. [sent-703, score-0.591]

97 We consider the following six cases: • For random variables va ∈ v( f ∗ , Im ) it follows from Property 5 that the cost of the st-cut will / include the unary potentials associated with such variables exactly, that is, ∑ va ∈v( f ∗ ,Im ) / 61 θa ( fm (a)). [sent-706, score-1.077]

98 (22) (a,b)∈A ( f ∗ ,Im ) B ( f ∗ ,Im ) / • For random variables va ∈ v( f ∗ , Im ), it follows from Property 5 that the cost of the st-cut will include the unary potentials associated with such variables exactly, that is, ∑ θa ( f ∗ (a)). [sent-708, score-0.486]

99 Since we are dealing with the truncated linear metric, the terms em , em and em can be simplified as b ab a em = wab | f ∗ (a) − f ∗ (b)|, em = wab ( f ∗ (a) − im − 1 + M), em = wab ( f ∗ (b) − im − 1 + M). [sent-720, score-2.999]

100 Furthermore, the conditions for (a, b) ∈ B1 ( f ∗ , Im ) and (a, b) ∈ B2 ( f ∗ , Im ) are given by (a, b) ∈ B1 ( f ∗ , Im ) ⇐⇒ im ∈ [ f ∗ (a) − L, f ∗ (a) − 1], (a, b) ∈ B2 ( f ∗ , Im ) ⇐⇒ im ∈ [ f ∗ (b) − L, f ∗ (b) − 1]. [sent-725, score-1.15]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('im', 0.575), ('fm', 0.377), ('wab', 0.339), ('jm', 0.24), ('va', 0.214), ('labeling', 0.208), ('ab', 0.189), ('potentials', 0.154), ('truncated', 0.127), ('energy', 0.121), ('sm', 0.1), ('swap', 0.091), ('unary', 0.089), ('em', 0.086), ('eksler', 0.081), ('oves', 0.081), ('runcated', 0.081), ('pab', 0.08), ('edges', 0.074), ('cm', 0.073), ('onvex', 0.069), ('mproved', 0.069), ('umar', 0.069), ('boykov', 0.065), ('ak', 0.064), ('pairwise', 0.061), ('labelings', 0.058), ('orr', 0.057), ('si', 0.056), ('capacity', 0.056), ('tardos', 0.054), ('move', 0.05), ('bim', 0.049), ('chekuri', 0.049), ('lp', 0.049), ('range', 0.047), ('multiplicative', 0.044), ('relaxation', 0.04), ('komodakis', 0.038), ('interval', 0.038), ('expansion', 0.038), ('odels', 0.036), ('schlesinger', 0.036), ('veksler', 0.036), ('bp', 0.035), ('capacities', 0.034), ('aim', 0.034), ('rounding', 0.034), ('mrf', 0.034), ('graph', 0.033), ('ishikawa', 0.031), ('gm', 0.031), ('stereo', 0.03), ('vb', 0.03), ('cost', 0.029), ('gupta', 0.027), ('neighboring', 0.027), ('bk', 0.027), ('labels', 0.027), ('convex', 0.027), ('kumar', 0.026), ('truncation', 0.023), ('potential', 0.023), ('tziritas', 0.023), ('pami', 0.023), ('szeliski', 0.023), ('epresenting', 0.022), ('otentials', 0.022), ('overestimates', 0.022), ('trw', 0.022), ('quadratic', 0.02), ('label', 0.02), ('ir', 0.019), ('metric', 0.019), ('elds', 0.019), ('moves', 0.019), ('map', 0.019), ('potts', 0.018), ('edge', 0.018), ('construction', 0.017), ('kolmogorov', 0.017), ('smooth', 0.017), ('sink', 0.017), ('felzenszwalb', 0.017), ('partitioning', 0.017), ('cut', 0.016), ('pawan', 0.015), ('lhs', 0.015), ('olga', 0.015), ('iteration', 0.015), ('intervals', 0.015), ('lk', 0.015), ('passing', 0.014), ('pixels', 0.014), ('vertices', 0.014), ('vm', 0.014), ('torr', 0.014), ('fi', 0.014), ('alahari', 0.013), ('message', 0.013)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999982 41 jmlr-2011-Improved Moves for Truncated Convex Models

Author: M. Pawan Kumar, Olga Veksler, Philip H.S. Torr

Abstract: We consider the problem of obtaining an approximate maximum a posteriori estimate of a discrete random field characterized by pairwise potentials that form a truncated convex model. For this problem, we propose two st-MINCUT based move making algorithms that we call Range Swap and Range Expansion. Our algorithms can be thought of as extensions of αβ-Swap and α-Expansion respectively that fully exploit the form of the pairwise potentials. Specifically, instead of dealing with one or two labels at each iteration, our methods explore a large search space by considering a range of labels (that is, an interval of consecutive labels). Furthermore, we show that Range Expansion provides the same multiplicative bounds as the standard linear programming (LP) relaxation in polynomial time. Compared to previous approaches based on the LP relaxation, for example interior-point algorithms or tree-reweighted message passing (TRW), our methods are faster as they use only the efficient st-MINCUT algorithm in their design. We demonstrate the usefulness of the proposed approaches on both synthetic and standard real data problems. Keywords: truncated convex models, move making algorithms, range moves, multiplicative bounds, linear programming relaxation

2 0.15385212 28 jmlr-2011-Double Updating Online Learning

Author: Peilin Zhao, Steven C.H. Hoi, Rong Jin

Abstract: In most kernel based online learning algorithms, when an incoming instance is misclassified, it will be added into the pool of support vectors and assigned with a weight, which often remains unchanged during the rest of the learning process. This is clearly insufficient since when a new support vector is added, we generally expect the weights of the other existing support vectors to be updated in order to reflect the influence of the added support vector. In this paper, we propose a new online learning method, termed Double Updating Online Learning, or DUOL for short, that explicitly addresses this problem. Instead of only assigning a fixed weight to the misclassified example received at the current trial, the proposed online learning algorithm also tries to update the weight for one of the existing support vectors. We show that the mistake bound can be improved by the proposed online learning method. We conduct an extensive set of empirical evaluations for both binary and multi-class online learning tasks. The experimental results show that the proposed technique is considerably more effective than the state-of-the-art online learning algorithms. The source code is available to public at http://www.cais.ntu.edu.sg/˜chhoi/DUOL/. Keywords: online learning, kernel method, support vector machines, maximum margin learning, classification

3 0.14494702 34 jmlr-2011-Faster Algorithms for Max-Product Message-Passing

Author: Julian J. McAuley, TibĂŠrio S. Caetano

Abstract: Maximum A Posteriori inference in graphical models is often solved via message-passing algorithms, such as the junction-tree algorithm or loopy belief-propagation. The exact solution to this problem is well-known to be exponential in the size of the maximal cliques of the triangulated model, while approximate inference is typically exponential in the size of the model’s factors. In this paper, we take advantage of the fact that many models have maximal cliques that are larger than their constituent factors, and also of the fact that many factors consist only of latent variables (i.e., they do not depend on an observation). This is a common case in a wide variety of applications that deal with grid-, tree-, and ring-structured models. In such cases, we are able to decrease the exponent of complexity for message-passing by 0.5 for both exact and approximate inference. We demonstrate that message-passing operations in such models are equivalent to some variant of matrix multiplication in the tropical semiring, for which we offer an O(N 2.5 ) expected-case solution. Keywords: graphical models, belief-propagation, tropical matrix multiplication

4 0.058421552 12 jmlr-2011-Bayesian Co-Training

Author: Shipeng Yu, Balaji Krishnapuram, Rómer Rosales, R. Bharat Rao

Abstract: Co-training (or more generally, co-regularization) has been a popular algorithm for semi-supervised learning in data with two feature representations (or views), but the fundamental assumptions underlying this type of models are still unclear. In this paper we propose a Bayesian undirected graphical model for co-training, or more generally for semi-supervised multi-view learning. This makes explicit the previously unstated assumptions of a large class of co-training type algorithms, and also clarifies the circumstances under which these assumptions fail. Building upon new insights from this model, we propose an improved method for co-training, which is a novel co-training kernel for Gaussian process classifiers. The resulting approach is convex and avoids local-maxima problems, and it can also automatically estimate how much each view should be trusted to accommodate noisy or unreliable views. The Bayesian co-training approach can also elegantly handle data samples with missing views, that is, some of the views are not available for some data points at learning time. This is further extended to an active sensing framework, in which the missing (sample, view) pairs are actively acquired to improve learning performance. The strength of active sensing model is that one actively sensed (sample, view) pair would improve the joint multi-view classification on all the samples. Experiments on toy data and several real world data sets illustrate the benefits of this approach. Keywords: co-training, multi-view learning, semi-supervised learning, Gaussian processes, undirected graphical models, active sensing

5 0.052945536 91 jmlr-2011-The Sample Complexity of Dictionary Learning

Author: Daniel Vainsencher, Shie Mannor, Alfred M. Bruckstein

Abstract: A large set of signals can sometimes be described sparsely using a dictionary, that is, every element can be represented as a linear combination of few elements from the dictionary. Algorithms for various signal processing applications, including classification, denoising and signal separation, learn a dictionary from a given set of signals to be represented. Can we expect that the error in representing by such a dictionary a previously unseen signal from the same source will be of similar magnitude as those for the given examples? We assume signals are generated from a fixed distribution, and study these questions from a statistical learning theory perspective. We develop generalization bounds on the quality of the learned dictionary for two types of constraints on the coefficient selection, as measured by the expected L2 error in representation when the dictionary is used. For the case of l1 regularized coefficient selection we provide a generalnp ln(mλ)/m , where n is the dimension, p is the number of ization bound of the order of O elements in the dictionary, λ is a bound on the l1 norm of the coefficient vector and m is the number of samples, which complements existing results. For the case of representing a new signal as a combination of at most k dictionary elements, we provide a bound of the order O( np ln(mk)/m) under an assumption on the closeness to orthogonality of the dictionary (low Babel function). We further show that this assumption holds for most dictionaries in high dimensions in a strong probabilistic sense. Our results also include bounds that converge as 1/m, not previously known for this problem. We provide similar results in a general setting using kernels with weak smoothness requirements. Keywords: dictionary learning, generalization bound, sparse representation

6 0.031500466 18 jmlr-2011-Convergence Rates of Efficient Global Optimization Algorithms

7 0.030954711 103 jmlr-2011-Weisfeiler-Lehman Graph Kernels

8 0.030260231 77 jmlr-2011-Posterior Sparsity in Unsupervised Dependency Parsing

9 0.027053766 15 jmlr-2011-CARP: Software for Fishing Out Good Clustering Algorithms

10 0.026959959 58 jmlr-2011-Learning from Partial Labels

11 0.026905848 20 jmlr-2011-Convex and Network Flow Optimization for Structured Sparsity

12 0.02680237 47 jmlr-2011-Inverse Reinforcement Learning in Partially Observable Environments

13 0.023005137 16 jmlr-2011-Clustering Algorithms for Chains

14 0.022466948 96 jmlr-2011-Two Distributed-State Models For Generating High-Dimensional Time Series

15 0.021772206 7 jmlr-2011-Adaptive Exact Inference in Graphical Models

16 0.020671779 86 jmlr-2011-Sparse Linear Identifiable Multivariate Modeling

17 0.020569263 82 jmlr-2011-Robust Gaussian Process Regression with a Student-tLikelihood

18 0.020031968 39 jmlr-2011-High-dimensional Covariance Estimation Based On Gaussian Graphical Models

19 0.019703336 84 jmlr-2011-Semi-Supervised Learning with Measure Propagation

20 0.019148607 22 jmlr-2011-Differentially Private Empirical Risk Minimization


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.119), (1, -0.004), (2, -0.03), (3, 0.012), (4, 0.034), (5, 0.044), (6, 0.044), (7, 0.019), (8, 0.082), (9, -0.012), (10, -0.222), (11, 0.117), (12, 0.302), (13, -0.105), (14, 0.324), (15, 0.012), (16, 0.065), (17, 0.113), (18, -0.24), (19, 0.119), (20, 0.153), (21, -0.207), (22, -0.096), (23, -0.124), (24, 0.139), (25, 0.104), (26, 0.1), (27, 0.102), (28, -0.166), (29, -0.086), (30, -0.158), (31, 0.025), (32, -0.164), (33, -0.061), (34, -0.044), (35, -0.077), (36, -0.02), (37, -0.099), (38, 0.048), (39, -0.056), (40, -0.013), (41, -0.012), (42, 0.028), (43, 0.031), (44, 0.055), (45, 0.026), (46, 0.004), (47, -0.037), (48, 0.044), (49, 0.003)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.98940402 41 jmlr-2011-Improved Moves for Truncated Convex Models

Author: M. Pawan Kumar, Olga Veksler, Philip H.S. Torr

Abstract: We consider the problem of obtaining an approximate maximum a posteriori estimate of a discrete random field characterized by pairwise potentials that form a truncated convex model. For this problem, we propose two st-MINCUT based move making algorithms that we call Range Swap and Range Expansion. Our algorithms can be thought of as extensions of αβ-Swap and α-Expansion respectively that fully exploit the form of the pairwise potentials. Specifically, instead of dealing with one or two labels at each iteration, our methods explore a large search space by considering a range of labels (that is, an interval of consecutive labels). Furthermore, we show that Range Expansion provides the same multiplicative bounds as the standard linear programming (LP) relaxation in polynomial time. Compared to previous approaches based on the LP relaxation, for example interior-point algorithms or tree-reweighted message passing (TRW), our methods are faster as they use only the efficient st-MINCUT algorithm in their design. We demonstrate the usefulness of the proposed approaches on both synthetic and standard real data problems. Keywords: truncated convex models, move making algorithms, range moves, multiplicative bounds, linear programming relaxation

2 0.54765451 34 jmlr-2011-Faster Algorithms for Max-Product Message-Passing

Author: Julian J. McAuley, TibĂŠrio S. Caetano

Abstract: Maximum A Posteriori inference in graphical models is often solved via message-passing algorithms, such as the junction-tree algorithm or loopy belief-propagation. The exact solution to this problem is well-known to be exponential in the size of the maximal cliques of the triangulated model, while approximate inference is typically exponential in the size of the model’s factors. In this paper, we take advantage of the fact that many models have maximal cliques that are larger than their constituent factors, and also of the fact that many factors consist only of latent variables (i.e., they do not depend on an observation). This is a common case in a wide variety of applications that deal with grid-, tree-, and ring-structured models. In such cases, we are able to decrease the exponent of complexity for message-passing by 0.5 for both exact and approximate inference. We demonstrate that message-passing operations in such models are equivalent to some variant of matrix multiplication in the tropical semiring, for which we offer an O(N 2.5 ) expected-case solution. Keywords: graphical models, belief-propagation, tropical matrix multiplication

3 0.53441578 28 jmlr-2011-Double Updating Online Learning

Author: Peilin Zhao, Steven C.H. Hoi, Rong Jin

Abstract: In most kernel based online learning algorithms, when an incoming instance is misclassified, it will be added into the pool of support vectors and assigned with a weight, which often remains unchanged during the rest of the learning process. This is clearly insufficient since when a new support vector is added, we generally expect the weights of the other existing support vectors to be updated in order to reflect the influence of the added support vector. In this paper, we propose a new online learning method, termed Double Updating Online Learning, or DUOL for short, that explicitly addresses this problem. Instead of only assigning a fixed weight to the misclassified example received at the current trial, the proposed online learning algorithm also tries to update the weight for one of the existing support vectors. We show that the mistake bound can be improved by the proposed online learning method. We conduct an extensive set of empirical evaluations for both binary and multi-class online learning tasks. The experimental results show that the proposed technique is considerably more effective than the state-of-the-art online learning algorithms. The source code is available to public at http://www.cais.ntu.edu.sg/˜chhoi/DUOL/. Keywords: online learning, kernel method, support vector machines, maximum margin learning, classification

4 0.1942586 12 jmlr-2011-Bayesian Co-Training

Author: Shipeng Yu, Balaji Krishnapuram, Rómer Rosales, R. Bharat Rao

Abstract: Co-training (or more generally, co-regularization) has been a popular algorithm for semi-supervised learning in data with two feature representations (or views), but the fundamental assumptions underlying this type of models are still unclear. In this paper we propose a Bayesian undirected graphical model for co-training, or more generally for semi-supervised multi-view learning. This makes explicit the previously unstated assumptions of a large class of co-training type algorithms, and also clarifies the circumstances under which these assumptions fail. Building upon new insights from this model, we propose an improved method for co-training, which is a novel co-training kernel for Gaussian process classifiers. The resulting approach is convex and avoids local-maxima problems, and it can also automatically estimate how much each view should be trusted to accommodate noisy or unreliable views. The Bayesian co-training approach can also elegantly handle data samples with missing views, that is, some of the views are not available for some data points at learning time. This is further extended to an active sensing framework, in which the missing (sample, view) pairs are actively acquired to improve learning performance. The strength of active sensing model is that one actively sensed (sample, view) pair would improve the joint multi-view classification on all the samples. Experiments on toy data and several real world data sets illustrate the benefits of this approach. Keywords: co-training, multi-view learning, semi-supervised learning, Gaussian processes, undirected graphical models, active sensing

5 0.17722131 15 jmlr-2011-CARP: Software for Fishing Out Good Clustering Algorithms

Author: Volodymyr Melnykov, Ranjan Maitra

Abstract: This paper presents the C LUSTERING A LGORITHMS ’ R EFEREE PACKAGE or CARP, an open source GNU GPL-licensed C package for evaluating clustering algorithms. Calibrating performance of such algorithms is important and CARP addresses this need by generating datasets of different clustering complexity and by assessing the performance of the concerned algorithm in terms of its ability to classify each dataset relative to the true grouping. This paper briefly describes the software and its capabilities. Keywords: CARP, M IX S IM, clustering algorithm, Gaussian mixture, overlap

6 0.17264466 91 jmlr-2011-The Sample Complexity of Dictionary Learning

7 0.13725254 103 jmlr-2011-Weisfeiler-Lehman Graph Kernels

8 0.12317001 58 jmlr-2011-Learning from Partial Labels

9 0.11834177 43 jmlr-2011-Information, Divergence and Risk for Binary Experiments

10 0.11193668 77 jmlr-2011-Posterior Sparsity in Unsupervised Dependency Parsing

11 0.10862324 84 jmlr-2011-Semi-Supervised Learning with Measure Propagation

12 0.10460746 39 jmlr-2011-High-dimensional Covariance Estimation Based On Gaussian Graphical Models

13 0.10205837 18 jmlr-2011-Convergence Rates of Efficient Global Optimization Algorithms

14 0.10123192 62 jmlr-2011-MSVMpack: A Multi-Class Support Vector Machine Package

15 0.093370698 21 jmlr-2011-Cumulative Distribution Networks and the Derivative-sum-product Algorithm: Models and Inference for Cumulative Distribution Functions on Graphs

16 0.089510322 20 jmlr-2011-Convex and Network Flow Optimization for Structured Sparsity

17 0.088412642 10 jmlr-2011-Anechoic Blind Source Separation Using Wigner Marginals

18 0.086808883 16 jmlr-2011-Clustering Algorithms for Chains

19 0.086767375 45 jmlr-2011-Internal Regret with Partial Monitoring: Calibration-Based Optimal Algorithms

20 0.084565006 68 jmlr-2011-Natural Language Processing (Almost) from Scratch


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(4, 0.046), (9, 0.025), (10, 0.079), (11, 0.479), (24, 0.031), (31, 0.058), (32, 0.017), (41, 0.019), (60, 0.012), (73, 0.034), (78, 0.059), (90, 0.027)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.83859289 41 jmlr-2011-Improved Moves for Truncated Convex Models

Author: M. Pawan Kumar, Olga Veksler, Philip H.S. Torr

Abstract: We consider the problem of obtaining an approximate maximum a posteriori estimate of a discrete random field characterized by pairwise potentials that form a truncated convex model. For this problem, we propose two st-MINCUT based move making algorithms that we call Range Swap and Range Expansion. Our algorithms can be thought of as extensions of αβ-Swap and α-Expansion respectively that fully exploit the form of the pairwise potentials. Specifically, instead of dealing with one or two labels at each iteration, our methods explore a large search space by considering a range of labels (that is, an interval of consecutive labels). Furthermore, we show that Range Expansion provides the same multiplicative bounds as the standard linear programming (LP) relaxation in polynomial time. Compared to previous approaches based on the LP relaxation, for example interior-point algorithms or tree-reweighted message passing (TRW), our methods are faster as they use only the efficient st-MINCUT algorithm in their design. We demonstrate the usefulness of the proposed approaches on both synthetic and standard real data problems. Keywords: truncated convex models, move making algorithms, range moves, multiplicative bounds, linear programming relaxation

2 0.69465852 80 jmlr-2011-Regression on Fixed-Rank Positive Semidefinite Matrices: A Riemannian Approach

Author: Gilles Meyer, Silvère Bonnabel, Rodolphe Sepulchre

Abstract: The paper addresses the problem of learning a regression model parameterized by a fixed-rank positive semidefinite matrix. The focus is on the nonlinear nature of the search space and on scalability to high-dimensional problems. The mathematical developments rely on the theory of gradient descent algorithms adapted to the Riemannian geometry that underlies the set of fixedrank positive semidefinite matrices. In contrast with previous contributions in the literature, no restrictions are imposed on the range space of the learned matrix. The resulting algorithms maintain a linear complexity in the problem size and enjoy important invariance properties. We apply the proposed algorithms to the problem of learning a distance function parameterized by a positive semidefinite matrix. Good performance is observed on classical benchmarks. Keywords: linear regression, positive semidefinite matrices, low-rank approximation, Riemannian geometry, gradient-based learning

3 0.26767823 4 jmlr-2011-A Family of Simple Non-Parametric Kernel Learning Algorithms

Author: Jinfeng Zhuang, Ivor W. Tsang, Steven C.H. Hoi

Abstract: Previous studies of Non-Parametric Kernel Learning (NPKL) usually formulate the learning task as a Semi-Definite Programming (SDP) problem that is often solved by some general purpose SDP solvers. However, for N data examples, the time complexity of NPKL using a standard interiorpoint SDP solver could be as high as O(N 6.5 ), which prohibits NPKL methods applicable to real applications, even for data sets of moderate size. In this paper, we present a family of efficient NPKL algorithms, termed “SimpleNPKL”, which can learn non-parametric kernels from a large set of pairwise constraints efficiently. In particular, we propose two efficient SimpleNPKL algorithms. One is SimpleNPKL algorithm with linear loss, which enjoys a closed-form solution that can be efficiently computed by the Lanczos sparse eigen decomposition technique. Another one is SimpleNPKL algorithm with other loss functions (including square hinge loss, hinge loss, square loss) that can be re-formulated as a saddle-point optimization problem, which can be further resolved by a fast iterative algorithm. In contrast to the previous NPKL approaches, our empirical results show that the proposed new technique, maintaining the same accuracy, is significantly more efficient and scalable. Finally, we also demonstrate that the proposed new technique is also applicable to speed up many kernel learning tasks, including colored maximum variance unfolding, minimum volume embedding, and structure preserving embedding. Keywords: non-parametric kernel learning, semi-definite programming, semi-supervised learning, side information, pairwise constraints

4 0.24724248 34 jmlr-2011-Faster Algorithms for Max-Product Message-Passing

Author: Julian J. McAuley, TibĂŠrio S. Caetano

Abstract: Maximum A Posteriori inference in graphical models is often solved via message-passing algorithms, such as the junction-tree algorithm or loopy belief-propagation. The exact solution to this problem is well-known to be exponential in the size of the maximal cliques of the triangulated model, while approximate inference is typically exponential in the size of the model’s factors. In this paper, we take advantage of the fact that many models have maximal cliques that are larger than their constituent factors, and also of the fact that many factors consist only of latent variables (i.e., they do not depend on an observation). This is a common case in a wide variety of applications that deal with grid-, tree-, and ring-structured models. In such cases, we are able to decrease the exponent of complexity for message-passing by 0.5 for both exact and approximate inference. We demonstrate that message-passing operations in such models are equivalent to some variant of matrix multiplication in the tropical semiring, for which we offer an O(N 2.5 ) expected-case solution. Keywords: graphical models, belief-propagation, tropical matrix multiplication

5 0.24327986 20 jmlr-2011-Convex and Network Flow Optimization for Structured Sparsity

Author: Julien Mairal, Rodolphe Jenatton, Guillaume Obozinski, Francis Bach

Abstract: We consider a class of learning problems regularized by a structured sparsity-inducing norm defined as the sum of ℓ2 - or ℓ∞ -norms over groups of variables. Whereas much effort has been put in developing fast optimization techniques when the groups are disjoint or embedded in a hierarchy, we address here the case of general overlapping groups. To this end, we present two different strategies: On the one hand, we show that the proximal operator associated with a sum of ℓ∞ norms can be computed exactly in polynomial time by solving a quadratic min-cost flow problem, allowing the use of accelerated proximal gradient methods. On the other hand, we use proximal splitting techniques, and address an equivalent formulation with non-overlapping groups, but in higher dimension and with additional constraints. We propose efficient and scalable algorithms exploiting these two strategies, which are significantly faster than alternative approaches. We illustrate these methods with several problems such as CUR matrix factorization, multi-task learning of tree-structured dictionaries, background subtraction in video sequences, image denoising with wavelets, and topographic dictionary learning of natural image patches. Keywords: convex optimization, proximal methods, sparse coding, structured sparsity, matrix factorization, network flow optimization, alternating direction method of multipliers

6 0.24075381 7 jmlr-2011-Adaptive Exact Inference in Graphical Models

7 0.23743556 59 jmlr-2011-Learning with Structured Sparsity

8 0.23610184 16 jmlr-2011-Clustering Algorithms for Chains

9 0.23280503 86 jmlr-2011-Sparse Linear Identifiable Multivariate Modeling

10 0.23252568 12 jmlr-2011-Bayesian Co-Training

11 0.22901887 91 jmlr-2011-The Sample Complexity of Dictionary Learning

12 0.22900881 29 jmlr-2011-Efficient Learning with Partially Observed Attributes

13 0.22869165 77 jmlr-2011-Posterior Sparsity in Unsupervised Dependency Parsing

14 0.22839801 104 jmlr-2011-X-Armed Bandits

15 0.22732896 67 jmlr-2011-Multitask Sparsity via Maximum Entropy Discrimination

16 0.22685923 19 jmlr-2011-Convergence of Distributed Asynchronous Learning Vector Quantization Algorithms

17 0.22652338 21 jmlr-2011-Cumulative Distribution Networks and the Derivative-sum-product Algorithm: Models and Inference for Cumulative Distribution Functions on Graphs

18 0.2264618 53 jmlr-2011-Learning High-Dimensional Markov Forest Distributions: Analysis of Error Rates

19 0.22566138 76 jmlr-2011-Parameter Screening and Optimisation for ILP using Designed Experiments

20 0.22532161 64 jmlr-2011-Minimum Description Length Penalization for Group and Multi-Task Sparse Learning