nips nips2012 nips2012-214 knowledge-graph by maker-knowledge-mining

214 nips-2012-Minimizing Sparse High-Order Energies by Submodular Vertex-Cover


Source: pdf

Author: Andrew Delong, Olga Veksler, Anton Osokin, Yuri Boykov

Abstract: Inference in high-order graphical models has become important in recent years. Several approaches are based, for example, on generalized message-passing, or on transformation to a pairwise model with extra ‘auxiliary’ variables. We focus on a special case where a much more efficient transformation is possible. Instead of adding variables, we transform the original problem into a comparatively small instance of submodular vertex-cover. These vertex-cover instances can then be attacked by existing algorithms (e.g. belief propagation, QPBO), where they often run 4–15 times faster and find better solutions than when applied to the original problem. We evaluate our approach on synthetic data, then we show applications within a fast hierarchical clustering and model-fitting framework. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Several approaches are based, for example, on generalized message-passing, or on transformation to a pairwise model with extra ‘auxiliary’ variables. [sent-10, score-0.106]

2 Instead of adding variables, we transform the original problem into a comparatively small instance of submodular vertex-cover. [sent-12, score-0.332]

3 Several algorithms have emerged as practical tools for inference, especially for graphs containing only unary and pairwise factors. [sent-18, score-0.096]

4 Prominent examples include belief propagation [30], more advanced message passing methods like TRW-S [21] or MPLP [33], combinatorial methods like α-expansion [6] (for ‘metric’ factors) and QPBO [32] (mainly for binary problems). [sent-19, score-0.084]

5 In terms of optimization, these algorithms are designed to minimize objective functions (energies) containing unary and pairwise terms. [sent-20, score-0.129]

6 Recent developments in high-order inference include, for example, high-arity CRF potentials [19, 38, 25, 31], cardinality-based potentials [13, 34], global potentials controlling the appearance of labels [24, 26, 7], learning with high-order loss functions [35], among many others. [sent-22, score-0.165]

7 One standard approach to high-order inference is to transform the problem to the pairwise case and then simply apply one of the aforementioned ‘pairwise’ algorithms. [sent-23, score-0.115]

8 There can be several equivalent high-order-to-pairwise transformations, and this choice affects the difficulty of the resulting pairwise inference problem. [sent-26, score-0.086]

9 Our work is about fast energy minimization (MAP inference) for particularly sparse, high-order “pattern potentials” used in [25, 31, 29]: each energy term prefers a specific (but arbitrary) assignment to its subset of variables. [sent-29, score-0.245]

10 Instead of directly transforming the high-order problem to pairwise, we transform the entire problem to a comparatively small instance of submodular vertex-cover ( SVC). [sent-30, score-0.332]

11 1 We also show that our ‘sparse’ high-order energies naturally appear when trying to solve hierarchical clustering problems using the algorithmic approach called fusion moves [27], also conceptually known as optimized crossover [1]. [sent-33, score-0.355]

12 The fusion approach is not standard for the kind of clustering objective we will consider, but we believe it is an interesting optimization strategy. [sent-35, score-0.173]

13 Section 2 introduces the class of high-order energies we consider, then derives the transformation to SVC and the subsequent decoding. [sent-37, score-0.133]

14 2 Sparse High-Order Energies Reducible to SVC ∏ In what follows we use x to denote a vector of binary variables, xP to denote product i∈P xi , and ∏ xQ to denote i∈Q xi . [sent-39, score-0.095]

15 It is well-known that any pseudo-boolean function (binary energy) can be written in the form ∑ ∑ F (x) = ai xi − bj xPj xQj i∈I (1) j∈V where each clique j has coefficient −bj with bj ≥ 0, and is defined over variables in sets Pj , Qj ⊆ I. [sent-42, score-0.432]

16 , x7 ) then a clique j with Pj = {2, 3} and Qj = {4, 5, 6} will explicitly reward binary configuration (· , 1, 1, 0, 0, 0, ·) by the amount bj (depicted as b1 in Figure 1). [sent-47, score-0.197]

17 A standard way to minimize F (x) would be to substitute each −bj xPj xQj term with a collection of equivalent pairwise terms. [sent-49, score-0.089]

18 However, we aim to minimize F (x) in a novel way, so first we review the submodular vertex-cover problem. [sent-52, score-0.274]

19 1 Review of Submodular Vertex-Cover The classic minimum-weighted vertex-cover (VC) problem can be stated as a 0-1 integer program where variable uj = 1 if and only if vertex j is included in the cover. [sent-54, score-0.245]

20 ∑ (2) ( VC) minimize j∈V wj uj subject to uj + uj′ ≥ 1 ∀{j, j ′ } ∈ E uj ∈ {0, 1}. [sent-55, score-0.768]

21 If the graph (V, E) is bipartite, then we call the specialized problem VC - B and it can be solved very efficiently by specialized bipartite maximum flow algorithms such as [2]. [sent-57, score-0.089]

22 A function f (x) is called submodular if f (x ∧ y) + f (x ∨ y) ≤ f (x) + f (y) for all x, y ∈ {0, 1}V where (x ∧ y)j = xj yj and (x ∨ y)j = 1 − xj y j . [sent-58, score-0.241]

23 A submodular function can be minimized in strongly polynomial time by combinatorial methods [17], but becomes NP-hard when subject to arbitrary covering constraints like (3). [sent-59, score-0.402]

24 The submodular vertex-cover ( SVC) problem generalizes VC by replacing the linear (modular) objective (2) with an arbitrary submodular objective, (SVC ) minimize f (u) subject to uj + uj′ ≥ 1 ∀{j, j ′ } ∈ E uj ∈ {0, 1}. [sent-60, score-1.005]

25 It turns out that a halfintegral relaxation uj ∈ {0, 1 , 1} (call this problem SVC - H), followed by upward rounding, gives 2 2 a 2-approximation much like for standard VC. [sent-62, score-0.245]

26 They also show how to transform any SVC - H instance into a bipartite instance of SVC (see below); this extends a classic result by Nemhauser & Trotter [28], allowing specialized combinatorial algorithms like [17] to solve the relaxation. [sent-63, score-0.181]

27 Suppose we have an SVC - B instance (J , K, E, f, g) where we can write submodular f and g as ∑ ∑ f (u) = wS uS , and g(v) = wS vS . [sent-68, score-0.274]

28 (6) S∈S 0 S∈S 1 Here S 0∏ S 1 are collections of subsets of J and K respectively, and typescript uS denotes and product j∈S uj throughout (as distinct from typescript u, which denotes a vector). [sent-69, score-0.343]

29 We can define an equivalent problem over variables uj and zk = v k . [sent-73, score-0.305]

30 With this substitution, the covering constraints become uj ≥ zk . [sent-74, score-0.436]

31 Since “g(v) submodular in v” implies “g(1−v) submodular in v,” letting g (z) = g(z) = g(v) means g (z) is submodular as a function of z. [sent-75, score-0.723]

32 Minimizing ¯ ¯ f (u)+ g (z) subject to uj ≥ zk is equivalent to our original problem. [sent-76, score-0.305]

33 Since uj ≥ zk can be enforced ¯ by large (submodular) penalty on assignment uj zk , SVC - B is equivalent to ∑ η uj zk where η = ∞. [sent-77, score-0.942]

34 minimize f (u) + g (z) + ¯ (7) (j,k)∈E When f and g take the form (6), we have g (z) = ¯ ∑ S∈S 1 wS zS where zS denotes product ∏ k∈S zk . [sent-78, score-0.093]

35 ∑7 Figure 1: Left: factor graph F (x) = i=1 ai xi −b1 x2 x3 x4 x5 x6 −b2 x1 x2 x3 x4 x5 −b3 x3 x4 x5 x6 x7 . [sent-87, score-0.188]

36 A small white square indicates ai > 0, a black square ai < 0. [sent-88, score-0.242]

37 A hollow edge connecting xi to factor j indicates i ∈ Pj , and a filled-in edge indicates i ∈ Qj . [sent-89, score-0.098]

38 Introduce auxiliary binary variables u ∈ {0, 1}V where uj = xPj xQj . [sent-98, score-0.272]

39 Because each bj ≥ 0, minimizing F (x) is equivalent to the 0-1 integer program with non-linear constraints minimize F (x, u) subject to uj ≤ xPj xQj ∀j ∈ V. [sent-99, score-0.45]

40 (8) Inequality (8) is sufficient if bj ≥ 0 because, for any fixed x, equality uj = xPj xQj holds for some u that minimizes F (x, u). [sent-100, score-0.352]

41 As a consequence of (8) we have uj = 0 ⇒ xPj = 1, xQj = 0. [sent-102, score-0.245]

42 In other words, u can be feasible if and only if, for each i, ∃ uj = 0, j ∈ Ji =⇒ uk = 1 ∀j ∈ Ki (9) ∃ uk = 0, k ∈ Ki =⇒ uj = 1 ∀j ∈ Ji . [sent-107, score-0.49]

43 (⇒) If uj ≤ xPj xQj for all j ∈ V, then having uJi + uKi ≥ 1 is necessary: if both uJi = 0 and uKi = 0 for any i it would mean there exists j ∈ Ji and k ∈ Ki for which xPj = 1 and xQk = 0, contradicting any unique assignment to xi . [sent-108, score-0.306]

44 (⇐) If uJi + uKi ≥ 1 for all i ∈ I, then we can always choose some x ∈ {0, 1}I for which every uj ≤ xPj xQj . [sent-109, score-0.245]

45 It will be convenient to choose a minimum cost assignment for each xi , subject to the constraints uJi = 0 ⇒ xi = 1 and uKi = 0 ⇒ xi = 0. [sent-110, score-0.157]

46 The assignment x(u) is feasible with respect to (8) because for any uj = 1 we have x(u)Pj = 1 and x(u)Qj = 0. [sent-112, score-0.272]

47 To express minimization of F solely in terms of u, first write (10) in equivalent form { uKi if ai < 0 (11) x(u)i = 1 − uJi otherwise. [sent-114, score-0.151]

48 Use (11) to write new SVC objective f (u) = F (x(u), u), which becomes ∑ ∑ ∑ ai (1 − uJi ) + ai uKi − bj (1 − uj ) f (u) = i : ai >0 = ∑ i : ai <0 −ai uJi + i : ai >0 ∑ ai uKi + i : ai <0 ∑ j∈V bj uj + const. [sent-116, score-1.551]

49 We define set S = {S ⊆ V | (∃Ji = S) ∨ (∃Ki = S)} and write ∑ f (u) = wS uS + const (13) S∈S where wS = ∑ −ai + i : ai >0, Ji =S ∑ ( ai ) + bj if S = {j} . [sent-118, score-0.349]

50 (14) i : ai <0, Ki =S Since the high-order terms uS in (13) have non-positive coefficients wS ≤ 0, then f (u) is submodular [5]. [sent-119, score-0.362]

51 Finally, to ensure (9) holds we add a covering constraint uj + uk ≥ 1 whenever there exists i such that j ∈ Ji , k ∈ Ki . [sent-122, score-0.348]

52 For this SVC instance, an optimal covering u minimizes F (x(u), u). [sent-123, score-0.103]

53 ) (decode the covering as in (10)) One reviewer suggested an extension that scales better with the number of overlapping cliques. [sent-126, score-0.103]

54 Specifically, let y ∈ {0, 1}S and use ∑ submodular objective f (y) = S∈S wS yS + j∈S (bj + 1)yS y{j} , where the inner sum ensures ∏ yS = j∈S y{j} at a local minimum because w{j} ≤ bj . [sent-128, score-0.348]

55 For each unique pair {Ji , Ki }, add a covering constraint yJi + yKi ≥ 1 (instead of O(|Ji |·|Ki |) constraints). [sent-129, score-0.103]

56 An optimal covering y ∗ of S ∗ then gives an optimal covering of V by assigning uj = y{j} . [sent-130, score-0.451]

57 If {Pj }j∈V are disjoint and, separately, {Qj }j∈V are disjoint (equivalently each |Ji |, |Ki | ≤ 1), then the SVC instance in Theorem 1 reduces to standard VC . [sent-135, score-0.093]

58 The objective then becomes ∑ f (u) = j∈V w{j} uj + const, a form of standard VC . [sent-138, score-0.245]

59 If sets {Pj }j∈V and {Qj }j∈V satisfy the conditions of propositions 2 and 3, then minimizing F (x) reduces to an instance of VC - B and can be solved by bipartite maximum flow. [sent-149, score-0.126]

60 Even if F (x) ≥ 0 for all x, it does not imply f (u) ≥ 0 for configurations of u that violate the covering constraints, as would be required. [sent-151, score-0.103]

61 For example, when P n -Potts potentials [19] are incorporated into α-expansion, the resulting expansion step contains high-order terms that are compact in this form; in the absence of pairwise CRF terms, Proposition 3 would apply. [sent-155, score-0.101]

62 The α-expansion algorithm has also been extended to optimize the facility location objective [7] commonly used for clustering (e. [sent-156, score-0.189]

63 For a fixed λ, the final energy of each algorithm was normalized between 0. [sent-166, score-0.094]

64 0 (baseline ICM energy); the true energy gap between lower bound and baseline is indicated at top, e. [sent-168, score-0.094]

65 also take the form (1) (in fact, Corollary 1 applies here); with no need to build the ‘full’ high-order graph, this would allow α-expansion to work as a fast alternative to the classic greedy algorithm for facility location, very similar to the fusion-based algorithm in [4]. [sent-171, score-0.12]

66 2 we show that our generalized transformation allows for a novel way to optimize a hierarchical facility location objective. [sent-173, score-0.241]

67 1 compares a number of methods on synthetic instances of energy (1). [sent-176, score-0.094]

68 1 Results on Synthetic Instances Each instance is a function F (x) where x represents a 100 × 100 grid of binary variables with random unary coefficients ai ∈ [−10, 10]. [sent-178, score-0.221]

69 Each instance also has |J | = 50 high-order cliques with bj ∈ [250λ, 500λ] (we will vary λ), where variable sets Pj and Qj each cover a random nj × nj and mj × mj region respectively (here the region size nj , mj ∈ {10, . [sent-179, score-0.171]

70 To transform high-order potentials to quadratic, we report results using Type-II binary reduction [31] because for TRW-S/MPLP it dominated the Type-I reduction in our experiments, and for BP and the others it made no difference. [sent-189, score-0.101]

71 This runs counter to the conventional used of “number of supermodular terms” as an estimate of difficulty: the Type-I reduction would generate one supermodular edge per ∑ high-order term, whereas Type-II generates |Pj | supermodular edges for each term ( i∈Pj xi y). [sent-190, score-0.186]

72 Still, SVC-QBPOI is 20 times faster than QPBOI while giving similar energies on average. [sent-206, score-0.083]

73 A classic operations research problem with the same fundamental components is facility location: the clients (data points) must be assigned to a nearby facility (cluster) but each facility costs money to open. [sent-219, score-0.36]

74 For hard optimization problems there is a particular algorithmic approach called fusion [27] or optimized crossover [1]. [sent-221, score-0.192]

75 If l0 is the first candidate labeling, and l1 is the second candidate x labeling, a fusion operation seeks a binary string x∗ such that the crossover labeling l(x) = (li i )i∈I ∗ minimizes E(l(x)). [sent-226, score-0.386]

76 In [4] we derived a fusion operation based on the greedy formulation of facility location, and found that the subproblem reduced to minimum-weighted vertex-cover. [sent-228, score-0.254]

77 We will now show that the fusion operation for hierarchical facility location objectives requires minimizing an energy of the form (1), which we have already shown can be transformed to a submodular vertex-cover problem. [sent-229, score-0.697]

78 [12] recently proposed a message-passing scheme for hierarchical facility location, with experiments on synthetic and HIV strain data. [sent-231, score-0.161]

79 We focus on more a computer vision-centric application: detecting a hierarchy of lines and vanishing points in images using the geometric image parsing objective proposed by Tretyak et al. [sent-232, score-0.179]

80 The hierarchical energy proposed by [36] contains five ‘layers’: edges, line segments, lines, vanishing points, and horizon. [sent-234, score-0.304]

81 Each layer provides evidence for subsequent (higher) layers, and at each level their is a complexity cost that regulates how much evidence is needed to detect a line, to detect a vanishing point, etc. [sent-235, score-0.139]

82 For simplicity we only model edges, lines, and vanishing points, but our fusion-based framework easily extends to the full model. [sent-236, score-0.139]

83 Let L be a set of candidate lines, and let V be a set of candidate vanishing points. [sent-239, score-0.267]

84 These sets are built by randomly sampling: one oriented edge to generate each candidate line, and pairs of lines to generate each candidate vanishing point. [sent-240, score-0.339]

85 Each line j ∈ L is associated with one vanishing point kj ∈ V. [sent-241, score-0.169]

86 (If a line passes close to multiple vanishing points, a copy of the line is made for each. [sent-242, score-0.199]

87 ) We seek a labeling l where li ∈ L ∪ ⊘ identifies the line (and vanishing point) that edge i belongs to, or assigns outlier label ⊘. [sent-243, score-0.24]

88 Let Di (j) = distj (xi , yi ) + distj (ψi ) denote the spatial distance and angular deviation of edge y i to line j, and let the outlier cost be Di (⊘) = const. [sent-244, score-0.16]

89 Similarly, let Dj = distj (kj ) be the distance of line j and its associated vanishing point projected onto the Gaussian sphere (see [36]). [sent-245, score-0.218]

90 Finally let Cl and Cv denote positive constants that penalize the detection of a line and a vanishing point respectively. [sent-246, score-0.169]

91 The hierarchical energy we minimize is ∑ ∑ ∑ Cv ·[∃kli = k]. [sent-247, score-0.168]

92 (15) E(l) = Di (li ) + (Cl + Dj )·[∃li = j] + i∈I j∈L k∈V This energy penalizes the number of unique lines, and the number of unique vanishing points that labeling l depends on. [sent-248, score-0.272]

93 Given two candidate labelings l0 , l1 , writing the fusion energy for (15) gives ∑ ∑ ∑ 0 1 0 E(l(x)) = Di + (Di − Di )xi + (Cl + Dj )·(1−xPj xQj ) + Cv ·(1−xPk xQk ) (16) i∈I j∈L k∈V 0 1 where Pj = { i | = j }, Qj = { i | = j }, and Pk = { i | kli = k }, Qk = { i | kli = k }. [sent-249, score-0.453]

94 0 li 1 li For each image we used 10,000 edges, generated 8,000 candidate lines and 150 candidate vanishing points. [sent-251, score-0.307]

95 We then generated 4 candidate labelings, each by allowing vanishing points to be detected in randomized order, and their associated lines to be detected in greedy order, and then we fused the labelings together by minimizing (16). [sent-252, score-0.343]

96 As argued in [27], fusion is a robust approach because it combines the strengths—quite literally—of all methods used to generate candidates. [sent-257, score-0.134]

97 Acknowledgements We thank Danny Tarlow for helpful discussion regarding MPLP, and an anonymous reviewer for suggesting a more efficient way to enforce covering constraints(! [sent-262, score-0.103]

98 (2001) A combinatorial, strongly polynomial-time algorithm for minimizing submodular functions. [sent-364, score-0.278]

99 (2009) A simple combinatorial algorithm for submodular function minimization. [sent-369, score-0.271]

100 (2009) Minimizing sparse higher order energy functions of discrete variables. [sent-440, score-0.094]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('svc', 0.535), ('uj', 0.245), ('submodular', 0.241), ('uki', 0.213), ('xpj', 0.18), ('pj', 0.179), ('uji', 0.173), ('qj', 0.165), ('xqj', 0.164), ('vanishing', 0.139), ('fusion', 0.134), ('ji', 0.132), ('ki', 0.128), ('ws', 0.127), ('ai', 0.121), ('facility', 0.12), ('mplp', 0.116), ('qpbo', 0.116), ('bj', 0.107), ('covering', 0.103), ('iwata', 0.098), ('energy', 0.094), ('icm', 0.087), ('bp', 0.084), ('energies', 0.083), ('inimize', 0.082), ('candidate', 0.064), ('clique', 0.063), ('labelings', 0.063), ('vc', 0.061), ('zk', 0.06), ('boykov', 0.06), ('crossover', 0.058), ('delong', 0.058), ('osokin', 0.058), ('bipartite', 0.056), ('pairwise', 0.056), ('orlin', 0.053), ('kolmogorov', 0.053), ('transformation', 0.05), ('distj', 0.049), ('kli', 0.049), ('qpboi', 0.049), ('typescript', 0.049), ('veksler', 0.048), ('potentials', 0.045), ('rother', 0.044), ('kohli', 0.044), ('hierarchical', 0.041), ('lempitsky', 0.04), ('lb', 0.04), ('supermodular', 0.04), ('unary', 0.04), ('lines', 0.04), ('labeling', 0.039), ('clustering', 0.039), ('western', 0.038), ('minimizing', 0.037), ('pattern', 0.036), ('givoni', 0.036), ('proposition', 0.035), ('xi', 0.034), ('graph', 0.033), ('instance', 0.033), ('gallagher', 0.033), ('nagano', 0.033), ('tretyak', 0.033), ('trotter', 0.033), ('xqk', 0.033), ('minimize', 0.033), ('edge', 0.032), ('di', 0.032), ('tarlow', 0.032), ('cl', 0.032), ('cliques', 0.031), ('ys', 0.031), ('inference', 0.03), ('construction', 0.03), ('minimization', 0.03), ('disjoint', 0.03), ('vision', 0.03), ('line', 0.03), ('location', 0.03), ('combinatorial', 0.03), ('transform', 0.029), ('cients', 0.029), ('torr', 0.029), ('comparatively', 0.029), ('komodakis', 0.029), ('tightening', 0.029), ('roof', 0.029), ('reducible', 0.029), ('zs', 0.029), ('dj', 0.028), ('coef', 0.028), ('constraints', 0.028), ('assignment', 0.027), ('binary', 0.027), ('message', 0.027), ('stitching', 0.027)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000005 214 nips-2012-Minimizing Sparse High-Order Energies by Submodular Vertex-Cover

Author: Andrew Delong, Olga Veksler, Anton Osokin, Yuri Boykov

Abstract: Inference in high-order graphical models has become important in recent years. Several approaches are based, for example, on generalized message-passing, or on transformation to a pairwise model with extra ‘auxiliary’ variables. We focus on a special case where a much more efficient transformation is possible. Instead of adding variables, we transform the original problem into a comparatively small instance of submodular vertex-cover. These vertex-cover instances can then be attacked by existing algorithms (e.g. belief propagation, QPBO), where they often run 4–15 times faster and find better solutions than when applied to the original problem. We evaluate our approach on synthetic data, then we show applications within a fast hierarchical clustering and model-fitting framework. 1

2 0.14154455 328 nips-2012-Submodular-Bregman and the Lovász-Bregman Divergences with Applications

Author: Rishabh Iyer, Jeff A. Bilmes

Abstract: We introduce a class of discrete divergences on sets (equivalently binary vectors) that we call the submodular-Bregman divergences. We consider two kinds, defined either from tight modular upper or tight modular lower bounds of a submodular function. We show that the properties of these divergences are analogous to the (standard continuous) Bregman divergence. We demonstrate how they generalize many useful divergences, including the weighted Hamming distance, squared weighted Hamming, weighted precision, recall, conditional mutual information, and a generalized KL-divergence on sets. We also show that the generalized Bregman divergence on the Lov´ sz extension of a submodular function, which we a call the Lov´ sz-Bregman divergence, is a continuous extension of a submodular a Bregman divergence. We point out a number of applications, and in particular show that a proximal algorithm defined through the submodular Bregman divergence provides a framework for many mirror-descent style algorithms related to submodular function optimization. We also show that a generalization of the k-means algorithm using the Lov´ sz Bregman divergence is natural in clustering scenarios where a ordering is important. A unique property of this algorithm is that computing the mean ordering is extremely efficient unlike other order based distance measures. 1

3 0.14112845 6 nips-2012-A Convex Formulation for Learning Scale-Free Networks via Submodular Relaxation

Author: Aaron Defazio, Tibério S. Caetano

Abstract: A key problem in statistics and machine learning is the determination of network structure from data. We consider the case where the structure of the graph to be reconstructed is known to be scale-free. We show that in such cases it is natural to formulate structured sparsity inducing priors using submodular functions, and we use their Lov´ sz extension to obtain a convex relaxation. For tractable classes a such as Gaussian graphical models, this leads to a convex optimization problem that can be efficiently solved. We show that our method results in an improvement in the accuracy of reconstructed networks for synthetic data. We also show how our prior encourages scale-free reconstructions on a bioinfomatics dataset.

4 0.13762456 359 nips-2012-Variational Inference for Crowdsourcing

Author: Qiang Liu, Jian Peng, Alex Ihler

Abstract: Crowdsourcing has become a popular paradigm for labeling large datasets. However, it has given rise to the computational task of aggregating the crowdsourced labels provided by a collection of unreliable annotators. We approach this problem by transforming it into a standard inference problem in graphical models, and applying approximate variational methods, including belief propagation (BP) and mean field (MF). We show that our BP algorithm generalizes both majority voting and a recent algorithm by Karger et al. [1], while our MF method is closely related to a commonly used EM algorithm. In both cases, we find that the performance of the algorithms critically depends on the choice of a prior distribution on the workers’ reliability; by choosing the prior properly, both BP and MF (and EM) perform surprisingly well on both simulated and real-world datasets, competitive with state-of-the-art algorithms based on more complicated modeling assumptions. 1

5 0.1167206 236 nips-2012-Near-Optimal MAP Inference for Determinantal Point Processes

Author: Jennifer Gillenwater, Alex Kulesza, Ben Taskar

Abstract: Determinantal point processes (DPPs) have recently been proposed as computationally efficient probabilistic models of diverse sets for a variety of applications, including document summarization, image search, and pose estimation. Many DPP inference operations, including normalization and sampling, are tractable; however, finding the most likely configuration (MAP), which is often required in practice for decoding, is NP-hard, so we must resort to approximate inference. This optimization problem, which also arises in experimental design and sensor placement, involves finding the largest principal minor of a positive semidefinite matrix. Because the objective is log-submodular, greedy algorithms have been used in the past with some empirical success; however, these methods only give approximation guarantees in the special case of monotone objectives, which correspond to a restricted class of DPPs. In this paper we propose a new algorithm for approximating the MAP problem based on continuous techniques for submodular function maximization. Our method involves a novel continuous relaxation of the log-probability function, which, in contrast to the multilinear extension used for general submodular functions, can be evaluated and differentiated exactly and efficiently. We obtain a practical algorithm with a 1/4-approximation guarantee for a more general class of non-monotone DPPs; our algorithm also extends to MAP inference under complex polytope constraints, making it possible to combine DPPs with Markov random fields, weighted matchings, and other models. We demonstrate that our approach outperforms standard and recent methods on both synthetic and real-world data. 1

6 0.10041688 304 nips-2012-Selecting Diverse Features via Spectral Regularization

7 0.080702126 186 nips-2012-Learning as MAP Inference in Discrete Graphical Models

8 0.068718016 47 nips-2012-Augment-and-Conquer Negative Binomial Processes

9 0.067081168 96 nips-2012-Density Propagation and Improved Bounds on the Partition Function

10 0.064425461 287 nips-2012-Random function priors for exchangeable arrays with applications to graphs and relational data

11 0.061107837 69 nips-2012-Clustering Sparse Graphs

12 0.06006825 337 nips-2012-The Lovász ϑ function, SVMs and finding large dense subgraphs

13 0.058572765 247 nips-2012-Nonparametric Reduced Rank Regression

14 0.05759665 278 nips-2012-Probabilistic n-Choose-k Models for Classification and Ranking

15 0.057177115 111 nips-2012-Efficient Sampling for Bipartite Matching Problems

16 0.056853622 204 nips-2012-MAP Inference in Chains using Column Generation

17 0.056789082 84 nips-2012-Convergence Rate Analysis of MAP Coordinate Minimization Algorithms

18 0.053847238 85 nips-2012-Convergence and Energy Landscape for Cheeger Cut Clustering

19 0.052412234 335 nips-2012-The Bethe Partition Function of Log-supermodular Graphical Models

20 0.052403327 7 nips-2012-A Divide-and-Conquer Method for Sparse Inverse Covariance Estimation


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.148), (1, 0.037), (2, 0.022), (3, -0.07), (4, -0.012), (5, -0.0), (6, -0.017), (7, -0.067), (8, -0.048), (9, 0.056), (10, -0.004), (11, -0.031), (12, 0.03), (13, 0.05), (14, 0.102), (15, -0.023), (16, -0.078), (17, -0.049), (18, -0.051), (19, -0.001), (20, 0.122), (21, -0.019), (22, 0.039), (23, 0.064), (24, 0.156), (25, 0.152), (26, -0.06), (27, -0.038), (28, -0.026), (29, -0.042), (30, -0.046), (31, 0.096), (32, -0.011), (33, 0.046), (34, -0.029), (35, 0.04), (36, -0.038), (37, -0.043), (38, -0.116), (39, 0.069), (40, -0.051), (41, -0.004), (42, -0.042), (43, -0.023), (44, 0.069), (45, -0.032), (46, -0.088), (47, 0.02), (48, 0.033), (49, 0.066)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.90715963 214 nips-2012-Minimizing Sparse High-Order Energies by Submodular Vertex-Cover

Author: Andrew Delong, Olga Veksler, Anton Osokin, Yuri Boykov

Abstract: Inference in high-order graphical models has become important in recent years. Several approaches are based, for example, on generalized message-passing, or on transformation to a pairwise model with extra ‘auxiliary’ variables. We focus on a special case where a much more efficient transformation is possible. Instead of adding variables, we transform the original problem into a comparatively small instance of submodular vertex-cover. These vertex-cover instances can then be attacked by existing algorithms (e.g. belief propagation, QPBO), where they often run 4–15 times faster and find better solutions than when applied to the original problem. We evaluate our approach on synthetic data, then we show applications within a fast hierarchical clustering and model-fitting framework. 1

2 0.77992618 328 nips-2012-Submodular-Bregman and the Lovász-Bregman Divergences with Applications

Author: Rishabh Iyer, Jeff A. Bilmes

Abstract: We introduce a class of discrete divergences on sets (equivalently binary vectors) that we call the submodular-Bregman divergences. We consider two kinds, defined either from tight modular upper or tight modular lower bounds of a submodular function. We show that the properties of these divergences are analogous to the (standard continuous) Bregman divergence. We demonstrate how they generalize many useful divergences, including the weighted Hamming distance, squared weighted Hamming, weighted precision, recall, conditional mutual information, and a generalized KL-divergence on sets. We also show that the generalized Bregman divergence on the Lov´ sz extension of a submodular function, which we a call the Lov´ sz-Bregman divergence, is a continuous extension of a submodular a Bregman divergence. We point out a number of applications, and in particular show that a proximal algorithm defined through the submodular Bregman divergence provides a framework for many mirror-descent style algorithms related to submodular function optimization. We also show that a generalization of the k-means algorithm using the Lov´ sz Bregman divergence is natural in clustering scenarios where a ordering is important. A unique property of this algorithm is that computing the mean ordering is extremely efficient unlike other order based distance measures. 1

3 0.6922394 236 nips-2012-Near-Optimal MAP Inference for Determinantal Point Processes

Author: Jennifer Gillenwater, Alex Kulesza, Ben Taskar

Abstract: Determinantal point processes (DPPs) have recently been proposed as computationally efficient probabilistic models of diverse sets for a variety of applications, including document summarization, image search, and pose estimation. Many DPP inference operations, including normalization and sampling, are tractable; however, finding the most likely configuration (MAP), which is often required in practice for decoding, is NP-hard, so we must resort to approximate inference. This optimization problem, which also arises in experimental design and sensor placement, involves finding the largest principal minor of a positive semidefinite matrix. Because the objective is log-submodular, greedy algorithms have been used in the past with some empirical success; however, these methods only give approximation guarantees in the special case of monotone objectives, which correspond to a restricted class of DPPs. In this paper we propose a new algorithm for approximating the MAP problem based on continuous techniques for submodular function maximization. Our method involves a novel continuous relaxation of the log-probability function, which, in contrast to the multilinear extension used for general submodular functions, can be evaluated and differentiated exactly and efficiently. We obtain a practical algorithm with a 1/4-approximation guarantee for a more general class of non-monotone DPPs; our algorithm also extends to MAP inference under complex polytope constraints, making it possible to combine DPPs with Markov random fields, weighted matchings, and other models. We demonstrate that our approach outperforms standard and recent methods on both synthetic and real-world data. 1

4 0.61889029 304 nips-2012-Selecting Diverse Features via Spectral Regularization

Author: Abhimanyu Das, Anirban Dasgupta, Ravi Kumar

Abstract: We study the problem of diverse feature selection in linear regression: selecting a small subset of diverse features that can predict a given objective. Diversity is useful for several reasons such as interpretability, robustness to noise, etc. We propose several spectral regularizers that capture a notion of diversity of features and show that these are all submodular set functions. These regularizers, when added to the objective function for linear regression, result in approximately submodular functions, which can then be maximized by efficient greedy and local search algorithms, with provable guarantees. We compare our algorithms to traditional greedy and 1 -regularization schemes and show that we obtain a more diverse set of features that result in the regression problem being stable under perturbations. 1

5 0.5820936 359 nips-2012-Variational Inference for Crowdsourcing

Author: Qiang Liu, Jian Peng, Alex Ihler

Abstract: Crowdsourcing has become a popular paradigm for labeling large datasets. However, it has given rise to the computational task of aggregating the crowdsourced labels provided by a collection of unreliable annotators. We approach this problem by transforming it into a standard inference problem in graphical models, and applying approximate variational methods, including belief propagation (BP) and mean field (MF). We show that our BP algorithm generalizes both majority voting and a recent algorithm by Karger et al. [1], while our MF method is closely related to a commonly used EM algorithm. In both cases, we find that the performance of the algorithms critically depends on the choice of a prior distribution on the workers’ reliability; by choosing the prior properly, both BP and MF (and EM) perform surprisingly well on both simulated and real-world datasets, competitive with state-of-the-art algorithms based on more complicated modeling assumptions. 1

6 0.54679656 6 nips-2012-A Convex Formulation for Learning Scale-Free Networks via Submodular Relaxation

7 0.44738683 96 nips-2012-Density Propagation and Improved Bounds on the Partition Function

8 0.43362847 337 nips-2012-The Lovász ϑ function, SVMs and finding large dense subgraphs

9 0.40819699 186 nips-2012-Learning as MAP Inference in Discrete Graphical Models

10 0.36319897 248 nips-2012-Nonparanormal Belief Propagation (NPNBP)

11 0.36187658 115 nips-2012-Efficient high dimensional maximum entropy modeling via symmetric partition functions

12 0.36077711 85 nips-2012-Convergence and Energy Landscape for Cheeger Cut Clustering

13 0.35112315 335 nips-2012-The Bethe Partition Function of Log-supermodular Graphical Models

14 0.3464703 133 nips-2012-Finding Exemplars from Pairwise Dissimilarities via Simultaneous Sparse Recovery

15 0.34210107 206 nips-2012-Majorization for CRFs and Latent Likelihoods

16 0.33671352 111 nips-2012-Efficient Sampling for Bipartite Matching Problems

17 0.33431476 82 nips-2012-Continuous Relaxations for Discrete Hamiltonian Monte Carlo

18 0.33370608 230 nips-2012-Multiple Choice Learning: Learning to Produce Multiple Structured Outputs

19 0.33290869 267 nips-2012-Perceptron Learning of SAT

20 0.32943848 316 nips-2012-Small-Variance Asymptotics for Exponential Family Dirichlet Process Mixture Models


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.042), (21, 0.041), (23, 0.302), (38, 0.148), (42, 0.019), (53, 0.049), (54, 0.028), (55, 0.014), (74, 0.055), (76, 0.11), (80, 0.067), (92, 0.043)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.77249372 132 nips-2012-Fiedler Random Fields: A Large-Scale Spectral Approach to Statistical Network Modeling

Author: Antonino Freno, Mikaela Keller, Marc Tommasi

Abstract: Statistical models for networks have been typically committed to strong prior assumptions concerning the form of the modeled distributions. Moreover, the vast majority of currently available models are explicitly designed for capturing some specific graph properties (such as power-law degree distributions), which makes them unsuitable for application to domains where the behavior of the target quantities is not known a priori. The key contribution of this paper is twofold. First, we introduce the Fiedler delta statistic, based on the Laplacian spectrum of graphs, which allows to dispense with any parametric assumption concerning the modeled network properties. Second, we use the defined statistic to develop the Fiedler random field model, which allows for efficient estimation of edge distributions over large-scale random networks. After analyzing the dependence structure involved in Fiedler random fields, we estimate them over several real-world networks, showing that they achieve a much higher modeling accuracy than other well-known statistical approaches.

same-paper 2 0.76989323 214 nips-2012-Minimizing Sparse High-Order Energies by Submodular Vertex-Cover

Author: Andrew Delong, Olga Veksler, Anton Osokin, Yuri Boykov

Abstract: Inference in high-order graphical models has become important in recent years. Several approaches are based, for example, on generalized message-passing, or on transformation to a pairwise model with extra ‘auxiliary’ variables. We focus on a special case where a much more efficient transformation is possible. Instead of adding variables, we transform the original problem into a comparatively small instance of submodular vertex-cover. These vertex-cover instances can then be attacked by existing algorithms (e.g. belief propagation, QPBO), where they often run 4–15 times faster and find better solutions than when applied to the original problem. We evaluate our approach on synthetic data, then we show applications within a fast hierarchical clustering and model-fitting framework. 1

3 0.66347343 41 nips-2012-Ancestor Sampling for Particle Gibbs

Author: Fredrik Lindsten, Thomas Schön, Michael I. Jordan

Abstract: We present a novel method in the family of particle MCMC methods that we refer to as particle Gibbs with ancestor sampling (PG-AS). Similarly to the existing PG with backward simulation (PG-BS) procedure, we use backward sampling to (considerably) improve the mixing of the PG kernel. Instead of using separate forward and backward sweeps as in PG-BS, however, we achieve the same effect in a single forward sweep. We apply the PG-AS framework to the challenging class of non-Markovian state-space models. We develop a truncation strategy of these models that is applicable in principle to any backward-simulation-based method, but which is particularly well suited to the PG-AS framework. In particular, as we show in a simulation study, PG-AS can yield an order-of-magnitude improved accuracy relative to PG-BS due to its robustness to the truncation error. Several application examples are discussed, including Rao-Blackwellized particle smoothing and inference in degenerate state-space models. 1

4 0.58732218 324 nips-2012-Stochastic Gradient Descent with Only One Projection

Author: Mehrdad Mahdavi, Tianbao Yang, Rong Jin, Shenghuo Zhu, Jinfeng Yi

Abstract: Although many variants of stochastic gradient descent have been proposed for large-scale convex optimization, most of them require projecting the solution at each iteration to ensure that the obtained solution stays within the feasible domain. For complex domains (e.g., positive semidefinite cone), the projection step can be computationally expensive, making stochastic gradient descent unattractive for large-scale optimization problems. We address this limitation by developing novel stochastic optimization algorithms that do not need intermediate projections. Instead, only one projection at the last iteration is needed to obtain a feasible solution in the given domain. Our theoretical analysis shows that with a high probability, √ the proposed algorithms achieve an O(1/ T ) convergence rate for general convex optimization, and an O(ln T /T ) rate for strongly convex optimization under mild conditions about the domain and the objective function. 1

5 0.57294703 313 nips-2012-Sketch-Based Linear Value Function Approximation

Author: Marc Bellemare, Joel Veness, Michael Bowling

Abstract: Hashing is a common method to reduce large, potentially infinite feature vectors to a fixed-size table. In reinforcement learning, hashing is often used in conjunction with tile coding to represent states in continuous spaces. Hashing is also a promising approach to value function approximation in large discrete domains such as Go and Hearts, where feature vectors can be constructed by exhaustively combining a set of atomic features. Unfortunately, the typical use of hashing in value function approximation results in biased value estimates due to the possibility of collisions. Recent work in data stream summaries has led to the development of the tug-of-war sketch, an unbiased estimator for approximating inner products. Our work investigates the application of this new data structure to linear value function approximation. Although in the reinforcement learning setting the use of the tug-of-war sketch leads to biased value estimates, we show that this bias can be orders of magnitude less than that of standard hashing. We provide empirical results on two RL benchmark domains and fifty-five Atari 2600 games to highlight the superior learning performance obtained when using tug-of-war hashing. 1

6 0.57210255 83 nips-2012-Controlled Recognition Bounds for Visual Learning and Exploration

7 0.57068574 21 nips-2012-A Unifying Perspective of Parametric Policy Search Methods for Markov Decision Processes

8 0.56877762 178 nips-2012-Learning Label Trees for Probabilistic Modelling of Implicit Feedback

9 0.56826538 236 nips-2012-Near-Optimal MAP Inference for Determinantal Point Processes

10 0.56752336 335 nips-2012-The Bethe Partition Function of Log-supermodular Graphical Models

11 0.56710601 10 nips-2012-A Linear Time Active Learning Algorithm for Link Classification

12 0.56687778 199 nips-2012-Link Prediction in Graphs with Autoregressive Features

13 0.56673437 149 nips-2012-Hierarchical Optimistic Region Selection driven by Curiosity

14 0.56671953 359 nips-2012-Variational Inference for Crowdsourcing

15 0.56537426 333 nips-2012-Synchronization can Control Regularization in Neural Systems via Correlated Noise Processes

16 0.56516141 316 nips-2012-Small-Variance Asymptotics for Exponential Family Dirichlet Process Mixture Models

17 0.56455618 260 nips-2012-Online Sum-Product Computation Over Trees

18 0.56391752 114 nips-2012-Efficient coding provides a direct link between prior and likelihood in perceptual Bayesian inference

19 0.56333643 38 nips-2012-Algorithms for Learning Markov Field Policies

20 0.5627057 112 nips-2012-Efficient Spike-Coding with Multiplicative Adaptation in a Spike Response Model