nips nips2009 nips2009-35 knowledge-graph by maker-knowledge-mining

35 nips-2009-Approximating MAP by Compensating for Structural Relaxations


Source: pdf

Author: Arthur Choi, Adnan Darwiche

Abstract: We introduce a new perspective on approximations to the maximum a posteriori (MAP) task in probabilistic graphical models, that is based on simplifying a given instance, and then tightening the approximation. First, we start with a structural relaxation of the original model. We then infer from the relaxation its deficiencies, and compensate for them. This perspective allows us to identify two distinct classes of approximations. First, we find that max-product belief propagation can be viewed as a way to compensate for a relaxation, based on a particular idealized case for exactness. We identify a second approach to compensation that is based on a more refined idealized case, resulting in a new approximation with distinct properties. We go on to propose a new class of algorithms that, starting with a relaxation, iteratively seeks tighter approximations. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 We identify a second approach to compensation that is based on a more refined idealized case, resulting in a new approximation with distinct properties. [sent-10, score-0.612]

2 For example, tree-reweighted belief propagation (TRW-BP) can be thought of as a linear programming relaxation of an integer program for a given MAP problem [1, 2]. [sent-14, score-0.313]

3 We illustrate here instead how it is possible to tighten a given relaxation itself, without impacting its structural complexity. [sent-17, score-0.253]

4 Second, we compensate for the relaxation by introducing auxiliary parameters, which we use to restore certain properties, leading to a tighter approximation. [sent-20, score-0.3]

5 We shall in fact propose two distinct properties on which a compensation can be based. [sent-21, score-0.581]

6 The first is based on a simplified case where a compensation can be guaranteed to yield exact results. [sent-22, score-0.548]

7 We further propose an algorithm for finding such a compensation, that starts with a relaxation and iteratively provides monotonically decreasing upper bounds on the MAP solution (at least empirically). [sent-26, score-0.261]

8 , Xn = xn } is an assignment of factor graph variables Xi to states xi , and where a is an index to the factor ψa (Xa ) over the domain Xa ⊆ X. [sent-31, score-0.35]

9 3 Relaxation The structural relaxations that we consider here are based on the relaxation of equivalence constraints from a model M, where an equivalence constraint Xi ≡ Xj is a factor ψeq (Xi , Xj ) over two variables Xi and Xj that have the same states. [sent-42, score-0.832]

10 Further, ψeq (xi , xj ) is 1 if xi = xj and 0 otherwise. [sent-43, score-0.778]

11 We call an assignment x valid, with respect to an equivalence constraint Xi ≡ Xj , if it sets variables Xi and Xj to the same state, and invalid otherwise. [sent-44, score-0.367]

12 Note that when we remove an equivalence constraint from a model M, the values map(x) for valid configurations x do not change, since log 1 = 0. [sent-45, score-0.32]

13 Thus, the MAP value after relaxing an equivalence constraint in M is an upper bound on the original MAP value. [sent-48, score-0.396]

14 When we now add the factor ψ2 (C, C ′ ) for the ′ equivalence constraint C ≡ C , we have a new model M′ which is equivalent to the original model M, in that an assignment x in M corresponds to an assignment x′ in M′ , where assignment x′ sets a variable and its clone to the same state. [sent-52, score-0.649]

15 We note that a number of structural relaxations can be reduced to the removal of equivalence constraints, including relaxations found by deleting edges [6, 7], as well as mini-bucket approximations [5, 4]. [sent-54, score-0.523]

16 In fact, the example above can be considered a relaxation where we delete a factor graph edge C → ψ1 , substituting clone C ′ in place of variable C. [sent-55, score-0.264]

17 Say that we remove the equivalence constraints in M, resulting in a relaxed model with MAP values r-map(. [sent-59, score-0.293]

18 To this end, we introduce into the relaxation additional factors ψij;i (Xi ) and ψij;j (Xj ) for each equivalence constraint Xi ≡ Xj that we remove. [sent-65, score-0.461]

19 These new factors add new parameters into the approximation, which we shall use to recover a weaker notion of equivalence into the model. [sent-67, score-0.332]

20 More specifically, given a set of equivalence constraints Xi ≡ Xj to relax, we have the original MAP values map(. [sent-68, score-0.267]

21 Consider then the case where an optimal assignment x⋆ for the relaxation happens to set variables Xi and Xj to the same state x, for each equivalence constraint Xi ≡ Xj that we relaxed. [sent-72, score-0.575]

22 In this case, the optimal solution for the relaxation is also an optimal solution for the original model, i. [sent-73, score-0.248]

23 On the other hand, if a relaxation’s optimal assignment sets Xi and Xj to different states, then it is not a valid assignment for the original model M, as it violates the equivalence constraint and thus has log probability −∞. [sent-76, score-0.543]

24 Consider, for a given equivalence constraint Xi ≡ Xj , the relaxation’s MAP values r-map(Xi = x) and r-map(Xj = x) when we set, respectively, a single variable Xi or Xj to a state x. [sent-77, score-0.288]

25 In this case, if there is a MAP assignment that sets variable Xi to a state x, then there is at least a MAP assignment that sets variable Xj to the same state, even if there is no MAP assignment that sets both Xi and Xj to the same state at the same time. [sent-80, score-0.34]

26 We now want to identify parameters θ(Xi ) and θ(Xj ) to compensate for a relaxation in this manner. [sent-81, score-0.276]

27 To get the intuitions behind these approaches, we consider first the simplified case where a single equivalence constraint is relaxed. [sent-83, score-0.282]

28 1 Intuitions: Splitting a Model into Two Consider the case where relaxing a single equivalence constraint Xi ≡ Xj splits a model M into two independent sub-models, Mi and Mj , where sub-model Mi contains variable Xi and sub-model Mj contains variable Xj . [sent-85, score-0.331]

29 Since sub-models Mi and Mj become independent after relaxing the single equivalence constraint Xi ≡ Xj , computing these parameters is sufficient to reconstruct the MAP solution for the original model M. [sent-89, score-0.369]

30 ) denote the MAP values of a compensation that results from relaxing an equivalence constraint Xi ≡ Xj that split M into two independent sub-models. [sent-97, score-0.898]

31 Then the compensation has parameters satisfying Equation 1 iff c-map(Xi = x) = c-map(Xj = x) = map(Xi = x, Xj = x) + γ. [sent-98, score-0.594]

32 In the case where relaxing an equivalence constraint does not split a model into two, a compensation satisfying Equation 1 at least satisfies a weaker notion of equivalence. [sent-100, score-0.954]

33 We might expect that such a compensation may lead to more meaningful, and hopefully more accurate, approximations than a relaxation. [sent-101, score-0.596]

34 Indeed, this compensation will eventually lead to a generalized class of belief propagation approximations. [sent-102, score-0.675]

35 Thus, we call a compensation satisfying Equation 1 a REC - BP approximation. [sent-103, score-0.578]

36 If a compensation has valid configurations, then its optimal solution sets variables Xi and Xj to the same state, and is thus a valid assignment for the original instance (it satisfies the equivalence constraint). [sent-110, score-0.983]

37 Moreover, if it has scaled values, then the compensation further allows us to recover the MAP value as well. [sent-111, score-0.589]

38 A compensation having valid configurations and scaled values is thus ideal as it is sufficient for us to recover the exact solution. [sent-112, score-0.692]

39 As the following proposition tells us, if a compensation is an ideal one, then it must at least satisfy Equation 2. [sent-115, score-0.629]

40 ) denote the MAP values of a compensation that results from relaxing an equivalence constraint Xi ≡ Xj in M. [sent-118, score-0.898]

41 We thus call a compensation satisfying Equation 2 a REC - I compensation. [sent-122, score-0.578]

42 In particular, if a compensation happens to satisfy c-map(Xi = x) = c-map(Xj = x) = c-map(Xi = x, Xj = x) for some state x, we have that θ(Xi = x) + θ(Xj = x) = map(Xi = x, Xj = x) (i. [sent-124, score-0.601]

43 Before we discuss the general case where we relax multiple equivalence constraints, we highlight first a few properties shared by both REC - BP and REC - I compensations, that shall follow from more general results that we shall present. [sent-127, score-0.321]

44 First, if the optimal assignment x⋆ for a compensation sets the variables Xi and Xj to the same state, then: (1) the assignment x⋆ is also optimal for the original 1 model M; and (2) 2 c-map⋆ = map⋆ . [sent-128, score-0.81]

45 2 Thus, at least in the case where a single equivalence constraint is relaxed, the compensations implied by Equations 1 and 2 do indeed tighten a relaxation (see the auxiliary Appendix for further details). [sent-131, score-0.699]

46 3 General Properties In this section, we identify the conditions that compensations should satisfy in the more general case where multiple equivalence constraints are relaxed, and further highlight some of their properties. [sent-133, score-0.504]

47 4 Suppose that k equivalence constraints Xi ≡ Xj are relaxed from a given model M. [sent-134, score-0.274]

48 Then compensations REC - BP and REC - I seek to recover into the relaxation two weaker notions of equivalence. [sent-135, score-0.454]

49 First, a REC - BP compensation has auxiliary parameters satisfying: c-map(Xi = x) = c-map(Xj = x) = θ(Xi = x) + θ(Xj = x) + γ where γ = k ⋆ 1+k c-map . [sent-136, score-0.59]

50 ⋆ We then approximate the exact MAP value map by the value (3) 1 ⋆ 1+k c-map . [sent-137, score-0.262]

51 ) denote the MAP values of a compensation that results from relaxing enough equivalence constraints Xi ≡ Xj in M to render it fully disconnected. [sent-141, score-0.895]

52 Then a compensation whose parameters satisfy Equation 3 has values exp{c-map(Xi = x)} that correspond to the max-marginals of a fixed-point of max-product belief propagation run on M, and vice-versa. [sent-142, score-0.743]

53 Loopy max-product belief propagation is thus the degenerate case of a REC - BP compensation, when the approximation is fully disconnected (by deleting every factor graph edge, as defined in Section 3). [sent-143, score-0.277]

54 Next, a REC - I compensation has parameters satisfying: c-map(Xi = x) = c-map(Xj = x) = (1 + k)[θ(Xi = x) + θ(Xj = x)] We again approximate the exact MAP value map⋆ with the value (4) 1 ⋆ 1+k c-map . [sent-145, score-0.564]

55 In both compensations, it is possible to determine if the optimal assignment x⋆ of a compensation is an optimal assignment for the original model M: we need only check that it is a valid assignment. [sent-146, score-0.862]

56 ) denote the MAP values of a compensation that results from relaxing k equivalence constraints Xi ≡ Xj . [sent-149, score-0.875]

57 A REC - I compensation has additional properties over a REC - BP compensation. [sent-153, score-0.548]

58 First, a REC - I compensation yields upper bounds on the MAP value, whereas REC - BP does not yield a bound in general. [sent-154, score-0.623]

59 ) denote the MAP values of a compensation that results from relaxing k equivalence constraints Xi ≡ Xj . [sent-157, score-0.875]

60 If an assignment x⋆ is optimal for a relaxation with MAP values r-map(. [sent-160, score-0.325]

61 , it does not violate the equivalence constraints Xi ≡ Xj ), then x⋆ is also optimal for M, where r-map(x⋆ ) = map(x⋆ ) (since they are composed of the same factor values). [sent-163, score-0.277]

62 If an assignment x⋆ of a relaxation is not valid for model M, then the MAP value of the relaxation is an upper bound on the original MAP value. [sent-164, score-0.589]

63 On the other hand, REC - I compensations are tighter approximations than the corresponding relaxation, at least in the case when a single equivalence constraint is relaxed: 1 map⋆ ≤ 2 c-map⋆ ≤ r-map⋆ . [sent-165, score-0.53]

64 When we relax multiple equivalence constraints we find, at least empirically, that REC - I bounds are never worse than relaxations, although we leave this point open. [sent-166, score-0.298]

65 ) denote the MAP values of a compensation that results from relaxing k equivalence constraints Xi ≡ Xj . [sent-170, score-0.875]

66 4, and if z is a partial assignment that sets the same sign to vari1 ables Xi and Xj , for any equivalence constraint Xi ≡ Xj relaxed, then: map(z) ≤ 1+k c-map(z). [sent-172, score-0.349]

67 5 Algorithm 1 RelaxEq-and-Compensate (REC) input: a model M with k equivalence constraints Xi ≡ Xj output: a compensation M′ main: 1: M′ ← result of relaxing all Xi ≡ Xj in M 0 2: add to M′ the factors θ(Xi ), θ(Xj ), for each Xi ≡ Xj 0 1 3: initialize all parameters θ0 (Xi = x), θ0 (Xj = x), e. [sent-176, score-0.898]

68 Consider, for the case of REC - BP, the fact that parameters satisfy Equation 3 iff they satisfy: θ(Xi = x) = c-map(Xj = x) − θ(Xj = x) − γ θ(Xj = x) = c-map(Xi = x) − θ(Xi = x) − γ This suggests an iterative fixed-point procedure for finding the parameters of a compensation that satisfy Equation 3. [sent-180, score-0.668]

69 First, we start with an initial compensation with MAP values c-map0 (. [sent-181, score-0.567]

70 For an iteration t > 0, we can update our parameters using the compensation from the previous iteration: θt (Xi = x) θt (Xj = x) = c-mapt−1 (Xj = x) − θt−1 (Xj = x) − γt−1 = c-mapt−1 (Xi = x) − θt−1 (Xi = x) − γt−1 (5) (6) k where γt−1 = 1+k c-map⋆ . [sent-183, score-0.564]

71 If at some point, the parameters of one iteration do not change in t−1 the next, then we can say that the iterations have converged, and that the compensation satisfies Equation 3. [sent-184, score-0.586]

72 Similarly, for REC - I compensations, we use the update equations: θt (Xi = x) = θt (Xj = x) = 1 1+k c-mapt−1 (Xj = x) − θt−1 (Xj = x) 1 1+k c-mapt−1 (Xi = x) − θt−1 (Xi = x) (7) (8) to identify compensations that satisfy Equation 4. [sent-185, score-0.252]

73 The 2 1 ⋆ consequence of this is that our initial compensation has the MAP value 1+k c-map0 = r-map⋆ . [sent-190, score-0.548]

74 1 That is, the initial compensation is equivalent to the relaxation, for both REC - BP and REC - I. [sent-191, score-0.548]

75 On the other hand, by Theorem 3, we know that a REC - I compensation must yield an upper bound on the true MAP value map⋆ . [sent-194, score-0.591]

76 Right column: frustrated grids, using REC - BP (top) with a fully disconnected relaxation, and REC - I (bottom) with a relaxation with max cluster size 3. [sent-236, score-0.356]

77 6 Experiments Our goal in this section is to highlight the degree to which different types of compensations can tighten a relaxation, as well as to highlight the differences in the iterative algorithms to find them. [sent-237, score-0.312]

78 We evaluated our compensations using randomly parametrized 10 × 10 grid networks. [sent-238, score-0.259]

79 We judge the quality of an approximation by the degree to which a compensation is able to improve a relaxation. [sent-239, score-0.568]

80 We first consider grid networks where factors ψa (xi , xj ) were assigned to grid edges (i, j), with values drawn uniformly at random from 0 to 1 (we assigned no factors to nodes). [sent-244, score-0.518]

81 Consider first Figure 1 (left), where we generated ten random grid networks (we plotted only ten for clarity) and plotted the compensation errors (y-axis) as they evolved over iterations (x-axis). [sent-247, score-0.687]

82 At iteration 0, the MAP value of each compensation is equivalent to that of the relaxation (by design). [sent-248, score-0.734]

83 We see that, once we start iterating, that both methods of compensation can tighten the approximation of our very coarse relaxation. [sent-249, score-0.606]

84 For REC - I, convergence is slower, but the compensation is still a significant improvement over the relaxation. [sent-251, score-0.548]

85 Moreover, it is apparent that further iterations would benefit the compensation further. [sent-252, score-0.57]

86 An attractive factor ψa (Xi , Xj ) was given a value at random from 1 − p to 1 if xi = xj and a value from 0 to 2 ˆ For each factor ψa and for each variable X in ψa , we replaced variable X with a unique clone X and ˆ introduced the equivalence constraint X ≡ X. [sent-255, score-0.829]

87 When we then relax all equivalence constraints, the resulting factor graph is fully disconnected. [sent-256, score-0.28]

88 7 1 p if xi = xj , which favors configurations xi = xj when p ≤ 2 . [sent-258, score-0.942]

89 Similarly for repulsive factors, which favors instead configurations where xi = xj . [sent-259, score-0.492]

90 It is well known that belief propagation tends to not converge in networks with frustrated interactions [11]. [sent-260, score-0.255]

91 On the other hand, REC - I compensations were relatively well behaved, and produced monotonically decreasing upper bounds on the MAP value; see Figure 1 (center). [sent-264, score-0.275]

92 Although the degree of compensation is not as dramatic, we note that we are compensating for a very coarse relaxation (fully disconnected). [sent-265, score-0.772]

93 Moreover, we can see in that one instance, REC - BP converges below the true MAP value; remember that by Theorem 3, REC - I compensations always yield upper bounds. [sent-267, score-0.261]

94 In the case of REC - I, the compensations did not improve significantly on the fully disconnected relaxations (not shown). [sent-268, score-0.375]

95 Surprisingly, this was enough for REC - I to compensate for the relaxation completely (to within 10−8 ) in 7 of the 10 instances plotted. [sent-273, score-0.257]

96 The idea of using structural relaxations for upper-bounding MAP solutions in probabilistic graphical models goes back to mini-bucket approximations [13], which can be considered to be a particular way of relaxing equivalence constraints from a model [4]. [sent-278, score-0.516]

97 One approach to compensation identified a generalized class of max-product belief propagation approximations. [sent-280, score-0.675]

98 We then identified a second approach that led to another class of approximations that we have observed to yield tighter upper bounds on MAP solutions as compared to a relaxation alone. [sent-281, score-0.362]

99 In contrast to the above approaches, compensations further guarantee, in Theorem 4, upper bounds on MAP solutions under any partial assignment (without rerunning the algorithm). [sent-286, score-0.395]

100 We hope that this perspective will enable the design of new approximations, especially in domains where specific notions of compensation may suggest themselves. [sent-290, score-0.585]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('compensation', 0.548), ('rec', 0.472), ('xj', 0.307), ('map', 0.262), ('compensations', 0.2), ('equivalence', 0.189), ('relaxation', 0.186), ('xi', 0.164), ('bp', 0.15), ('relaxations', 0.111), ('frustrated', 0.106), ('assignment', 0.1), ('relaxing', 0.082), ('propagation', 0.069), ('xa', 0.062), ('constraint', 0.06), ('adnan', 0.059), ('grid', 0.059), ('belief', 0.058), ('maxx', 0.056), ('compensate', 0.055), ('valid', 0.052), ('gurations', 0.049), ('relaxed', 0.048), ('approximations', 0.048), ('choi', 0.047), ('clone', 0.047), ('tommi', 0.047), ('disconnected', 0.044), ('upper', 0.043), ('angeles', 0.041), ('rina', 0.041), ('relax', 0.04), ('arthur', 0.038), ('tighten', 0.038), ('compensating', 0.038), ('uai', 0.037), ('constraints', 0.037), ('deleting', 0.035), ('intuitions', 0.033), ('los', 0.033), ('satisfy', 0.033), ('shall', 0.033), ('tighter', 0.033), ('ideal', 0.032), ('bounds', 0.032), ('factor', 0.031), ('satisfying', 0.03), ('structural', 0.029), ('weaker', 0.027), ('compensated', 0.027), ('factors', 0.026), ('highlight', 0.026), ('auxiliary', 0.026), ('assignments', 0.025), ('idealized', 0.025), ('states', 0.024), ('grids', 0.024), ('lp', 0.024), ('dechter', 0.024), ('kalev', 0.024), ('kask', 0.024), ('pensation', 0.024), ('recover', 0.023), ('iterations', 0.022), ('networks', 0.022), ('original', 0.022), ('iterative', 0.022), ('mj', 0.021), ('eq', 0.021), ('talya', 0.021), ('darwiche', 0.021), ('repulsive', 0.021), ('fully', 0.02), ('approximation', 0.02), ('message', 0.02), ('alan', 0.02), ('state', 0.02), ('optimal', 0.02), ('solutions', 0.02), ('equation', 0.02), ('values', 0.019), ('identify', 0.019), ('perspective', 0.019), ('jaakkola', 0.019), ('yair', 0.019), ('notion', 0.018), ('scaled', 0.018), ('plotted', 0.018), ('notions', 0.018), ('remember', 0.018), ('tightening', 0.018), ('invalid', 0.018), ('con', 0.017), ('posteriori', 0.017), ('proposition', 0.016), ('converged', 0.016), ('equations', 0.016), ('instances', 0.016), ('parameters', 0.016)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999988 35 nips-2009-Approximating MAP by Compensating for Structural Relaxations

Author: Arthur Choi, Adnan Darwiche

Abstract: We introduce a new perspective on approximations to the maximum a posteriori (MAP) task in probabilistic graphical models, that is based on simplifying a given instance, and then tightening the approximation. First, we start with a structural relaxation of the original model. We then infer from the relaxation its deficiencies, and compensate for them. This perspective allows us to identify two distinct classes of approximations. First, we find that max-product belief propagation can be viewed as a way to compensate for a relaxation, based on a particular idealized case for exactness. We identify a second approach to compensation that is based on a more refined idealized case, resulting in a new approximation with distinct properties. We go on to propose a new class of algorithms that, starting with a relaxation, iteratively seeks tighter approximations. 1

2 0.14999104 31 nips-2009-An LP View of the M-best MAP problem

Author: Menachem Fromer, Amir Globerson

Abstract: We consider the problem of finding the M assignments with maximum probability in a probabilistic graphical model. We show how this problem can be formulated as a linear program (LP) on a particular polytope. We prove that, for tree graphs (and junction trees in general), this polytope has a particularly simple form and differs from the marginal polytope in a single inequality constraint. We use this characterization to provide an approximation scheme for non-tree graphs, by using the set of spanning trees over such graphs. The method we present puts the M -best inference problem in the context of LP relaxations, which have recently received considerable attention and have proven useful in solving difficult inference problems. We show empirically that our method often finds the provably exact M best configurations for problems of high tree-width. A common task in probabilistic modeling is finding the assignment with maximum probability given a model. This is often referred to as the MAP (maximum a-posteriori) problem. Of particular interest is the case of MAP in graphical models, i.e., models where the probability factors into a product over small subsets of variables. For general models, this is an NP-hard problem [11], and thus approximation algorithms are required. Of those, the class of LP based relaxations has recently received considerable attention [3, 5, 18]. In fact, it has been shown that some problems (e.g., fixed backbone protein design) can be solved exactly via sequences of increasingly tighter LP relaxations [13]. In many applications, one is interested not only in the MAP assignment but also in the M maximum probability assignments [19]. For example, in a protein design problem, we might be interested in the M amino acid sequences that are most stable on a given backbone structure [2]. In cases where the MAP problem is tractable, one can devise tractable algorithms for the M best problem [8, 19]. Specifically, for low tree-width graphs, this can be done via a variant of max-product [19]. However, when finding MAPs is not tractable, it is much less clear how to approximate the M best case. One possible approach is to use loopy max-product to obtain approximate max-marginals and use those to approximate the M best solutions [19]. However, this is largely a heuristic and does not provide any guarantees in terms of optimality certificates or bounds on the optimal values. LP approximations to MAP do enjoy such guarantees. Specifically, they provide upper bounds on the MAP value and optimality certificates. Furthermore, they often work for graphs with large tree-width [13]. The goal of the current work is to leverage the power of LP relaxations to the M best case. We begin by focusing on the problem of finding the second best solution. We show how it can be formulated as an LP over a polytope we call the “assignment-excluding marginal polytope”. In the general case, this polytope may require an exponential number of inequalities, but we prove that when the graph is a tree it has a very compact representation. We proceed to use this result to obtain approximations to the second best problem, and show how these can be tightened in various ways. Next, we show how M best assignments can be found by relying on algorithms for 1 second best assignments, and thus our results for the second best case can be used to devise an approximation algorithm for the M best problem. We conclude by applying our method to several models, showing that it often finds the exact M best assignments. 1 The M-best MAP problem and its LP formulation Consider a function on n variables defined as: f (x1 , . . . , xn ; θ) = θij (xi , xj ) + ij∈E θi (xi ) (1) i∈V where V and E are the vertices and nodes of a graph G with n nodes. We shall be interested in the M assignments with largest f (x; θ) value.1 Denote these by x(1) , . . . , x(M) , so that x(1) is the assignment that maximizes f (x; θ), x(2) is the 2nd best assignment, etc. The MAP problem (i.e., finding x(1) ) can be formulated as an LP as follows [15]. Let µ be a vector of distributions that includes {µij (xi , xj )}ij∈E over edge variables and {µi (xi )}i∈V over nodes. The set of µ that arise from some joint distribution is known as the marginal polytope [15] and is denoted by M(G). Formally: M(G) = {µ | ∃p(x) ∈ ∆ s.t. p(xi , xj ) = µij (xi , xj ) , p(xi ) = µi (xi )} . where ∆ is the set of distributions on x. The MAP problem can then be shown to be equivalent to the following LP:2 max f (x; θ) = max µ · θ , (2) x µ∈M(G) It can be shown that this LP always has a maximizing µ that is a vertex of M(G) and is integral. Furthermore, this µ corresponds to the MAP assignment x(1) . Although the number of variables in this LP is only O(|E| + |V |), the difficulty comes from an exponential number of linear inequalities generally required to describe the marginal polytope M(G). We shall find it useful to define a mapping between assignments x and integral vertices of the polytope. Given an integral vertex v ∈ M(G), define x(v) to be the assignment that maximizes vi (xi ). And, given an assignment z define v(z) to be the integral vertex in M(G) corresponding to the assignment z. Thus the LP in Eq. 2 will be maximized by v(x(1) ). One simple outer bound of the marginal polytope is the local polytope ML (G), which only enforces pairwise constraints between variables:     µi (xi ) = 1 (3) µij (xi , xj ) = µj (xj ), µij (xi , xj ) = µi (xi ), ML (G) = µ ≥ 0   x x x i j i The LP relaxation is then to maximize µ · θ where µ ∈ ML (G). For tree structured graphs, ML (G) = M(G) [15] and thus the LP relaxation yields the exact MAP x(1) . An LP Formulation for the 2nd -best MAP 2 Assume we found the MAP assignment x(1) and are now interested in finding x(2) . Is there a simple LP whose solution yields x(2) ? We begin by focusing on the case where G is a tree so that the local LP relaxation is exact. We first treat the case of a connected tree. To construct an LP whose solution is x(2) , a natural approach is to use the LP for x(1) (i.e., the LP in Eq. 2) but somehow eliminate the solution x(1) using additional constraints. This, however, is somewhat trickier than it sounds. The key difficulty is that the new constraints should not generate fractional vertices, so that the resulting LP is still exact. We begin by defining the polytope over which we need to optimize in order to obtain x(2) . 1 2 This is equivalent to finding P maximum probability assignments for a model p(x) ∝ ef (x;θ) . the P P P We use the notation µ · θ = ij∈E xi ,xj µij (xi , xj )θij (xi , xj ) + i xi µi (xi )θi (xi ) 2 Definition 1. The assignment-excluding marginal polytope is defined as: ˆ M(G, z) = {µ | ∃p(x) ∈ ∆ s.t. p(z) = 0, p(xi , xj ) = µij (xi , xj ), p(xi ) = µi (xi )} . ˆ M(G, z) is simply the convex hull of all (integral) vectors v(x) for x = z. (4) ˆ The following result shows that optimizing over M(G, x(1) ) will yield the second best soluˆ tion x(2) , so that we refer to M(G, x(1) ) as the second-best marginal polytope. Lemma 1. The 2nd best solution is obtained via the following LP: maxx=x(1) f (x; θ) = maxµ∈M(G,x(1) ) µ · θ. Furthermore, the µ that maximizes the LP on ˆ the right is integral and corresponds to the second-best MAP assignment x(2) . The proof is similar to that of Eq. 2: instead of optimizing over x, we optimize over distributions p(x), while enforcing that p(x(1) ) = 0 so that x(1) is excluded from the maximization. The key question which we now address is how to obtain a simple characterization of ˆ ˆ M(G, z). Intuitively, it would seems that M(G, z) should be “similar” to M(G), such that it can be described as M(G) plus some constraints that “block” the assignment z. To illustrate the difficulty in finding such “blocking” constraints, consider the following constraint, originally suggested by Santos [10]: i µi (zi ) ≤ n − 1. This inequality is not satisfied by µ = v(z) since v(z) attains the value n for the LHS of the above. Furthermore, for any x = z and µ = v(x), the LHS would be n − 1 or less. Thus, this inequality separates ˆ v(z) from all other integral vertices. One might conclude that we can define M(G, z) by adding this inequality to M(G). The difficulty is that the resulting polytope has fractional vertices,3 and maximizing over it won’t generally yield an integral solution. It turns out that there is a different inequality that does yield an exact characterization of ˆ M(G, z) when G is a tree. We now define this inequality and state our main theorem. Definition 2. Consider the functional I(µ, z) (which is linear in µ): (1 − di )µi (zi ) + I(µ, z) = i µij (zi , zj ) (5) ij∈E where di is the degree of node i in the tree graph G. ˆ Theorem 1. Adding the single inequality I(µ, z) ≤ 0 to M(G) yields M(G, z). ˆ M(G, z) = {µ | µ ∈ M(G), I(µ, z) ≤ 0 } (6) The theorem is proved in the appendix. Taken together with Lemma 1, it implies that x(2) may be obtained via an LP that is very similar to the MAP-LP, but has an additional constraint. We note the interesting similarity between I(µ, z) and the Bethe entropy [20]. The only difference is that in Bethe, µi , µij are replaced by H(Xi ), H(Xi , Xj ) respectively.4 The theorem also generalizes to the case where G is not a tree, but we have a junction tree for G. In this case, the theorem still holds if we define a generalized I(µ, z) inequality as: (1 − dS )µS (zS ) + S∈S µC (zC ) ≤ 0 (7) C∈C where C and S are the junction tree cliques and their separators, respectively, and dS is the number of cliques that intersect on separator S. In this case, the marginal polytope should enforce consistency between marginals µC (zC ) and their separators µS (zS ). However, such a characterization requires variables whose cardinality is exponential in the tree-width and is thus tractable only for graphs of low tree-width. In the next section, we address approximations for general graphs. A corresponding result exists for the case when G is a forest. In this case, the inequality in Eq. 6 is modified to: I(µ, z) ≤ |P | − 1, where |P | denotes the number of connected components of G. Interestingly, for a graph without edges, this gives the Santos inequality. 3 Consider the case of a single edge between 2 nodes where the MAP assignment is (0, 0). Adding the inequality µ1 (0) + µ2 (0) ≤ 1 produces the fractional vertex (0.5, 0.5). 4 The connection to Bethe can be more clearly understood from a duality-based proof of Theorem 1. We will cover this in an extended version of the manuscript. 3 2nd best LPs for general graphs - Spanning tree inequalities 3 When the graph G is not a tree, the marginal polytope M(G) generally requires an exponential number of inequalities. However, as mentioned above, it does have an exact description in terms of marginals over cliques and separators of a junction tree. Given such marginals on ˆ junction tree cliques, we also have an exact characterization of M(G, z) via the constraint in Eq. 7. However, in general, we cannot afford to be exponential in tree-width. Thus a common strategy [15] is to replace M(G) with an outer bound that enforces consistency between marginals on overlapping sets of variables. The simplest example is ML (G) in Eq. 3. ˆ In what follows, we describe an outer-bound approximation scheme for M(G, z). We use ML (G) as the approximation for M(G) (more generally ML (G) can enforce consistency between any set of small regions, e.g., triplets). When G is not a tree, the linear constraint in ˆ Eq. 6 will no longer suffice to derive M(G, z). Moreover, direct application of the inequality will incorrectly remove some integral vertices. An alternative approach is to add inequalities that separate v(z) from the other integral vertices. This will serve to eliminate more and more fractional vertices, and if enough constraints are added, this may result in an integral solution. One obvious family of such constraints are those corresponding to spanning trees in G and have the form of Eq. 5. Definition 3. Consider any T that is a spanning tree of G. Define the functional I T (µ, z): (1 − dT )µi (zi ) + i I T (µ, z) = i µij (zi , zj ) (8) ij∈T where dT is the degree of i in T . We refer to I T (µ, z) ≤ 0 as a spanning tree inequality. i For any sub-tree T of G, the corresponding spanning tree inequality separates the vertex v(z) from the other vertices. This can be shown via similar arguments as in the proof of Theorem 1. Note, however, that the resulting polytope may still have fractional vertices. The above argument shows that any spanning tree provides a separating inequality for ˆ M(G, z). In principle, we would like to use as many such inequalities as possible. Definition 4. The spanning tree assignment-excluding marginal polytope is defined as: ˆ MST (G, z) = µ | µ ∈ ML (G), L ∀ tree T ⊆ E I T (µ, z) ≤ 0 (9) where the ST notation indicates the inclusion of all spanning tree inequalities for G.5 Thus, we would actually like to perform the following optimization problem: max ˆ µ∈MST (G,z) L µ·θ ˆ as an approximation to optimization over M(G, z); i.e., we seek the optimal µ subject to all spanning tree inequalities for G with the ambition that this µ be integral and thus provide the non-z MAP assignment, with a certificate of optimality. Although the number of spanning trees is exponential in n, it turns out that all spanning inequalities can be used in practice. One way to achieve this is via a cutting plane algorithm [12] that finds the most violated spanning tree inequality and adds it to the LP. To implement this efficiently, we note that for a particular µ and a spanning tree T , the value of I T (µ, z) can be decomposed into a sum over the edges in T (and a T -independent constant): I T (µ, z) = µi (zi ) µij (zi , zj ) − µi (zi ) − µj (zj ) + (10) i ij∈T The tree maximizing the above is the maximum-weight spanning tree with edge-weights wij = µij (zi , zj ) − µi (zi ) − µj (zj ). It can thus be found efficiently. The cutting plane algorithm proceeds as follows. We start by adding an arbitrary spanning tree. Then, as long as the optimal µ is fractional, we find the spanning tree inequality that µ most violates (where this is implemented via the maximum-weight spanning tree). This constraint will necessarily remove µ from the polytope. If there are no violated inequalities 5 ˆ ˆL Note that M(G, z) ⊆ MST (G, z) ⊂ ML (G). 4 but µ is still fractional, then spanning tree inequalities do not suffice to find an integral solution (but see below on hypertree constraints to add in this case). In practice, we found that only a relatively small number of inequalities are needed to successfully yield an integral solution, or determine that all such inequalities are already satisfied. An alternative approach for solving the all spanning-tree problem is to work via the dual. The dual variables roughly correspond to points in the spanning tree polytope [16], optimization over which can be done in polynomial time, e.g., via the ellipsoid algorithm. We do not pursue this here since the cutting plane algorithm performed well in our experiments. ˆ As mentioned earlier, we can exactly characterize M(G, z) using Eq. 7, albeit at a cost exponential in the tree-width of the graph. A practical compromise would be to use inequalities over clique trees of G, where the cliques are relatively small, e.g., triplets. The corresponding constraint (Eq. 7 with the small cliques and their separators) will necessarily separate v(z) from the other integral vertices. Finding the maximally violated such inequality is an NP-hard problem, equivalent to a prize collecting Steiner tree problem, but recent work has found that such problems are often exactly solvable in practice [7]. It thus might be practical to include all such trees as constraints using a cutting plane algorithm. 4 From 2nd -best to M-best Thus far, we only dealt with the 2nd best case. As we show now, it turns out that the 2nd -best formalism can be used to devise an algorithm for M best. We begin by describing an algorithm for the exact M best and then show how it can be used to approximate those via the approximations for 2nd best described above. Fig. 1 describes our scheme, which we call Partitioning for Enumerating Solutions (or PES) for solving the M best problem. The scheme is general and only assumes that MAP-“like” problems can be solved. It is inspired by several pre-existing M best solution schemes [4, 6, 8, 19] but differs from them in highlighting the role of finding a second best solution within a given subspace. for m ← 1 to M do if m = 1 then Run MAP solver to obtain the best assignment: x(1) ≡ arg max f (x; θ) CONSTRAINTS1 ← ∅ else k ←− arg max ′ k′ ∈{1,...,m−1} f (y(k ) ; θ) // sub-space containing mth best assignment x(m) ← y(k) // mth best assignment // A variable choice that distinguishes x(m) from x(k) : (m) (v, a) ← any member of the set {(i, xi (m) ) : xi (k) = xi } CONSTRAINTSm ← CONSTRAINTSk ∪ {xv = a} // Eliminate x(k) (as MAP) from subspace m CONSTRAINTSk ← CONSTRAINTSk ∪ {xv = a} // Eliminate x(m) (as 2nd -best) from subspace k y(k) ← CalcNextBestSolution(CONSTRAINTSk , x(k) ) end y(m) ← CalcNextBestSolution(CONSTRAINTSm , x(m) ) end return {x(m) }M m=1 /* Find next best solution in sub-space defined by CONSTRAINTS */ Function CalcNextBestSolution(CONSTRAINTS, x(∗) ) // x(∗) is the MAP in the sub-space defined by CONSTRAINTS: Run MAP solver to obtain the second-best solution: y ≡ arg max f (x; θ), and return y. x=x(∗) ,CONSTRAINTS end Figure 1: Pseudocode for the PES algorithm. The modus operandi of the PES algorithm is to efficiently partition the search space while systematically excluding all previously determined assignments. Significantly, any MAP 5 Attractive Grids Ranks Run-times 1 50 Mixed Grids Ranks Run-times 1 50 0.5 0 S N B 0 Hard Protein SCP Ranks Run-times 1 50 0.5 S N B 0 0 S+R N+R B+R 0.5 S+R N+R B+R 0 S+R B B+R 0 S+R B B+R Figure 2: Number of best ranks and normalized run-times for the attractive and mixed grids, and the more difficult protein SCP problems. S, N, and B denote the STRIPES, Nilsson, and BMMF algorithms. Algorithms marked with +R denote that regions of variables were added for those runs. solver can be plugged into it, on the condition that it is capable of solving the arg max in the CalcNextBestSolution subroutine. The correctness of PES can be shown by observing that at the M th stage, all previous best solutions are excluded from the optimization and no other assignment is excluded. Of note, this simple partitioning scheme is possible due to the observation that the first-best and second-best MAP assignments must differ in the assignment of at least one variable in the graph. The main computational step of the PES algorithm is to maximize f (x; θ) subject to x = x(∗) and x ∈ CONSTRAINTS (see the CalcNextBestSolution subroutine). The CONSTRAINTS set merely enforces that some of the coordinates of x are either equal to or different from specified values.6 Within the LP, these can be enforced by setting µi (xi = a) = 1 or µi (xi = a) = 0. It can be shown that if one optimizes µ · θ with ˆ these constraints and µ ∈ M(G, x(∗) ), the solution is integral. Thus, the only element ˆ requiring approximation in the general case is the description of M(G, x(∗) ). We choose as ˆ this approximation the polytope MST (G, x(∗) ) in Eq. 9. We call the resulting approximaL tion algorithm Spanning TRee Inequalities and Partitioning for Enumerating Solutions, or STRIPES. In the next section, we evaluate this scheme experimentally. 5 Experiments We compared the performance of STRIPES to the BMMF algorithm [19] and the Lawler/Nilsson algorithm [6, 8]. Nilsson’s algorithm is equivalent to PES where the 2nd best assignment is obtained from maximizations within O(n) partitions, so that its runtime is O(n) times the cost of finding a single MAP. Here we approximated each MAP with its LP relaxation (as in STRIPES), so that both STRIPES and Nilsson come with certificates of optimality when their LP solutions are integral. BMMF relies on loopy BP to approximate the M best solutions.7 We used M = 50 in all experiments. To compare the algorithms, we pooled all their solutions, noting the 50 top probabilities, and then counted the fraction of these that any particular algorithm found (its solution rank). For run-time comparisons, we normalized the times by the longest-running algorithm for each example. We begin by considering pairwise MRFs on binary grid graphs of size 10 × 10. In the first experiment, we used an Ising model with attractive (submodular) potentials, a setting in which the pairwise LP relaxation is exact [14]. For each grid edge ij, we randomly chose Jij ∈ [0, 0.5], and local potentials were randomized in the range ±0.5. The results for 25 graphs are shown in Fig. 2. Both the STRIPES and Nilsson algorithms obtained the 50 optimal solutions (as learned from their optimality certificates), while BMMF clearly fared less well for some of the graphs. While the STRIPES algorithm took < 0.5 to 2 minutes to run, the Nilsson algorithm took around 13 minutes. On the other hand, BMMF was quicker, taking around 10 seconds per run, while failing to find a significant portion of the top solutions. Overall, the STRIPES algorithm was required to employ up to 19 spanning tree inequalities per calculation of second-best solution. 6 This is very different from the second best constraint, since setting x1 = 1 blocks all assignments with this value, as opposed to setting x = 1 which blocks only the assignment with all ones. 7 For BMMF, we used the C implementation at http://www.cs.huji.ac.il/~ talyam/ inference.html. The LPs for STRIPES and Nilsson were solved using CPLEX. 6 Next, we studied Ising models with mixed interaction potentials (with Jij and the local potentials randomly chosen in [−0.5, 0.5]). For almost all of the 25 models, all three algorithms were not able to successfully find the top solutions. Thus, we added regions of triplets (two for every grid face) to tighten the LP relaxation (for STRIPES and Nilsson) and to perform GBP instead of BP (for BMMF). This resulted in STRIPES and Nilsson always provably finding the optimal solutions, and BMMF mostly finding these solutions (Fig. 2). For these more difficult grids, however, STRIPES was the fastest of the algorithms, taking 0.5 - 5 minutes. On the other hand, the Nilsson and BMMF algorithms took 18 minutes and 2.5 7 minutes, respectively. STRIPES added up to 23 spanning tree inequalities per iteration. The protein side-chain prediction (SCP) problem is to to predict the placement of amino acid side-chains given a protein backbone [2, 18]. Minimization of a protein energy function corresponds to finding a MAP assignment for a pairwise MRF [19]. We employed the dataset of [18] (up to 45 states per variable, mean approximate tree-width 50), running all algorithms to calculate the optimal side-chain configurations. For 315 of 370 problems in the dataset, the first MAP solution was obtained directly as a result of the LP relaxation having an integral solution (“easy” problems). STRIPES provably found the subsequent top 50 solutions within 4.5 hours for all but one of these cases (up to 8 spanning trees per calculation), and BMMF found the same 50 solutions for each case within 0.5 hours; note that only STRIPES provides a certificate of optimality for these solutions. On the other hand, only for 146 of the 315 problems was the Nilsson method able to complete within five days; thus, we do not compare its performance here. For the remaining 55 (“hard”) problems (Fig. 2), we added problem-specific triplet regions using the MPLP algorithm [13]. We then ran the STRIPES algorithm to find the optimal solutions. Surprisingly, it was able to exactly find the 50 top solutions for all cases, using up to 4 standard spanning tree inequalities per second-best calculation. The STRIPES run-times for these problems ranged from 6 minutes to 23 hours. On the other hand, whether running BMMF without these regions (BP) or with the regions (GBP), it did not perform as well as STRIPES in terms of the number of high-ranking solutions or its speed. To summarize, STRIPES provably found the top 50 solutions for 369 of the 370 protein SCP problems. 6 Conclusion ˆ In this work, we present a novel combinatorial object M(G, z) and show its utility in obtaining the M best MAP assignments. We provide a simple characterization of it for tree structured graphs, and show how it can be used for approximations in non-tree graphs. As with the marginal polytope, many interesting questions arise about the properties of ˆ M(G, z). For example, in which non-tree cases can we provide a compact characterization (e.g., as for the cut-polytope for planar graphs [1]). Another compelling question is in which problems the spanning tree inequalities are provably optimal. An interesting generalization of our method is to predict diverse solutions satisfying some local measure of “distance” from each other, e.g., as in [2]. Here we studied the polytope that results from excluding one assignment. An intriguing question is to characterize the polytope that excludes M assignments. We have found that it does not simply correspond to adding M constraints I(µ, z i ) ≤ 0 for i = 1, . . . , M , so its ˆ geometry is apparently more complicated than that of M(G, z). Here we used LP solvers to solve for µ. Such generic solvers could be slow for large-scale problems. However, in recent years, specialized algorithms have been suggested for solving MAP-LP relaxations [3, 5, 9, 17]. These use the special form of the constraints to obtain local-updates and more scalable algorithms. We intend to apply these schemes to our method. Finally, our empirical results show that our method indeed leverages the power of LP relaxations and yields exact M best optimal solutions for problems with large tree-width. Acknowledgements We thank Nati Linial for his helpful discussions and Chen Yanover and Talya Meltzer for their insight and help in running BMMF. We also thank the anonymous reviewers for their useful advice. 7 A Proof of Theorem 1 Recall that for any µ ∈ M(G), there exists a probability density p(x) s.t. µ = x p(x)v(x). Denote pµ (z) as the minimal value of p(z) among all p(x) that give µ. We prove that ˆ pµ (z) = max(0, I(µ, z)), from which the theorem follows (since pµ (z) = 0 iff µ ∈ M(G, z)). The proof is by induction on n. For n = 1, the node has degree 0, so I(µ, z) = µ1 (z1 ). Clearly, pµ (z) = µ1 (z1 ), so pµ (z) = I(µ, z). For n > 1, there must exist a leaf in G ˆ (assume that its index is n and its neighbor’s is n − 1). Denote G as the tree obtained ˆ by removing node n and its edge with n − 1. For any assignment x, denote x as the corresponding sub-assignment for the first n − 1 variables. Also, any µ can be derived by ˆ ˆ adding appropriate coordinates to a unique µ ∈ M(G). For an integral vertex µ = v(x), ˆˆ ˆ ˆ ˆ ˆ x denote its projected µ as v (ˆ ). Denote by I(µ, z ) the functional in Eq. 5 applied to G. For ˆ any µ and its projected µ, it can be seen that: ˆˆ ˆ I(µ, z) = I(µ, z ) − α (11) where we define α = xn =zn µn−1,n (zn−1 , xn ) (so 0 ≤ α ≤ 1). The inductive assumption ˆ ˆ ˆ gives a p(ˆ ) that has marginals µ and also p(ˆ ) = max(0, I(µ, z )). We next use p(ˆ ) to ˆx ˆz ˆx construct a p(x) that has marginals µ and the desired minimal pµ (z). Consider three cases: ˆˆ ˆ I. I(µ, z) ≤ 0 and I(µ, z ) ≤ 0. From the inductive assumption, pµ (ˆ ) = 0, so we define: ˆˆ z µn−1,n (xn−1 , xn ) p(x) = p(ˆ ) ˆx (12) µn−1 (xn−1 ) which indeed marginalizes to µ, and p(z) = 0 so that pµ (z) = 0 as required. If µn−1 (xn−1 ) = 0, then p(ˆ ) is necessarily 0, in which case we define p(x) = 0. Note that this construction ˆx is identical to that used in proving that ML (G) = M(G) for a tree graph G. ˆˆ ˆ II. I(µ, z) > 0. Based on Eq. 11 and α ≥ 0, we have I(µ, z ) > 0. Applying the inductive ˆ µ, z ) = pµ (ˆ ) > 0. Now, define p(x) so that p(z) = I(µ, z): ˆ assumption to µ, we obtain I( ˆ ˆ ˆˆ z xl , l ≤ n − 2 δ(xn−1 = zn−1 ) δ(xn = zn ) p(x) no constraint 0 no constraint As in Eq. 12 0 0 ∃ l x l = zl 1 ∀ l x l = zl 1 µn−1,n (zn−1 , xn ) 1 1 p(ˆ ) ˆx 0 I(µ, z) Simple algebra shows that p(x) is non-negative and has µ as marginals. We now show that p(z) is minimal. Based on the inductive assumption and Eq. 11, it can easily be shown that I(v(z), z) = 1, I(v(x), z) ≤ 0 for x = z. For any p(x) s.t. µ = x p(x)v(x), from linearity, I(µ, z) = p(z) + x=z p(x)I(v(x), z) ≤ p(z) (since I(v(x), z) ≤ 0 for x = z). Since the p(z) we define achieves this lower bound, it is clearly minimal. ˆˆ ˆ ˆ III. I(µ, z) ≤ 0 but I(µ, z ) > 0. Applying the inductive assumption to µ, we see that ˆ µ, z ) > 0; Eq. 11 implies α − I(µ, z ) ≥ 0. Define β = µn−1 (zn−1 ) − pµ (ˆ ), which ˆˆ ˆ ˆˆ z pµ (ˆ ) = I( ˆ ˆ ˆˆ z ˆ is non-negative since µn−1 (zn−1 ) = µn−1 (ˆ n−1 ) and p marginalizes to µ. Define p(x) as: ˆ z ˆ xl , l ≤ n − 2 δ(xn−1 = zn−1 ) δ(xn = zn ) no constraint 0 no constraint ∃ l x l = zl As in Eq. 12 0 ˆ ˆ z µ (z ,x ) p(ˆ ) n−1,n βn−1 n α−I(µ,ˆ ) ˆx α µ (z ,z ) p(ˆ ) n−1,n βn−1 n ˆx (z ,x ) ˆˆ ˆ µ I(µ, z ) n−1,n αn−1 n 1 0 0 1 1 ∀ l x l = zl p(x) 1 which indeed marginalizes to µ, and p(z) = 0 so that pµ (z) = 0, as required. 8 References [1] F. Barahona. On cuts and matchings in planar graphs. Math. Program., 60(1):53–68, 1993. [2] M. Fromer and C. Yanover. Accurate prediction for atomic-level protein design and its application in diversifying the near-optimal sequence space. Proteins: Structure, Function, and Bioinformatics, 75:682–705, 2009. [3] A. Globerson and T. Jaakkola. Fixing max-product: Convergent message passing algorithms for MAP LP-relaxations. In J. Platt, D. Koller, Y. Singer, and S. Roweis, editors, Advances in Neural Information Processing Systems 21. MIT Press, Cambridge, MA, 2007. [4] E. Kloppmann, G. M. Ullmann, and T. Becker. An extended dead-end elimination algorithm to determine gap-free lists of low energy states. Journal of Comp. Chem., 28:2325–2335, 2007. [5] N. Komodakis and N. Paragios. Beyond loose LP-relaxations: Optimizing MRFs by repairing cycles. In D. Forsyth, P. Torr, and A. Zisserman, editors, ECCV, pages 806–820, Heidelberg, Germany, 2008. Springer. [6] E. L. Lawler. A procedure for computing the K best solutions to discrete optimization problems and its application to the shortest path problem. Management Science, 18(7):401–405, 1972. [7] I. Ljubic, R. Weiskircher, U. Pferschy, G. W. Klau, P. Mutzel, and M. Fischetti. An algorithmic framework for the exact solution of the prize-collecting steiner tree problem. Mathematical Programming, 105:427–449, Feb 2006. [8] D. Nilsson. An efficient algorithm for finding the M most probable configurations in probabilistic expert systems. Statistics and Computing, 8:159–173, Jun 1998. [9] P. Ravikumar, A. Agarwal, and M. Wainwright. Message-passing for graph-structured linear programs: proximal projections, convergence and rounding schemes. In Proc. of the 25th international conference on Machine learning, pages 800–807, New York, NY, USA, 2008. ACM. [10] E. Santos. On the generation of alternative explanations with implications for belief revision. In Proc. of the 7th Annual Conference on Uncertainty in Artificial Intelligence, 1991. [11] Y. Shimony. Finding the MAPs for belief networks is NP-hard. 68(2):399–410, 1994. Aritifical Intelligence, [12] D. Sontag and T. Jaakkola. New outer bounds on the marginal polytope. In J. Platt, D. Koller, Y. Singer, and S. Roweis, editors, Advances in Neural Information Processing Systems 20, pages 1393–1400. MIT Press, Cambridge, MA, 2007. [13] D. Sontag, T. Meltzer, A. Globerson, T. Jaakkola, and Y. Weiss. Tightening LP relaxations for MAP using message passing. In Proc. of the 24th Annual Conference on Uncertainty in Artificial Intelligence, pages 503–510, 2008. [14] B. Taskar, S. Lacoste-Julien, and M. I. Jordan. Structured prediction, dual extragradient and bregman projections. J. Mach. Learn. Res., 7:1627–1653, 2006. [15] M. Wainwright and M. Jordan. Graphical models, exponential families, and variational inference. Found. Trends Mach. Learn., 1(1-2):1–305, 2008. [16] M. J. Wainwright, T. Jaakkola, and A. S. Willsky. A new class of upper bounds on the log partition function. IEEE Transactions on Information Theory, 51(7):2313–2335, 2005. [17] T. Werner. A linear programming approach to max-sum problem: A review. IEEE Trans. Pattern Anal. Mach. Intell., 29(7):1165–1179, 2007. [18] C. Yanover, T. Meltzer, and Y. Weiss. Linear programming relaxations and belief propagation – an empirical study. Journal of Machine Learning Research, 7:1887–1907, 2006. [19] C. Yanover and Y. Weiss. Finding the M most probable configurations using loopy belief propagation. In Advances in Neural Information Processing Systems 16. MIT Press, Cambridge, MA, 2004. [20] J. Yedidia, W. W.T. Freeman, and Y. Weiss. Constructing free-energy approximations and generalized belief propagation algorithms. IEEE Trans. on Information Theory, 51(7):2282– 2312, 2005. 9

3 0.13775943 227 nips-2009-Speaker Comparison with Inner Product Discriminant Functions

Author: Zahi Karam, Douglas Sturim, William M. Campbell

Abstract: Speaker comparison, the process of finding the speaker similarity between two speech signals, occupies a central role in a variety of applications—speaker verification, clustering, and identification. Speaker comparison can be placed in a geometric framework by casting the problem as a model comparison process. For a given speech signal, feature vectors are produced and used to adapt a Gaussian mixture model (GMM). Speaker comparison can then be viewed as the process of compensating and finding metrics on the space of adapted models. We propose a framework, inner product discriminant functions (IPDFs), which extends many common techniques for speaker comparison—support vector machines, joint factor analysis, and linear scoring. The framework uses inner products between the parameter vectors of GMM models motivated by several statistical methods. Compensation of nuisances is performed via linear transforms on GMM parameter vectors. Using the IPDF framework, we show that many current techniques are simple variations of each other. We demonstrate, on a 2006 NIST speaker recognition evaluation task, new scoring methods using IPDFs which produce excellent error rates and require significantly less computation than current techniques.

4 0.093889974 141 nips-2009-Local Rules for Global MAP: When Do They Work ?

Author: Kyomin Jung, Pushmeet Kohli, Devavrat Shah

Abstract: We consider the question of computing Maximum A Posteriori (MAP) assignment in an arbitrary pair-wise Markov Random Field (MRF). We present a randomized iterative algorithm based on simple local updates. The algorithm, starting with an arbitrary initial assignment, updates it in each iteration by first, picking a random node, then selecting an (appropriately chosen) random local neighborhood and optimizing over this local neighborhood. Somewhat surprisingly, we show that this algorithm finds a near optimal assignment within n log 2 n iterations with high probability for any n node pair-wise MRF with geometry (i.e. MRF graph with polynomial growth) with the approximation error depending on (in a reasonable manner) the geometric growth rate of the graph and the average radius of the local neighborhood – this allows for a graceful tradeoff between the complexity of the algorithm and the approximation error. Through extensive simulations, we show that our algorithm finds extremely good approximate solutions for various kinds of MRFs with geometry.

5 0.092209771 103 nips-2009-Graph Zeta Function in the Bethe Free Energy and Loopy Belief Propagation

Author: Yusuke Watanabe, Kenji Fukumizu

Abstract: We propose a new approach to the analysis of Loopy Belief Propagation (LBP) by establishing a formula that connects the Hessian of the Bethe free energy with the edge zeta function. The formula has a number of theoretical implications on LBP. It is applied to give a sufficient condition that the Hessian of the Bethe free energy is positive definite, which shows non-convexity for graphs with multiple cycles. The formula clarifies the relation between the local stability of a fixed point of LBP and local minima of the Bethe free energy. We also propose a new approach to the uniqueness of LBP fixed point, and show various conditions of uniqueness. 1

6 0.085843481 229 nips-2009-Statistical Analysis of Semi-Supervised Learning: The Limit of Infinite Unlabelled Data

7 0.08196222 10 nips-2009-A Gaussian Tree Approximation for Integer Least-Squares

8 0.079093687 187 nips-2009-Particle-based Variational Inference for Continuous Systems

9 0.075395942 36 nips-2009-Asymptotic Analysis of MAP Estimation via the Replica Method and Compressed Sensing

10 0.074220456 250 nips-2009-Training Factor Graphs with Reinforcement Learning for Efficient MAP Inference

11 0.070490487 167 nips-2009-Non-Parametric Bayesian Dictionary Learning for Sparse Image Representations

12 0.069567889 43 nips-2009-Bayesian estimation of orientation preference maps

13 0.066301115 214 nips-2009-Semi-supervised Regression using Hessian energy with an application to semi-supervised dimensionality reduction

14 0.065180279 61 nips-2009-Convex Relaxation of Mixture Regression with Efficient Algorithms

15 0.062450949 47 nips-2009-Boosting with Spatial Regularization

16 0.059483521 146 nips-2009-Manifold Regularization for SIR with Rate Root-n Convergence

17 0.059369862 129 nips-2009-Learning a Small Mixture of Trees

18 0.054759648 223 nips-2009-Sparse Metric Learning via Smooth Optimization

19 0.050508559 58 nips-2009-Constructing Topological Maps using Markov Random Fields and Loop-Closure Detection

20 0.04891501 5 nips-2009-A Bayesian Model for Simultaneous Image Clustering, Annotation and Object Segmentation


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.136), (1, 0.054), (2, -0.011), (3, 0.021), (4, -0.04), (5, -0.04), (6, 0.041), (7, -0.006), (8, -0.003), (9, -0.096), (10, -0.099), (11, 0.078), (12, 0.062), (13, 0.07), (14, 0.032), (15, -0.048), (16, -0.026), (17, -0.036), (18, 0.032), (19, -0.011), (20, -0.089), (21, -0.135), (22, 0.019), (23, 0.005), (24, 0.019), (25, 0.124), (26, 0.203), (27, 0.019), (28, -0.149), (29, 0.147), (30, -0.145), (31, 0.131), (32, 0.171), (33, 0.045), (34, 0.062), (35, -0.026), (36, 0.014), (37, 0.05), (38, -0.003), (39, 0.006), (40, 0.18), (41, -0.006), (42, -0.098), (43, -0.084), (44, 0.117), (45, 0.014), (46, -0.029), (47, -0.051), (48, -0.002), (49, -0.038)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.97253597 35 nips-2009-Approximating MAP by Compensating for Structural Relaxations

Author: Arthur Choi, Adnan Darwiche

Abstract: We introduce a new perspective on approximations to the maximum a posteriori (MAP) task in probabilistic graphical models, that is based on simplifying a given instance, and then tightening the approximation. First, we start with a structural relaxation of the original model. We then infer from the relaxation its deficiencies, and compensate for them. This perspective allows us to identify two distinct classes of approximations. First, we find that max-product belief propagation can be viewed as a way to compensate for a relaxation, based on a particular idealized case for exactness. We identify a second approach to compensation that is based on a more refined idealized case, resulting in a new approximation with distinct properties. We go on to propose a new class of algorithms that, starting with a relaxation, iteratively seeks tighter approximations. 1

2 0.7527442 103 nips-2009-Graph Zeta Function in the Bethe Free Energy and Loopy Belief Propagation

Author: Yusuke Watanabe, Kenji Fukumizu

Abstract: We propose a new approach to the analysis of Loopy Belief Propagation (LBP) by establishing a formula that connects the Hessian of the Bethe free energy with the edge zeta function. The formula has a number of theoretical implications on LBP. It is applied to give a sufficient condition that the Hessian of the Bethe free energy is positive definite, which shows non-convexity for graphs with multiple cycles. The formula clarifies the relation between the local stability of a fixed point of LBP and local minima of the Bethe free energy. We also propose a new approach to the uniqueness of LBP fixed point, and show various conditions of uniqueness. 1

3 0.67375982 10 nips-2009-A Gaussian Tree Approximation for Integer Least-Squares

Author: Jacob Goldberger, Amir Leshem

Abstract: This paper proposes a new algorithm for the linear least squares problem where the unknown variables are constrained to be in a finite set. The factor graph that corresponds to this problem is very loopy; in fact, it is a complete graph. Hence, applying the Belief Propagation (BP) algorithm yields very poor results. The algorithm described here is based on an optimal tree approximation of the Gaussian density of the unconstrained linear system. It is shown that even though the approximation is not directly applied to the exact discrete distribution, applying the BP algorithm to the modified factor graph outperforms current methods in terms of both performance and complexity. The improved performance of the proposed algorithm is demonstrated on the problem of MIMO detection.

4 0.65546393 31 nips-2009-An LP View of the M-best MAP problem

Author: Menachem Fromer, Amir Globerson

Abstract: We consider the problem of finding the M assignments with maximum probability in a probabilistic graphical model. We show how this problem can be formulated as a linear program (LP) on a particular polytope. We prove that, for tree graphs (and junction trees in general), this polytope has a particularly simple form and differs from the marginal polytope in a single inequality constraint. We use this characterization to provide an approximation scheme for non-tree graphs, by using the set of spanning trees over such graphs. The method we present puts the M -best inference problem in the context of LP relaxations, which have recently received considerable attention and have proven useful in solving difficult inference problems. We show empirically that our method often finds the provably exact M best configurations for problems of high tree-width. A common task in probabilistic modeling is finding the assignment with maximum probability given a model. This is often referred to as the MAP (maximum a-posteriori) problem. Of particular interest is the case of MAP in graphical models, i.e., models where the probability factors into a product over small subsets of variables. For general models, this is an NP-hard problem [11], and thus approximation algorithms are required. Of those, the class of LP based relaxations has recently received considerable attention [3, 5, 18]. In fact, it has been shown that some problems (e.g., fixed backbone protein design) can be solved exactly via sequences of increasingly tighter LP relaxations [13]. In many applications, one is interested not only in the MAP assignment but also in the M maximum probability assignments [19]. For example, in a protein design problem, we might be interested in the M amino acid sequences that are most stable on a given backbone structure [2]. In cases where the MAP problem is tractable, one can devise tractable algorithms for the M best problem [8, 19]. Specifically, for low tree-width graphs, this can be done via a variant of max-product [19]. However, when finding MAPs is not tractable, it is much less clear how to approximate the M best case. One possible approach is to use loopy max-product to obtain approximate max-marginals and use those to approximate the M best solutions [19]. However, this is largely a heuristic and does not provide any guarantees in terms of optimality certificates or bounds on the optimal values. LP approximations to MAP do enjoy such guarantees. Specifically, they provide upper bounds on the MAP value and optimality certificates. Furthermore, they often work for graphs with large tree-width [13]. The goal of the current work is to leverage the power of LP relaxations to the M best case. We begin by focusing on the problem of finding the second best solution. We show how it can be formulated as an LP over a polytope we call the “assignment-excluding marginal polytope”. In the general case, this polytope may require an exponential number of inequalities, but we prove that when the graph is a tree it has a very compact representation. We proceed to use this result to obtain approximations to the second best problem, and show how these can be tightened in various ways. Next, we show how M best assignments can be found by relying on algorithms for 1 second best assignments, and thus our results for the second best case can be used to devise an approximation algorithm for the M best problem. We conclude by applying our method to several models, showing that it often finds the exact M best assignments. 1 The M-best MAP problem and its LP formulation Consider a function on n variables defined as: f (x1 , . . . , xn ; θ) = θij (xi , xj ) + ij∈E θi (xi ) (1) i∈V where V and E are the vertices and nodes of a graph G with n nodes. We shall be interested in the M assignments with largest f (x; θ) value.1 Denote these by x(1) , . . . , x(M) , so that x(1) is the assignment that maximizes f (x; θ), x(2) is the 2nd best assignment, etc. The MAP problem (i.e., finding x(1) ) can be formulated as an LP as follows [15]. Let µ be a vector of distributions that includes {µij (xi , xj )}ij∈E over edge variables and {µi (xi )}i∈V over nodes. The set of µ that arise from some joint distribution is known as the marginal polytope [15] and is denoted by M(G). Formally: M(G) = {µ | ∃p(x) ∈ ∆ s.t. p(xi , xj ) = µij (xi , xj ) , p(xi ) = µi (xi )} . where ∆ is the set of distributions on x. The MAP problem can then be shown to be equivalent to the following LP:2 max f (x; θ) = max µ · θ , (2) x µ∈M(G) It can be shown that this LP always has a maximizing µ that is a vertex of M(G) and is integral. Furthermore, this µ corresponds to the MAP assignment x(1) . Although the number of variables in this LP is only O(|E| + |V |), the difficulty comes from an exponential number of linear inequalities generally required to describe the marginal polytope M(G). We shall find it useful to define a mapping between assignments x and integral vertices of the polytope. Given an integral vertex v ∈ M(G), define x(v) to be the assignment that maximizes vi (xi ). And, given an assignment z define v(z) to be the integral vertex in M(G) corresponding to the assignment z. Thus the LP in Eq. 2 will be maximized by v(x(1) ). One simple outer bound of the marginal polytope is the local polytope ML (G), which only enforces pairwise constraints between variables:     µi (xi ) = 1 (3) µij (xi , xj ) = µj (xj ), µij (xi , xj ) = µi (xi ), ML (G) = µ ≥ 0   x x x i j i The LP relaxation is then to maximize µ · θ where µ ∈ ML (G). For tree structured graphs, ML (G) = M(G) [15] and thus the LP relaxation yields the exact MAP x(1) . An LP Formulation for the 2nd -best MAP 2 Assume we found the MAP assignment x(1) and are now interested in finding x(2) . Is there a simple LP whose solution yields x(2) ? We begin by focusing on the case where G is a tree so that the local LP relaxation is exact. We first treat the case of a connected tree. To construct an LP whose solution is x(2) , a natural approach is to use the LP for x(1) (i.e., the LP in Eq. 2) but somehow eliminate the solution x(1) using additional constraints. This, however, is somewhat trickier than it sounds. The key difficulty is that the new constraints should not generate fractional vertices, so that the resulting LP is still exact. We begin by defining the polytope over which we need to optimize in order to obtain x(2) . 1 2 This is equivalent to finding P maximum probability assignments for a model p(x) ∝ ef (x;θ) . the P P P We use the notation µ · θ = ij∈E xi ,xj µij (xi , xj )θij (xi , xj ) + i xi µi (xi )θi (xi ) 2 Definition 1. The assignment-excluding marginal polytope is defined as: ˆ M(G, z) = {µ | ∃p(x) ∈ ∆ s.t. p(z) = 0, p(xi , xj ) = µij (xi , xj ), p(xi ) = µi (xi )} . ˆ M(G, z) is simply the convex hull of all (integral) vectors v(x) for x = z. (4) ˆ The following result shows that optimizing over M(G, x(1) ) will yield the second best soluˆ tion x(2) , so that we refer to M(G, x(1) ) as the second-best marginal polytope. Lemma 1. The 2nd best solution is obtained via the following LP: maxx=x(1) f (x; θ) = maxµ∈M(G,x(1) ) µ · θ. Furthermore, the µ that maximizes the LP on ˆ the right is integral and corresponds to the second-best MAP assignment x(2) . The proof is similar to that of Eq. 2: instead of optimizing over x, we optimize over distributions p(x), while enforcing that p(x(1) ) = 0 so that x(1) is excluded from the maximization. The key question which we now address is how to obtain a simple characterization of ˆ ˆ M(G, z). Intuitively, it would seems that M(G, z) should be “similar” to M(G), such that it can be described as M(G) plus some constraints that “block” the assignment z. To illustrate the difficulty in finding such “blocking” constraints, consider the following constraint, originally suggested by Santos [10]: i µi (zi ) ≤ n − 1. This inequality is not satisfied by µ = v(z) since v(z) attains the value n for the LHS of the above. Furthermore, for any x = z and µ = v(x), the LHS would be n − 1 or less. Thus, this inequality separates ˆ v(z) from all other integral vertices. One might conclude that we can define M(G, z) by adding this inequality to M(G). The difficulty is that the resulting polytope has fractional vertices,3 and maximizing over it won’t generally yield an integral solution. It turns out that there is a different inequality that does yield an exact characterization of ˆ M(G, z) when G is a tree. We now define this inequality and state our main theorem. Definition 2. Consider the functional I(µ, z) (which is linear in µ): (1 − di )µi (zi ) + I(µ, z) = i µij (zi , zj ) (5) ij∈E where di is the degree of node i in the tree graph G. ˆ Theorem 1. Adding the single inequality I(µ, z) ≤ 0 to M(G) yields M(G, z). ˆ M(G, z) = {µ | µ ∈ M(G), I(µ, z) ≤ 0 } (6) The theorem is proved in the appendix. Taken together with Lemma 1, it implies that x(2) may be obtained via an LP that is very similar to the MAP-LP, but has an additional constraint. We note the interesting similarity between I(µ, z) and the Bethe entropy [20]. The only difference is that in Bethe, µi , µij are replaced by H(Xi ), H(Xi , Xj ) respectively.4 The theorem also generalizes to the case where G is not a tree, but we have a junction tree for G. In this case, the theorem still holds if we define a generalized I(µ, z) inequality as: (1 − dS )µS (zS ) + S∈S µC (zC ) ≤ 0 (7) C∈C where C and S are the junction tree cliques and their separators, respectively, and dS is the number of cliques that intersect on separator S. In this case, the marginal polytope should enforce consistency between marginals µC (zC ) and their separators µS (zS ). However, such a characterization requires variables whose cardinality is exponential in the tree-width and is thus tractable only for graphs of low tree-width. In the next section, we address approximations for general graphs. A corresponding result exists for the case when G is a forest. In this case, the inequality in Eq. 6 is modified to: I(µ, z) ≤ |P | − 1, where |P | denotes the number of connected components of G. Interestingly, for a graph without edges, this gives the Santos inequality. 3 Consider the case of a single edge between 2 nodes where the MAP assignment is (0, 0). Adding the inequality µ1 (0) + µ2 (0) ≤ 1 produces the fractional vertex (0.5, 0.5). 4 The connection to Bethe can be more clearly understood from a duality-based proof of Theorem 1. We will cover this in an extended version of the manuscript. 3 2nd best LPs for general graphs - Spanning tree inequalities 3 When the graph G is not a tree, the marginal polytope M(G) generally requires an exponential number of inequalities. However, as mentioned above, it does have an exact description in terms of marginals over cliques and separators of a junction tree. Given such marginals on ˆ junction tree cliques, we also have an exact characterization of M(G, z) via the constraint in Eq. 7. However, in general, we cannot afford to be exponential in tree-width. Thus a common strategy [15] is to replace M(G) with an outer bound that enforces consistency between marginals on overlapping sets of variables. The simplest example is ML (G) in Eq. 3. ˆ In what follows, we describe an outer-bound approximation scheme for M(G, z). We use ML (G) as the approximation for M(G) (more generally ML (G) can enforce consistency between any set of small regions, e.g., triplets). When G is not a tree, the linear constraint in ˆ Eq. 6 will no longer suffice to derive M(G, z). Moreover, direct application of the inequality will incorrectly remove some integral vertices. An alternative approach is to add inequalities that separate v(z) from the other integral vertices. This will serve to eliminate more and more fractional vertices, and if enough constraints are added, this may result in an integral solution. One obvious family of such constraints are those corresponding to spanning trees in G and have the form of Eq. 5. Definition 3. Consider any T that is a spanning tree of G. Define the functional I T (µ, z): (1 − dT )µi (zi ) + i I T (µ, z) = i µij (zi , zj ) (8) ij∈T where dT is the degree of i in T . We refer to I T (µ, z) ≤ 0 as a spanning tree inequality. i For any sub-tree T of G, the corresponding spanning tree inequality separates the vertex v(z) from the other vertices. This can be shown via similar arguments as in the proof of Theorem 1. Note, however, that the resulting polytope may still have fractional vertices. The above argument shows that any spanning tree provides a separating inequality for ˆ M(G, z). In principle, we would like to use as many such inequalities as possible. Definition 4. The spanning tree assignment-excluding marginal polytope is defined as: ˆ MST (G, z) = µ | µ ∈ ML (G), L ∀ tree T ⊆ E I T (µ, z) ≤ 0 (9) where the ST notation indicates the inclusion of all spanning tree inequalities for G.5 Thus, we would actually like to perform the following optimization problem: max ˆ µ∈MST (G,z) L µ·θ ˆ as an approximation to optimization over M(G, z); i.e., we seek the optimal µ subject to all spanning tree inequalities for G with the ambition that this µ be integral and thus provide the non-z MAP assignment, with a certificate of optimality. Although the number of spanning trees is exponential in n, it turns out that all spanning inequalities can be used in practice. One way to achieve this is via a cutting plane algorithm [12] that finds the most violated spanning tree inequality and adds it to the LP. To implement this efficiently, we note that for a particular µ and a spanning tree T , the value of I T (µ, z) can be decomposed into a sum over the edges in T (and a T -independent constant): I T (µ, z) = µi (zi ) µij (zi , zj ) − µi (zi ) − µj (zj ) + (10) i ij∈T The tree maximizing the above is the maximum-weight spanning tree with edge-weights wij = µij (zi , zj ) − µi (zi ) − µj (zj ). It can thus be found efficiently. The cutting plane algorithm proceeds as follows. We start by adding an arbitrary spanning tree. Then, as long as the optimal µ is fractional, we find the spanning tree inequality that µ most violates (where this is implemented via the maximum-weight spanning tree). This constraint will necessarily remove µ from the polytope. If there are no violated inequalities 5 ˆ ˆL Note that M(G, z) ⊆ MST (G, z) ⊂ ML (G). 4 but µ is still fractional, then spanning tree inequalities do not suffice to find an integral solution (but see below on hypertree constraints to add in this case). In practice, we found that only a relatively small number of inequalities are needed to successfully yield an integral solution, or determine that all such inequalities are already satisfied. An alternative approach for solving the all spanning-tree problem is to work via the dual. The dual variables roughly correspond to points in the spanning tree polytope [16], optimization over which can be done in polynomial time, e.g., via the ellipsoid algorithm. We do not pursue this here since the cutting plane algorithm performed well in our experiments. ˆ As mentioned earlier, we can exactly characterize M(G, z) using Eq. 7, albeit at a cost exponential in the tree-width of the graph. A practical compromise would be to use inequalities over clique trees of G, where the cliques are relatively small, e.g., triplets. The corresponding constraint (Eq. 7 with the small cliques and their separators) will necessarily separate v(z) from the other integral vertices. Finding the maximally violated such inequality is an NP-hard problem, equivalent to a prize collecting Steiner tree problem, but recent work has found that such problems are often exactly solvable in practice [7]. It thus might be practical to include all such trees as constraints using a cutting plane algorithm. 4 From 2nd -best to M-best Thus far, we only dealt with the 2nd best case. As we show now, it turns out that the 2nd -best formalism can be used to devise an algorithm for M best. We begin by describing an algorithm for the exact M best and then show how it can be used to approximate those via the approximations for 2nd best described above. Fig. 1 describes our scheme, which we call Partitioning for Enumerating Solutions (or PES) for solving the M best problem. The scheme is general and only assumes that MAP-“like” problems can be solved. It is inspired by several pre-existing M best solution schemes [4, 6, 8, 19] but differs from them in highlighting the role of finding a second best solution within a given subspace. for m ← 1 to M do if m = 1 then Run MAP solver to obtain the best assignment: x(1) ≡ arg max f (x; θ) CONSTRAINTS1 ← ∅ else k ←− arg max ′ k′ ∈{1,...,m−1} f (y(k ) ; θ) // sub-space containing mth best assignment x(m) ← y(k) // mth best assignment // A variable choice that distinguishes x(m) from x(k) : (m) (v, a) ← any member of the set {(i, xi (m) ) : xi (k) = xi } CONSTRAINTSm ← CONSTRAINTSk ∪ {xv = a} // Eliminate x(k) (as MAP) from subspace m CONSTRAINTSk ← CONSTRAINTSk ∪ {xv = a} // Eliminate x(m) (as 2nd -best) from subspace k y(k) ← CalcNextBestSolution(CONSTRAINTSk , x(k) ) end y(m) ← CalcNextBestSolution(CONSTRAINTSm , x(m) ) end return {x(m) }M m=1 /* Find next best solution in sub-space defined by CONSTRAINTS */ Function CalcNextBestSolution(CONSTRAINTS, x(∗) ) // x(∗) is the MAP in the sub-space defined by CONSTRAINTS: Run MAP solver to obtain the second-best solution: y ≡ arg max f (x; θ), and return y. x=x(∗) ,CONSTRAINTS end Figure 1: Pseudocode for the PES algorithm. The modus operandi of the PES algorithm is to efficiently partition the search space while systematically excluding all previously determined assignments. Significantly, any MAP 5 Attractive Grids Ranks Run-times 1 50 Mixed Grids Ranks Run-times 1 50 0.5 0 S N B 0 Hard Protein SCP Ranks Run-times 1 50 0.5 S N B 0 0 S+R N+R B+R 0.5 S+R N+R B+R 0 S+R B B+R 0 S+R B B+R Figure 2: Number of best ranks and normalized run-times for the attractive and mixed grids, and the more difficult protein SCP problems. S, N, and B denote the STRIPES, Nilsson, and BMMF algorithms. Algorithms marked with +R denote that regions of variables were added for those runs. solver can be plugged into it, on the condition that it is capable of solving the arg max in the CalcNextBestSolution subroutine. The correctness of PES can be shown by observing that at the M th stage, all previous best solutions are excluded from the optimization and no other assignment is excluded. Of note, this simple partitioning scheme is possible due to the observation that the first-best and second-best MAP assignments must differ in the assignment of at least one variable in the graph. The main computational step of the PES algorithm is to maximize f (x; θ) subject to x = x(∗) and x ∈ CONSTRAINTS (see the CalcNextBestSolution subroutine). The CONSTRAINTS set merely enforces that some of the coordinates of x are either equal to or different from specified values.6 Within the LP, these can be enforced by setting µi (xi = a) = 1 or µi (xi = a) = 0. It can be shown that if one optimizes µ · θ with ˆ these constraints and µ ∈ M(G, x(∗) ), the solution is integral. Thus, the only element ˆ requiring approximation in the general case is the description of M(G, x(∗) ). We choose as ˆ this approximation the polytope MST (G, x(∗) ) in Eq. 9. We call the resulting approximaL tion algorithm Spanning TRee Inequalities and Partitioning for Enumerating Solutions, or STRIPES. In the next section, we evaluate this scheme experimentally. 5 Experiments We compared the performance of STRIPES to the BMMF algorithm [19] and the Lawler/Nilsson algorithm [6, 8]. Nilsson’s algorithm is equivalent to PES where the 2nd best assignment is obtained from maximizations within O(n) partitions, so that its runtime is O(n) times the cost of finding a single MAP. Here we approximated each MAP with its LP relaxation (as in STRIPES), so that both STRIPES and Nilsson come with certificates of optimality when their LP solutions are integral. BMMF relies on loopy BP to approximate the M best solutions.7 We used M = 50 in all experiments. To compare the algorithms, we pooled all their solutions, noting the 50 top probabilities, and then counted the fraction of these that any particular algorithm found (its solution rank). For run-time comparisons, we normalized the times by the longest-running algorithm for each example. We begin by considering pairwise MRFs on binary grid graphs of size 10 × 10. In the first experiment, we used an Ising model with attractive (submodular) potentials, a setting in which the pairwise LP relaxation is exact [14]. For each grid edge ij, we randomly chose Jij ∈ [0, 0.5], and local potentials were randomized in the range ±0.5. The results for 25 graphs are shown in Fig. 2. Both the STRIPES and Nilsson algorithms obtained the 50 optimal solutions (as learned from their optimality certificates), while BMMF clearly fared less well for some of the graphs. While the STRIPES algorithm took < 0.5 to 2 minutes to run, the Nilsson algorithm took around 13 minutes. On the other hand, BMMF was quicker, taking around 10 seconds per run, while failing to find a significant portion of the top solutions. Overall, the STRIPES algorithm was required to employ up to 19 spanning tree inequalities per calculation of second-best solution. 6 This is very different from the second best constraint, since setting x1 = 1 blocks all assignments with this value, as opposed to setting x = 1 which blocks only the assignment with all ones. 7 For BMMF, we used the C implementation at http://www.cs.huji.ac.il/~ talyam/ inference.html. The LPs for STRIPES and Nilsson were solved using CPLEX. 6 Next, we studied Ising models with mixed interaction potentials (with Jij and the local potentials randomly chosen in [−0.5, 0.5]). For almost all of the 25 models, all three algorithms were not able to successfully find the top solutions. Thus, we added regions of triplets (two for every grid face) to tighten the LP relaxation (for STRIPES and Nilsson) and to perform GBP instead of BP (for BMMF). This resulted in STRIPES and Nilsson always provably finding the optimal solutions, and BMMF mostly finding these solutions (Fig. 2). For these more difficult grids, however, STRIPES was the fastest of the algorithms, taking 0.5 - 5 minutes. On the other hand, the Nilsson and BMMF algorithms took 18 minutes and 2.5 7 minutes, respectively. STRIPES added up to 23 spanning tree inequalities per iteration. The protein side-chain prediction (SCP) problem is to to predict the placement of amino acid side-chains given a protein backbone [2, 18]. Minimization of a protein energy function corresponds to finding a MAP assignment for a pairwise MRF [19]. We employed the dataset of [18] (up to 45 states per variable, mean approximate tree-width 50), running all algorithms to calculate the optimal side-chain configurations. For 315 of 370 problems in the dataset, the first MAP solution was obtained directly as a result of the LP relaxation having an integral solution (“easy” problems). STRIPES provably found the subsequent top 50 solutions within 4.5 hours for all but one of these cases (up to 8 spanning trees per calculation), and BMMF found the same 50 solutions for each case within 0.5 hours; note that only STRIPES provides a certificate of optimality for these solutions. On the other hand, only for 146 of the 315 problems was the Nilsson method able to complete within five days; thus, we do not compare its performance here. For the remaining 55 (“hard”) problems (Fig. 2), we added problem-specific triplet regions using the MPLP algorithm [13]. We then ran the STRIPES algorithm to find the optimal solutions. Surprisingly, it was able to exactly find the 50 top solutions for all cases, using up to 4 standard spanning tree inequalities per second-best calculation. The STRIPES run-times for these problems ranged from 6 minutes to 23 hours. On the other hand, whether running BMMF without these regions (BP) or with the regions (GBP), it did not perform as well as STRIPES in terms of the number of high-ranking solutions or its speed. To summarize, STRIPES provably found the top 50 solutions for 369 of the 370 protein SCP problems. 6 Conclusion ˆ In this work, we present a novel combinatorial object M(G, z) and show its utility in obtaining the M best MAP assignments. We provide a simple characterization of it for tree structured graphs, and show how it can be used for approximations in non-tree graphs. As with the marginal polytope, many interesting questions arise about the properties of ˆ M(G, z). For example, in which non-tree cases can we provide a compact characterization (e.g., as for the cut-polytope for planar graphs [1]). Another compelling question is in which problems the spanning tree inequalities are provably optimal. An interesting generalization of our method is to predict diverse solutions satisfying some local measure of “distance” from each other, e.g., as in [2]. Here we studied the polytope that results from excluding one assignment. An intriguing question is to characterize the polytope that excludes M assignments. We have found that it does not simply correspond to adding M constraints I(µ, z i ) ≤ 0 for i = 1, . . . , M , so its ˆ geometry is apparently more complicated than that of M(G, z). Here we used LP solvers to solve for µ. Such generic solvers could be slow for large-scale problems. However, in recent years, specialized algorithms have been suggested for solving MAP-LP relaxations [3, 5, 9, 17]. These use the special form of the constraints to obtain local-updates and more scalable algorithms. We intend to apply these schemes to our method. Finally, our empirical results show that our method indeed leverages the power of LP relaxations and yields exact M best optimal solutions for problems with large tree-width. Acknowledgements We thank Nati Linial for his helpful discussions and Chen Yanover and Talya Meltzer for their insight and help in running BMMF. We also thank the anonymous reviewers for their useful advice. 7 A Proof of Theorem 1 Recall that for any µ ∈ M(G), there exists a probability density p(x) s.t. µ = x p(x)v(x). Denote pµ (z) as the minimal value of p(z) among all p(x) that give µ. We prove that ˆ pµ (z) = max(0, I(µ, z)), from which the theorem follows (since pµ (z) = 0 iff µ ∈ M(G, z)). The proof is by induction on n. For n = 1, the node has degree 0, so I(µ, z) = µ1 (z1 ). Clearly, pµ (z) = µ1 (z1 ), so pµ (z) = I(µ, z). For n > 1, there must exist a leaf in G ˆ (assume that its index is n and its neighbor’s is n − 1). Denote G as the tree obtained ˆ by removing node n and its edge with n − 1. For any assignment x, denote x as the corresponding sub-assignment for the first n − 1 variables. Also, any µ can be derived by ˆ ˆ adding appropriate coordinates to a unique µ ∈ M(G). For an integral vertex µ = v(x), ˆˆ ˆ ˆ ˆ ˆ x denote its projected µ as v (ˆ ). Denote by I(µ, z ) the functional in Eq. 5 applied to G. For ˆ any µ and its projected µ, it can be seen that: ˆˆ ˆ I(µ, z) = I(µ, z ) − α (11) where we define α = xn =zn µn−1,n (zn−1 , xn ) (so 0 ≤ α ≤ 1). The inductive assumption ˆ ˆ ˆ gives a p(ˆ ) that has marginals µ and also p(ˆ ) = max(0, I(µ, z )). We next use p(ˆ ) to ˆx ˆz ˆx construct a p(x) that has marginals µ and the desired minimal pµ (z). Consider three cases: ˆˆ ˆ I. I(µ, z) ≤ 0 and I(µ, z ) ≤ 0. From the inductive assumption, pµ (ˆ ) = 0, so we define: ˆˆ z µn−1,n (xn−1 , xn ) p(x) = p(ˆ ) ˆx (12) µn−1 (xn−1 ) which indeed marginalizes to µ, and p(z) = 0 so that pµ (z) = 0 as required. If µn−1 (xn−1 ) = 0, then p(ˆ ) is necessarily 0, in which case we define p(x) = 0. Note that this construction ˆx is identical to that used in proving that ML (G) = M(G) for a tree graph G. ˆˆ ˆ II. I(µ, z) > 0. Based on Eq. 11 and α ≥ 0, we have I(µ, z ) > 0. Applying the inductive ˆ µ, z ) = pµ (ˆ ) > 0. Now, define p(x) so that p(z) = I(µ, z): ˆ assumption to µ, we obtain I( ˆ ˆ ˆˆ z xl , l ≤ n − 2 δ(xn−1 = zn−1 ) δ(xn = zn ) p(x) no constraint 0 no constraint As in Eq. 12 0 0 ∃ l x l = zl 1 ∀ l x l = zl 1 µn−1,n (zn−1 , xn ) 1 1 p(ˆ ) ˆx 0 I(µ, z) Simple algebra shows that p(x) is non-negative and has µ as marginals. We now show that p(z) is minimal. Based on the inductive assumption and Eq. 11, it can easily be shown that I(v(z), z) = 1, I(v(x), z) ≤ 0 for x = z. For any p(x) s.t. µ = x p(x)v(x), from linearity, I(µ, z) = p(z) + x=z p(x)I(v(x), z) ≤ p(z) (since I(v(x), z) ≤ 0 for x = z). Since the p(z) we define achieves this lower bound, it is clearly minimal. ˆˆ ˆ ˆ III. I(µ, z) ≤ 0 but I(µ, z ) > 0. Applying the inductive assumption to µ, we see that ˆ µ, z ) > 0; Eq. 11 implies α − I(µ, z ) ≥ 0. Define β = µn−1 (zn−1 ) − pµ (ˆ ), which ˆˆ ˆ ˆˆ z pµ (ˆ ) = I( ˆ ˆ ˆˆ z ˆ is non-negative since µn−1 (zn−1 ) = µn−1 (ˆ n−1 ) and p marginalizes to µ. Define p(x) as: ˆ z ˆ xl , l ≤ n − 2 δ(xn−1 = zn−1 ) δ(xn = zn ) no constraint 0 no constraint ∃ l x l = zl As in Eq. 12 0 ˆ ˆ z µ (z ,x ) p(ˆ ) n−1,n βn−1 n α−I(µ,ˆ ) ˆx α µ (z ,z ) p(ˆ ) n−1,n βn−1 n ˆx (z ,x ) ˆˆ ˆ µ I(µ, z ) n−1,n αn−1 n 1 0 0 1 1 ∀ l x l = zl p(x) 1 which indeed marginalizes to µ, and p(z) = 0 so that pµ (z) = 0, as required. 8 References [1] F. Barahona. On cuts and matchings in planar graphs. Math. Program., 60(1):53–68, 1993. [2] M. Fromer and C. Yanover. Accurate prediction for atomic-level protein design and its application in diversifying the near-optimal sequence space. Proteins: Structure, Function, and Bioinformatics, 75:682–705, 2009. [3] A. Globerson and T. Jaakkola. Fixing max-product: Convergent message passing algorithms for MAP LP-relaxations. In J. Platt, D. Koller, Y. Singer, and S. Roweis, editors, Advances in Neural Information Processing Systems 21. MIT Press, Cambridge, MA, 2007. [4] E. Kloppmann, G. M. Ullmann, and T. Becker. An extended dead-end elimination algorithm to determine gap-free lists of low energy states. Journal of Comp. Chem., 28:2325–2335, 2007. [5] N. Komodakis and N. Paragios. Beyond loose LP-relaxations: Optimizing MRFs by repairing cycles. In D. Forsyth, P. Torr, and A. Zisserman, editors, ECCV, pages 806–820, Heidelberg, Germany, 2008. Springer. [6] E. L. Lawler. A procedure for computing the K best solutions to discrete optimization problems and its application to the shortest path problem. Management Science, 18(7):401–405, 1972. [7] I. Ljubic, R. Weiskircher, U. Pferschy, G. W. Klau, P. Mutzel, and M. Fischetti. An algorithmic framework for the exact solution of the prize-collecting steiner tree problem. Mathematical Programming, 105:427–449, Feb 2006. [8] D. Nilsson. An efficient algorithm for finding the M most probable configurations in probabilistic expert systems. Statistics and Computing, 8:159–173, Jun 1998. [9] P. Ravikumar, A. Agarwal, and M. Wainwright. Message-passing for graph-structured linear programs: proximal projections, convergence and rounding schemes. In Proc. of the 25th international conference on Machine learning, pages 800–807, New York, NY, USA, 2008. ACM. [10] E. Santos. On the generation of alternative explanations with implications for belief revision. In Proc. of the 7th Annual Conference on Uncertainty in Artificial Intelligence, 1991. [11] Y. Shimony. Finding the MAPs for belief networks is NP-hard. 68(2):399–410, 1994. Aritifical Intelligence, [12] D. Sontag and T. Jaakkola. New outer bounds on the marginal polytope. In J. Platt, D. Koller, Y. Singer, and S. Roweis, editors, Advances in Neural Information Processing Systems 20, pages 1393–1400. MIT Press, Cambridge, MA, 2007. [13] D. Sontag, T. Meltzer, A. Globerson, T. Jaakkola, and Y. Weiss. Tightening LP relaxations for MAP using message passing. In Proc. of the 24th Annual Conference on Uncertainty in Artificial Intelligence, pages 503–510, 2008. [14] B. Taskar, S. Lacoste-Julien, and M. I. Jordan. Structured prediction, dual extragradient and bregman projections. J. Mach. Learn. Res., 7:1627–1653, 2006. [15] M. Wainwright and M. Jordan. Graphical models, exponential families, and variational inference. Found. Trends Mach. Learn., 1(1-2):1–305, 2008. [16] M. J. Wainwright, T. Jaakkola, and A. S. Willsky. A new class of upper bounds on the log partition function. IEEE Transactions on Information Theory, 51(7):2313–2335, 2005. [17] T. Werner. A linear programming approach to max-sum problem: A review. IEEE Trans. Pattern Anal. Mach. Intell., 29(7):1165–1179, 2007. [18] C. Yanover, T. Meltzer, and Y. Weiss. Linear programming relaxations and belief propagation – an empirical study. Journal of Machine Learning Research, 7:1887–1907, 2006. [19] C. Yanover and Y. Weiss. Finding the M most probable configurations using loopy belief propagation. In Advances in Neural Information Processing Systems 16. MIT Press, Cambridge, MA, 2004. [20] J. Yedidia, W. W.T. Freeman, and Y. Weiss. Constructing free-energy approximations and generalized belief propagation algorithms. IEEE Trans. on Information Theory, 51(7):2282– 2312, 2005. 9

5 0.51596171 227 nips-2009-Speaker Comparison with Inner Product Discriminant Functions

Author: Zahi Karam, Douglas Sturim, William M. Campbell

Abstract: Speaker comparison, the process of finding the speaker similarity between two speech signals, occupies a central role in a variety of applications—speaker verification, clustering, and identification. Speaker comparison can be placed in a geometric framework by casting the problem as a model comparison process. For a given speech signal, feature vectors are produced and used to adapt a Gaussian mixture model (GMM). Speaker comparison can then be viewed as the process of compensating and finding metrics on the space of adapted models. We propose a framework, inner product discriminant functions (IPDFs), which extends many common techniques for speaker comparison—support vector machines, joint factor analysis, and linear scoring. The framework uses inner products between the parameter vectors of GMM models motivated by several statistical methods. Compensation of nuisances is performed via linear transforms on GMM parameter vectors. Using the IPDF framework, we show that many current techniques are simple variations of each other. We demonstrate, on a 2006 NIST speaker recognition evaluation task, new scoring methods using IPDFs which produce excellent error rates and require significantly less computation than current techniques.

6 0.50915217 36 nips-2009-Asymptotic Analysis of MAP Estimation via the Replica Method and Compressed Sensing

7 0.46362057 141 nips-2009-Local Rules for Global MAP: When Do They Work ?

8 0.44393778 129 nips-2009-Learning a Small Mixture of Trees

9 0.43875363 146 nips-2009-Manifold Regularization for SIR with Rate Root-n Convergence

10 0.39165071 47 nips-2009-Boosting with Spatial Regularization

11 0.38974249 131 nips-2009-Learning from Neighboring Strokes: Combining Appearance and Context for Multi-Domain Sketch Recognition

12 0.37729052 7 nips-2009-A Data-Driven Approach to Modeling Choice

13 0.37570608 30 nips-2009-An Integer Projected Fixed Point Method for Graph Matching and MAP Inference

14 0.35194135 187 nips-2009-Particle-based Variational Inference for Continuous Systems

15 0.33615124 160 nips-2009-Multiple Incremental Decremental Learning of Support Vector Machines

16 0.3302502 214 nips-2009-Semi-supervised Regression using Hessian energy with an application to semi-supervised dimensionality reduction

17 0.31640673 105 nips-2009-Grouped Orthogonal Matching Pursuit for Variable Selection and Prediction

18 0.31402183 173 nips-2009-Nonparametric Greedy Algorithms for the Sparse Learning Problem

19 0.30960086 1 nips-2009-$L 1$-Penalized Robust Estimation for a Class of Inverse Problems Arising in Multiview Geometry

20 0.2865223 229 nips-2009-Statistical Analysis of Semi-Supervised Learning: The Limit of Infinite Unlabelled Data


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(21, 0.044), (24, 0.057), (25, 0.056), (35, 0.088), (36, 0.091), (39, 0.028), (41, 0.251), (58, 0.141), (61, 0.016), (71, 0.031), (81, 0.019), (86, 0.056), (91, 0.015)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.79437506 35 nips-2009-Approximating MAP by Compensating for Structural Relaxations

Author: Arthur Choi, Adnan Darwiche

Abstract: We introduce a new perspective on approximations to the maximum a posteriori (MAP) task in probabilistic graphical models, that is based on simplifying a given instance, and then tightening the approximation. First, we start with a structural relaxation of the original model. We then infer from the relaxation its deficiencies, and compensate for them. This perspective allows us to identify two distinct classes of approximations. First, we find that max-product belief propagation can be viewed as a way to compensate for a relaxation, based on a particular idealized case for exactness. We identify a second approach to compensation that is based on a more refined idealized case, resulting in a new approximation with distinct properties. We go on to propose a new class of algorithms that, starting with a relaxation, iteratively seeks tighter approximations. 1

2 0.62391883 254 nips-2009-Variational Gaussian-process factor analysis for modeling spatio-temporal data

Author: Jaakko Luttinen, Alexander T. Ihler

Abstract: We present a probabilistic factor analysis model which can be used for studying spatio-temporal datasets. The spatial and temporal structure is modeled by using Gaussian process priors both for the loading matrix and the factors. The posterior distributions are approximated using the variational Bayesian framework. High computational cost of Gaussian process modeling is reduced by using sparse approximations. The model is used to compute the reconstructions of the global sea surface temperatures from a historical dataset. The results suggest that the proposed model can outperform the state-of-the-art reconstruction systems.

3 0.61762512 100 nips-2009-Gaussian process regression with Student-t likelihood

Author: Jarno Vanhatalo, Pasi Jylänki, Aki Vehtari

Abstract: In the Gaussian process regression the observation model is commonly assumed to be Gaussian, which is convenient in computational perspective. However, the drawback is that the predictive accuracy of the model can be significantly compromised if the observations are contaminated by outliers. A robust observation model, such as the Student-t distribution, reduces the influence of outlying observations and improves the predictions. The problem, however, is the analytically intractable inference. In this work, we discuss the properties of a Gaussian process regression model with the Student-t likelihood and utilize the Laplace approximation for approximate inference. We compare our approach to a variational approximation and a Markov chain Monte Carlo scheme, which utilize the commonly used scale mixture representation of the Student-t distribution. 1

4 0.61061323 224 nips-2009-Sparse and Locally Constant Gaussian Graphical Models

Author: Jean Honorio, Dimitris Samaras, Nikos Paragios, Rita Goldstein, Luis E. Ortiz

Abstract: Locality information is crucial in datasets where each variable corresponds to a measurement in a manifold (silhouettes, motion trajectories, 2D and 3D images). Although these datasets are typically under-sampled and high-dimensional, they often need to be represented with low-complexity statistical models, which are comprised of only the important probabilistic dependencies in the datasets. Most methods attempt to reduce model complexity by enforcing structure sparseness. However, sparseness cannot describe inherent regularities in the structure. Hence, in this paper we first propose a new class of Gaussian graphical models which, together with sparseness, imposes local constancy through 1 -norm penalization. Second, we propose an efficient algorithm which decomposes the strictly convex maximum likelihood estimation into a sequence of problems with closed form solutions. Through synthetic experiments, we evaluate the closeness of the recovered models to the ground truth. We also test the generalization performance of our method in a wide range of complex real-world datasets and demonstrate that it captures useful structures such as the rotation and shrinking of a beating heart, motion correlations between body parts during walking and functional interactions of brain regions. Our method outperforms the state-of-the-art structure learning techniques for Gaussian graphical models both for small and large datasets. 1

5 0.61053002 30 nips-2009-An Integer Projected Fixed Point Method for Graph Matching and MAP Inference

Author: Marius Leordeanu, Martial Hebert, Rahul Sukthankar

Abstract: Graph matching and MAP inference are essential problems in computer vision and machine learning. We introduce a novel algorithm that can accommodate both problems and solve them efficiently. Recent graph matching algorithms are based on a general quadratic programming formulation, which takes in consideration both unary and second-order terms reflecting the similarities in local appearance as well as in the pairwise geometric relationships between the matched features. This problem is NP-hard, therefore most algorithms find approximate solutions by relaxing the original problem. They find the optimal continuous solution of the modified problem, ignoring during optimization the original discrete constraints. Then the continuous solution is quickly binarized at the end, but very little attention is put into this final discretization step. In this paper we argue that the stage in which a discrete solution is found is crucial for good performance. We propose an efficient algorithm, with climbing and convergence properties, that optimizes in the discrete domain the quadratic score, and it gives excellent results either by itself or by starting from the solution returned by any graph matching algorithm. In practice it outperforms state-or-the art graph matching algorithms and it also significantly improves their performance if used in combination. When applied to MAP inference, the algorithm is a parallel extension of Iterated Conditional Modes (ICM) with climbing and convergence properties that make it a compelling alternative to the sequential ICM. In our experiments on MAP inference our algorithm proved its effectiveness by significantly outperforming [13], ICM and Max-Product Belief Propagation. 1

6 0.60948396 61 nips-2009-Convex Relaxation of Mixture Regression with Efficient Algorithms

7 0.60728335 31 nips-2009-An LP View of the M-best MAP problem

8 0.60690987 158 nips-2009-Multi-Label Prediction via Sparse Infinite CCA

9 0.60560429 62 nips-2009-Correlation Coefficients are Insufficient for Analyzing Spike Count Dependencies

10 0.60460448 228 nips-2009-Speeding up Magnetic Resonance Image Acquisition by Bayesian Multi-Slice Adaptive Compressed Sensing

11 0.60436726 163 nips-2009-Neurometric function analysis of population codes

12 0.60399127 36 nips-2009-Asymptotic Analysis of MAP Estimation via the Replica Method and Compressed Sensing

13 0.60302866 147 nips-2009-Matrix Completion from Noisy Entries

14 0.60068887 173 nips-2009-Nonparametric Greedy Algorithms for the Sparse Learning Problem

15 0.59966123 104 nips-2009-Group Sparse Coding

16 0.5976516 81 nips-2009-Ensemble Nystrom Method

17 0.59678388 1 nips-2009-$L 1$-Penalized Robust Estimation for a Class of Inverse Problems Arising in Multiview Geometry

18 0.59586245 257 nips-2009-White Functionals for Anomaly Detection in Dynamical Systems

19 0.59443504 223 nips-2009-Sparse Metric Learning via Smooth Optimization

20 0.59426361 195 nips-2009-Probabilistic Relational PCA