74 nips-2003-Finding the M Most Probable Configurations using Loopy Belief Propagation

Chen Yanover, Yair Weiss

Abstract: Loopy belief propagation (BP) has been successfully used in a number of difficult graphical models to find the most probable configuration of the hidden variables. In applications ranging from protein folding to image analysis one would like to find not just the best configuration but rather the top M . While this problem has been solved using the junction tree formalism, in many real world problems the clique size in the junction tree is prohibitively large. In this work we address the problem of finding the M best configurations when exact inference is impossible. We start by developing a new exact inference algorithm for calculating the best configurations that uses only max-marginals. For approximate inference, we replace the max-marginals with the beliefs calculated using max-product BP and generalized BP. We show empirically that the algorithm can accurately and rapidly approximate the M best configurations in graphs with hundreds of variables. 1

1 il Abstract Loopy belief propagation (BP) has been successfully used in a number of difficult graphical models to find the most probable configuration of the hidden variables. [sent-4, score-0.282]

2 In applications ranging from protein folding to image analysis one would like to find not just the best configuration but rather the top M . [sent-5, score-0.173]

3 While this problem has been solved using the junction tree formalism, in many real world problems the clique size in the junction tree is prohibitively large. [sent-6, score-0.362]

4 In this work we address the problem of finding the M best configurations when exact inference is impossible. [sent-7, score-0.135]

5 We start by developing a new exact inference algorithm for calculating the best configurations that uses only max-marginals. [sent-8, score-0.256]

6 For approximate inference, we replace the max-marginals with the beliefs calculated using max-product BP and generalized BP. [sent-9, score-0.13]

7 We show empirically that the algorithm can accurately and rapidly approximate the M best configurations in graphs with hundreds of variables. [sent-10, score-0.171]

8 1 Introduction Considerable progress has been made in the field of approximate inference using techniques such as variational methods [7], Monte-Carlo methods [5], mini-bucket elimination [4] and belief propagation (BP) [6]. [sent-11, score-0.275]

9 These techniques allow approximate solutions to various inference tasks in graphical models where building a junction tree is infeasible due to the exponentially large clique size. [sent-12, score-0.349]

10 The inference tasks that have been considered include calculating marginal probabilities, finding the most likely configuration, and evaluating or bounding the log likelihood. [sent-13, score-0.154]

11 In this paper we consider an inference task that has not been tackled with the same tools of approximate inference: calculating the M most probable configurations (MPCs). [sent-14, score-0.3]

12 As a motivating example, consider the protein folding task known as the side-chain prediction problem. [sent-16, score-0.108]

13 In our previous work [17], we showed how to find the minimal-energy side-chain configuration using approximate inference in a graphical model. [sent-17, score-0.151]

14 The graph has 300 nodes and the clique size in a junction tree calculated using standard software [10] can be up to an order of 1042 , so that exact inference is obviously impossible. [sent-18, score-0.393]

15 We showed that loopy max-product belief propagation (BP) achieved excellent results in finding the first MPC for this graph. [sent-19, score-0.381]

16 But we are also interested in finding the second best configuration, the third best or, more generally, the top M configurations. [sent-21, score-0.092]

17 The problem of finding the M MPCs has been successfully solved within the junction tree (JT) framework. [sent-23, score-0.164]

18 However, to the best of our knowledge, there has been no equivalent solution when building a junction tree is infeasible. [sent-24, score-0.191]

19 A simple solution would be outputting the top M configurations that are generated by a Monte-Carlo simulation or by a local search algorithm from multiple initializations. [sent-25, score-0.108]

20 Alternatively, one can attempt to use more sophisticated heuristically guided search methods (such as A∗ ) or use exact MPCs algorithms on an approximated, reduced size junction tree [4, 1]. [sent-27, score-0.237]

21 We start by showing why the standard algorithm [11] for calculating the top M MPCs cannot be used in graphs with cycles. [sent-30, score-0.189]

22 We then introduce a novel algorithm called Best Max-Marginal First (BMMF) and show that when the max-marginals are exact it provably finds the M MPCs. [sent-31, score-0.099]

23 We show simulation results of BMMF in graphs where exact inference is impossible, with excellent performance on challenging graphical models with hundreds of variables. [sent-32, score-0.212]

24 Let mk = (mk (1), mk (2), · · · , mk (N )) denote the k th MPC. [sent-34, score-0.741]

25 Pearl, Dawid and others [12, 3, 11] have shown that this configuration can be calculated using a quantity known as max-marginals (MMs): max marginal(i, j) = max Pr(X = x|y) x:x(i)=j (1) Max-marginal lemma: If there exists a unique MAP assignment m1 (i. [sent-36, score-0.294]

26 Pr(X = m1 |y) > Pr(X = x|y), ∀x = m1 ) then x1 defined by x1 (i) = arg maxj max marginal(i, j) will recover the MAP assignment, m1 = x1 . [sent-38, score-0.105]

27 When the graph is a tree, the MMs can be calculated exactly using max-product belief propagation [16, 15, 12] using two passes: one up the tree and the other down the tree. [sent-41, score-0.333]

28 Similarly, for an arbitrary graph they can be calculated exactly using two passes of max-propagation in the junction tree [2, 11, 3]. [sent-42, score-0.305]

29 A more efficient algorithm for calculating m1 requires only one pass of maxpropagation. [sent-43, score-0.121]

30 After calculating the max-marginal exactly at the root node, the MAP assignment m1 can be calculated by tracing back the pointers that were used during the max-propagation [11]. [sent-44, score-0.309]

31 Figure 1a illustrates this traceback operation in the Viterbi algorithm in HMMs [13] (the pairwise potentials favor configurations where neighboring nodes have different values). [sent-45, score-0.231]

32 After calculating messages from left x(3) = 1 x(2) x(3) 1 )= x(2 = 0 ) x(2 x(2) = 0 x(1) x(1 ) x(1 = 0 )= 1 x(1) x(3) x(2) x(3) = 1 x(3) = 0 a b Figure 1: a. [sent-46, score-0.085]

33 The MAP configuration can be calculated by a forward message passing scheme followed by a backward “traceback”. [sent-48, score-0.087]

34 The same traceback operation applied to a loopy graph may give inconsistent results. [sent-50, score-0.461]

35 These traceback operations, however, are problematic in loopy graphs. [sent-54, score-0.436]

36 After setting x1 (3) = 1 we traceback and find x1 (2) = 0, x1 (1) = 1 and finally x1 (3) = 0, which is obviously inconsistent with our initial choice. [sent-56, score-0.22]

37 One advantage of using traceback is that it can recover m1 even if there are “ties” in the MMs, i. [sent-57, score-0.195]

38 The proof is a special case of the proof we present for claim 2 in the next section. [sent-64, score-0.116]

39 1 The Simplified Max-Flow Propagation Algorithm Nilsson’s Simplified Max-Flow Propagation (SMFP) [11] starts by calculating the MMs and using the max-marginal lemma to find m1 . [sent-69, score-0.157]

40 Since m2 must differ from m1 in at least one variable, the algorithm defines N conditioning sets, Ci (x(1) = m1 (1), x(2) = m1 (2), · · · , x(i−1) = m1 (i−1), x(i) = m1 (i)). [sent-70, score-0.086]

41 It then uses the maxmarginal lemma to find the most probable configuration given each conditioning set, xi = arg maxx Pr(X = x|y, Ci ) and finally m2 = arg maxx∈{xi } Pr(X = x|y). [sent-71, score-0.41]

42 Since the conditioning sets form a partition, it is easy to show that the algorithm finds m2 after N calculations of the MMs. [sent-72, score-0.125]

43 Similarly, to find mk the algorithm uses the fact that mk must differ from m1 , m2 , · · · , mk−1 in at least one variable and forms a new set of up to N conditioning sets. [sent-73, score-0.58]

44 Using the max-marginal lemma one can find the MPC given each of these new conditioning sets. [sent-74, score-0.122]

45 This gives up to N new candidates, in addition to (k − 1)(N − 1) previously calculated candidates. [sent-75, score-0.087]

46 most probable candidate out of these k(N − 1) + 1 is guaranteed to be mk . [sent-77, score-0.35]

47 He suggested an algorithm that uses traceback operations to reduce the computation significantly. [sent-79, score-0.231]

48 Since traceback operations are problematic in loopy graphs, we now present a novel algorithm that does not use traceback but may require far less calculation of the MMs compared to SMFP. [sent-80, score-0.691]

49 2 A novel algorithm: Best Max-Marginal First For simplicity of exposition, we will describe the BMMF algorithm under what we call the strict order assumption, that no two configurations have exactly the same probability. [sent-82, score-0.087]

50 In the first iteration, t = 1, we start by calculating the MMs, and using the max-marginal lemma we find m1 . [sent-86, score-0.157]

51 We now search the max-marginal table for the next best maxmarginal value. [sent-87, score-0.1]

52 We then add the complementary constraint x(3) = 1 to the originating constraints set and calculate the MMs. [sent-93, score-0.092]

53 We now add the constraint x(1) = 0 to the constraints set from t = 1, calculate the MMs and use the max-marginal lemma to find x3 = 0001. [sent-96, score-0.121]

54 Finally, we add the complementary constraint x(1) = 0 to the originating constraints set and calculate the MMs. [sent-97, score-0.092]

55 Claim 2: x2 calculated by the BMMF algorithm is equal to the second MPC m2 . [sent-101, score-0.123]

56 Then, after iteration k, the collection {SAT1 , SAT2 , · · · , SATk } is a partition of the assignment space. [sent-111, score-0.193]

57 For k = 2, SAT1 = {x|x(i2 ) = j2 } and SAT2 = {x|x(i2 ) = j2 } are mutually disjoint and SAT1 ∪ SAT2 covers the assignment space, therefore {SAT1 , SAT2 } is a partition of the assignment space. [sent-114, score-0.265]

58 Assume that after iteration k − 1, {SAT1 , SAT2 , · · · , SATk−1 } is a partition of the assignment space. [sent-115, score-0.193]

59 Note that in iteration k, we add CONSTRAINTSk = CONSTRAINTSsk ∪ {(x(ik ) = jk )} and modify CONSTRAINTSsk = CONSTRAINTSsk ∪ {(x(ik ) = jk )}, while keeping all other constraints set unchanged. [sent-116, score-0.305]

60 Since after itera- tion k − 1 {SAT1 , SAT2 , · · · , SATk−1 } is a partition of the assignment space, so is {SAT1 , SAT2 , · · · , SATk }. [sent-118, score-0.155]

61 Claim 3: xk , the configuration calculated by the algorithm in iteration k, is mk , the kth MPC. [sent-119, score-0.408]

62 Proof: First, note that SCOREsk (ik , jk ) ≤ SCOREsk−1 (ik−1 , jk−1 ), otherwise (ik , jk , sk ) would have been chosen in iteration k − 1. [sent-120, score-0.282]

63 Following the partition lemma, each assignment arises at most once. [sent-121, score-0.155]

64 By the strict order assumption, this means that SCOREsk (ik , jk ) < SCOREsk−1 (ik−1 , jk−1 ). [sent-122, score-0.122]

65 We know that mk differs from all previous xs in at least one location. [sent-124, score-0.281]

66 In particular, mk must differ from xs∗ in at least one location. [sent-125, score-0.247]

67 We want to show that SCOREs∗ (i∗ , j∗ ) = Pr(X = mk |y). [sent-127, score-0.247]

68 Now suppose there exists ml , l ≤ k − 1 such that ml ∈ SATs∗ and ml (i∗ ) = j ∗ . [sent-130, score-0.112]

69 Since (i∗ , j ∗ , s∗ ) ∈ USEDk this would mean that SCOREsk (ik , jk ) ≥ SCOREsk−1 (ik−1 , jk−1 ) / which is a contradiction. [sent-131, score-0.122]

70 Therefore mk is the most probable assignment that satisfies CONSTRAINTSs∗ and has the value j ∗ at location i∗ . [sent-132, score-0.46]

71 A consequence of claim 3 is that BMMF will find the top M MPCs using 2M calculations of max marginals. [sent-134, score-0.165]

72 In real world loopy problems, especially when N M , this can lead to drastically different run times. [sent-136, score-0.241]

73 3 Approximate MPCs algorithms using loopy BP We now compare 4 approximate MPCs algorithms: 1. [sent-141, score-0.284]

74 2 with the MMs based on the beliefs computed by loopy max-product BP or max-GBP: SCOREk (i, j) = Pr(X = xk |y) BEL(i, j|CONSTRAINTSk ) maxj BEL(i, j|CONSTRAINTSk ) (14) 2. [sent-144, score-0.268]

75 This is just Nilsson’s SMFP algorithm with the MMs calculated using loopy max-product BP. [sent-146, score-0.364]

76 We collect all configurations encountered during a greedy optimization of the posterior probability (this is just Gibbs sampling at zero temperature) and output the top M of these. [sent-152, score-0.121]

77 All four algorithms were implemented in Matlab and the number of iterations for greedy and Gibbs were chosen so that the run times would be the same as that of loopy BMMF. [sent-153, score-0.299]

78 Gibbs sampling started from m1 , the most probable assignment, and the greedy local search algorithm initialized to an assignment “similar” to m 1 (1% of the variables were chosen randomly and their values flipped). [sent-154, score-0.366]

79 For the protein folding problem [17], we used a database consisting of 325 proteins, each gives rise to a graphical model with hundreds of variables and many loops. [sent-155, score-0.182]

80 5 5 10 15 20 25 30 35 Configuration Number 40 45 50 5 10 15 20 25 Configuration Number Figure 3: The configurations found by loopy-BMMF compared to those obtained using Gibbs sampling and greedy local search for a large toy-QMR model (right) and a 32 × 32 spin glass model (right). [sent-164, score-0.225]

81 compared the top 100 correct configurations obtained by the A∗ heuristic search algorithm [8] to those found by loopy BMMF algorithm, using BP. [sent-165, score-0.349]

82 In all cases where A∗ was feasible, loopy BMMF always found the correct configurations. [sent-166, score-0.241]

83 We then assessed the performance of the BMMF algorithm for a couple of relatively small problems, where exact inference was possible. [sent-170, score-0.173]

84 For both a small toy-QMR model (with 20 diseases and 50 symptoms) and a 8 × 8 spin glass model the BMMF algorithm obtained the correct MPCs. [sent-171, score-0.175]

85 Finally, we compared the performance of the algorithms for couple of hard problems — a large toy-QMR model (with 100 diseases and 200 symptoms) and 32 × 32 spin glass model with large pairwise interactions. [sent-172, score-0.168]

86 For the toy-QMR model, the MPCs calculated by the BMMF algorithm were better than those calculated by Gibbs sampling (Figure 3, left). [sent-173, score-0.235]

87 Gibbs results are worse than those of the greedy search and therefore not shown). [sent-177, score-0.092]

88 Note that finding the second MPC using the simple MFP algorithm requires a week, while the loopy BMMF calculated the 25 MPCs in few hours only. [sent-178, score-0.364]

89 However, in many real-world applications exact inference is impossible and approximate techniques are needed. [sent-180, score-0.151]

90 We have presented a new algorithm, called Best Max-Marginal First that will provably solve the problem if MMs can be calculated exactly. [sent-182, score-0.087]

91 We have shown that the algorithm continues to perform well when the MMs are approximated using max-product loopy BP or GBP. [sent-183, score-0.277]

92 The success of loopy BMMF suggests that in some cases the max product loopy BP gives a good numerical approximation to the true MMs. [sent-185, score-0.518]

93 Most existing analysis of loopy max-product [16, 15] has focused on the configurations found by the algorithm. [sent-186, score-0.241]

94 It would be interesting to extend the analysis to bound the approximate MMs which in turn would lead to a provable approximate MPCs algorithm. [sent-187, score-0.086]

95 While we have used loopy BP to approximate the MMs, any approximate inference can be used inside BMMF to derive a novel, approximate MPCs algorithm. [sent-188, score-0.439]

96 [14] can be shown to give the MAP assignment when it converges. [sent-190, score-0.11]

97 Applications of a general propagation algorithm for probabilistic expert systems. [sent-207, score-0.122]

98 Exploring the conformational space of protein side chains using dead-end elimination and the A* algorithm. [sent-241, score-0.08]

99 An efficient algorithm for finding the M most probable configurations in probabilistic expert systems. [sent-255, score-0.139]

100 On the optimality of solutions of the max-product belief propagation algorithm in arbitrary graphs. [sent-287, score-0.176]

similar papers computed by tfidf model

