nips nips2003 nips2003-32 knowledge-graph by maker-knowledge-mining

32 nips-2003-Approximate Expectation Maximization

Source: pdf

Author: Tom Heskes, Onno Zoeter, Wim Wiegerinck

Abstract: We discuss the integration of the expectation-maximization (EM) algorithm for maximum likelihood learning of Bayesian networks with belief propagation algorithms for approximate inference. Specifically we propose to combine the outer-loop step of convergent belief propagation algorithms with the M-step of the EM algorithm. This then yields an approximate EM algorithm that is essentially still double loop, with the important advantage of an inner loop that is guaranteed to converge. Simulations illustrate the merits of such an approach. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Specifically we propose to combine the outer-loop step of convergent belief propagation algorithms with the M-step of the EM algorithm. [sent-2, score-0.748]

2 This then yields an approximate EM algorithm that is essentially still double loop, with the important advantage of an inner loop that is guaranteed to converge. [sent-3, score-0.415]

3 1 Introduction The EM (expectation-maximization) algorithm [1, 2] is a popular method for maximum likelihood learning in probabilistic models with hidden variables. [sent-5, score-0.141]

4 The E-step boils down to computing probabilities of the hidden variables given the observed variables (evidence) and current set of parameters. [sent-6, score-0.248]

5 A complication may arise in the E-step, when computing the probability of the hidden variables given the evidence becomes intractable. [sent-9, score-0.176]

6 An often used approach is to replace the exact yet intractable inference in the Estep with approximate inference, either through sampling or using a deterministic variational method. [sent-10, score-0.354]

7 The use of a "mean-field" variational method in this context leads to an algorithm known as variational EM and can be given the interpretation of minimizing a free energy with respect to both a tractable approximate distribution (approximate E-step) and the parameters (M-step) [2]. [sent-11, score-0.792]

8 Loopy belief propagation [3] and variants thereof, such as generalized belief propagation [4] and expectation propagation [5], have become popular alternatives to the "mean-field" variational approaches, often yielding somewhat better approximations. [sent-12, score-1.507]

9 And indeed, they can and have been applied for approximate inference in the E-step of the EM algorithm (see e. [sent-13, score-0.232]

10 A possible worry, however, is that standard application of these belief propagation algorithms does not always lead to convergence. [sent-16, score-0.482]

11 So-called double-loop algorithms with convergence guarantees have been derived, such as CCCP [8] and UPS [9], but they tend to be an order of magnitude slower than standard belief propagation~ . [sent-17, score-0.25]

12 The goal of this article is to integrate expectation-maximization with belief propagation. [sent-18, score-0.268]

13 As for variational EM, this integration relies on the free-energy interpretation of EM that is reviewed in Section 2. [sent-19, score-0.111]

14 In Section 3 we describe how the exact free energy can be approximated with a Kikuchi free energy and how this leads to an approximate EM algorithm. [sent-20, score-0.982]

15 Section 4 contains our main result: integrating the outer-loop of a convergent double-loop algorithm with the M-step, we are left with an overall double-loop algorithm, where the inner loop is now a convex constrained optimization problem with a unique solution. [sent-21, score-0.696]

16 2 The free energy interpretation of EM We consider probabilistic models P(x; fJ), with fJ the model parameters to be learned and x the variables in the model. [sent-23, score-0.565]

17 We subdivide the variables into hidden variables h and observed, evidenced variables e. [sent-24, score-0.217]

18 In maximum likelihood learning, the goal is to find the parameters fJ that maximize the likelihood P(e; fJ) or, equivalently, that minimize minus the loglikelihood L(O) = -log pee; 0) = -log [~p(e, h; 0)] . [sent-26, score-0.145]

19 However, in complex probability models P(hle; fJ) can be difficult and even intractable to compute, mainly because of the normalization in the denominator. [sent-32, score-0.102]

20 For later purposes we note that the EM algorithm can be interpreted as a general "bound optimization algorithm" [10]. [sent-33, score-0.161]

21 In this interpretation the free energy F(Q,B) is an upper bound on the function L(B) that we try to minimize; the E-step corresponds to a reset of the bound and the M-step to the minimization of the upper bound. [sent-34, score-0.777]

22 Note that this restriction affects both the energy term and the entropy term. [sent-37, score-0.346]

23 By construction the approximate minQEpl F(Q, B) is an upper bound on L(B). [sent-38, score-0.218]

24 3 Approximate free energies In several studies, propagation algorithms like loopy belief propagation [6J and expectation propagation [7] have been applied to find approximate solutions for the E-step. [sent-39, score-1.571]

25 As we will see, the corresponding approximate EM-algorithm can be interpreted as alternate minimization of a Bethe or Kikuchi free energy. [sent-40, score-0.45]

26 For the moment, we will consider the case of loopy and generalized belief propagation applied to probability models with just discrete variables. [sent-41, score-0.699]

27 The generalization to expectation propagation is discussed in Section 6. [sent-42, score-0.342]

28 For a Bayesian a network, the energy term simplifies into a sum over local terms: E(Q,B) == - LLQ(ha)log'1ia(ha,ea;Ba). [sent-47, score-0.227]

29 a hex However, the entropy term is as intractable as the normalization in (3) that we try to prevent. [sent-48, score-0.187]

30 The subsets indexed by f3 correspond to intersections between the subsets indexed by a, intersections of intersections, and so on. [sent-51, score-0.238]

31 The parameters clJ are called Moebius or overcounting numbers. [sent-52, score-0.139]

32 The Kikuchi/Bethe approximation is exact if the a-clusters form a singly-connected structure. [sent-59, score-0.107]

33 That is, exact inference is obtained when the a-clusters correspond to cliques in a junction tree. [sent-60, score-0.157]

34 The f3 subsets then play the role of the separators and have overcounting numbers 1 - nlJ with n{J the number of neighboring cliq~es. [sent-61, score-0.156]

35 There are different kinds of approximations (Bethe, CVM, junction graphs), each corresponding to a somewhat different choice of a-clusters, f3-subsets and overcounting numbers (see [4] for an overview). [sent-63, score-0.134]

36 The important point is that the approximate entropy is, like the energy, a sum of local terms. [sent-65, score-0.191]

37 Furthermore, the Kikuchi free energy as a function of the probability distribution Q only depends on the marginals Q(xa:) and Q(xf3). [sent-66, score-0.471]

38 The minimization of the exact free energy with respect to a probability distribution Q has been turned into the minimization of the Kikuchi free energy F(Q,()) == E (Q, ()) -8(Q) with respect to a set of pseudo-marginals Q == {Q a: , Qf3 }. [sent-67, score-1.008]

39 Because the entropy does not depend on the parameters (), the M-step of the approximate EM algorithm is completely equivalent to the M-step of the exact EM algorithm. [sent-69, score-0.345]

40 In other words, the seemingly naive procedure of using generalized or loopy belief propagation to compute the statistics in the E-step and use it in the M-step, can be interpreted as alternate minimization of the Kikuchi approximation of the exact free energy. [sent-71, score-1.282]

41 That is, algorithm (5) can be interpreted as a bound optimization algorithm for minimizing L(8) == miI! [sent-72, score-0.332]

42 4 Constrained optimization There are two kinds of approaches for finding the minimum of the Kikuchi free energy. [sent-74, score-0.236]

43 The first one is to run loopy or generalized belief propagation, e. [sent-75, score-0.432]

44 In the following we will refer to the use of standard belief propagation in the E-step as the "naive algorithm". [sent-79, score-0.482]

45 Recently, there have been derived double-loop algorithms that explicitly minimize the Kikuchi free energy [8, 9, 11]. [sent-80, score-0.409]

46 Technically, finding the minimum of the Kikuchi free energy with respect to consistent marginals corresponds to a non-convex constrained optimization problem. [sent-81, score-0.565]

47 The consistency and normalization constraints on the marginals are linear in Q and so is the energy term E (Q, 8). [sent-82, score-0.38]

48 The non-convexity stems from the entropy terms and specifically those with negative overcounting numbers. [sent-83, score-0.238]

49 Most currently described techniques, such as CCCP [8], UPS [9] and variants thereof, can be understood as general bound optimization algorithms. [sent-84, score-0.202]

50 In CCCP concave terms are bounded with a linear term, yielding a convex bound and thus, in combination with the linear constraints, a convex optimization problem to be solved in the inner loop. [sent-85, score-0.51]

51 G(Q,R,8) with G(Q,R,8) ==F(Q,(}) +'K(Q,R) , REP (6) Algorithm 1 Generalized belief propagation. [sent-87, score-0.215]

52 By construction G(Q, R, 0) is convex in Q - the concave QIJ log QIJ terms in F(Q, 0) cancel with those in K (Q, R) - as well as an upper bound on F(Q, B) since K(Q, R) ~ O. [sent-93, score-0.256]

53 The now convex optimization problem in the inner loop can be solved with a message passing algorithm very similar to standard loopy or generalized belief propagation. [sent-94, score-0.822]

54 In fact, we can use Algorithm 1, with clJ == 0 and after a slight redefinition of the potentials Wa such that they incorporate the linear bound of the concave entropy terms (see [11] for details). [sent-95, score-0.272]

55 The messages in this algorithm are in one-to-one correspondence with the Lagrange multipliers of the concave dual. [sent-96, score-0.134]

56 Most importantly, with the particular scheduling in Algorithm 1, each update is guaranteed to increase the dual and therefore the inner-loop algorithm must converge to its unique solution. [sent-97, score-0.101]

57 The outer loop simply sets R == Q and corresponds to a reset of the bound. [sent-98, score-0.226]

58 In principle (see however the next section) the algorithmic complexity of the convergent algorithm is- the same as that of the naive algorithm. [sent-102, score-0.457]

59 60 80 100 outer loops (a) Coupled hidden Markov model. [sent-124, score-0.224]

60 (a) Architecture for 3 time slice Qa'1d 4 hidden nodes per time slice. [sent-127, score-0.164]

61 (b) 11inus the loglikelihood in the Kikuchi/Bethe approximation as a function of the number of M-steps. [sent-128, score-0.126]

62 Naive algorithm (solid line), convergent algorithm (dashed), convergent algorithm with tighter bound and overrelaxation (dash-dotted), same for a Kikuchi approximation (dotted). [sent-129, score-1.008]

63 5 'Simulations For illustration, we compare the naive and convergent approximate EM algorithms for learning in a coupled hidden Markov model. [sent-131, score-0.674]

64 The architecture of coupled hidden Markov models is sketched in Figure l(a) for T == 3 time slices and M == 4 hiddenvariable nodes per time slice. [sent-132, score-0.252]

65 The parameters to be learned are the observation matrix p(em,t == ilhm,t == j) and two transition matrices: p(h1,t+l == ilh1,t == j, h 2 ,t == k) == p(hM,t+l == ilhM,t == j, hM-l,t == k) for the outer nodes and p(hm,t+l == ilhm-1,t == j, hm,t == k, h m+1 ,t == l) for the middle nodes. [sent-134, score-0.189]

66 In the inner loop of both the naive and the convergent algorithm, Algorithm 1 was run for 10 iterations. [sent-138, score-0.606]

67 Loopy belief propagation, which for dynamic Bayesian networks can be interpreted as an iterative version of the Boyen-Koller algorithm [12], converged just fine for the many instances that we have seen. [sent-139, score-0.365]

68 The naive algorithm nicely minimizes the Bethe approximation of minus the loglikelihood L(O), as can be seen from the solid line in Figure 1(b). [sent-140, score-0.348]

69 The Bethe approximation is fairly accurate in this model and plots of the exact loglikelihood, both those learned with exact and with approximate EM, are very similar (not shown). [sent-141, score-0.304]

70 The convergent algorithm also works fine, but takes more time to converge (dashed line). [sent-142, score-0.325]

71 be expected: the additional bound implied by the outer-loop E-step makes G(Q,R,(}) a looser bound of L((}) than F(Q, (}) and the tighter the bound in a bound optimization algorithm, the faster the convergence. [sent-144, score-0.586]

72 Therefore, it makes sense to use tighter convex bounds on F(Q, (}), for example those derived in [Ill. [sent-145, score-0.119]

73 It gives an indication of how close one can bring the convergent to the naive algorithm (overrelaxation applied to the M-step affects both algorithms in the same way and is therefore not considered here). [sent-155, score-0.491]

74 Another option is to repeat the inner and outer E-steps N times, before updating the parameters in the M-step. [sent-156, score-0.216]

75 Plots for N ~ 3 are indistinguishable from the solid line for the naive algorithm. [sent-157, score-0.132]

76 The above shows that the price to be paid for an algorithm that is guaranteed to converge is relatively low. [sent-158, score-0.101]

77 Obviously, the true value of the convergent algorithm becomes clear when the naive algorithm fails. [sent-159, score-0.516]

78 Many instances of non-convergence of loopy and especially generalized belief propagation have been reported (see e. [sent-160, score-0.699]

79 [3, 11] and [12] specifically on coupled hidden Markov models). [sent-162, score-0.221]

80 In the context of the coupled hidden Markov models we observed serious problems with generalized belief propagation. [sent-164, score-0.451]

81 For example, with a-clusters of size 12, consisting of 3 neighboring hidden and evidence nodes in two subsequent time slices, we did not manage to get the naive algorithm to converge properly. [sent-165, score-0.361]

82 6 Discussion The main idea of this article, that there is no need to run a converging doubleloop algorithm in an approximate E-step until convergence, only applies to directed probabilistic graphical models like Bayesian networks. [sent-167, score-0.216]

83 For this so-called partition function, the bound used in converging double-loop algorithms works in the opposite direction as the bound implicit in the EM algorithm. [sent-170, score-0.275]

84 The convex bound of [13] does work in the right direction, but cannot (yet) handle missing values. [sent-171, score-0.181]

85 In [14] standard loopy belief propagation is used in the inner loop of iterative proportional fitting (IPF). [sent-172, score-0.875]

86 Also here it is not yet clear how to integrate IPF with convergent belief propagation without ending up with a triple-loop algorithm. [sent-173, score-0.748]

87 Following the same line of reasoning, expectation maximization can be combined with expectation propagation (EP) [5]. [sent-174, score-0.417]

88 EP can be understood as a generalization of loopy belief propagation. [sent-175, score-0.402]

89 Besides neglecting possible loops in the gI;'aphical structure, expectation propagation can also handle projections onto an exponential family of distributions. [sent-176, score-0.404]

90 The approximate free energy for EP is the same Bethe free energy, only the constraints are different. [sent-177, score-0.739]

91 That is, the "strong" marginalization constraints (4) are replaced by the "weak" marginalization constraints that all These constraints are still linear subsets marginals agree upon their moments. [sent-178, score-0.32]

92 in Qa and Q{3 and we can make the same decomposition (6) of the Bethe free energy into a convex and a concave term to derive a double-loop algorithm with a convex optimization problem in the inner loop. [sent-179, score-0.834]

93 It has been emphasized before [13] that it makes no sense to learn with approxi- mate inference and then apply exact inference given the learned parameters. [sent-183, score-0.225]

94 The intuition is that we tune the parameters to the evidence, incorporating the errors that are made while doing approximate inference. [sent-184, score-0.143]

95 In that context it is important that the results of approximate inference are reproducable and the use of convergent algorithms is a relevant step in that direction. [sent-185, score-0.439]

96 'Loopy belief propagation for approximate inference: An empirical study. [sent-203, score-0.588]

97 Constructing free energy approximations and generalized belief propagation algorithms. [sent-210, score-0.957]

98 CCCP algorithms to minimize the Bethe and Kikuchi free energies: Convergent alternatives to belief propagation. [sent-233, score-0.43]

99 The factored frontier algorithm for approximate inference in DBNs. [sent-255, score-0.232]

100 Tree-reweighted belief propagation algorithms and approximate ML estimation via pseudo-moment matching. [sent-261, score-0.588]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('kikuchi', 0.271), ('propagation', 0.267), ('convergent', 0.266), ('em', 0.253), ('fj', 0.24), ('energy', 0.227), ('belief', 0.215), ('free', 0.182), ('bethe', 0.178), ('loopy', 0.151), ('naive', 0.132), ('fix', 0.12), ('bound', 0.112), ('loop', 0.109), ('qij', 0.108), ('approximate', 0.106), ('overcounting', 0.102), ('cccp', 0.102), ('xlj', 0.102), ('inner', 0.099), ('qa', 0.093), ('coupled', 0.088), ('argrnine', 0.088), ('ba', 0.088), ('overrelaxation', 0.088), ('xa', 0.086), ('entropy', 0.085), ('hidden', 0.082), ('outer', 0.08), ('loglikelihood', 0.077), ('boils', 0.076), ('concave', 0.075), ('expectation', 0.075), ('wa', 0.074), ('ep', 0.074), ('variational', 0.07), ('convex', 0.069), ('inference', 0.067), ('generalized', 0.066), ('minimization', 0.066), ('intersections', 0.065), ('loops', 0.062), ('marginals', 0.062), ('algorithm', 0.059), ('argminf', 0.059), ('hle', 0.059), ('ipf', 0.059), ('mii', 0.059), ('nijmegen', 0.059), ('qep', 0.059), ('ups', 0.059), ('exact', 0.058), ('xij', 0.056), ('subsets', 0.054), ('optimization', 0.054), ('article', 0.053), ('intractable', 0.053), ('specifically', 0.051), ('converging', 0.051), ('clj', 0.051), ('heskes', 0.051), ('thereof', 0.051), ('ij', 0.05), ('tighter', 0.05), ('normalization', 0.049), ('approximation', 0.049), ('evidence', 0.049), ('alternate', 0.048), ('interpreted', 0.048), ('bayesian', 0.047), ('variables', 0.045), ('properly', 0.044), ('fine', 0.043), ('slice', 0.043), ('slices', 0.043), ('constraints', 0.042), ('guaranteed', 0.042), ('interpretation', 0.041), ('argmin', 0.041), ('energies', 0.041), ('constrained', 0.04), ('nodes', 0.039), ('marginalization', 0.039), ('solve', 0.038), ('reset', 0.037), ('parameters', 0.037), ('understood', 0.036), ('murphy', 0.036), ('convergence', 0.035), ('affects', 0.034), ('fitting', 0.034), ('implied', 0.034), ('teh', 0.034), ('alternatives', 0.033), ('learned', 0.033), ('yielding', 0.032), ('markov', 0.032), ('junction', 0.032), ('ex', 0.032), ('minus', 0.031)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000004 32 nips-2003-Approximate Expectation Maximization

Author: Tom Heskes, Onno Zoeter, Wim Wiegerinck

2 0.20738663 80 nips-2003-Generalised Propagation for Fast Fourier Transforms with Partial or Missing Data

Author: Amos J. Storkey

Abstract: Discrete Fourier transforms and other related Fourier methods have been practically implementable due to the fast Fourier transform (FFT). However there are many situations where doing fast Fourier transforms without complete data would be desirable. In this paper it is recognised that formulating the FFT algorithm as a belief network allows suitable priors to be set for the Fourier coeﬃcients. Furthermore eﬃcient generalised belief propagation methods between clusters of four nodes enable the Fourier coeﬃcients to be inferred and the missing data to be estimated in near to O(n log n) time, where n is the total of the given and missing data points. This method is compared with a number of common approaches such as setting missing data to zero or to interpolation. It is tested on generated data and for a Fourier analysis of a damaged audio signal. 1

3 0.17917728 189 nips-2003-Tree-structured Approximations by Expectation Propagation

Author: Yuan Qi, Tom Minka

Abstract: Approximation structure plays an important role in inference on loopy graphs. As a tractable structure, tree approximations have been utilized in the variational method of Ghahramani & Jordan (1997) and the sequential projection method of Frey et al. (2000). However, belief propagation represents each factor of the graph with a product of single-node messages. In this paper, belief propagation is extended to represent factors with tree approximations, by way of the expectation propagation framework. That is, each factor sends a “message” to all pairs of nodes in a tree structure. The result is more accurate inferences and more frequent convergence than ordinary belief propagation, at a lower cost than variational trees or double-loop algorithms. 1

4 0.14731492 174 nips-2003-Semidefinite Relaxations for Approximate Inference on Graphs with Cycles

Author: Michael I. Jordan, Martin J. Wainwright

Abstract: We present a new method for calculating approximate marginals for probability distributions deﬁned by graphs with cycles, based on a Gaussian entropy bound combined with a semideﬁnite outer bound on the marginal polytope. This combination leads to a log-determinant maximization problem that can be solved by efﬁcient interior point methods [8]. As with the Bethe approximation and its generalizations [12], the optimizing arguments of this problem can be taken as approximations to the exact marginals. In contrast to Bethe/Kikuchi approaches, our variational problem is strictly convex and so has a unique global optimum. An additional desirable feature is that the value of the optimal solution is guaranteed to provide an upper bound on the log partition function. In experimental trials, the performance of the log-determinant relaxation is comparable to or better than the sum-product algorithm, and by a substantial margin for certain problem classes. Finally, the zero-temperature limit of our log-determinant relaxation recovers a class of well-known semideﬁnite relaxations for integer programming [e.g., 3]. 1

5 0.14603689 29 nips-2003-Applying Metric-Trees to Belief-Point POMDPs

Author: Joelle Pineau, Geoffrey J. Gordon, Sebastian Thrun

Abstract: Recent developments in grid-based and point-based approximation algorithms for POMDPs have greatly improved the tractability of POMDP planning. These approaches operate on sets of belief points by individually learning a value function for each point. In reality, belief points exist in a highly-structured metric simplex, but current POMDP algorithms do not exploit this property. This paper presents a new metric-tree algorithm which can be used in the context of POMDP planning to sort belief points spatially, and then perform fast value function updates over groups of points. We present results showing that this approach can reduce computation in point-based POMDP algorithms for a wide range of problems. 1

6 0.14231992 117 nips-2003-Linear Response for Approximate Inference

7 0.12221294 47 nips-2003-Computing Gaussian Mixture Models with EM Using Equivalence Constraints

8 0.11920859 142 nips-2003-On the Concentration of Expectation and Approximate Inference in Layered Networks

9 0.11295025 169 nips-2003-Sample Propagation

10 0.11055621 74 nips-2003-Finding the M Most Probable Configurations using Loopy Belief Propagation

11 0.10958523 33 nips-2003-Approximate Planning in POMDPs with Macro-Actions

12 0.10439552 35 nips-2003-Attractive People: Assembling Loose-Limbed Models using Non-parametric Belief Propagation

13 0.10327482 22 nips-2003-An Improved Scheme for Detection and Labelling in Johansson Displays

14 0.10013 100 nips-2003-Laplace Propagation

15 0.095054857 50 nips-2003-Denoising and Untangling Graphs Using Degree Priors

16 0.093024246 193 nips-2003-Variational Linear Response

17 0.090009116 152 nips-2003-Pairwise Clustering and Graphical Models

18 0.087024495 94 nips-2003-Information Maximization in Noisy Channels : A Variational Approach

19 0.083177693 196 nips-2003-Wormholes Improve Contrastive Divergence

20 0.081960119 42 nips-2003-Bounded Finite State Controllers

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.229), (1, -0.022), (2, -0.129), (3, 0.273), (4, 0.112), (5, -0.237), (6, 0.134), (7, 0.014), (8, 0.004), (9, 0.024), (10, -0.003), (11, 0.027), (12, -0.033), (13, 0.0), (14, -0.08), (15, -0.011), (16, 0.025), (17, 0.058), (18, 0.011), (19, -0.025), (20, -0.016), (21, 0.042), (22, 0.013), (23, -0.059), (24, 0.063), (25, 0.055), (26, -0.062), (27, -0.024), (28, -0.087), (29, 0.011), (30, 0.028), (31, 0.024), (32, -0.043), (33, 0.025), (34, 0.038), (35, 0.078), (36, 0.086), (37, -0.072), (38, -0.041), (39, -0.002), (40, 0.007), (41, -0.026), (42, -0.019), (43, 0.029), (44, -0.017), (45, -0.024), (46, 0.066), (47, 0.034), (48, -0.006), (49, -0.022)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.97721791 32 nips-2003-Approximate Expectation Maximization

Author: Tom Heskes, Onno Zoeter, Wim Wiegerinck

2 0.81824273 80 nips-2003-Generalised Propagation for Fast Fourier Transforms with Partial or Missing Data

Author: Amos J. Storkey

3 0.6950317 189 nips-2003-Tree-structured Approximations by Expectation Propagation

Author: Yuan Qi, Tom Minka

4 0.6740334 74 nips-2003-Finding the M Most Probable Configurations using Loopy Belief Propagation

Author: Chen Yanover, Yair Weiss

Abstract: Loopy belief propagation (BP) has been successfully used in a number of diﬃcult graphical models to ﬁnd the most probable conﬁguration of the hidden variables. In applications ranging from protein folding to image analysis one would like to ﬁnd not just the best conﬁguration but rather the top M . While this problem has been solved using the junction tree formalism, in many real world problems the clique size in the junction tree is prohibitively large. In this work we address the problem of ﬁnding the M best conﬁgurations when exact inference is impossible. We start by developing a new exact inference algorithm for calculating the best conﬁgurations that uses only max-marginals. For approximate inference, we replace the max-marginals with the beliefs calculated using max-product BP and generalized BP. We show empirically that the algorithm can accurately and rapidly approximate the M best conﬁgurations in graphs with hundreds of variables. 1

5 0.5968951 169 nips-2003-Sample Propagation

Author: Mark A. Paskin

Abstract: Rao–Blackwellization is an approximation technique for probabilistic inference that ﬂexibly combines exact inference with sampling. It is useful in models where conditioning on some of the variables leaves a simpler inference problem that can be solved tractably. This paper presents Sample Propagation, an efﬁcient implementation of Rao–Blackwellized approximate inference for a large class of models. Sample Propagation tightly integrates sampling with message passing in a junction tree, and is named for its simple, appealing structure: it walks the clusters of a junction tree, sampling some of the current cluster’s variables and then passing a message to one of its neighbors. We discuss the application of Sample Propagation to conditional Gaussian inference problems such as switching linear dynamical systems. 1

6 0.57375538 29 nips-2003-Applying Metric-Trees to Belief-Point POMDPs

7 0.57043242 142 nips-2003-On the Concentration of Expectation and Approximate Inference in Layered Networks

8 0.55553514 35 nips-2003-Attractive People: Assembling Loose-Limbed Models using Non-parametric Belief Propagation

9 0.53580087 22 nips-2003-An Improved Scheme for Detection and Labelling in Johansson Displays

10 0.53484792 174 nips-2003-Semidefinite Relaxations for Approximate Inference on Graphs with Cycles

11 0.49405101 117 nips-2003-Linear Response for Approximate Inference

12 0.48246303 33 nips-2003-Approximate Planning in POMDPs with Macro-Actions

13 0.47768354 152 nips-2003-Pairwise Clustering and Graphical Models

14 0.46053365 50 nips-2003-Denoising and Untangling Graphs Using Degree Priors

15 0.45805767 100 nips-2003-Laplace Propagation

16 0.4514468 94 nips-2003-Information Maximization in Noisy Channels : A Variational Approach

17 0.44931003 42 nips-2003-Bounded Finite State Controllers

18 0.41920584 30 nips-2003-Approximability of Probability Distributions

19 0.40860105 155 nips-2003-Perspectives on Sparse Bayesian Learning

20 0.40381536 193 nips-2003-Variational Linear Response

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.019), (30, 0.014), (35, 0.581), (53, 0.053), (71, 0.067), (76, 0.043), (85, 0.056), (91, 0.052), (99, 0.034)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.97835588 32 nips-2003-Approximate Expectation Maximization

Author: Tom Heskes, Onno Zoeter, Wim Wiegerinck

2 0.95432359 70 nips-2003-Fast Algorithms for Large-State-Space HMMs with Applications to Web Usage Analysis

Author: Pedro F. Felzenszwalb, Daniel P. Huttenlocher, Jon M. Kleinberg

Abstract: In applying Hidden Markov Models to the analysis of massive data streams, it is often necessary to use an artiﬁcially reduced set of states; this is due in large part to the fact that the basic HMM estimation algorithms have a quadratic dependence on the size of the state set. We present algorithms that reduce this computational bottleneck to linear or near-linear time, when the states can be embedded in an underlying grid of parameters. This type of state representation arises in many domains; in particular, we show an application to traﬃc analysis at a high-volume Web site. 1

3 0.94481051 1 nips-2003-1-norm Support Vector Machines

Author: Ji Zhu, Saharon Rosset, Robert Tibshirani, Trevor J. Hastie

Abstract: The standard 2-norm SVM is known for its good performance in twoclass classi£cation. In this paper, we consider the 1-norm SVM. We argue that the 1-norm SVM may have some advantage over the standard 2-norm SVM, especially when there are redundant noise features. We also propose an ef£cient algorithm that computes the whole solution path of the 1-norm SVM, hence facilitates adaptive selection of the tuning parameter for the 1-norm SVM. 1

4 0.67777711 91 nips-2003-Inferring State Sequences for Non-linear Systems with Embedded Hidden Markov Models

Author: Radford M. Neal, Matthew J. Beal, Sam T. Roweis

Abstract: We describe a Markov chain method for sampling from the distribution of the hidden state sequence in a non-linear dynamical system, given a sequence of observations. This method updates all states in the sequence simultaneously using an embedded Hidden Markov Model (HMM). An update begins with the creation of “pools” of candidate states at each time. We then deﬁne an embedded HMM whose states are indexes within these pools. Using a forward-backward dynamic programming algorithm, we can efﬁciently choose a state sequence with the appropriate probabilities from the exponentially large number of state sequences that pass through states in these pools. We illustrate the method in a simple one-dimensional example, and in an example showing how an embedded HMM can be used to in effect discretize the state space without any discretization error. We also compare the embedded HMM to a particle smoother on a more substantial problem of inferring human motion from 2D traces of markers. 1

5 0.61675328 122 nips-2003-Margin Maximizing Loss Functions

Author: Saharon Rosset, Ji Zhu, Trevor J. Hastie

Abstract: Margin maximizing properties play an important role in the analysis of classi£cation models, such as boosting and support vector machines. Margin maximization is theoretically interesting because it facilitates generalization error analysis, and practically interesting because it presents a clear geometric interpretation of the models being built. We formulate and prove a suf£cient condition for the solutions of regularized loss functions to converge to margin maximizing separators, as the regularization vanishes. This condition covers the hinge loss of SVM, the exponential loss of AdaBoost and logistic regression loss. We also generalize it to multi-class classi£cation problems, and present margin maximizing multiclass versions of logistic regression and support vector machines. 1

6 0.6123426 146 nips-2003-Online Learning of Non-stationary Sequences

7 0.58000004 123 nips-2003-Markov Models for Automated ECG Interval Analysis

8 0.56510234 97 nips-2003-Iterative Scaled Trust-Region Learning in Krylov Subspaces via Pearlmutter's Implicit Sparse Hessian

9 0.5510354 101 nips-2003-Large Margin Classifiers: Convex Loss, Low Noise, and Convergence Rates

10 0.5417884 100 nips-2003-Laplace Propagation

11 0.54134029 72 nips-2003-Fast Feature Selection from Microarray Expression Data via Multiplicative Large Margin Algorithms

12 0.54029584 117 nips-2003-Linear Response for Approximate Inference

13 0.53266001 158 nips-2003-Policy Search by Dynamic Programming

14 0.53006047 50 nips-2003-Denoising and Untangling Graphs Using Degree Priors

15 0.52942926 169 nips-2003-Sample Propagation

16 0.51887655 189 nips-2003-Tree-structured Approximations by Expectation Propagation

17 0.51225942 69 nips-2003-Factorization with Uncertainty and Missing Data: Exploiting Temporal Coherence

18 0.50255203 116 nips-2003-Linear Program Approximations for Factored Continuous-State Markov Decision Processes

19 0.50053465 131 nips-2003-Modeling User Rating Profiles For Collaborative Filtering

20 0.49671441 75 nips-2003-From Algorithmic to Subjective Randomness