nips nips2009 nips2009-187 knowledge-graph by maker-knowledge-mining

187 nips-2009-Particle-based Variational Inference for Continuous Systems

Source: pdf

Author: Andrew Frank, Padhraic Smyth, Alexander T. Ihler

Abstract: Since the development of loopy belief propagation, there has been considerable work on advancing the state of the art for approximate inference over distributions deﬁned on discrete random variables. Improvements include guarantees of convergence, approximations that are provably more accurate, and bounds on the results of exact inference. However, extending these methods to continuous-valued systems has lagged behind. While several methods have been developed to use belief propagation on systems with continuous values, recent advances for discrete variables have not as yet been incorporated. In this context we extend a recently proposed particle-based belief propagation algorithm to provide a general framework for adapting discrete message-passing algorithms to inference in continuous systems. The resulting algorithms behave similarly to their purely discrete counterparts, extending the beneﬁts of these more advanced inference techniques to the continuous domain. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract Since the development of loopy belief propagation, there has been considerable work on advancing the state of the art for approximate inference over distributions deﬁned on discrete random variables. [sent-15, score-0.585]

2 While several methods have been developed to use belief propagation on systems with continuous values, recent advances for discrete variables have not as yet been incorporated. [sent-18, score-0.587]

3 In this context we extend a recently proposed particle-based belief propagation algorithm to provide a general framework for adapting discrete message-passing algorithms to inference in continuous systems. [sent-19, score-0.713]

4 The resulting algorithms behave similarly to their purely discrete counterparts, extending the beneﬁts of these more advanced inference techniques to the continuous domain. [sent-20, score-0.338]

5 Early examples of the use of graph structure for inference include join or junction trees [1] for exact inference, Markov chain Monte Carlo (MCMC) methods [2], and variational methods such as mean ﬁeld and structured mean ﬁeld approaches [3]. [sent-22, score-0.277]

6 Belief propagation (BP), originally proposed by Pearl [1], has gained in popularity as a method of approximate inference, and in the last decade has led to a number of more sophisticated algorithms based on conjugate dual formulations and free energy approximations [4, 5, 6]. [sent-23, score-0.384]

7 However, the progress on approximate inference in systems with continuous random variables has not kept pace with that for discrete random variables. [sent-24, score-0.376]

8 Some methods, such as MCMC techniques, are directly applicable to continuous domains, while others such as belief propagation have approximate continuous formulations [7, 8]. [sent-25, score-0.625]

9 Our aim is to extend particle methods to take advantage of recent advances in approximate inference algorithms for discrete-valued systems. [sent-27, score-0.319]

10 Several recent algorithms provide signiﬁcant advantages over loopy belief propagation. [sent-28, score-0.321]

11 More general approximations can be used to provide theoretical bounds on the results of exact inference [5, 3] or are guaranteed to improve the quality of approximation [6], allowing an informed trade-off between computation and accuracy. [sent-30, score-0.298]

12 Like belief propagation, they can be formulated as local message-passing algorithms on the graph, making them amenable to parallel computation [11] or inference in distributed systems [12, 13]. [sent-31, score-0.37]

13 In order to develop particle-based approximations for these algorithms, we focus on one particular technique for concreteness: tree-reweighted belief propagation (TRW) [5]. [sent-34, score-0.441]

14 TRW represents one of the earliest of a recent class of inference algorithms for discrete systems, but as we discuss in Section 2. [sent-35, score-0.249]

15 The basic idea of our algorithm is simple and extends previous particle formulations of exact inference [15] and loopy belief propagation [16]. [sent-37, score-0.765]

16 We use collections of samples drawn from the continuous state space of each variable to deﬁne a discrete problem, “lifting” the inference task from the original space to a restricted, discrete domain on which TRW can be performed. [sent-38, score-0.494]

17 At any point, the current results of the discrete inference can be used to re-select the sample points from a variable’s continuous domain. [sent-39, score-0.306]

18 This iterative interaction between the sample locations and the discrete messages produces a dynamic discretization that adapts itself to the inference results. [sent-40, score-0.401]

19 We demonstrate that TRW and similar methods can be naturally incorporated into the lifted, discrete phase of particle belief propagation and that they confer similar beneﬁts on the continuous problem as hold in truly discrete systems. [sent-41, score-0.836]

20 To this end we measure the performance of the algorithm on an Ising grid, an analogous continuous model, and the sensor localization problem. [sent-42, score-0.305]

21 In each case, we show that tree-reweighted particle BP exhibits behavior similar to TRW and produces signiﬁcantly more robust marginal estimates than ordinary particle BP. [sent-43, score-0.322]

22 This structure can then be applied to organize computations over p(X) and construct efﬁcient algorithms for many inference tasks, including optimization to ﬁnd a maximum a posteriori (MAP) conﬁguration, marginalization, or computing the likelihood of observed data. [sent-48, score-0.154]

23 1 Factor Graphs Factor graphs [17] are a particular type of graphical model that describe the factorization structure of the distribution p(X) using a bipartite graph consisting of factor nodes and variable nodes. [sent-50, score-0.236]

24 Let Xu ⊆ X denote the neighbors of factor node fu and Fs ⊆ F denote the neighbors of variable node xs . [sent-58, score-0.868]

25 (1) u=1 In a common abuse of notation, we use the same symbols to represent each variable node and its associated variable xs , and similarly for each factor node and its associated function fu . [sent-63, score-0.857]

26 Each factor fu corresponds to a strictly positive function over a subset of the variables. [sent-64, score-0.201]

27 The graph connectivity captures the conditional independence structure of p(X), enabling the development of efﬁcient exact and approximate inference algorithms [1, 17, 18]. [sent-65, score-0.286]

28 A common inference problem is that of computing the marginal distributions of p(X). [sent-67, score-0.206]

29 Speciﬁcally, for each variable xs we are interested in computing the marginal distribution ps (xs ) = p(X) ∂X. [sent-68, score-0.574]

30 When the variables are discrete and the graph G representing p(X) forms a tree (G has no cycles), marginalization can be performed efﬁciently using the belief propagation or sum-product algorithm [1, 17]. [sent-70, score-0.589]

31 For inference in more general graphs, the junction tree algorithm [19] creates a 2 tree-structured hypergraph of G and then performs inference on this hypergraph. [sent-71, score-0.277]

32 2 Approximate Inference Loopy BP [1] is a popular alternative to exact methods and proceeds by iteratively passing “messages” between variable and factor nodes in the graph as though the graph were a tree (ignoring cycles). [sent-75, score-0.325]

33 The algorithm is exact when the graph is tree-structured and can provide excellent approximations in some cases even when the graph has loops. [sent-76, score-0.204]

34 Many of the more recent varieties of approximate inference are framed explicitly as an optimization of local approximations over locally deﬁned cost functions. [sent-78, score-0.23]

35 Variational or free-energy based approaches convert the problem of exact inference into the optimization of a free energy function over the set of realizable marginal distributions M, called the marginal polytope [18]. [sent-79, score-0.35]

36 Since the solution µ may not correspond to the marginals of any consistent joint distribution, these approximate marginals are typically referred to as pseudomarginals. [sent-82, score-0.158]

37 Belief propagation can be understood in this framework as corresponding to an outer approximation M ⊇ M enforcing local consistency and the Bethe approximation to H [4]. [sent-84, score-0.159]

38 Fractional belief propagation [20] corresponds to a more general Bethe-like approximation with additional parameters, which can be modiﬁed to ensure that the cost function is convex and used with convergent algorithms [21]. [sent-87, score-0.438]

39 A special case includes tree-reweighted belief propagation [5], which both ensures convexity and provides an upper bound on the partition function Z. [sent-88, score-0.412]

40 Overall, these advances have provided signiﬁcant improvements in the state of the art for approximate inference in discrete-valued systems. [sent-91, score-0.164]

41 For concreteness, in the rest of the paper we will use tree-reweighted belief propagation (TRW) [5] as our inference method of choice, although the same ideas can be applied to any of the discussed inference algorithms. [sent-94, score-0.619]

42 The ﬁxed-point equations for TRW lead to a message-passing algorithm similar to BP, deﬁned by m xs fu (xs ) ∝ fv ∈Fs mfv xs (xs )ρv mfu xs (xs ) , mfu xs (xs ) ∝ Xu \xs fu (Xu )1/ρu mxt fu (xt ) xt ∈Xu \xs (2) The parameters ρv are called edge weights or appearance probabilities. [sent-96, score-2.837]

43 In particular, any reasonably ﬁne-grained discretization produces a discrete variable whose domain size d is quite large. [sent-100, score-0.229]

44 1 Particle Representations for Message-Passing Particle-based approximations have been extended to loopy belief propagation as well. [sent-110, score-0.514]

45 For example, in the nonparametric belief propagation (NBP) algorithm [7], the BP messages are represented as Gaussian mixtures and message products are approximated by drawing samples, which are then smoothed to form new Gaussian mixture distributions. [sent-111, score-0.618]

46 Instead, we adapt a recent particle belief propagation (PBP) algorithm [16] to work on the treereweighted formulation. [sent-114, score-0.498]

47 In PBP, samples (particles) are drawn for each variable, and each message is represented as a set of weights over the available values of the target variable. [sent-115, score-0.171]

48 At a high level, the procedure iterates between sampling particles from each variable’s domain, performing inference over the resulting discrete problem, and adaptively updating the sampling distributions. [sent-116, score-0.303]

49 Formally, we deﬁne a proposal distribution Ws (xs ) for each variable xs such that Ws (xs ) is non-zero over the domain of xs . [sent-118, score-1.083]

50 Note that we may rewrite the factor message computation (2) as an importance reweighted expectation:  mfu xs (xs ) ∝ E Xu \xs fu (Xu )1/ρu xt ∈Xu \xs  mxt fu (xt )  Wt (xt ) (3) Let us index the variables that are neighbors of factor fu as Xu = {xu1 , . [sent-119, score-1.313]

51 Then, after (1) (N ) sampling particles {xs , · · · , xs } from Ws (xs ), we can index a particular assignment of parti(j) (j ) (j ) cle values to the variables in Xu with Xu = [xu11 , . [sent-123, score-0.604]

52 Each of the values in the message then represents an estimate of the continuous function (2) evaluated at a single particle. [sent-128, score-0.207]

53 (1) Samples for each variable provide a dynamic discretization of the continuous space; (2) inference proceeds by optimization or messagepassing in the discrete space; (3) the resulting local functions can be used to change the proposals Ws (·) and choose new sample locations for each variable. [sent-130, score-0.437]

54 Just as in discrete problems, it is often desirable to obtain estimates of the log partition function for use in goodness-of-ﬁt testing or model comparison. [sent-133, score-0.161]

55 Using other message passing approaches that ﬁt into this framework, such as mean ﬁeld, can provide a similar a lower bound. [sent-135, score-0.154]

56 2 Rao-Blackwellized Estimates Quantities about xs such as expected values under the pseudomarginal can be computed using the (i) samples xs . [sent-138, score-1.01]

57 However, for any given variable node xs , the incoming messages to xs given in (4) are deﬁned in terms of the importance weights and sampled values of the neighboring variables. [sent-139, score-1.186]

58 Thus, we can compute an estimate of the messages and beliefs deﬁned in (4)–(5) at arbitrary values of xs , simply by evaluating (4) at that point. [sent-140, score-0.688]

59 This allows us to perform Rao-Blackwellization, conditioning on the samples at the neighbors of xs rather than using xs ’s samples directly. [sent-141, score-1.064]

60 Using this trick we can often get much higher quality estimates from the inference for small N . [sent-142, score-0.186]

61 This interdependence motivates an attempt to learn the sampling distributions in an online fashion, adaptively updating them based on the results of the partially completed inference procedure. [sent-148, score-0.159]

62 Note that this procedure depends on the same properties as Rao-Blackwellized estimates: that we be able to compute our messages and beliefs at a new set of points given the message weights at the other nodes. [sent-149, score-0.316]

63 Both [15] and [16] suggest using the current belief at each iteration to form a new proposal distribution. [sent-150, score-0.256]

64 In [16], a short Metropolis-Hastings MCMC sequence is run at a single node, using the Rao-Blackwellized belief estimate to compute an acceptance probability. [sent-152, score-0.216]

65 We initially demonstrate the behavior of our particle-based algorithms on a small (3 × 3) lattice of binary-valued variables to compare with the exact discrete implementations, then show that the same observed behavior arises in an analagous continuous-valued problem. [sent-188, score-0.197]

66 Each data point represents the median L1 error between the beliefs and the true marginals, across all nodes and 40 randomly initialized trials, after 50 iterations. [sent-201, score-0.156]

67 In both cases, as N increases the particle versions of the algorithms converge to their discrete equivalents. [sent-203, score-0.25]

68 2 Continuous grid model The results for discrete systems, and their corresponding intuition, carry over naturally into continuous systems as well. [sent-205, score-0.256]

69 To illustrate on an interpretable analogue of the Ising model, we use the same graph structure but with real-valued variables, and factors given by: f (xs ) = exp − x2 s 2 2σl + exp − (xs − 1)2 2 2σl f (xs , xt ) = exp − |xs − xt |2 2 2σp . [sent-206, score-0.2]

70 Figure 3 shows the results of running PBP and TRW-PBP on the continuous grid model, demonstrating similar characteristics to the discrete model. [sent-225, score-0.256]

71 The left panel reveals that our continuous grid model also induces a phase shift in PBP, much like that of the Ising model. [sent-226, score-0.195]

72 For sufﬁciently small values of σp (large values on our transformed axis), the beliefs in PBP collapse to unimodal distributions with an L1 error of 1. [sent-227, score-0.227]

73 The resulting bounds, computed for a continuous grid model in which mean ﬁeld collapses to a single mode, are shown in Figure 4. [sent-234, score-0.194]

74 Sensor localization considers the task of estimating the position of a collection of sensors in a network given noisy estimates of a subset of the distances between pairs of sensors, along with known positions for a small number of anchor nodes. [sent-240, score-0.197]

75 Typical localization algorithms operate by optimizing to ﬁnd the most likely joint conﬁguration of sensor positions. [sent-241, score-0.248]

76 In [12], this problem is formulated as a graphical model and an alternative solution is proposed using nonparametric belief propagation to perform approximate marginalization. [sent-243, score-0.477]

77 A signiﬁcant advantage of this approach is that by providing approximate marginals, we can estimate the degree of uncertainty in the sensor positions. [sent-244, score-0.202]

78 Gauging this uncertainty can be particularly important when the distance information is sufﬁciently ambiguous that the posterior belief is multi-modal, since in this case the estimated sensor position may be quite far from its true value. [sent-245, score-0.376]

79 Unfortunately, belief propagation is not ideal for identifying multimodality, since the model is essentially attractive. [sent-246, score-0.375]

80 BP may underestimate the degree of uncertainty in the marginal distributions and (as in the case of the Ising-like models in the previous section) collapse into a single mode, providing beliefs which are misleadingly overconﬁdent. [sent-247, score-0.238]

81 Figure 5 shows a set of sensor conﬁgurations where this is the case. [sent-248, score-0.16]

82 This induces a bimodal uncertainty about the locations of the remaining nodes 7 Anchor Anchor Anchor Mobile Mobile Mobile Target Target Target (a) Exact (b) PBP (c) TRW-PBP Figure 5: Sensor location belief at the target node. [sent-251, score-0.372]

83 In this system there is not enough information in the measurements to resolve the sensor positions. [sent-259, score-0.16]

84 Figure 5b displays the Rao-Blackwellized belief estimate for one node after 20 iterations of PBP with each variable represented by 100 particles. [sent-261, score-0.299]

85 Examination of the other nodes’ beliefs (not shown for space) conﬁrms that all are unimodal distributions centered around their reﬂected locations. [sent-263, score-0.181]

86 It is worth noting that PBP converged to the alternative set of unimodal beliefs (supporting the true locations) in about half of our trials. [sent-264, score-0.169]

87 The corresponding belief estimate generated by TRW-PBP is shown in Figure 5c. [sent-266, score-0.216]

88 Also, each of the two modes is less concentrated than the belief in 5b. [sent-268, score-0.216]

89 As with the continuous grid model we see increased stability at the price of conservative overdispersion. [sent-269, score-0.161]

90 6 Conclusion We propose a framework for extending recent advances in discrete approximate inference for application to continuous systems. [sent-271, score-0.348]

91 The framework directly integrates reweighted message passing algorithms such as TRW into the lifted, discrete phase of PBP. [sent-272, score-0.338]

92 Furthermore, it allows us to iteratively adjust the proposal distributions, providing a discretization that adapts to the results of inference, and allows us to use Rao-Blackwellized estimates to improve our ﬁnal belief estimates. [sent-273, score-0.356]

93 Using an Ising-like system, we argue that phase transitions exist for particle versions of BP similar to those found in discrete systems, and that TRW signiﬁcantly improves the quality of the estimate in those regimes. [sent-275, score-0.287]

94 This improvement is highly relevant to approximate marginalization for sensor localization tasks, in which it is important to accurately represent the posterior uncertainty. [sent-276, score-0.301]

95 The ﬂexibility in the choice of message passing algorithm makes it easy to consider several instantiations of the framework and use the one best suited to a particular problem. [sent-277, score-0.154]

96 Furthermore, future improvements in message-passing inference algorithms on discrete systems can be directly incorporated into continuous problems. [sent-278, score-0.363]

97 Constructing free energy approximations and generalized belief propagation algorithms. [sent-301, score-0.496]

98 CCCP algorithms to minimize the Bethe and Kikuchi free energies: convergent alternatives to belief propagation. [sent-330, score-0.307]

99 A general algorithm for approximate inference and its application to hybrid Bayes nets. [sent-380, score-0.164]

100 Convergent message-passing algorithms for inference over general graphs with convex free energies. [sent-415, score-0.209]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('xs', 0.49), ('trw', 0.397), ('pbp', 0.36), ('belief', 0.216), ('sensor', 0.16), ('fu', 0.16), ('propagation', 0.159), ('bp', 0.157), ('particle', 0.123), ('inference', 0.122), ('message', 0.118), ('beliefs', 0.108), ('ising', 0.1), ('discrete', 0.095), ('mfu', 0.095), ('messages', 0.09), ('continuous', 0.089), ('particles', 0.086), ('anchor', 0.085), ('ihler', 0.083), ('xu', 0.077), ('ws', 0.077), ('loopy', 0.073), ('grid', 0.072), ('discretization', 0.071), ('approximations', 0.066), ('xt', 0.062), ('bimodal', 0.062), ('marginals', 0.058), ('mfv', 0.057), ('localization', 0.056), ('fs', 0.051), ('fv', 0.05), ('nodes', 0.048), ('graph', 0.048), ('marginal', 0.047), ('collapse', 0.046), ('node', 0.046), ('mode', 0.043), ('cccp', 0.043), ('marginalization', 0.043), ('exact', 0.042), ('approximate', 0.042), ('factor', 0.041), ('irvine', 0.04), ('mobile', 0.04), ('proposal', 0.04), ('juxtaposed', 0.038), ('mxt', 0.038), ('ndb', 0.038), ('ups', 0.038), ('xull', 0.038), ('variable', 0.037), ('partition', 0.037), ('distributions', 0.037), ('passing', 0.036), ('unimodal', 0.036), ('fractional', 0.036), ('quality', 0.035), ('graphical', 0.035), ('mixtures', 0.035), ('april', 0.034), ('phase', 0.034), ('importance', 0.033), ('junction', 0.033), ('pseudomarginals', 0.033), ('collapses', 0.033), ('bounds', 0.033), ('variational', 0.032), ('algorithms', 0.032), ('convergent', 0.031), ('samples', 0.03), ('formulations', 0.03), ('ected', 0.03), ('estimates', 0.029), ('discretized', 0.029), ('sontag', 0.028), ('lifted', 0.028), ('factors', 0.028), ('variables', 0.028), ('free', 0.028), ('energy', 0.027), ('graphs', 0.027), ('sensors', 0.027), ('bethe', 0.026), ('domain', 0.026), ('alternative', 0.025), ('incorporated', 0.025), ('guration', 0.025), ('february', 0.025), ('carlo', 0.024), ('monte', 0.024), ('neighbors', 0.024), ('concreteness', 0.024), ('eld', 0.023), ('locations', 0.023), ('mcmc', 0.023), ('smyth', 0.023), ('reweighted', 0.023), ('target', 0.023)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999928 187 nips-2009-Particle-based Variational Inference for Continuous Systems

Author: Andrew Frank, Padhraic Smyth, Alexander T. Ihler

2 0.29027179 101 nips-2009-Generalization Errors and Learning Curves for Regression with Multi-task Gaussian Processes

Author: Kian M. Chai

Abstract: We provide some insights into how task correlations in multi-task Gaussian process (GP) regression affect the generalization error and the learning curve. We analyze the asymmetric two-tasks case, where a secondary task is to help the learning of a primary task. Within this setting, we give bounds on the generalization error and the learning curve of the primary task. Our approach admits intuitive understandings of the multi-task GP by relating it to single-task GPs. For the case of one-dimensional input-space under optimal sampling with data only for the secondary task, the limitations of multi-task GP can be quantiﬁed explicitly. 1

3 0.12073813 39 nips-2009-Bayesian Belief Polarization

Author: Alan Jern, Kai-min Chang, Charles Kemp

Abstract: Empirical studies have documented cases of belief polarization, where two people with opposing prior beliefs both strengthen their beliefs after observing the same evidence. Belief polarization is frequently offered as evidence of human irrationality, but we demonstrate that this phenomenon is consistent with a fully Bayesian approach to belief revision. Simulation results indicate that belief polarization is not only possible but relatively common within the set of Bayesian models that we consider. Suppose that Carol has requested a promotion at her company and has received a score of 50 on an aptitude test. Alice, one of the company’s managers, began with a high opinion of Carol and became even more conﬁdent of her abilities after seeing her test score. Bob, another manager, began with a low opinion of Carol and became even less conﬁdent about her qualiﬁcations after seeing her score. On the surface, it may appear that either Alice or Bob is behaving irrationally, since the same piece of evidence has led them to update their beliefs about Carol in opposite directions. This situation is an example of belief polarization [1, 2], a widely studied phenomenon that is often taken as evidence of human irrationality [3, 4]. In some cases, however, belief polarization may appear much more sensible when all the relevant information is taken into account. Suppose, for instance, that Alice was familiar with the aptitude test and knew that it was scored out of 60, but that Bob was less familiar with the test and assumed that the score was a percentage. Even though only one interpretation of the score can be correct, Alice and Bob have both made rational inferences given their assumptions about the test. Some instances of belief polarization are almost certain to qualify as genuine departures from rational inference, but we argue in this paper that others will be entirely compatible with a rational approach. Distinguishing between these cases requires a precise normative standard against which human inferences can be compared. We suggest that Bayesian inference provides this normative standard, and present a set of Bayesian models that includes cases where polarization can and cannot emerge. Our work is in the spirit of previous studies that use careful rational analyses in order to illuminate apparently irrational human behavior (e.g. [5, 6, 7]). Previous studies of belief polarization have occasionally taken a Bayesian approach, but often the goal is to show how belief polarization can emerge as a consequence of approximate inference in a Bayesian model that is subject to memory constraints or processing limitations [8]. In contrast, we demonstrate that some examples of polarization are compatible with a fully Bayesian approach. Other formal accounts of belief polarization have relied on complex versions of utility theory [9], or have focused on continuous hypothesis spaces [10] unlike the discrete hypothesis spaces usually considered by psychological studies of belief polarization. We focus on discrete hypothesis spaces and require no additional machinery beyond the basics of Bayesian inference. We begin by introducing the belief revision phenomena considered in this paper and developing a Bayesian approach that clariﬁes whether and when these phenomena should be considered irrational. We then consider several Bayesian models that are capable of producing belief polarization and illustrate them with concrete examples. Having demonstrated that belief polarization is compatible 1 (a) Contrary updating (i) Divergence (ii) (b) Parallel updating Convergence A P (h1 ) 0.5 0.5 0.5 B Prior beliefs Updated beliefs Prior beliefs Updated beliefs Prior beliefs Updated beliefs Figure 1: Examples of belief updating behaviors for two individuals, A (solid line) and B (dashed line). The individuals begin with different beliefs about hypothesis h1 . After observing the same set of evidence, their beliefs may (a) move in opposite directions or (b) move in the same direction. with a Bayesian approach, we present simulations suggesting that this phenomenon is relatively generic within the space of models that we consider. We ﬁnish with some general comments on human rationality and normative models. 1 Belief revision phenomena The term “belief polarization” is generally used to describe situations in which two people observe the same evidence and update their respective beliefs in the directions of their priors. A study by Lord, et al. [1] provides one classic example in which participants read about two studies, one of which concluded that the death penalty deters crime and another which concluded that the death penalty has no effect on crime. After exposure to this mixed evidence, supporters of the death penalty strengthened their support and opponents strengthened their opposition. We will treat belief polarization as a special case of contrary updating, a phenomenon where two people update their beliefs in opposite directions after observing the same evidence (Figure 1a). We distinguish between two types of contrary updating. Belief divergence refers to cases in which the person with the stronger belief in some hypothesis increases the strength of his or her belief and the person with the weaker belief in the hypothesis decreases the strength of his or her belief (Figure 1a(i)). Divergence therefore includes cases of traditional belief polarization. The opposite of divergence is belief convergence (Figure 1a(ii)), in which the person with the stronger belief decreases the strength of his or her belief and the person with the weaker belief increases the strength of his or her belief. Contrary updating may be contrasted with parallel updating (Figure 1b), in which the two people update their beliefs in the same direction. Throughout this paper, we consider only situations in which both people change their beliefs after observing some evidence. All such situations can be unambiguously classiﬁed as instances of parallel or contrary updating. Parallel updating is clearly compatible with a normative approach, but the normative status of divergence and convergence is less clear. Many authors argue that divergence is irrational, and many of the same authors also propose that convergence is rational [2, 3]. For example, Baron [3] writes that “Normatively, we might expect that beliefs move toward the middle of the range when people are presented with mixed evidence.” (p. 210) The next section presents a formal analysis that challenges the conventional wisdom about these phenomena and clariﬁes the cases where they can be considered rational. 2 A Bayesian approach to belief revision Since belief revision involves inference under uncertainty, Bayesian inference provides the appropriate normative standard. Consider a problem where two people observe data d that bear on some hypothesis h1 . Let P1 (·) and P2 (·) be distributions that capture the two people’s respective beliefs. Contrary updating occurs whenever one person’s belief in h1 increases and the other person’s belief in h1 decreases, or when [P1 (h1 |d) − P1 (h1 )] [P2 (h1 |d) − P2 (h1 )] < 0 . 2 (1) Family 1 (a) H (c) (d) (e) V H D Family 2 (b) V V V D H D H D H (f) (g) V V D H D (h) V H D H D Figure 2: (a) A simple Bayesian network that cannot produce either belief divergence or belief convergence. (b) – (h) All possible three-node Bayes nets subject to the constraints described in the text. Networks in Family 1 can produce only parallel updating, but networks in Family 2 can produce both parallel and contrary updating. We will use Bayesian networks to capture the relationships between H, D, and any other variables that are relevant to the situation under consideration. For example, Figure 2a captures the idea that the data D are probabilistically generated from hypothesis H. The remaining networks in Figure 2 show several other ways in which D and H may be related, and will be discussed later. We assume that the two individuals agree on the variables that are relevant to a problem and agree about the relationships between these variables. We can formalize this idea by requiring that both people agree on the structure and the conditional probability distributions (CPDs) of a network N that captures relationships between the relevant variables, and that they differ only in the priors they assign to the root nodes of N . If N is the Bayes net in Figure 2a, then we assume that the two people must agree on the distribution P (D|H), although they may have different priors P1 (H) and P2 (H). If two people agree on network N but have different priors on the root nodes, we can create a single expanded Bayes net to simulate the inferences of both individuals. The expanded network is created by adding a background knowledge node B that sends directed edges to all root nodes in N , and acts as a switch that sets different root node priors for the two different individuals. Given this expanded network, distributions P1 and P2 in Equation 1 can be recovered by conditioning on the value of the background knowledge node and rewritten as [P (h1 |d, b1 ) − P (h1 |b1 )] [P (h1 |d, b2 ) − P (h1 |b2 )] < 0 (2) where P (·) represents the probability distribution captured by the expanded network. Suppose that there are exactly two mutually exclusive hypotheses. For example, h1 and h0 might state that the death penalty does or does not deter crime. In this case Equation 2 implies that contrary updating occurs when [P (d|h1 , b1 ) − P (d|h0 , b1 )] [P (d|h1 , b2 ) − P (d|h0 , b2 )] < 0 . (3) Equation 3 is derived in the supporting material, and leads immediately to the following result: R1: If H is a binary variable and D and B are conditionally independent given H, then contrary updating is impossible. Result R1 follows from the observation that if D and B are conditionally independent given H, then the product in Equation 3 is equal to (P (d|h1 ) − P (d|h0 ))2 , which cannot be less than zero. R1 implies that the simple Bayes net in Figure 2a is incapable of producing contrary updating, an observation previously made by Lopes [11]. Our analysis may help to explain the common intuition that belief divergence is irrational, since many researchers seem to implicitly adopt a model in which H and D are the only relevant variables. Network 2a, however, is too simple to capture the causal relationships that are present in many real world situations. For example, the promotion example at the beginning of this paper is best captured using a network with an additional node that represents the grading scale for the aptitude test. Networks with many nodes may be needed for some real world problems, but here we explore the space of three-node networks. We restrict our attention to connected graphs in which D has no outgoing edges, motivated by the idea that the three variables should be linked and that the data are the ﬁnal result of some generative process. The seven graphs that meet these conditions are shown in Figures 2b–h, where the additional variable has been labeled V . These Bayes nets illustrate cases in which (b) V is an additional 3 Models Conventional wisdom Family 1 Family 2 Belief divergence Belief convergence Parallel updating Table 1: The ﬁrst column represents the conventional wisdom about which belief revision phenomena are normative. The models in the remaining columns include all three-node Bayes nets. This set of models can be partitioned into those that support both belief divergence and convergence (Family 2) and those that support neither (Family 1). piece of evidence that bears on H, (c) V informs the prior probability of H, (d)–(e) D is generated by an intervening variable V , (f) V is an additional generating factor of D, (g) V informs both the prior probability of H and the likelihood of D, and (h) H and D are both effects of V . The graphs in Figure 2 have been organized into two families. R1 implies that none of the graphs in Family 1 is capable of producing contrary updating. The next section demonstrates by example that all three of the graphs in Family 2 are capable of producing contrary updating. Table 1 compares the two families of Bayes nets to the informal conclusions about normative approaches that are often found in the psychological literature. As previously noted, the conventional wisdom holds that belief divergence is irrational but that convergence and parallel updating are both rational. Our analysis suggests that this position has little support. Depending on the causal structure of the problem under consideration, a rational approach should allow both divergence and convergence or neither. Although we focus in this paper on Bayes nets with no more than three nodes, the class of all network structures can be partitioned into those that can (Family 2) and cannot (Family 1) produce contrary updating. R1 is true for Bayes nets of any size and characterizes one group of networks that belong to Family 1. Networks where the data provide no information about the hypotheses must also fail to produce contrary updating. Note that if D and H are conditionally independent given B, then the left side of Equation 3 is equal to zero, meaning contrary updating cannot occur. We conjecture that all remaining networks can produce contrary updating if the cardinalities of the nodes and the CPDs are chosen appropriately. Future studies can attempt to verify this conjecture and to precisely characterize the CPDs that lead to contrary updating. 3 Examples of rational belief divergence We now present four scenarios that can be modeled by the three-node Bayes nets in Family 2. Our purpose in developing these examples is to demonstrate that these networks can produce belief divergence and to provide some everyday examples in which this behavior is both normative and intuitive. 3.1 Example 1: Promotion We ﬁrst consider a scenario that can be captured by Bayes net 2f, in which the data depend on two independent factors. Recall the scenario described at the beginning of this paper: Alice and Bob are responsible for deciding whether to promote Carol. For simplicity, we consider a case where the data represent a binary outcome—whether or not Carol’s r´ sum´ indicates that she is included e e in The Directory of Notable People—rather than her score on an aptitude test. Alice believes that The Directory is a reputable publication but Bob believes it is illegitimate. This situation is represented by the Bayes net and associated CPDs in Figure 3a. In the tables, the hypothesis space H = {‘Unqualiﬁed’ = 0, ‘Qualiﬁed’ = 1} represents whether or not Carol is qualiﬁed for the promotion, the additional factor V = {‘Disreputable’ = 0, ‘Reputable’ = 1} represents whether The Directory is a reputable publication, and the data variable D = {‘Not included’ = 0, ‘Included’ = 1} represents whether Carol is featured in it. The actual probabilities were chosen to reﬂect the fact that only an unqualiﬁed person is likely to pad their r´ sum´ by mentioning a disreputable publication, but that e e 4 (a) B Alice Bob (b) P(V=1) 0.01 0.9 B Alice Bob V B Alice Bob P(H=1) 0.6 0.4 V H D V 0 0 1 1 H 0 1 0 1 V 0 1 P(D=1) 0.5 0.1 0.1 0.9 (c) P(H=1) 0.1 0.9 H V 0 0 1 1 D H 0 1 0 1 P(D=1) 0.4 0.01 0.4 0.6 (d) B Alice Bob P(V=0) P(V=1) P(V=2) P(V=3) 0.6 0.2 0.1 0.1 0.1 0.1 0.2 0.6 B Alice Bob P(V1=1) 0.9 0.1 P(H=1) 1 1 0 0 H B Alice Bob V1 V V 0 1 2 3 P(V=1) 0.9 0.1 D V 0 1 2 3 P(D=0) P(D=1) P(D=2) P(D=3) 0.7 0.1 0.1 0.1 0.1 0.7 0.1 0.1 0.1 0.1 0.7 0.1 0.1 0.1 0.1 0.7 V1 0 0 1 1 V2 0 1 0 1 P(H=1) 0.5 0.1 0.5 0.9 P(V2=1) 0.5 0.5 V2 H D V2 0 1 P(D=1) 0.1 0.9 Figure 3: The Bayes nets and conditional probability distributions used in (a) Example 1: Promotion, (b) Example 2: Religious belief, (c) Example 3: Election polls, (d) Example 4: Political belief. only a qualiﬁed person is likely to be included in The Directory if it is reputable. Note that Alice and Bob agree on the conditional probability distribution for D, but assign different priors to V and H. Alice and Bob therefore interpret the meaning of Carol’s presence in The Directory differently, resulting in the belief divergence shown in Figure 4a. This scenario is one instance of a large number of belief divergence cases that can be attributed to two individuals possessing different mental models of how the observed evidence was generated. For instance, suppose now that Alice and Bob are both on an admissions committee and are evaluating a recommendation letter for an applicant. Although the letter is positive, it is not enthusiastic. Alice, who has less experience reading recommendation letters interprets the letter as a strong endorsement. Bob, however, takes the lack of enthusiasm as an indication that the author has some misgivings [12]. As in the promotion scenario, the differences in Alice’s and Bob’s experience can be effectively represented by the priors they assign to the H and V nodes in a Bayes net of the form in Figure 2f. 3.2 Example 2: Religious belief We now consider a scenario captured by Bayes net 2g. In our example for Bayes net 2f, the status of an additional factor V affected how Alice and Bob interpreted the data D, but did not shape their prior beliefs about H. In many cases, however, the additional factor V will inﬂuence both people’s prior beliefs about H as well as their interpretation of the relationship between D and H. Bayes net 2g captures this situation, and we provide a concrete example inspired by an experiment conducted by Batson [13]. Suppose that Alice believes in a “Christian universe:” she believes in the divinity of Jesus Christ and expects that followers of Christ will be persecuted. Bob, on the other hand, believes in a “secular universe.” This belief leads him to doubt Christ’s divinity, but to believe that if Christ were divine, his followers would likely be protected rather than persecuted. Now suppose that both Alice and Bob observe that Christians are, in fact, persecuted, and reassess the probability of Christ’s divinity. This situation is represented by the Bayes net and associated CPDs in Figure 3b. In the tables, the hypothesis space H = {‘Human’ = 0, ‘Divine’ = 1} represents the divinity of Jesus Christ, the additional factor V = {‘Secular’ = 0, ‘Christian’ = 1} represents the nature of the universe, and the data variable D = {‘Not persecuted’ = 0, ‘Persecuted’ = 1} represents whether Christians are subject to persecution. The exact probabilities were chosen to reﬂect the fact that, regardless of worldview, people will agree on a “base rate” of persecution given that Christ is not divine, but that more persecution is expected if the Christian worldview is correct than if the secular worldview is correct. Unlike in the previous scenario, Alice and Bob agree on the CPDs for both D and H, but 5 (a) (b) P (H = 1) (d) 1 1 1 0.5 1 (c) 0.5 0.5 A 0.5 B 0 0 0 Prior beliefs Updated beliefs Prior beliefs Updated beliefs 0 Prior beliefs Updated beliefs Prior beliefs Updated beliefs Figure 4: Belief revision outcomes for (a) Example 1: Promotion, (b) Example 2: Religious belief, (c) Example 3: Election polls, and (d) Example 4: Political belief. In all four plots, the updated beliefs for Alice (solid line) and Bob (dashed line) are computed after observing the data described in the text. The plots conﬁrm that all four of our example networks can lead to belief divergence. differ in the priors they assign to V . As a result, Alice and Bob disagree about whether persecution supports or undermines a Christian worldview, which leads to the divergence shown in Figure 4b. This scenario is analogous to many real world situations in which one person has knowledge that the other does not. For instance, in a police interrogation, someone with little knowledge of the case (V ) might take a suspect’s alibi (D) as strong evidence of their innocence (H). However, a detective with detailed knowledge of the case may assign a higher prior probability to the subject’s guilt based on other circumstantial evidence, and may also notice a detail in the suspect’s alibi that only the culprit would know, thus making the statement strong evidence of guilt. In all situations of this kind, although two people possess different background knowledge, their inferences are normative given that knowledge, consistent with the Bayes net in Figure 2g. 3.3 Example 3: Election polls We now consider two qualitatively different cases that are both captured by Bayes net 2h. The networks considered so far have all included a direct link between H and D. In our next two examples, we consider cases where the hypotheses and observed data are not directly linked, but are coupled by means of one or more unobserved causal factors. Suppose that an upcoming election will be contested by two Republican candidates, Rogers and Rudolph, and two Democratic candidates, Davis and Daly. Alice and Bob disagree about the various candidates’ chances of winning, with Alice favoring the two Republicans and Bob favoring the two Democrats. Two polls were recently released, one indicating that Rogers was most likely to win the election and the other indicating that Daly was most likely to win. After considering these polls, they both assess the likelihood that a Republican will win the election. This situation is represented by the Bayes net and associated CPDs in Figure 3c. In the tables, the hypothesis space H = {‘Democrat wins’ = 0, ‘Republican wins’ = 1} represents the winning party, the variable V = {‘Rogers’ = 0, ‘Rudolph’ = 1, ‘Davis’ = 2, ‘Daly’ = 3} represents the winning candidate, and the data variables D1 = D2 = {‘Rogers’ = 0, ‘Rudolph’ = 1, ‘Davis’ = 2, ‘Daly’ = 3} represent the results of the two polls. The exact probabilities were chosen to reﬂect the fact that the polls are likely to reﬂect the truth with some noise, but whether a Democrat or Republican wins is completely determined by the winning candidate V . In Figure 3c, only a single D node is shown because D1 and D2 have identical CPDs. The resulting belief divergence is shown in Figure 4c. Note that in this scenario, Alice’s and Bob’s different priors cause them to discount the poll that disagrees with their existing beliefs as noise, thus causing their prior beliefs to be reinforced by the mixed data. This scenario was inspired by the death penalty study [1] alluded to earlier, in which a set of mixed results caused supporters and opponents of the death penalty to strengthen their existing beliefs. We do not claim that people’s behavior in this study can be explained with exactly the model employed here, but our analysis does show that selective interpretation of evidence is sometimes consistent with a rational approach. 6 3.4 Example 4: Political belief We conclude with a second illustration of Bayes net 2h in which two people agree on the interpretation of an observed piece of evidence but disagree about the implications of that evidence. In this scenario, Alice and Bob are two economists with different philosophies about how the federal government should approach a major recession. Alice believes that the federal government should increase its own spending to stimulate economic activity; Bob believes that the government should decrease its spending and reduce taxes instead, providing taxpayers with more spending money. A new bill has just been proposed and an independent study found that the bill was likely to increase federal spending. Alice and Bob now assess the likelihood that this piece of legislation will improve the economic climate. This scenario can be modeled by the Bayes net and associated CPDs in Figure 3d. In the tables, the hypothesis space H = {‘Bad policy’ = 0, ‘Good policy’ = 1} represents whether the new bill is good for the economy and the data variable D = {‘No spending’ = 0, ‘Spending increase’ = 1} represents the conclusions of the independent study. Unlike in previous scenarios, we introduce two additional factors, V 1 = {‘Fiscally conservative’ = 0, ‘Fiscally liberal’ = 1}, which represents the optimal economic philosophy, and V 2 = {‘No spending’ = 0, ‘Spending increase’ = 1}, which represents the spending policy of the new bill. The exact probabilities in the tables were chosen to reﬂect the fact that if the bill does not increase spending, the policy it enacts may still be good for other reasons. A uniform prior was placed on V 2 for both people, reﬂecting the fact that they have no prior expectations about the spending in the bill. However, the priors placed on V 1 for Alice and Bob reﬂect their different beliefs about the best economic policy. The resulting belief divergence behavior is shown in Figure 4d. The model used in this scenario bears a strong resemblance to the probabilogical model of attitude change developed by McGuire [14] in which V 1 and V 2 might be logical “premises” that entail the “conclusion” H. 4 How common is contrary updating? We have now described four concrete cases where belief divergence is captured by a normative approach. It is possible, however, that belief divergence is relatively rare within the Bayes nets of Family 2, and that our four examples are exotic special cases that depend on carefully selected CPDs. To rule out this possibility, we ran simulations to explore the space of all possible CPDs for the three networks in Family 2. We initially considered cases where H, D, and V were binary variables, and ran two simulations for each model. In one simulation, the priors and each row of each CPD were sampled from a symmetric Beta distribution with parameter 0.1, resulting in probabilities highly biased toward 0 and 1. In the second simulation, the probabilities were sampled from a uniform distribution. In each trial, a single set of CPDs were generated and then two different priors were generated for each root node in the graph to simulate two individuals, consistent with our assumption that two individuals may have different priors but must agree about the conditional probabilities. 20,000 trials were carried out in each simulation, and the proportion of trials that led to convergence and divergence was computed. Trials were only counted as instances of convergence or divergence if |P (H = 1|D = 1) − P (H = 1)| > for both individuals, with = 1 × 10−5 . The results of these simulations are shown in Table 2. The supporting material proves that divergence and convergence are equally common, and therefore the percentages in the table show the frequencies for contrary updating of either type. Our primary question was whether contrary updating is rare or anomalous. In all but the third simulation, contrary updating constituted a substantial proportion of trials, suggesting that the phenomenon is relatively generic. We were also interested in whether this behavior relied on particular settings of the CPDs. The fact that percentages for the uniform distribution are approximately the same or greater than for the biased distribution indicates that contrary updating appears to be a relatively generic behavior for the Bayes nets we considered. More generally, these results directly challenge the suggestion that normative accounts are not suited for modeling belief divergence. The last two columns of Table 2 show results for two simulations with the same Bayes net, the only difference being whether V was treated as 2-valued (binary) or 4-valued. The 4-valued case is included because both Examples 3 and 4 considered multi-valued additional factor variables V . 7 2-valued V V H Biased Uniform 4-valued V V V V D 9.6% 18.2% D H 12.7% 16.0% H D 0% 0% H D 23.3% 20.0% Table 2: Simulation results. The percentages indicate the proportion of trials that produced contrary updating using the speciﬁed Bayes net (column) and probability distributions (row). The prior and conditional probabilities were either sampled from a Beta(0.1, 0.1) distribution (biased) or a Beta(1, 1) distribution (uniform). The probabilities for the simulation results shown in the last column were sampled from a Dirichlet([0.1, 0.1, 0.1, 0.1]) distribution (biased) or a Dirichlet([1, 1, 1, 1]) distribution (uniform). In Example 4, we used two binary variables, but we could have equivalently used a single 4-valued variable. Belief convergence and divergence are not possible in the binary case, a result that is proved in the supporting material. We believe, however, that convergence and divergence are fairly common whenever V takes three or more values, and the simulation in the last column of the table conﬁrms this claim for the 4-valued case. Given that belief divergence seems relatively common in the space of all Bayes nets, it is natural to explore whether cases of rational divergence are regularly encountered in the real world. One possible approach is to analyze a large database of networks that capture everyday belief revision problems, and to determine what proportion of networks lead to rational divergence. Future studies can explore this issue, but our simulations suggest that contrary updating is likely to arise in cases where it is necessary to move beyond a simple model like the one in Figure 2a and consider several causal factors. 5 Conclusion This paper presented a family of Bayes nets that can account for belief divergence, a phenomenon that is typically considered to be incompatible with normative accounts. We provided four concrete examples that illustrate how this family of networks can capture a variety of settings where belief divergence can emerge from rational statistical inference. We also described a series of simulations that suggest that belief divergence is not only possible but relatively common within the family of networks that we considered. Our work suggests that belief polarization should not always be taken as evidence of irrationality, and that researchers who aim to document departures from rationality may wish to consider alternative phenomena instead. One such phenomenon might be called “inevitable belief reinforcement” and occurs when supporters of a hypothesis update their belief in the same direction for all possible data sets d. For example, a gambler will demonstrate inevitable belief reinforcement if he or she becomes increasingly convinced that a roulette wheel is biased towards red regardless of whether the next spin produces red, black, or green. This phenomenon is provably inconsistent with any fully Bayesian approach, and therefore provides strong evidence of irrationality. Although we propose that some instances of polarization are compatible with a Bayesian approach, we do not claim that human inferences are always or even mostly rational. We suggest, however, that characterizing normative behavior can require careful thought, and that formal analyses are invaluable for assessing the rationality of human inferences. In some cases, a formal analysis will provide an appropriate baseline for understanding how human inferences depart from rational norms. In other cases, a formal analysis will suggest that an apparently irrational inference makes sense once all of the relevant information is taken into account. 8 References [1] C. G. Lord, L. Ross, and M. R. Lepper. Biased assimilation and attitude polarization: The effects of prior theories on subsequently considered evidence. Journal of Personality and Social Psychology, 37(1):2098–2109, 1979. [2] L. Ross and M. R. Lepper. The perseverance of beliefs: Empirical and normative considerations. In New directions for methodology of social and behavioral science: Fallible judgment in behavioral research. Jossey-Bass, San Francisco, 1980. [3] J. Baron. Thinking and Deciding. Cambridge University Press, Cambridge, 4th edition, 2008. [4] A. Gerber and D. Green. Misperceptions about perceptual bias. Annual Review of Political Science, 2:189–210, 1999. [5] M. Oaksford and N. Chater. A rational analysis of the selection task as optimal data selection. Psychological Review, 101(4):608–631, 1994. [6] U. Hahn and M. Oaksford. The rationality of informal argumentation: A Bayesian approach to reasoning fallacies. Psychological Review, 114(3):704–732, 2007. [7] S. Sher and C. R. M. McKenzie. Framing effects and rationality. In N. Chater and M. Oaksford, editors, The probablistic mind: Prospects for Bayesian cognitive science. Oxford University Press, Oxford, 2008. [8] B. O’Connor. Biased evidence assimilation under bounded Bayesian rationality. Master’s thesis, Stanford University, 2006. [9] A. Zimper and A. Ludwig. Attitude polarization. Technical report, Mannheim Research Institute for the Economics of Aging, 2007. [10] A. K. Dixit and J. W. Weibull. Political polarization. Proceedings of the National Academy of Sciences, 104(18):7351–7356, 2007. [11] L. L. Lopes. Averaging rules and adjustment processes in Bayesian inference. Bulletin of the Psychonomic Society, 23(6):509–512, 1985. [12] A. Harris, A. Corner, and U. Hahn. “Damned by faint praise”: A Bayesian account. In A. D. De Groot and G. Heymans, editors, Proceedings of the 31th Annual Conference of the Cognitive Science Society, Austin, TX, 2009. Cognitive Science Society. [13] C. D. Batson. Rational processing or rationalization? The effect of disconﬁrming information on a stated religious belief. Journal of Personality and Social Psychology, 32(1):176–184, 1975. [14] W. J. McGuire. The probabilogical model of cognitive structure and attitude change. In R. E. Petty, T. M. Ostrom, and T. C. Brock, editors, Cognitive Responses in Persuasion. Lawrence Erlbaum Associates, 1981. 9

4 0.11652269 132 nips-2009-Learning in Markov Random Fields using Tempered Transitions

Author: Ruslan Salakhutdinov

Abstract: Markov random ﬁelds (MRF’s), or undirected graphical models, provide a powerful framework for modeling complex dependencies among random variables. Maximum likelihood learning in MRF’s is hard due to the presence of the global normalizing constant. In this paper we consider a class of stochastic approximation algorithms of the Robbins-Monro type that use Markov chain Monte Carlo to do approximate maximum likelihood learning. We show that using MCMC operators based on tempered transitions enables the stochastic approximation algorithm to better explore highly multimodal distributions, which considerably improves parameter estimates in large, densely-connected MRF’s. Our results on MNIST and NORB datasets demonstrate that we can successfully learn good generative models of high-dimensional, richly structured data that perform well on digit and object recognition tasks.

5 0.10240141 123 nips-2009-Large Scale Nonparametric Bayesian Inference: Data Parallelisation in the Indian Buffet Process

Author: Finale Doshi-velez, Shakir Mohamed, Zoubin Ghahramani, David A. Knowles

Abstract: Nonparametric Bayesian models provide a framework for ﬂexible probabilistic modelling of complex datasets. Unfortunately, the high-dimensional averages required for Bayesian methods can be slow, especially with the unbounded representations used by nonparametric models. We address the challenge of scaling Bayesian inference to the increasingly large datasets found in real-world applications. We focus on parallelisation of inference in the Indian Buffet Process (IBP), which allows data points to have an unbounded number of sparse latent features. Our novel MCMC sampler divides a large data set between multiple processors and uses message passing to compute the global likelihoods and posteriors. This algorithm, the ﬁrst parallel inference scheme for IBP-based models, scales to datasets orders of magnitude larger than have previously been possible. 1

6 0.09694875 10 nips-2009-A Gaussian Tree Approximation for Integer Least-Squares

7 0.093045592 18 nips-2009-A Stochastic approximation method for inference in probabilistic graphical models

8 0.084922664 141 nips-2009-Local Rules for Global MAP: When Do They Work ?

9 0.082537889 31 nips-2009-An LP View of the M-best MAP problem

10 0.080400936 167 nips-2009-Non-Parametric Bayesian Dictionary Learning for Sparse Image Representations

11 0.080288745 41 nips-2009-Bayesian Source Localization with the Multivariate Laplace Prior

12 0.08009275 235 nips-2009-Structural inference affects depth perception in the context of potential occlusion

13 0.079093687 35 nips-2009-Approximating MAP by Compensating for Structural Relaxations

14 0.074339546 242 nips-2009-The Infinite Partially Observable Markov Decision Process

15 0.072002538 215 nips-2009-Sensitivity analysis in HMMs with application to likelihood maximization

16 0.071311675 228 nips-2009-Speeding up Magnetic Resonance Image Acquisition by Bayesian Multi-Slice Adaptive Compressed Sensing

17 0.070862688 256 nips-2009-Which graphical models are difficult to learn?

18 0.068784669 103 nips-2009-Graph Zeta Function in the Bethe Free Energy and Loopy Belief Propagation

19 0.066045821 129 nips-2009-Learning a Small Mixture of Trees

20 0.065882333 30 nips-2009-An Integer Projected Fixed Point Method for Graph Matching and MAP Inference

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.196), (1, 0.025), (2, 0.04), (3, -0.064), (4, 0.056), (5, -0.062), (6, 0.113), (7, 0.043), (8, -0.084), (9, -0.164), (10, -0.083), (11, 0.09), (12, 0.001), (13, 0.005), (14, 0.123), (15, 0.017), (16, -0.074), (17, 0.037), (18, 0.114), (19, -0.172), (20, -0.06), (21, -0.124), (22, 0.066), (23, 0.022), (24, 0.161), (25, -0.034), (26, 0.077), (27, -0.015), (28, -0.201), (29, 0.048), (30, -0.036), (31, 0.037), (32, 0.01), (33, -0.141), (34, 0.115), (35, 0.055), (36, -0.025), (37, -0.018), (38, 0.0), (39, -0.078), (40, -0.146), (41, 0.09), (42, 0.057), (43, 0.05), (44, -0.01), (45, -0.082), (46, -0.172), (47, 0.024), (48, 0.07), (49, 0.054)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.94415092 187 nips-2009-Particle-based Variational Inference for Continuous Systems

Author: Andrew Frank, Padhraic Smyth, Alexander T. Ihler

2 0.73005825 101 nips-2009-Generalization Errors and Learning Curves for Regression with Multi-task Gaussian Processes

Author: Kian M. Chai

3 0.59321344 132 nips-2009-Learning in Markov Random Fields using Tempered Transitions

Author: Ruslan Salakhutdinov

4 0.5435558 39 nips-2009-Bayesian Belief Polarization

Author: Alan Jern, Kai-min Chang, Charles Kemp

5 0.54339606 10 nips-2009-A Gaussian Tree Approximation for Integer Least-Squares

Author: Jacob Goldberger, Amir Leshem

Abstract: This paper proposes a new algorithm for the linear least squares problem where the unknown variables are constrained to be in a ﬁnite set. The factor graph that corresponds to this problem is very loopy; in fact, it is a complete graph. Hence, applying the Belief Propagation (BP) algorithm yields very poor results. The algorithm described here is based on an optimal tree approximation of the Gaussian density of the unconstrained linear system. It is shown that even though the approximation is not directly applied to the exact discrete distribution, applying the BP algorithm to the modiﬁed factor graph outperforms current methods in terms of both performance and complexity. The improved performance of the proposed algorithm is demonstrated on the problem of MIMO detection.

6 0.46891674 235 nips-2009-Structural inference affects depth perception in the context of potential occlusion

7 0.44590607 188 nips-2009-Perceptual Multistability as Markov Chain Monte Carlo Inference

8 0.41608176 30 nips-2009-An Integer Projected Fixed Point Method for Graph Matching and MAP Inference

9 0.41446459 141 nips-2009-Local Rules for Global MAP: When Do They Work ?

10 0.40501526 103 nips-2009-Graph Zeta Function in the Bethe Free Energy and Loopy Belief Propagation

11 0.38165152 18 nips-2009-A Stochastic approximation method for inference in probabilistic graphical models

12 0.38080978 123 nips-2009-Large Scale Nonparametric Bayesian Inference: Data Parallelisation in the Indian Buffet Process

13 0.37367123 35 nips-2009-Approximating MAP by Compensating for Structural Relaxations

14 0.34132648 11 nips-2009-A General Projection Property for Distribution Families

15 0.34066975 215 nips-2009-Sensitivity analysis in HMMs with application to likelihood maximization

16 0.33264285 124 nips-2009-Lattice Regression

17 0.33199909 97 nips-2009-Free energy score space

18 0.3319979 228 nips-2009-Speeding up Magnetic Resonance Image Acquisition by Bayesian Multi-Slice Adaptive Compressed Sensing

19 0.32882488 31 nips-2009-An LP View of the M-best MAP problem

20 0.32112941 129 nips-2009-Learning a Small Mixture of Trees

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(21, 0.019), (24, 0.059), (25, 0.07), (35, 0.094), (36, 0.108), (39, 0.056), (58, 0.06), (61, 0.02), (66, 0.248), (71, 0.062), (81, 0.026), (86, 0.063), (91, 0.027)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.91559237 114 nips-2009-Indian Buffet Processes with Power-law Behavior

Author: Yee W. Teh, Dilan Gorur

Abstract: The Indian buffet process (IBP) is an exchangeable distribution over binary matrices used in Bayesian nonparametric featural models. In this paper we propose a three-parameter generalization of the IBP exhibiting power-law behavior. We achieve this by generalizing the beta process (the de Finetti measure of the IBP) to the stable-beta process and deriving the IBP corresponding to it. We ﬁnd interesting relationships between the stable-beta process and the Pitman-Yor process (another stochastic process used in Bayesian nonparametric models with interesting power-law properties). We derive a stick-breaking construction for the stable-beta process, and ﬁnd that our power-law IBP is a good model for word occurrences in document corpora. 1

2 0.83425486 172 nips-2009-Nonparametric Bayesian Texture Learning and Synthesis

Author: Long Zhu, Yuanahao Chen, Bill Freeman, Antonio Torralba

Abstract: We present a nonparametric Bayesian method for texture learning and synthesis. A texture image is represented by a 2D Hidden Markov Model (2DHMM) where the hidden states correspond to the cluster labeling of textons and the transition matrix encodes their spatial layout (the compatibility between adjacent textons). The 2DHMM is coupled with the Hierarchical Dirichlet process (HDP) which allows the number of textons and the complexity of transition matrix grow as the input texture becomes irregular. The HDP makes use of Dirichlet process prior which favors regular textures by penalizing the model complexity. This framework (HDP-2DHMM) learns the texton vocabulary and their spatial layout jointly and automatically. The HDP-2DHMM results in a compact representation of textures which allows fast texture synthesis with comparable rendering quality over the state-of-the-art patch-based rendering methods. We also show that the HDP2DHMM can be applied to perform image segmentation and synthesis. The preliminary results suggest that HDP-2DHMM is generally useful for further applications in low-level vision problems. 1

same-paper 3 0.83094442 187 nips-2009-Particle-based Variational Inference for Continuous Systems

Author: Andrew Frank, Padhraic Smyth, Alexander T. Ihler

4 0.8120532 101 nips-2009-Generalization Errors and Learning Curves for Regression with Multi-task Gaussian Processes

Author: Kian M. Chai

5 0.74276412 162 nips-2009-Neural Implementation of Hierarchical Bayesian Inference by Importance Sampling

Author: Lei Shi, Thomas L. Griffiths

Abstract: The goal of perception is to infer the hidden states in the hierarchical process by which sensory data are generated. Human behavior is consistent with the optimal statistical solution to this problem in many tasks, including cue combination and orientation detection. Understanding the neural mechanisms underlying this behavior is of particular importance, since probabilistic computations are notoriously challenging. Here we propose a simple mechanism for Bayesian inference which involves averaging over a few feature detection neurons which ﬁre at a rate determined by their similarity to a sensory stimulus. This mechanism is based on a Monte Carlo method known as importance sampling, commonly used in computer science and statistics. Moreover, a simple extension to recursive importance sampling can be used to perform hierarchical Bayesian inference. We identify a scheme for implementing importance sampling with spiking neurons, and show that this scheme can account for human behavior in cue combination and the oblique effect. 1

6 0.66649979 174 nips-2009-Nonparametric Latent Feature Models for Link Prediction

7 0.66493571 123 nips-2009-Large Scale Nonparametric Bayesian Inference: Data Parallelisation in the Indian Buffet Process

8 0.66155845 29 nips-2009-An Infinite Factor Model Hierarchy Via a Noisy-Or Mechanism

9 0.65591967 217 nips-2009-Sharing Features among Dynamical Systems with Beta Processes

10 0.65120119 158 nips-2009-Multi-Label Prediction via Sparse Infinite CCA

11 0.64533848 40 nips-2009-Bayesian Nonparametric Models on Decomposable Graphs

12 0.64185154 215 nips-2009-Sensitivity analysis in HMMs with application to likelihood maximization

13 0.61998433 250 nips-2009-Training Factor Graphs with Reinforcement Learning for Efficient MAP Inference

14 0.61614925 131 nips-2009-Learning from Neighboring Strokes: Combining Appearance and Context for Multi-Domain Sketch Recognition

15 0.6159336 18 nips-2009-A Stochastic approximation method for inference in probabilistic graphical models

16 0.61523402 132 nips-2009-Learning in Markov Random Fields using Tempered Transitions

17 0.61348087 129 nips-2009-Learning a Small Mixture of Trees

18 0.6129685 41 nips-2009-Bayesian Source Localization with the Multivariate Laplace Prior

19 0.61232471 113 nips-2009-Improving Existing Fault Recovery Policies

20 0.61050344 97 nips-2009-Free energy score space