nips nips2008 nips2008-89 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Tom Minka, John Winn
Abstract: Gates are a new notation for representing mixture models and context-sensitive independence in factor graphs. Factor graphs provide a natural representation for message-passing algorithms, such as expectation propagation. However, message passing in mixture models is not well captured by factor graphs unless the entire mixture is represented by one factor, because the message equations have a containment structure. Gates capture this containment structure graphically, allowing both the independences and the message-passing equations for a model to be readily visualized. Different variational approximations for mixture models can be understood as different ways of drawing the gates in a model. We present general equations for expectation propagation and variational message passing in the presence of gates. 1
Reference: text
sentIndex sentText sentNum sentScore
1 However, message passing in mixture models is not well captured by factor graphs unless the entire mixture is represented by one factor, because the message equations have a containment structure. [sent-5, score-0.68]
2 Different variational approximations for mixture models can be understood as different ways of drawing the gates in a model. [sent-7, score-0.498]
3 We present general equations for expectation propagation and variational message passing in the presence of gates. [sent-8, score-0.32]
4 Section 2 describes what a gate is and shows how it can be used to represent context-specific independencies in a number of example models. [sent-17, score-0.719]
5 Section 3 motivates the use of gates for inference and section 4 expands on this by showing how gates can be used within three standard inference algorithms: Expectation Propagation (EP), Variational Message Passing (VMP) and Gibbs sampling. [sent-18, score-0.868]
6 Section 5 shows how the placement of gates can tradeoff cost versus accuracy of inference. [sent-19, score-0.413]
7 Section 6 discusses the use of gates to implement inference algorithms. [sent-20, score-0.434]
8 N x x Figure 1: Gate examples (a) The dashed rectangle indicates a gate containing a Gaussian factor, with selector variable c. [sent-23, score-0.986]
9 (b) Two gates with different key values used to construct a mixture of two Gaussians. [sent-24, score-0.489]
10 (c) When multiple gates share a selector variable, they can be drawn touching with the selector variable connected to only one of the gates. [sent-25, score-1.034]
11 (d) A mixture of N Gaussians constructed using both a gate and a plate. [sent-26, score-0.72]
12 2 The Gate A gate encloses part of a factor graph and switches it on or off depending on the state of a latent selector variable. [sent-28, score-1.033]
13 The gate is on when the selector variable has a particular value, called the key, and off for all other values. [sent-29, score-0.986]
14 A gate allows context-specific independencies to be made explicit in the graphical model: the dependencies represented by any factors inside the gate are present only in the context of the selector variable having the key value. [sent-30, score-1.959]
15 Mathematically, a gate represents raising the δ(c=key) contained factors to the power zero if the gate is off, or one if it is on: ( i fi (x)) where c is the selector variable. [sent-31, score-1.72]
16 In diagrams, a gate is denoted by a dashed box labelled with the value of key, with the selector variable connected to the box boundary. [sent-32, score-1.009]
17 Whilst the examples in this paper refer to factor graphs, gate notation can also be used in both directed Bayesian networks and undirected graphs. [sent-34, score-0.794]
18 This example represents the term N (x; m, p−1 )δ(c=true) so that when c is true the gate is on and x has a Gaussian distribution with mean m and precision p. [sent-36, score-0.671]
19 Otherwise, the gate is off and x is uniformly distributed (since it is connected to nothing). [sent-37, score-0.694]
20 By using several gates with different key values, multiple components of a mixture can be represented. [sent-38, score-0.489]
21 Figure 1b shows how a mixture of two Gaussians can be represented using two gates with different key values, true and false. [sent-39, score-0.489]
22 When multiple gates have the same selector variable but differ2 ent key values, they can be drawn as in figure 1c, with the gate rectangles touching and the selector variable connected to only one of the gates. [sent-41, score-1.788]
23 Notice that in this example, an integer selector variable is used and the key values are the integers 1,2,3. [sent-42, score-0.342]
24 For large homogeneous mixtures, gates can be used in conjunction with plates [6]. [sent-43, score-0.431]
25 For example, figure 1d shows how a mixture of N Gaussians can be represented by placing the gate, Gaussian factor and mean/precision variables inside a plate, so that they are replicated N times. [sent-44, score-0.271]
26 To avoid ambiguities, gates cannot partially overlap, nor can a gate contain its own selector variable. [sent-46, score-1.343]
27 N Figure 2: Examples of models which use gates (a) A line process where neighboring pixel intensities are independent if an edge exists between them. [sent-50, score-0.465]
28 The selector variable c encodes whether the linear dependency represented by the structure inside the gate is present or absent. [sent-52, score-1.084]
29 Such variables have the behaviour that, when the gate is off, they revert to having a default value of false or zero, depending on the variable type. [sent-54, score-0.783]
30 Mathematically, a variable inside a gate represents a Dirac delta when the gate is off: δ(x)1−δ(c=key) where δ(x) is one only when x has its default value. [sent-55, score-1.514]
31 Figure 2b shows an example where variables are contained in gates – this example is described in the following section. [sent-56, score-0.452]
32 1 Examples of models with gates Figure 2a shows a line process from [7]. [sent-58, score-0.413]
33 The use of gates makes clear the assumption that two neighboring image pixels xi and xj have a dependency between their intensity values, unless there is an edge eij between them. [sent-59, score-0.551]
34 In this case the selector variable is connected only to the gate, as shown in the example of figure 2b. [sent-62, score-0.338]
35 The binary selector variable c switches on or off a linear model of the genetic variant’s contribution yn to the trait xn , across all individuals. [sent-65, score-0.45]
36 When the gate is off, yn reverts to the default value of 0 and so the trait is explained only by a Gaussian-distributed background model zn . [sent-66, score-0.779]
37 3 How gates arise from message-passing on mixture models Factor graph notation arises naturally when describing message passing algorithms, such as the sum-product algorithm. [sent-68, score-0.744]
38 Similarly, the gate notation arises naturally when considering the behavior of message passing algorithms on mixture models. [sent-69, score-1.002]
39 For example, the update for q(mk ) can be interpreted as (message from prior)×(blurred message from f ). [sent-73, score-0.22]
40 The update for q(x) can be interpreted as (blurred message from m1 )×(blurred message from m2 ). [sent-74, score-0.421]
41 Blurring occurs whenever a message is sent from a factor having a random exponent to a factor without that exponent. [sent-75, score-0.401]
42 Hence, we use a graphical notation where a gate is a container, holding all the factors switched by the gate. [sent-77, score-0.813]
43 Graphically, the blurring operation then happens whenever a message leaves a gate. [sent-78, score-0.274]
44 Messages passed into a gate and within a gate are unchanged. [sent-79, score-1.342]
45 For example, EP on this model will blur the message from f to mk and from f to x, where “blurring” means a linear combination with the 1 function followed by KL-projection. [sent-81, score-0.374]
46 1 Why gates are not equivalent to ‘pick’ factors It is possible to rewrite this model so that the f factors do not have exponents, and therefore would not be in gates. [sent-83, score-0.561]
47 This is because the blurring effect caused by exponents operates in one direction only, while the blurring effect caused by intermediate factors is always bidirectional. [sent-85, score-0.238]
48 The pick factor will correctly blur the downward messages from (m1 , m2 ) to x. [sent-88, score-0.348]
49 However, the pick factor will also blur the message upward from x before it reaches the factor f , which is incorrect. [sent-89, score-0.459]
50 In this case, the message from x to f is not blurred, and the upward messages to (m1 , m2 ) are blurred, which is correct. [sent-91, score-0.403]
51 However, the downward messages from (m1 , m2 ) to f are blurred before reaching f , which is incorrect. [sent-92, score-0.272]
52 2 Variables inside gates Now consider an example where it is natural to consider a variable to be inside a gate. [sent-94, score-0.665]
53 The message from the gate to c can be interpreted as the evidence for the submodel containing f1 , f2 , and y. [sent-98, score-0.962]
54 4 4 Inference with gates In the previous section, we explained why the gate notation arises when performing message passing in some example mixture models. [sent-99, score-1.415]
55 In this section, we describe how gate notation can be generally incorporated into Variational Message Passing [10], Expectation Propagation [11] and Gibbs Sampling [7] to allow each of these algorithms to support context-specific independence. [sent-100, score-0.706]
56 Notice that VMP uses different messages to and from deterministic factors, that is, factors which have the form fa (xi , xa\i ) = δ(xi − h(xa\i )) where xi is the derived child variable. [sent-102, score-0.486]
57 When performing inference on models with gates, it is useful to employ a normalised form of gate model. [sent-107, score-0.73]
58 In this form, variables inside a gate have no links to factors outside the gate, and a variable outside a gate links to at most one factor inside the gate. [sent-108, score-1.899]
59 Both of these requirements can be achieved by splitting a variable into a copy inside and a copy outside the gate, connected by an equality factor inside the gate. [sent-109, score-0.436]
60 A factor inside a gate should not connect to the selector of the gate; it should be given the key value instead. [sent-110, score-1.143]
61 In addition, gates should be balanced by ensuring that if a variable links Alg. [sent-111, score-0.49]
62 Type Variable to factor Factor to variable mi→a (xi ) ma→i (xi ) proj EP xa \xi mb→i (xi ) Any j∈a mi→a (xi ) b=a VMP ma→i (xi ) Stochastic exp a∋i xa \xi mb→i (xi ) Det. [sent-112, score-0.52]
63 to child mj→a (xj ) fa (xa ) sa = xa xa sa = exp xa ( mj→a (xj ))fa (xa ) mj→a (xj )ma→j (xj ) j∈a j∈a j∈a mj→a (xj ) log fa (xa ) Table 1: Messages and evidence computations for EP and VMP The top part of the table shows messages between a variable xi and a factor fa . [sent-115, score-1.36]
64 The notation j ∈ a refers to all neighbors of the factor, j = i is all neighbors except i, par is the parent factor of a derived variable, and ch is the child variable of a deterministic factor. [sent-116, score-0.252]
65 5 to a factor in a gate with selector variable c, the variable also links to factors in gates keyed on all other values of the selector variable c. [sent-119, score-1.953]
66 This can be achieved by connecting the variable to uniform factors in gates for any missing values of c. [sent-120, score-0.543]
67 After balancing, each gate is part of a gate block – a set of gates activated by different values of the same condition variable. [sent-121, score-1.791]
68 1 Variational Message Passing with gates VMP can be augmented to run on a gate model in normalised form, by changing only the messages out of the gate and by introducing messages from the gate to the selector variable. [sent-124, score-3.085]
69 Messages sent between nodes inside the gate and messages into the gate are unchanged from standard VMP. [sent-125, score-1.645]
70 The variational distributions for variables inside gates are implicitly conditioned on the gate selector, as at the end of section 3. [sent-126, score-1.237]
71 In the following, an individual gate is denoted g, its selector variable c and its key kg . [sent-127, score-1.107]
72 The messages out of a gate are modified as follows: • The message from a factor fa inside a gate g with selector c to a variable outside g is the usual VMP message, raised to the power mc→g (c = kg ), except in the following case. [sent-129, score-2.454]
73 • Where a variable xi is the child of a number of deterministic factors inside a gate block G with selector variable c, the variable is treated as derived and the message is a momentmatched average of the individual VMP messages. [sent-130, score-1.644]
74 Then the message to xi is mG→i (xi ) = proj g∈G mc→g (c = kg )mg→i (xi ) (8) where mg→i (xi ) is the usual VMP message from the unique parent factor in g and proj is a moment-matching projection onto the exponential family. [sent-131, score-0.824]
75 The message from a gate g to its selector variable c is a product of evidence messages from the contained nodes: mg→c (c = kg ) = sa a∈g mg→c (c = kg ) = 1 si , (9) i∈g where sa and si are the VMP evidence messages from a factor and variable, respectively (Table 1). [sent-132, score-2.129]
76 Deterministic variables and factors send evidence messages of 1, except where a deterministic factor fa parents a variable xi outside g. [sent-134, score-0.709]
77 To allow for nested gates, we must also define an evidence message for a gate: q(c=kg ) sg = a∈g 4. [sent-136, score-0.315]
78 2 sa i∈g si (12) Expectation Propagation with gates As with VMP, EP can support gate models in normalised form by making small modifications to the message-passing rules. [sent-137, score-1.217]
79 Once again, messages between nodes inside a gate are unchanged. [sent-138, score-0.95]
80 Recall that, following gate balancing, all gates are part of gate blocks. [sent-139, score-1.755]
81 In the following, an individual gate is denoted g, its selector variable c and its key kg . [sent-140, score-1.107]
82 6 The messages into a gate are as follows: • The message from a selector variable to each gate in a gate block G is the same. [sent-142, score-2.746]
83 It is the product of all messages into the variable excluding messages from gates in G. [sent-143, score-0.831]
84 • The message from a variable to each neighboring factor inside a gate block G is the same. [sent-144, score-1.188]
85 It is product of all messages into the variable excluding messages from any factor in G. [sent-145, score-0.506]
86 Each gate computes an intermediate evidence-like quantity sg defined as: sg = sa a∈g si i∈g sig where sig = mi→g (xi )mg→i (xi ) (13) xi i∈nbrs(g) where mg→i is the usual EP message to xi from its (unique) neighboring factor in g. [sent-147, score-1.429]
87 ) • The message from a gate block G to its selector variable c is: sg mG→c (c = kg ) = g∈G sg Finally, the evidence contribution of a gate block with selector c is: g∈G sg sc = i∈nbrs(g) xi mi→g (xi )mG→i (xi ) 4. [sent-150, score-2.611]
88 3 (14) (15) (16) Gibbs sampling with gates Gibbs sampling can easily extend to gates which contain only factors. [sent-151, score-0.826]
89 If the factor is in a gate that is currently off, replace with a uniform distribution. [sent-158, score-0.759]
90 For a gate g with selector xi , the conditional distribution is proportional to s for the key value and 1 otherwise, where s is the product of all factors in g. [sent-159, score-1.111]
91 5 Enlarging gates to increase approximation accuracy Gates induce a structured approximation as in [9], so by moving nodes inside or outside of gates, you can trade off inference accuracy versus cost. [sent-163, score-0.592]
92 Because one gate of a gate block is always on, any node (variable or factor) outside a gate block G can be equivalently placed inside each gate of G. [sent-164, score-2.895]
93 Their modification can be viewed as a gate enlargement (figure 3). [sent-167, score-0.671]
94 By enlarging the gate block to include unm , the blurring between the multiplication factor and unm is removed, increasing accuracy. [sent-168, score-1.003]
95 This comes at no additional cost since unm is only used by one gate and therefore only one message is needed per n and m. [sent-169, score-0.924]
96 N Figure 3: Student-t mixture model using gates (a) Model from [13] (b) Structured approximation suggested by [14], which can be interpreted as enlarging the gate. [sent-178, score-0.512]
97 Gates are also used for computing the evidence for a model, by placing the entire model in a gate with binary selector variable b. [sent-181, score-1.05]
98 Similarly, gates are used for model comparison by placing each model in a different gate of a gate block. [sent-183, score-1.772]
99 The marginal over the selector gives the posterior distribution over models. [sent-184, score-0.259]
100 We have shown that gates are similarly effective both as a graphical modelling notation and as a construct within an inference algorithm. [sent-186, score-0.502]
wordName wordTfidf (topN-words)
[('gate', 0.671), ('gates', 0.413), ('selector', 0.259), ('message', 0.201), ('messages', 0.181), ('vmp', 0.168), ('mk', 0.144), ('xa', 0.138), ('mg', 0.117), ('inside', 0.098), ('kg', 0.094), ('fa', 0.094), ('factor', 0.088), ('xi', 0.08), ('factors', 0.074), ('blurred', 0.073), ('blurring', 0.073), ('proj', 0.072), ('sg', 0.067), ('sa', 0.065), ('trait', 0.063), ('variable', 0.056), ('mi', 0.053), ('unm', 0.052), ('ep', 0.05), ('mixture', 0.049), ('mj', 0.049), ('independencies', 0.048), ('evidence', 0.047), ('passing', 0.046), ('outside', 0.041), ('normalised', 0.038), ('neighboring', 0.038), ('genetic', 0.037), ('variational', 0.036), ('block', 0.036), ('nbrs', 0.036), ('notation', 0.035), ('graphical', 0.033), ('pick', 0.032), ('enlarging', 0.031), ('si', 0.03), ('blur', 0.029), ('deterministic', 0.029), ('child', 0.028), ('gibbs', 0.028), ('exp', 0.028), ('zn', 0.027), ('key', 0.027), ('log', 0.026), ('raising', 0.025), ('sent', 0.024), ('diagrams', 0.024), ('container', 0.024), ('containment', 0.024), ('contingent', 0.024), ('mpar', 0.024), ('submodel', 0.024), ('touching', 0.024), ('usable', 0.024), ('xch', 0.024), ('gn', 0.023), ('connected', 0.023), ('dependencies', 0.022), ('graphs', 0.022), ('propagation', 0.022), ('gamma', 0.021), ('links', 0.021), ('mc', 0.021), ('upward', 0.021), ('sig', 0.021), ('archambeau', 0.021), ('inference', 0.021), ('xj', 0.02), ('contained', 0.02), ('xn', 0.02), ('variables', 0.019), ('false', 0.019), ('winn', 0.019), ('structured', 0.019), ('interpreted', 0.019), ('sending', 0.018), ('exponents', 0.018), ('downward', 0.018), ('conjunction', 0.018), ('default', 0.018), ('gure', 0.017), ('ma', 0.017), ('microsoft', 0.017), ('placing', 0.017), ('graphically', 0.016), ('balancing', 0.016), ('minka', 0.016), ('bayesian', 0.016), ('parent', 0.016), ('copy', 0.016), ('associations', 0.015), ('switches', 0.015), ('expectation', 0.015), ('intensities', 0.014)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999863 89 nips-2008-Gates
Author: Tom Minka, John Winn
Abstract: Gates are a new notation for representing mixture models and context-sensitive independence in factor graphs. Factor graphs provide a natural representation for message-passing algorithms, such as expectation propagation. However, message passing in mixture models is not well captured by factor graphs unless the entire mixture is represented by one factor, because the message equations have a containment structure. Gates capture this containment structure graphically, allowing both the independences and the message-passing equations for a model to be readily visualized. Different variational approximations for mixture models can be understood as different ways of drawing the gates in a model. We present general equations for expectation propagation and variational message passing in the presence of gates. 1
2 0.12502657 40 nips-2008-Bounds on marginal probability distributions
Author: Joris M. Mooij, Hilbert J. Kappen
Abstract: We propose a novel bound on single-variable marginal probability distributions in factor graphs with discrete variables. The bound is obtained by propagating local bounds (convex sets of probability distributions) over a subtree of the factor graph, rooted in the variable of interest. By construction, the method not only bounds the exact marginal probability distribution of a variable, but also its approximate Belief Propagation marginal (“belief”). Thus, apart from providing a practical means to calculate bounds on marginals, our contribution also lies in providing a better understanding of the error made by Belief Propagation. We show that our bound outperforms the state-of-the-art on some inference problems arising in medical diagnosis. 1
3 0.098025411 209 nips-2008-Short-Term Depression in VLSI Stochastic Synapse
Author: Peng Xu, Timothy K. Horiuchi, Pamela A. Abshire
Abstract: We report a compact realization of short-term depression (STD) in a VLSI stochastic synapse. The behavior of the circuit is based on a subtractive single release model of STD. Experimental results agree well with simulation and exhibit expected STD behavior: the transmitted spike train has negative autocorrelation and lower power spectral density at low frequencies which can remove redundancy in the input spike train, and the mean transmission probability is inversely proportional to the input spike rate which has been suggested as an automatic gain control mechanism in neural systems. The dynamic stochastic synapse could potentially be a powerful addition to existing deterministic VLSI spiking neural systems. 1
4 0.092522688 49 nips-2008-Clusters and Coarse Partitions in LP Relaxations
Author: David Sontag, Amir Globerson, Tommi S. Jaakkola
Abstract: We propose a new class of consistency constraints for Linear Programming (LP) relaxations for finding the most probable (MAP) configuration in graphical models. Usual cluster-based LP relaxations enforce joint consistency on the beliefs of a cluster of variables, with computational cost increasing exponentially with the size of the clusters. By partitioning the state space of a cluster and enforcing consistency only across partitions, we obtain a class of constraints which, although less tight, are computationally feasible for large clusters. We show how to solve the cluster selection and partitioning problem monotonically in the dual LP, using the current beliefs to guide these choices. We obtain a dual message passing algorithm and apply it to protein design problems where the variables have large state spaces and the usual cluster-based relaxations are very costly. The resulting method solves many of these problems exactly, and significantly faster than a method that does not use partitioning. 1
5 0.09085983 50 nips-2008-Continuously-adaptive discretization for message-passing algorithms
Author: Michael Isard, John MacCormick, Kannan Achan
Abstract: Continuously-Adaptive Discretization for Message-Passing (CAD-MP) is a new message-passing algorithm for approximate inference. Most message-passing algorithms approximate continuous probability distributions using either: a family of continuous distributions such as the exponential family; a particle-set of discrete samples; or a fixed, uniform discretization. In contrast, CAD-MP uses a discretization that is (i) non-uniform, and (ii) adaptive to the structure of the marginal distributions. Non-uniformity allows CAD-MP to localize interesting features (such as sharp peaks) in the marginal belief distributions with time complexity that scales logarithmically with precision, as opposed to uniform discretization which scales at best linearly. We give a principled method for altering the non-uniform discretization according to information-based measures. CAD-MP is shown in experiments to estimate marginal beliefs much more precisely than competing approaches for the same computational expense. 1
6 0.080308184 46 nips-2008-Characterizing response behavior in multisensory perception with conflicting cues
7 0.067701034 235 nips-2008-The Infinite Hierarchical Factor Regression Model
8 0.065323435 223 nips-2008-Structure Learning in Human Sequential Decision-Making
9 0.049449503 105 nips-2008-Improving on Expectation Propagation
10 0.046908088 166 nips-2008-On the asymptotic equivalence between differential Hebbian and temporal difference learning using a local third factor
11 0.045744333 78 nips-2008-Exact Convex Confidence-Weighted Learning
12 0.043936457 129 nips-2008-MAS: a multiplicative approximation scheme for probabilistic inference
13 0.04182085 128 nips-2008-Look Ma, No Hands: Analyzing the Monotonic Feature Abstraction for Text Classification
14 0.03786533 216 nips-2008-Sparse probabilistic projections
15 0.035710726 179 nips-2008-Phase transitions for high-dimensional joint support recovery
16 0.032820199 191 nips-2008-Recursive Segmentation and Recognition Templates for 2D Parsing
17 0.032481343 97 nips-2008-Hierarchical Fisher Kernels for Longitudinal Data
18 0.030240057 138 nips-2008-Modeling human function learning with Gaussian processes
19 0.029136892 201 nips-2008-Robust Near-Isometric Matching via Structured Learning of Graphical Models
20 0.028921505 150 nips-2008-Near-optimal Regret Bounds for Reinforcement Learning
topicId topicWeight
[(0, -0.101), (1, 0.019), (2, 0.014), (3, 0.019), (4, 0.02), (5, -0.088), (6, 0.014), (7, 0.02), (8, -0.003), (9, -0.048), (10, -0.008), (11, 0.075), (12, 0.084), (13, -0.045), (14, -0.09), (15, 0.024), (16, 0.064), (17, -0.048), (18, 0.015), (19, 0.004), (20, 0.086), (21, -0.065), (22, -0.145), (23, 0.005), (24, 0.021), (25, -0.088), (26, 0.077), (27, -0.057), (28, 0.017), (29, 0.02), (30, -0.152), (31, 0.014), (32, 0.007), (33, 0.023), (34, -0.197), (35, 0.065), (36, 0.011), (37, 0.037), (38, -0.037), (39, -0.079), (40, 0.037), (41, 0.108), (42, -0.135), (43, -0.016), (44, 0.046), (45, -0.113), (46, 0.098), (47, -0.076), (48, 0.075), (49, 0.071)]
simIndex simValue paperId paperTitle
same-paper 1 0.96004844 89 nips-2008-Gates
Author: Tom Minka, John Winn
Abstract: Gates are a new notation for representing mixture models and context-sensitive independence in factor graphs. Factor graphs provide a natural representation for message-passing algorithms, such as expectation propagation. However, message passing in mixture models is not well captured by factor graphs unless the entire mixture is represented by one factor, because the message equations have a containment structure. Gates capture this containment structure graphically, allowing both the independences and the message-passing equations for a model to be readily visualized. Different variational approximations for mixture models can be understood as different ways of drawing the gates in a model. We present general equations for expectation propagation and variational message passing in the presence of gates. 1
2 0.69590294 40 nips-2008-Bounds on marginal probability distributions
Author: Joris M. Mooij, Hilbert J. Kappen
Abstract: We propose a novel bound on single-variable marginal probability distributions in factor graphs with discrete variables. The bound is obtained by propagating local bounds (convex sets of probability distributions) over a subtree of the factor graph, rooted in the variable of interest. By construction, the method not only bounds the exact marginal probability distribution of a variable, but also its approximate Belief Propagation marginal (“belief”). Thus, apart from providing a practical means to calculate bounds on marginals, our contribution also lies in providing a better understanding of the error made by Belief Propagation. We show that our bound outperforms the state-of-the-art on some inference problems arising in medical diagnosis. 1
3 0.58005768 50 nips-2008-Continuously-adaptive discretization for message-passing algorithms
Author: Michael Isard, John MacCormick, Kannan Achan
Abstract: Continuously-Adaptive Discretization for Message-Passing (CAD-MP) is a new message-passing algorithm for approximate inference. Most message-passing algorithms approximate continuous probability distributions using either: a family of continuous distributions such as the exponential family; a particle-set of discrete samples; or a fixed, uniform discretization. In contrast, CAD-MP uses a discretization that is (i) non-uniform, and (ii) adaptive to the structure of the marginal distributions. Non-uniformity allows CAD-MP to localize interesting features (such as sharp peaks) in the marginal belief distributions with time complexity that scales logarithmically with precision, as opposed to uniform discretization which scales at best linearly. We give a principled method for altering the non-uniform discretization according to information-based measures. CAD-MP is shown in experiments to estimate marginal beliefs much more precisely than competing approaches for the same computational expense. 1
4 0.54611766 49 nips-2008-Clusters and Coarse Partitions in LP Relaxations
Author: David Sontag, Amir Globerson, Tommi S. Jaakkola
Abstract: We propose a new class of consistency constraints for Linear Programming (LP) relaxations for finding the most probable (MAP) configuration in graphical models. Usual cluster-based LP relaxations enforce joint consistency on the beliefs of a cluster of variables, with computational cost increasing exponentially with the size of the clusters. By partitioning the state space of a cluster and enforcing consistency only across partitions, we obtain a class of constraints which, although less tight, are computationally feasible for large clusters. We show how to solve the cluster selection and partitioning problem monotonically in the dual LP, using the current beliefs to guide these choices. We obtain a dual message passing algorithm and apply it to protein design problems where the variables have large state spaces and the usual cluster-based relaxations are very costly. The resulting method solves many of these problems exactly, and significantly faster than a method that does not use partitioning. 1
5 0.50250608 46 nips-2008-Characterizing response behavior in multisensory perception with conflicting cues
Author: Rama Natarajan, Iain Murray, Ladan Shams, Richard S. Zemel
Abstract: We explore a recently proposed mixture model approach to understanding interactions between conflicting sensory cues. Alternative model formulations, differing in their sensory noise models and inference methods, are compared based on their fit to experimental data. Heavy-tailed sensory likelihoods yield a better description of the subjects’ response behavior than standard Gaussian noise models. We study the underlying cause for this result, and then present several testable predictions of these models. 1
6 0.40935779 53 nips-2008-Counting Solution Clusters in Graph Coloring Problems Using Belief Propagation
7 0.38117123 235 nips-2008-The Infinite Hierarchical Factor Regression Model
8 0.37143356 37 nips-2008-Biasing Approximate Dynamic Programming with a Lower Discount Factor
9 0.36612302 129 nips-2008-MAS: a multiplicative approximation scheme for probabilistic inference
10 0.31532145 216 nips-2008-Sparse probabilistic projections
11 0.30171651 78 nips-2008-Exact Convex Confidence-Weighted Learning
12 0.29847455 222 nips-2008-Stress, noradrenaline, and realistic prediction of mouse behaviour using reinforcement learning
13 0.27480701 105 nips-2008-Improving on Expectation Propagation
15 0.26691976 223 nips-2008-Structure Learning in Human Sequential Decision-Making
16 0.26288301 204 nips-2008-Self-organization using synaptic plasticity
17 0.25931731 96 nips-2008-Hebbian Learning of Bayes Optimal Decisions
18 0.24981815 209 nips-2008-Short-Term Depression in VLSI Stochastic Synapse
19 0.24917139 178 nips-2008-Performance analysis for L\ 2 kernel classification
20 0.2478652 2 nips-2008-A Convex Upper Bound on the Log-Partition Function for Binary Distributions
topicId topicWeight
[(6, 0.035), (7, 0.051), (12, 0.019), (15, 0.011), (28, 0.135), (57, 0.075), (59, 0.024), (63, 0.016), (64, 0.011), (77, 0.072), (78, 0.021), (83, 0.023), (85, 0.391)]
simIndex simValue paperId paperTitle
same-paper 1 0.74578708 89 nips-2008-Gates
Author: Tom Minka, John Winn
Abstract: Gates are a new notation for representing mixture models and context-sensitive independence in factor graphs. Factor graphs provide a natural representation for message-passing algorithms, such as expectation propagation. However, message passing in mixture models is not well captured by factor graphs unless the entire mixture is represented by one factor, because the message equations have a containment structure. Gates capture this containment structure graphically, allowing both the independences and the message-passing equations for a model to be readily visualized. Different variational approximations for mixture models can be understood as different ways of drawing the gates in a model. We present general equations for expectation propagation and variational message passing in the presence of gates. 1
2 0.42658871 216 nips-2008-Sparse probabilistic projections
Author: Cédric Archambeau, Francis R. Bach
Abstract: We present a generative model for performing sparse probabilistic projections, which includes sparse principal component analysis and sparse canonical correlation analysis as special cases. Sparsity is enforced by means of automatic relevance determination or by imposing appropriate prior distributions, such as generalised hyperbolic distributions. We derive a variational Expectation-Maximisation algorithm for the estimation of the hyperparameters and show that our novel probabilistic approach compares favourably to existing techniques. We illustrate how the proposed method can be applied in the context of cryptoanalysis as a preprocessing tool for the construction of template attacks. 1
3 0.41302973 200 nips-2008-Robust Kernel Principal Component Analysis
Author: Minh H. Nguyen, Fernando Torre
Abstract: Kernel Principal Component Analysis (KPCA) is a popular generalization of linear PCA that allows non-linear feature extraction. In KPCA, data in the input space is mapped to higher (usually) dimensional feature space where the data can be linearly modeled. The feature space is typically induced implicitly by a kernel function, and linear PCA in the feature space is performed via the kernel trick. However, due to the implicitness of the feature space, some extensions of PCA such as robust PCA cannot be directly generalized to KPCA. This paper presents a technique to overcome this problem, and extends it to a unified framework for treating noise, missing data, and outliers in KPCA. Our method is based on a novel cost function to perform inference in KPCA. Extensive experiments, in both synthetic and real data, show that our algorithm outperforms existing methods. 1
4 0.40902963 118 nips-2008-Learning Transformational Invariants from Natural Movies
Author: Charles Cadieu, Bruno A. Olshausen
Abstract: We describe a hierarchical, probabilistic model that learns to extract complex motion from movies of the natural environment. The model consists of two hidden layers: the first layer produces a sparse representation of the image that is expressed in terms of local amplitude and phase variables. The second layer learns the higher-order structure among the time-varying phase variables. After training on natural movies, the top layer units discover the structure of phase-shifts within the first layer. We show that the top layer units encode transformational invariants: they are selective for the speed and direction of a moving pattern, but are invariant to its spatial structure (orientation/spatial-frequency). The diversity of units in both the intermediate and top layers of the model provides a set of testable predictions for representations that might be found in V1 and MT. In addition, the model demonstrates how feedback from higher levels can influence representations at lower levels as a by-product of inference in a graphical model. 1
5 0.40708816 138 nips-2008-Modeling human function learning with Gaussian processes
Author: Thomas L. Griffiths, Chris Lucas, Joseph Williams, Michael L. Kalish
Abstract: Accounts of how people learn functional relationships between continuous variables have tended to focus on two possibilities: that people are estimating explicit functions, or that they are performing associative learning supported by similarity. We provide a rational analysis of function learning, drawing on work on regression in machine learning and statistics. Using the equivalence of Bayesian linear regression and Gaussian processes, we show that learning explicit rules and using similarity can be seen as two views of one solution to this problem. We use this insight to define a Gaussian process model of human function learning that combines the strengths of both approaches. 1
6 0.40480199 31 nips-2008-Bayesian Exponential Family PCA
7 0.40437379 235 nips-2008-The Infinite Hierarchical Factor Regression Model
8 0.40382078 151 nips-2008-Non-parametric Regression Between Manifolds
9 0.40346494 192 nips-2008-Reducing statistical dependencies in natural signals using radial Gaussianization
10 0.4024322 215 nips-2008-Sparse Signal Recovery Using Markov Random Fields
11 0.4020673 234 nips-2008-The Infinite Factorial Hidden Markov Model
12 0.40202022 231 nips-2008-Temporal Dynamics of Cognitive Control
13 0.40162465 49 nips-2008-Clusters and Coarse Partitions in LP Relaxations
14 0.40113157 197 nips-2008-Relative Performance Guarantees for Approximate Inference in Latent Dirichlet Allocation
15 0.39992854 66 nips-2008-Dynamic visual attention: searching for coding length increments
16 0.39957377 30 nips-2008-Bayesian Experimental Design of Magnetic Resonance Imaging Sequences
17 0.39917213 129 nips-2008-MAS: a multiplicative approximation scheme for probabilistic inference
18 0.39891315 208 nips-2008-Shared Segmentation of Natural Scenes Using Dependent Pitman-Yor Processes
19 0.39891243 63 nips-2008-Dimensionality Reduction for Data in Multiple Feature Representations
20 0.39843851 87 nips-2008-Fitted Q-iteration by Advantage Weighted Regression