nips nips2000 nips2000-15 knowledge-graph by maker-knowledge-mining

15 nips-2000-Accumulator Networks: Suitors of Local Probability Propagation


Source: pdf

Author: Brendan J. Frey, Anitha Kannan

Abstract: One way to approximate inference in richly-connected graphical models is to apply the sum-product algorithm (a.k.a. probability propagation algorithm), while ignoring the fact that the graph has cycles. The sum-product algorithm can be directly applied in Gaussian networks and in graphs for coding, but for many conditional probability functions - including the sigmoid function - direct application of the sum-product algorithm is not possible. We introduce

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Accumulator networks: Suitors of local probability propagation Brendan J. [sent-1, score-0.247]

2 edu/ "-'frey Abstract One way to approximate inference in richly-connected graphical models is to apply the sum-product algorithm (a. [sent-5, score-0.198]

3 probability propagation algorithm), while ignoring the fact that the graph has cycles. [sent-8, score-0.307]

4 The sum-product algorithm can be directly applied in Gaussian networks and in graphs for coding, but for many conditional probability functions - including the sigmoid function - direct application of the sum-product algorithm is not possible. [sent-9, score-0.146]

5 In an accumulator network, the probability of a child given its parents is computed by accumulating the inputs from the parents in a Markov chain or more generally a tree. [sent-11, score-1.021]

6 After giving expressions for inference and learning in accumulator networks, we give results on the "bars problem" and on the problem of extracting translated, overlapping faces from an image. [sent-12, score-0.664]

7 1 Introduction Graphical probability models with hidden variables are capable of representing complex dependencies between variables, filling in missing data and making Bayesoptimal decisions using probabilistic inferences (Hinton and Sejnowski 1986; Pearl 1988; Neal 1992). [sent-13, score-0.158]

8 However, when the number of cycles in the network is large (more precisely, when the cut set size is exponential), exact inference becomes intractable. [sent-15, score-0.317]

9 To cope with the intractability of exact inference, a variety of approximate inference methods have been invented, including Monte Carlo (Hinton and Sejnowski 1986; Neal 1992), Helmholz machines (Dayan et al. [sent-17, score-0.183]

10 The sum-product algorithm passes messages in both directions along the edges in a graphical model and fuses these messages at each vertex to compute an estimate of P(variablelobs), where obs is the assignment of the observed variables. [sent-26, score-0.774]

11 In a directed (b) (c) ••• Xj Ynj(Xj) Ynj(Xj) ~K(Y) ~K(Y) ••• ZK Figure 1: The sum-product algorithm passes messages in both directions along each edge in a Bayesian network. [sent-27, score-0.283]

12 (a) Incoming messages are fused to compute an estimate of P(ylobservations). [sent-29, score-0.376]

13 (b) Messages are combined to produce an outgoing message 7rk(Y). [sent-30, score-0.241]

14 (c) Messages are combined to produce an outgoing message >'j(Xj). [sent-31, score-0.241]

15 graphical model (Bayesian belief network) the message on an edge is a function of the parent of the edge. [sent-34, score-0.264]

16 The messages are initialized to 1 and then the variables are processed in some order or in parallel. [sent-35, score-0.344]

17 Each variable fuses incoming messages and produces outgoing messages, accounting for observations as described below. [sent-36, score-0.467]

18 , Z K are the children of y, messages are fused at y to produce function F(y) as follows (see Fig. [sent-43, score-0.422]

19 , xJ) is the conditional probability function associated with y. [sent-53, score-0.114]

20 If the graph is a tree and if messages are propagated from every variable in the network to y, as described below, the estimate is exact: F(y) = P(y, obs). [sent-54, score-0.482]

21 If the graph has cycles, this inference is approximate. [sent-56, score-0.149]

22 The message 7rk(Y) passed from y to Zk is computed as follows (see Fig. [sent-57, score-0.137]

23 The message Aj(Xj) passed from y to Xj is computed as follows (see Fig. [sent-59, score-0.137]

24 Aj(Xj) = y Xl X; - l Xj+1 XJ (2) (3) i#j k Notice that Xj is not summed over and is excluded from the product of the messages on the right. [sent-70, score-0.283]

25 7r- If y is observed to have the value y*, the fused result at y and the outgoing 71" messages are modified as follows: F (y) {F(Y) f- 0 if Y = y* otherwise , 7I"k (y) f- if Y = y* { 7I"k(Y) otherwise 0 (4) The outgoing A messages are computed as follows: Aj(Xj) = L: . [sent-71, score-0.859]

26 X; - l X;+l XJ k (5) i#j If the graph is a tree, these formulas can be derived quite easily using the fact that summations distribute over products. [sent-81, score-0.1]

27 ' N Figure 2: The local complexity of a richly connected directed graphical model such as the one in (a) can be simplified by assuming that the effects of a child's parents are accumulated by a low-complexity Markov chain as shown in (b) . [sent-85, score-0.301]

28 2 Accumulator networks The complexity of the local computations at a variable generally scales exponentially with the number of parents of the variable. [sent-87, score-0.208]

29 However, for certain types of conditional probability function P(yIXI,' . [sent-89, score-0.114]

30 In an "accumulator network" , the probability of a child given its parents is computed by accumulating the inputs from the parents in a Markov chain or more generally a tree. [sent-112, score-0.511]

31 (For simplicity, we use Markov chains in this paper. [sent-113, score-0.072]

32 2a and b show how a layered Bayesian network can be redrawn as an accumulator network. [sent-115, score-0.683]

33 2c shows the general form of accumulator network considered in this paper, which corresponds to a fully connected Bayes net on variables Xl, . [sent-118, score-0.672]

34 ,X N and the accumulation variables for Xi are Si,l, . [sent-125, score-0.293]

35 ,Si,i-l' The effect of variable Xj on child Xi is accumulated by Si,j' The joint distribution over the variables X = {Xi: i = 1, . [sent-128, score-0.201]

36 ,N} and the accumulation variables S = {Si,j : i = 1, . [sent-131, score-0.293]

37 If Xj is not a parent of Xi in the original network, we set P(si,jlxj,Si,j-t) Si,j = Si,j-l and P(Si,j IXj, Si,j-l) = 0 if Si,j :I Si,j-l' (6) = 1 if A well-known example of an accumulator network is the noisy-OR network (Pearl 1988; Neal 1992). [sent-138, score-0.75]

38 In this case, all variables are binary and we set if Si,j-l = 1, if Xj = 1 and Si,j-l otherwise, where Pi ,j is the probability that Xj = 0, (7) = 1 turns on the OR-chain. [sent-139, score-0.127]

39 Using an accumulation chain whose state space size equals the number of configurations of the parent variables, we can produce an accumulator network that can model the same joint distributions on Xl, . [sent-140, score-0.98]

40 Inference in an accumulator network is performed by passing messages as described above, either in parallel, at random, or in a regular fashion, such as up the accumulation chains, left to the variables, right to the accumulation chains and down the accumulation chains, iteratively. [sent-144, score-1.662]

41 Later, we give results for an accumulator network that extracts images of translated, overlapping faces from an visual scene. [sent-145, score-0.704]

42 The accumulation variables represent intensities of light rays at different depths in a layered 3-D scene. [sent-146, score-0.469]

43 1 Learning accuIIlulator networks To learn the conditional probability functions in an accumulator network, we apply the sum-product algorithm for each training case to compute sufficient statistics. [sent-148, score-0.66]

44 Following Russell and Norvig (1995), the sufficient statistic needed to update the conditional probability function P(si,jlxj,Si,j-t) for Si,j in Fig. [sent-149, score-0.184]

45 (This approxi- mation is exact if the graph is a tree. [sent-153, score-0.114]

46 ) The sufficient statistics can be used for online learning or batch learning. [sent-154, score-0.07]

47 If batch learning is used, the sufficient statistics are averaged over the training set and then the conditional probability functions are modified. [sent-155, score-0.184]

48 In fact, the conditional probability function P(si,jlxj,Si,j-l) can be set equal to the normalized form of the average sufficient statistic, in which case learning performs approximate EM, where the E-step is approximated by the sum-product algorithm. [sent-156, score-0.19]

49 3a shows the network structure for the binary bars problem and Fig. [sent-158, score-0.183]

50 For an N x N binary image, the network has 3 layers of binary variables: 1 top-layer variable (meant to select orientation); 2N middlelayer variables (mean to select bars); and N 2 bottom-layer image variables. [sent-160, score-0.334]

51 For large N, performing exact inference is computationally intractable and hence the need for approximate inference. [sent-161, score-0.183]

52 Accumulator networks enable efficient inference using probability propagation since local computations are made feasible. [sent-162, score-0.372]

53 The topology of the accumulator network can be easily tailored to the bars problem, as described above. [sent-163, score-0.693]

54 Given an accumulator network with the proper conditional probability tables, inference computes the probability of each bar and the probability of vertical (a) (b) (c) " 0 _c. [sent-164, score-0.946]

55 ,, , # of Iterations Figure 3: (a) Bayesian network for bars problem. [sent-167, score-0.183]

56 (c) KL divergence between approximate inference and exact inference after each iteration versus horizontal orientation for an input image. [sent-169, score-0.416]

57 After each iteration of probability propagation, messages are fused to produce estimates of these probabilities. [sent-170, score-0.54]

58 3c shows the Kullback Leibler divergence between these approximate probabilities and the exact probabilities after each iteration, for 5 input images. [sent-172, score-0.125]

59 In most cases, we found that probability propagation correctly infers the presence of appropriate bars and the overall orientation of the bars. [sent-174, score-0.359]

60 In cases of multiple interpretations of the image (e. [sent-180, score-0.095]

61 3c, image 4), probability propagation tended to find appropriate interpre. [sent-183, score-0.311]

62 c tations, although the divergence between the g approximate and exact inferences is larger. [sent-184, score-0.156]

63 £-12 Starting with an accumulator network with random parameters, we trained the network as described above. [sent-193, score-0.712]

64 1 Accumulating light rays for layered vision We give results on an accumulator network that extracts image components from scenes constructed from different types of overlapping face at random positions. [sent-201, score-0.91]

65 Suppose we divide up a 3-D scene into L layers and assume that one of 0 objects can sit in each layer in one of P positions. [sent-202, score-0.372]

66 The total number of object-position combinations per layer is K = 0 x P. [sent-203, score-0.196]

67 For notational convenience, we assume that each object-position pair is a different object modeled by an opaqueness map (probability that each pixel is opaque) and an appearance map (intensity of each pixel). [sent-204, score-0.354]

68 We constrain the opaqueness and appearance maps of the same object in different positions to be the same, up to translation. [sent-205, score-0.341]

69 5a shows the appearance maps of 4 such objects (the first one is a wall). [sent-207, score-0.232]

70 In our model, Pkn is the probability that the nth pixel of object k is opaque and is the intensity of the nth pixel for object k. [sent-208, score-0.795]

71 The input images are modeled by randomly picking an object in each of L layers, choosing whether each pixel in each layer is transparent or opaque, and accumulating light intensity by imaging the pixels through the layers, and then adding Gaussian noise. [sent-209, score-0.668]

72 ,K} is the index (b) (a) Figure 5: (a) Learned appearance maps for a wall (all pixels dark and nearly opaque) and 3 faces. [sent-215, score-0.224]

73 (b) An image produced by combining the maps in (a) and adding noise. [sent-216, score-0.168]

74 The brightness of a pixel in the kth picture corresponds to the probability that the pixel is imaged by object k. [sent-218, score-0.47]

75 of the object in the lth layer, where layer 1 is adjacent to the camera and layer Lis farthest from the camera. [sent-219, score-0.526]

76 y~ is the accumulated discrete intensity of the light ray for pixel n at layer l. [sent-220, score-0.571]

77 y~ depends on the identity of the object in the current layer zl and the intensity of pixel n in the previous layer y~+1. [sent-221, score-0.835]

78 So, 1 1 Zl = 0, y~ = y~+1 zl > 0, y~ = W z l n = y~+l zl > 0' n = W z ln . [sent-222, score-0.26]

79 Each condition corresponds to a different imaging operation at layer l for the light ray corresponding to pixel n. [sent-231, score-0.498]

80 Xn is the discretized intensity of pixel n, obtained from the light ray arriving at the camera, y~. [sent-232, score-0.37]

81 After training the network on 200 labeled images, we applied iterative inference to identify and 10cate image components. [sent-234, score-0.254]

82 After each iteration, the message passed from y~ to zl is an estimate of the probability that the light ray for pixel n is imaged by object zl at layer l (i. [sent-235, score-1.097]

83 So, for each object at each layer, we have an n-pixel "probabilistic segmentation map". [sent-238, score-0.255]

84 5c we show the 4 maps in layer 1 corresponding to the objects shown in Fig. [sent-240, score-0.371]

85 zL ~'::::::::===:::::::::--- One such set of segmentation maps can be drawn for each layer. [sent-242, score-0.225]

86 For deeper layers, the maps hopefully segment the part of the scene that sits behind the objects in the shallower layers. [sent-243, score-0.24]

87 7a shows the sets of segmentation maps corresponding to different layers, after each iteration of probability propagation, for the input image shown on the far right. [sent-245, score-0.407]

88 After 1 iteration, the segmentation in the Figure 6: An accumulator netfirst layer is quite poor, causing uncertain segmenwork for layered vision. [sent-246, score-0.899]

89 tation in deeper layers (except for the wall, which is mostly segmented properly in layer 2). [sent-247, score-0.296]

90 As iterations increases, the algorithm converges to the correct segmentation, where object 2 is in front, followed by objects 3, 4 and 1 (the wall). [sent-248, score-0.242]

91 7a that another possible depth ordering is object 2 in front, followed by objects 4, 3 and 1 - i. [sent-250, score-0.233]

92 We added an extremely large amount of noise the the image used above, to see what the algorithm would do when the two depth orders really are equally likely. [sent-254, score-0.092]

93 7b shows the noisy image and the series of segmentation maps produced at each layer as the number of iterations increases. [sent-256, score-0.522]

94 The segmentation maps for layer 1 show that object 2 is correctly identified as being in the front. [sent-257, score-0.555]

95 Quite surprisingly, the segmentation maps in layer 2 oscillate between the two plausible interpretations of the scene - object 3 in front of object 4 and object 4 in front of object 3. [sent-258, score-1.137]

96 Proceedings of the 34th Allerton Conference on Communication, Control and Figure 7: (a) Probabilistic segmentation maps for each Computing 1996, University layer (column) after each iteration (row) of probability of Illinois at Urbana. [sent-332, score-0.539]

97 A revoIu t IOn: Belief propagation in graphs with cycles. [sent-344, score-0.213]

98 Turbodecoding as an instance of Pearl's belief propagation algorithm. [sent-369, score-0.243]

99 Loopy belief propagation for approximate inference: An empirical study. [sent-376, score-0.283]

100 Correctness of belief propagation in Gaussian graphical models of arbitrary topology. [sent-391, score-0.312]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('accumulator', 0.51), ('messages', 0.283), ('accumulation', 0.232), ('lill', 0.232), ('layer', 0.196), ('propagation', 0.181), ('xj', 0.163), ('lc', 0.135), ('object', 0.134), ('parents', 0.134), ('zl', 0.13), ('segmentation', 0.121), ('pixel', 0.117), ('frey', 0.11), ('maps', 0.104), ('network', 0.101), ('outgoing', 0.1), ('message', 0.095), ('fused', 0.093), ('obs', 0.093), ('opaque', 0.093), ('ray', 0.093), ('ixj', 0.09), ('inference', 0.089), ('bars', 0.082), ('chains', 0.072), ('layered', 0.072), ('il', 0.071), ('pearl', 0.071), ('objects', 0.071), ('layers', 0.07), ('xor', 0.07), ('graphical', 0.069), ('accumulating', 0.067), ('probability', 0.066), ('image', 0.064), ('wall', 0.063), ('intensity', 0.062), ('belief', 0.062), ('variables', 0.061), ('graph', 0.06), ('light', 0.058), ('appearance', 0.057), ('child', 0.057), ('front', 0.057), ('lxi', 0.054), ('exact', 0.054), ('chain', 0.053), ('iteration', 0.052), ('conditional', 0.048), ('fuses', 0.046), ('norvig', 0.046), ('opaqueness', 0.046), ('rays', 0.046), ('ynj', 0.046), ('produce', 0.046), ('xl', 0.045), ('yl', 0.045), ('accumulated', 0.045), ('cycles', 0.045), ('neal', 0.045), ('jordan', 0.043), ('ak', 0.042), ('passed', 0.042), ('weiss', 0.041), ('arriving', 0.04), ('summations', 0.04), ('approximate', 0.04), ('variable', 0.038), ('parent', 0.038), ('iterations', 0.037), ('sufficient', 0.036), ('computations', 0.036), ('imaged', 0.036), ('kschischang', 0.036), ('mceliece', 0.036), ('nth', 0.036), ('mackay', 0.035), ('scene', 0.035), ('aj', 0.034), ('imaging', 0.034), ('batch', 0.034), ('faces', 0.034), ('statistic', 0.034), ('zk', 0.034), ('graphs', 0.032), ('interpretations', 0.031), ('inferences', 0.031), ('horizontal', 0.031), ('divergence', 0.031), ('overlapping', 0.031), ('hinton', 0.03), ('orientation', 0.03), ('translated', 0.03), ('barber', 0.03), ('deeper', 0.03), ('exponential', 0.03), ('cut', 0.028), ('depth', 0.028), ('extracts', 0.028)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000008 15 nips-2000-Accumulator Networks: Suitors of Local Probability Propagation

Author: Brendan J. Frey, Anitha Kannan

Abstract: One way to approximate inference in richly-connected graphical models is to apply the sum-product algorithm (a.k.a. probability propagation algorithm), while ignoring the fact that the graph has cycles. The sum-product algorithm can be directly applied in Gaussian networks and in graphs for coding, but for many conditional probability functions - including the sigmoid function - direct application of the sum-product algorithm is not possible. We introduce

2 0.15800074 62 nips-2000-Generalized Belief Propagation

Author: Jonathan S. Yedidia, William T. Freeman, Yair Weiss

Abstract: Belief propagation (BP) was only supposed to work for tree-like networks but works surprisingly well in many applications involving networks with loops, including turbo codes. However, there has been little understanding of the algorithm or the nature of the solutions it finds for general graphs. We show that BP can only converge to a stationary point of an approximate free energy, known as the Bethe free energy in statistical physics. This result characterizes BP fixed-points and makes connections with variational approaches to approximate inference. More importantly, our analysis lets us build on the progress made in statistical physics since Bethe's approximation was introduced in 1935. Kikuchi and others have shown how to construct more accurate free energy approximations, of which Bethe's approximation is the simplest. Exploiting the insights from our analysis, we derive generalized belief propagation (GBP) versions ofthese Kikuchi approximations. These new message passing algorithms can be significantly more accurate than ordinary BP, at an adjustable increase in complexity. We illustrate such a new GBP algorithm on a grid Markov network and show that it gives much more accurate marginal probabilities than those found using ordinary BP. 1

3 0.1427207 115 nips-2000-Sequentially Fitting ``Inclusive'' Trees for Inference in Noisy-OR Networks

Author: Brendan J. Frey, Relu Patrascu, Tommi Jaakkola, Jodi Moran

Abstract: An important class of problems can be cast as inference in noisyOR Bayesian networks, where the binary state of each variable is a logical OR of noisy versions of the states of the variable's parents. For example, in medical diagnosis, the presence of a symptom can be expressed as a noisy-OR of the diseases that may cause the symptom - on some occasions, a disease may fail to activate the symptom. Inference in richly-connected noisy-OR networks is intractable, but approximate methods (e .g., variational techniques) are showing increasing promise as practical solutions. One problem with most approximations is that they tend to concentrate on a relatively small number of modes in the true posterior, ignoring other plausible configurations of the hidden variables. We introduce a new sequential variational method for bipartite noisyOR networks, that favors including all modes of the true posterior and models the posterior distribution as a tree. We compare this method with other approximations using an ensemble of networks with network statistics that are comparable to the QMR-DT medical diagnostic network. 1 Inclusive variational approximations Approximate algorithms for probabilistic inference are gaining in popularity and are now even being incorporated into VLSI hardware (T. Richardson, personal communication). Approximate methods include variational techniques (Ghahramani and Jordan 1997; Saul et al. 1996; Frey and Hinton 1999; Jordan et al. 1999), local probability propagation (Gallager 1963; Pearl 1988; Frey 1998; MacKay 1999a; Freeman and Weiss 2001) and Markov chain Monte Carlo (Neal 1993; MacKay 1999b). Many algorithms have been proposed in each of these classes. One problem that most of the above algorithms suffer from is a tendency to concentrate on a relatively small number of modes of the target distribution (the distribution being approximated). In the case of medical diagnosis, different modes correspond to different explanations of the symptoms. Markov chain Monte Carlo methods are usually guaranteed to eventually sample from all the modes, but this may take an extremely long time, even when tempered transitions (Neal 1996) are (a) ,,

4 0.12327941 106 nips-2000-Propagation Algorithms for Variational Bayesian Learning

Author: Zoubin Ghahramani, Matthew J. Beal

Abstract: Variational approximations are becoming a widespread tool for Bayesian learning of graphical models. We provide some theoretical results for the variational updates in a very general family of conjugate-exponential graphical models. We show how the belief propagation and the junction tree algorithms can be used in the inference step of variational Bayesian learning. Applying these results to the Bayesian analysis of linear-Gaussian state-space models we obtain a learning procedure that exploits the Kalman smoothing propagation, while integrating over all model parameters. We demonstrate how this can be used to infer the hidden state dimensionality of the state-space model in a variety of synthetic problems and one real high-dimensional data set. 1

5 0.11121 103 nips-2000-Probabilistic Semantic Video Indexing

Author: Milind R. Naphade, Igor Kozintsev, Thomas S. Huang

Abstract: We propose a novel probabilistic framework for semantic video indexing. We define probabilistic multimedia objects (multijects) to map low-level media features to high-level semantic labels. A graphical network of such multijects (multinet) captures scene context by discovering intra-frame as well as inter-frame dependency relations between the concepts. The main contribution is a novel application of a factor graph framework to model this network. We model relations between semantic concepts in terms of their co-occurrence as well as the temporal dependencies between these concepts within video shots. Using the sum-product algorithm [1] for approximate or exact inference in these factor graph multinets, we attempt to correct errors made during isolated concept detection by forcing high-level constraints. This results in a significant improvement in the overall detection performance. 1

6 0.10550999 41 nips-2000-Discovering Hidden Variables: A Structure-Based Approach

7 0.10420898 140 nips-2000-Tree-Based Modeling and Estimation of Gaussian Processes on Graphs with Cycles

8 0.091696553 114 nips-2000-Second Order Approximations for Probability Models

9 0.083226994 47 nips-2000-Error-correcting Codes on a Bethe-like Lattice

10 0.076337717 135 nips-2000-The Manhattan World Assumption: Regularities in Scene Statistics which Enable Bayesian Inference

11 0.07601691 107 nips-2000-Rate-coded Restricted Boltzmann Machines for Face Recognition

12 0.069698729 14 nips-2000-A Variational Mean-Field Theory for Sigmoidal Belief Networks

13 0.066711903 17 nips-2000-Active Learning for Parameter Estimation in Bayesian Networks

14 0.065547608 79 nips-2000-Learning Segmentation by Random Walks

15 0.064329207 118 nips-2000-Smart Vision Chip Fabricated Using Three Dimensional Integration Technology

16 0.063502371 104 nips-2000-Processing of Time Series by Neural Circuits with Biologically Realistic Synaptic Dynamics

17 0.063318767 13 nips-2000-A Tighter Bound for Graphical Models

18 0.06076825 10 nips-2000-A Productive, Systematic Framework for the Representation of Visual Structure

19 0.060459688 82 nips-2000-Learning and Tracking Cyclic Human Motion

20 0.059319593 2 nips-2000-A Comparison of Image Processing Techniques for Visual Speech Recognition Applications


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.203), (1, -0.069), (2, 0.148), (3, -0.012), (4, 0.162), (5, 0.0), (6, 0.156), (7, 0.002), (8, 0.018), (9, 0.194), (10, 0.073), (11, 0.063), (12, -0.119), (13, 0.012), (14, -0.019), (15, -0.005), (16, -0.083), (17, 0.225), (18, -0.0), (19, 0.063), (20, -0.061), (21, 0.127), (22, -0.093), (23, 0.081), (24, 0.053), (25, 0.045), (26, -0.01), (27, -0.085), (28, -0.083), (29, 0.07), (30, -0.062), (31, 0.016), (32, 0.104), (33, -0.192), (34, -0.118), (35, -0.08), (36, 0.046), (37, 0.019), (38, -0.009), (39, 0.057), (40, 0.161), (41, 0.047), (42, -0.024), (43, 0.008), (44, 0.051), (45, 0.007), (46, -0.055), (47, 0.075), (48, -0.088), (49, 0.026)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95456457 15 nips-2000-Accumulator Networks: Suitors of Local Probability Propagation

Author: Brendan J. Frey, Anitha Kannan

Abstract: One way to approximate inference in richly-connected graphical models is to apply the sum-product algorithm (a.k.a. probability propagation algorithm), while ignoring the fact that the graph has cycles. The sum-product algorithm can be directly applied in Gaussian networks and in graphs for coding, but for many conditional probability functions - including the sigmoid function - direct application of the sum-product algorithm is not possible. We introduce

2 0.6412189 115 nips-2000-Sequentially Fitting ``Inclusive'' Trees for Inference in Noisy-OR Networks

Author: Brendan J. Frey, Relu Patrascu, Tommi Jaakkola, Jodi Moran

Abstract: An important class of problems can be cast as inference in noisyOR Bayesian networks, where the binary state of each variable is a logical OR of noisy versions of the states of the variable's parents. For example, in medical diagnosis, the presence of a symptom can be expressed as a noisy-OR of the diseases that may cause the symptom - on some occasions, a disease may fail to activate the symptom. Inference in richly-connected noisy-OR networks is intractable, but approximate methods (e .g., variational techniques) are showing increasing promise as practical solutions. One problem with most approximations is that they tend to concentrate on a relatively small number of modes in the true posterior, ignoring other plausible configurations of the hidden variables. We introduce a new sequential variational method for bipartite noisyOR networks, that favors including all modes of the true posterior and models the posterior distribution as a tree. We compare this method with other approximations using an ensemble of networks with network statistics that are comparable to the QMR-DT medical diagnostic network. 1 Inclusive variational approximations Approximate algorithms for probabilistic inference are gaining in popularity and are now even being incorporated into VLSI hardware (T. Richardson, personal communication). Approximate methods include variational techniques (Ghahramani and Jordan 1997; Saul et al. 1996; Frey and Hinton 1999; Jordan et al. 1999), local probability propagation (Gallager 1963; Pearl 1988; Frey 1998; MacKay 1999a; Freeman and Weiss 2001) and Markov chain Monte Carlo (Neal 1993; MacKay 1999b). Many algorithms have been proposed in each of these classes. One problem that most of the above algorithms suffer from is a tendency to concentrate on a relatively small number of modes of the target distribution (the distribution being approximated). In the case of medical diagnosis, different modes correspond to different explanations of the symptoms. Markov chain Monte Carlo methods are usually guaranteed to eventually sample from all the modes, but this may take an extremely long time, even when tempered transitions (Neal 1996) are (a) ,,

3 0.62022477 62 nips-2000-Generalized Belief Propagation

Author: Jonathan S. Yedidia, William T. Freeman, Yair Weiss

Abstract: Belief propagation (BP) was only supposed to work for tree-like networks but works surprisingly well in many applications involving networks with loops, including turbo codes. However, there has been little understanding of the algorithm or the nature of the solutions it finds for general graphs. We show that BP can only converge to a stationary point of an approximate free energy, known as the Bethe free energy in statistical physics. This result characterizes BP fixed-points and makes connections with variational approaches to approximate inference. More importantly, our analysis lets us build on the progress made in statistical physics since Bethe's approximation was introduced in 1935. Kikuchi and others have shown how to construct more accurate free energy approximations, of which Bethe's approximation is the simplest. Exploiting the insights from our analysis, we derive generalized belief propagation (GBP) versions ofthese Kikuchi approximations. These new message passing algorithms can be significantly more accurate than ordinary BP, at an adjustable increase in complexity. We illustrate such a new GBP algorithm on a grid Markov network and show that it gives much more accurate marginal probabilities than those found using ordinary BP. 1

4 0.49052575 103 nips-2000-Probabilistic Semantic Video Indexing

Author: Milind R. Naphade, Igor Kozintsev, Thomas S. Huang

Abstract: We propose a novel probabilistic framework for semantic video indexing. We define probabilistic multimedia objects (multijects) to map low-level media features to high-level semantic labels. A graphical network of such multijects (multinet) captures scene context by discovering intra-frame as well as inter-frame dependency relations between the concepts. The main contribution is a novel application of a factor graph framework to model this network. We model relations between semantic concepts in terms of their co-occurrence as well as the temporal dependencies between these concepts within video shots. Using the sum-product algorithm [1] for approximate or exact inference in these factor graph multinets, we attempt to correct errors made during isolated concept detection by forcing high-level constraints. This results in a significant improvement in the overall detection performance. 1

5 0.47581765 140 nips-2000-Tree-Based Modeling and Estimation of Gaussian Processes on Graphs with Cycles

Author: Martin J. Wainwright, Erik B. Sudderth, Alan S. Willsky

Abstract: We present the embedded trees algorithm, an iterative technique for estimation of Gaussian processes defined on arbitrary graphs. By exactly solving a series of modified problems on embedded spanning trees, it computes the conditional means with an efficiency comparable to or better than other techniques. Unlike other methods, the embedded trees algorithm also computes exact error covariances. The error covariance computation is most efficient for graphs in which removing a small number of edges reveals an embedded tree. In this context, we demonstrate that sparse loopy graphs can provide a significant increase in modeling power relative to trees, with only a minor increase in estimation complexity. 1

6 0.45233533 135 nips-2000-The Manhattan World Assumption: Regularities in Scene Statistics which Enable Bayesian Inference

7 0.42175257 106 nips-2000-Propagation Algorithms for Variational Bayesian Learning

8 0.38256285 41 nips-2000-Discovering Hidden Variables: A Structure-Based Approach

9 0.35369873 118 nips-2000-Smart Vision Chip Fabricated Using Three Dimensional Integration Technology

10 0.34247077 79 nips-2000-Learning Segmentation by Random Walks

11 0.32344934 47 nips-2000-Error-correcting Codes on a Bethe-like Lattice

12 0.29149204 53 nips-2000-Feature Correspondence: A Markov Chain Monte Carlo Approach

13 0.26764938 114 nips-2000-Second Order Approximations for Probability Models

14 0.25652793 71 nips-2000-Interactive Parts Model: An Application to Recognition of On-line Cursive Script

15 0.24267572 104 nips-2000-Processing of Time Series by Neural Circuits with Biologically Realistic Synaptic Dynamics

16 0.24192154 1 nips-2000-APRICODD: Approximate Policy Construction Using Decision Diagrams

17 0.23728213 107 nips-2000-Rate-coded Restricted Boltzmann Machines for Face Recognition

18 0.23295893 99 nips-2000-Periodic Component Analysis: An Eigenvalue Method for Representing Periodic Structure in Speech

19 0.23219116 8 nips-2000-A New Model of Spatial Representation in Multimodal Brain Areas

20 0.22945072 30 nips-2000-Bayesian Video Shot Segmentation


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(10, 0.015), (17, 0.096), (32, 0.021), (33, 0.032), (48, 0.039), (54, 0.025), (55, 0.066), (62, 0.03), (65, 0.434), (67, 0.017), (75, 0.011), (76, 0.035), (79, 0.018), (81, 0.022), (90, 0.019), (91, 0.036)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.94976997 33 nips-2000-Combining ICA and Top-Down Attention for Robust Speech Recognition

Author: Un-Min Bae, Soo-Young Lee

Abstract: We present an algorithm which compensates for the mismatches between characteristics of real-world problems and assumptions of independent component analysis algorithm. To provide additional information to the ICA network, we incorporate top-down selective attention. An MLP classifier is added to the separated signal channel and the error of the classifier is backpropagated to the ICA network. This backpropagation process results in estimation of expected ICA output signal for the top-down attention. Then, the unmixing matrix is retrained according to a new cost function representing the backpropagated error as well as independence. It modifies the density of recovered signals to the density appropriate for classification. For noisy speech signal recorded in real environments, the algorithm improved the recognition performance and showed robustness against parametric changes. 1

same-paper 2 0.92263311 15 nips-2000-Accumulator Networks: Suitors of Local Probability Propagation

Author: Brendan J. Frey, Anitha Kannan

Abstract: One way to approximate inference in richly-connected graphical models is to apply the sum-product algorithm (a.k.a. probability propagation algorithm), while ignoring the fact that the graph has cycles. The sum-product algorithm can be directly applied in Gaussian networks and in graphs for coding, but for many conditional probability functions - including the sigmoid function - direct application of the sum-product algorithm is not possible. We introduce

3 0.80572945 68 nips-2000-Improved Output Coding for Classification Using Continuous Relaxation

Author: Koby Crammer, Yoram Singer

Abstract: Output coding is a general method for solving multiclass problems by reducing them to multiple binary classification problems. Previous research on output coding has employed, almost solely, predefined discrete codes. We describe an algorithm that improves the performance of output codes by relaxing them to continuous codes. The relaxation procedure is cast as an optimization problem and is reminiscent of the quadratic program for support vector machines. We describe experiments with the proposed algorithm, comparing it to standard discrete output codes. The experimental results indicate that continuous relaxations of output codes often improve the generalization performance, especially for short codes.

4 0.45059255 47 nips-2000-Error-correcting Codes on a Bethe-like Lattice

Author: Renato Vicente, David Saad, Yoshiyuki Kabashima

Abstract: We analyze Gallager codes by employing a simple mean-field approximation that distorts the model geometry and preserves important interactions between sites. The method naturally recovers the probability propagation decoding algorithm as an extremization of a proper free-energy. We find a thermodynamic phase transition that coincides with information theoretical upper-bounds and explain the practical code performance in terms of the free-energy landscape.

5 0.42953646 115 nips-2000-Sequentially Fitting ``Inclusive'' Trees for Inference in Noisy-OR Networks

Author: Brendan J. Frey, Relu Patrascu, Tommi Jaakkola, Jodi Moran

Abstract: An important class of problems can be cast as inference in noisyOR Bayesian networks, where the binary state of each variable is a logical OR of noisy versions of the states of the variable's parents. For example, in medical diagnosis, the presence of a symptom can be expressed as a noisy-OR of the diseases that may cause the symptom - on some occasions, a disease may fail to activate the symptom. Inference in richly-connected noisy-OR networks is intractable, but approximate methods (e .g., variational techniques) are showing increasing promise as practical solutions. One problem with most approximations is that they tend to concentrate on a relatively small number of modes in the true posterior, ignoring other plausible configurations of the hidden variables. We introduce a new sequential variational method for bipartite noisyOR networks, that favors including all modes of the true posterior and models the posterior distribution as a tree. We compare this method with other approximations using an ensemble of networks with network statistics that are comparable to the QMR-DT medical diagnostic network. 1 Inclusive variational approximations Approximate algorithms for probabilistic inference are gaining in popularity and are now even being incorporated into VLSI hardware (T. Richardson, personal communication). Approximate methods include variational techniques (Ghahramani and Jordan 1997; Saul et al. 1996; Frey and Hinton 1999; Jordan et al. 1999), local probability propagation (Gallager 1963; Pearl 1988; Frey 1998; MacKay 1999a; Freeman and Weiss 2001) and Markov chain Monte Carlo (Neal 1993; MacKay 1999b). Many algorithms have been proposed in each of these classes. One problem that most of the above algorithms suffer from is a tendency to concentrate on a relatively small number of modes of the target distribution (the distribution being approximated). In the case of medical diagnosis, different modes correspond to different explanations of the symptoms. Markov chain Monte Carlo methods are usually guaranteed to eventually sample from all the modes, but this may take an extremely long time, even when tempered transitions (Neal 1996) are (a) ,,

6 0.40123674 36 nips-2000-Constrained Independent Component Analysis

7 0.39969629 118 nips-2000-Smart Vision Chip Fabricated Using Three Dimensional Integration Technology

8 0.39758638 80 nips-2000-Learning Switching Linear Models of Human Motion

9 0.38752332 62 nips-2000-Generalized Belief Propagation

10 0.36846974 106 nips-2000-Propagation Algorithms for Variational Bayesian Learning

11 0.36823493 8 nips-2000-A New Model of Spatial Representation in Multimodal Brain Areas

12 0.36567509 45 nips-2000-Emergence of Movement Sensitive Neurons' Properties by Learning a Sparse Code for Natural Moving Images

13 0.36329904 127 nips-2000-Structure Learning in Human Causal Induction

14 0.35920376 10 nips-2000-A Productive, Systematic Framework for the Representation of Visual Structure

15 0.35724428 108 nips-2000-Recognizing Hand-written Digits Using Hierarchical Products of Experts

16 0.35280606 99 nips-2000-Periodic Component Analysis: An Eigenvalue Method for Representing Periodic Structure in Speech

17 0.35235259 14 nips-2000-A Variational Mean-Field Theory for Sigmoidal Belief Networks

18 0.34507906 103 nips-2000-Probabilistic Semantic Video Indexing

19 0.34468952 138 nips-2000-The Use of Classifiers in Sequential Inference

20 0.34310365 107 nips-2000-Rate-coded Restricted Boltzmann Machines for Face Recognition