nips nips2000 nips2000-115 knowledge-graph by maker-knowledge-mining

115 nips-2000-Sequentially Fitting ``Inclusive'' Trees for Inference in Noisy-OR Networks


Source: pdf

Author: Brendan J. Frey, Relu Patrascu, Tommi Jaakkola, Jodi Moran

Abstract: An important class of problems can be cast as inference in noisyOR Bayesian networks, where the binary state of each variable is a logical OR of noisy versions of the states of the variable's parents. For example, in medical diagnosis, the presence of a symptom can be expressed as a noisy-OR of the diseases that may cause the symptom - on some occasions, a disease may fail to activate the symptom. Inference in richly-connected noisy-OR networks is intractable, but approximate methods (e .g., variational techniques) are showing increasing promise as practical solutions. One problem with most approximations is that they tend to concentrate on a relatively small number of modes in the true posterior, ignoring other plausible configurations of the hidden variables. We introduce a new sequential variational method for bipartite noisyOR networks, that favors including all modes of the true posterior and models the posterior distribution as a tree. We compare this method with other approximations using an ensemble of networks with network statistics that are comparable to the QMR-DT medical diagnostic network. 1 Inclusive variational approximations Approximate algorithms for probabilistic inference are gaining in popularity and are now even being incorporated into VLSI hardware (T. Richardson, personal communication). Approximate methods include variational techniques (Ghahramani and Jordan 1997; Saul et al. 1996; Frey and Hinton 1999; Jordan et al. 1999), local probability propagation (Gallager 1963; Pearl 1988; Frey 1998; MacKay 1999a; Freeman and Weiss 2001) and Markov chain Monte Carlo (Neal 1993; MacKay 1999b). Many algorithms have been proposed in each of these classes. One problem that most of the above algorithms suffer from is a tendency to concentrate on a relatively small number of modes of the target distribution (the distribution being approximated). In the case of medical diagnosis, different modes correspond to different explanations of the symptoms. Markov chain Monte Carlo methods are usually guaranteed to eventually sample from all the modes, but this may take an extremely long time, even when tempered transitions (Neal 1996) are (a) ,,

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Sequentially fitting "inclusive" trees for inference in noisy-OR networks Brendan J. [sent-1, score-0.25]

2 For example, in medical diagnosis, the presence of a symptom can be expressed as a noisy-OR of the diseases that may cause the symptom - on some occasions, a disease may fail to activate the symptom. [sent-8, score-1.136]

3 Inference in richly-connected noisy-OR networks is intractable, but approximate methods (e . [sent-9, score-0.131]

4 , variational techniques) are showing increasing promise as practical solutions. [sent-11, score-0.199]

5 One problem with most approximations is that they tend to concentrate on a relatively small number of modes in the true posterior, ignoring other plausible configurations of the hidden variables. [sent-12, score-0.463]

6 We introduce a new sequential variational method for bipartite noisyOR networks, that favors including all modes of the true posterior and models the posterior distribution as a tree. [sent-13, score-0.832]

7 We compare this method with other approximations using an ensemble of networks with network statistics that are comparable to the QMR-DT medical diagnostic network. [sent-14, score-0.206]

8 1 Inclusive variational approximations Approximate algorithms for probabilistic inference are gaining in popularity and are now even being incorporated into VLSI hardware (T. [sent-15, score-0.402]

9 Approximate methods include variational techniques (Ghahramani and Jordan 1997; Saul et al. [sent-17, score-0.243]

10 1999), local probability propagation (Gallager 1963; Pearl 1988; Frey 1998; MacKay 1999a; Freeman and Weiss 2001) and Markov chain Monte Carlo (Neal 1993; MacKay 1999b). [sent-19, score-0.179]

11 One problem that most of the above algorithms suffer from is a tendency to concentrate on a relatively small number of modes of the target distribution (the distribution being approximated). [sent-21, score-0.35]

12 In the case of medical diagnosis, different modes correspond to different explanations of the symptoms. [sent-22, score-0.331]

13 Markov chain Monte Carlo methods are usually guaranteed to eventually sample from all the modes, but this may take an extremely long time, even when tempered transitions (Neal 1996) are (a) ,,"\ (b) Q(x) \ fiE. [sent-23, score-0.071]

14 x Figure 1: We approximate P(x) by adjusting the mean and variance of a Gaussian, Q(x}. [sent-28, score-0.075]

15 (a) The result of minimizing D(QIIP) = 2:" Q(x)log(Q(x)/ P(x禄, as is done for most variational methods. [sent-29, score-0.231]

16 (b) The result of minimizing D(PIIQ) = 2:" P(x)log(P(x)/Q(x禄. [sent-30, score-0.032]

17 Preliminary results on local probability propagation in richly connected networks show that it is sometimes able to oscillate between plausible modes (Murphy et al. [sent-32, score-0.556]

18 1999; Frey 2000), but other results also show that it sometimes diverges or oscillates between implausible configurations (McEliece et al. [sent-33, score-0.121]

19 Most variational techniques minimize a cost function that favors finding the single, most massive mode, excluding less probable modes of the target distribution (e. [sent-35, score-0.756]

20 More sophisticated variational techniques capture multiple modes using substructures (Saul and Jordan 1996) or by leaving part of the original network intact and approximating the remainder (Jaakkola and Jordan 1999). [sent-39, score-0.458]

21 However, although these methods increase the number of modes that are captured, they still exclude modes. [sent-40, score-0.349]

22 Variational techniques approximate a target distribution P(x) using a simpler, parameterized distribution Q(x) (or a parameterized bound). [sent-41, score-0.165]

23 A common approach to variational inference is to minimize a relative entropy, D(QIIP) = l: Q(x) log ~~:~. [sent-46, score-0.409]

24 Often D(QIIP) can be minimized with respect to the parameters of Q using iterative optimization or even exact optimization. [sent-49, score-0.061]

25 To see how minimizing D(QIIP) may exclude modes of the target distribution, suppose Q is a Gaussian and P is bimodal with a region of vanishing density between the two modes, as shown in Fig. [sent-50, score-0.449]

26 If we minimize D(QIIP) with respect to the mean and variance of Q, it will cover only one of the two modes, as illustrated in Fig. [sent-52, score-0.044]

27 ) This is because D(QIIP) will tend to infinity if Q is nonzero in the region where P has vanishing density. [sent-55, score-0.121]

28 In contrast, if we minimize D(PIIQ) = Ex P(x)log(P(x)/Q(x)) with respect to the mean and variance of Q, it will cover all modes, since D(PIIQ) will tend to infinity if Q vanishes in any region where P is nonzero. [sent-56, score-0.127]

29 For many problems, including medical diagnosis, it is easy to argue that it is more important that our approximation include all modes than exclude non plausible configurations at the cost of excluding other modes. [sent-59, score-0.66]

30 The former leads to a low number of false negatives, whereas the latter may lead to a large number of false negatives (concluding a disease is not present when it is) . [sent-60, score-0.273]

31 2 shows a bipartite noisy-OR Bayesian network with N binary hidden variables d = (d 1, . [sent-64, score-0.129]

32 , d N ) and K binary observed variables s = (Sl, . [sent-67, score-0.045]

33 Later, we present results on medical diagnosis, where dn = 1 indicates a disease is active, dn = 0 indicates a disease is inactive, Sk = 1 indicates a symptom is active and Sk = 0 indicates a symptom is inactive. [sent-71, score-1.582]

34 The joint distribution is K N k=l P(d, s) = n=l [II P(skl d )] [II P(dn )]. [sent-72, score-0.03]

35 (2) In the case of medical diagnosis, this form assumes the diseases are independent. [sent-73, score-0.407]

36 1 Although some diseases probably do depend on other diseases, this form is considered to be a worthwhile representation of the problem (Shwe et al. [sent-74, score-0.338]

37 The probability that symptom Sk fails to be activated (Sk = 0) is the product of the probabilities that each active disease fails to activate Sk: N P(Sk = Old) = PkO II p~~. [sent-77, score-0.716]

38 (3) n=l Pkn is the probability that an active dn fails to activate Sk. [sent-78, score-0.411]

39 1- PkO is the probability that symptom Sk is active when none of the diseases are active. [sent-80, score-0.687]

40 Exact inference computes the distribution over d given a subset of observed values in s. [sent-81, score-0.165]

41 However, if Sk is not observed, the corresponding likelihood (node plus edges) may be deleted to give a new network that describes the marginal distribution over d and the remaining variables in s. [sent-82, score-0.075]

42 So, we assume that we are considering a subnetwork where all the variables in s are observed. [sent-83, score-0.045]

43 We reorder the variables in s so that the first J variables are active (Sk = 1, 1 ~ k ~ J) and the remaining variables are inactive (Sk = 0, J + 1 ~ k ~ K). [sent-84, score-0.315]

44 The posterior distribution can then be written J N k=l P(dls) ocP(d,s) n=l = [II(1-PkoIIp~~)][ K II k=J+1 N N n=l n=l (pkoIIp~~)][IIP(dn)J. [sent-85, score-0.1]

45 (4) Taken together, the two terms in brackets on the right take a simple, product form over the variables in d. [sent-86, score-0.121]

46 So, the first step in inference is to "absorb" the inactive 1 However, the diseases are dependent given that some symptoms are present . [sent-87, score-0.676]

47 variables in s by modifying the priors P(dn ) as follows: pI (dn ) K II = anP(dn ) ( d Pkn) n, (5) k=J+l where an is a constant that normalizes P/(dn ). [sent-88, score-0.045]

48 Assuming the inactive symptoms have been absorbed, we have J N N k=l P(dls) ex n=l n=l [II (1 - PkO II p~~)] [II P/(dn)]. [sent-89, score-0.247]

49 (6) The term in brackets on the left does not have a product form. [sent-90, score-0.076]

50 The entire expression can be multiplied out to give a sum of 2J product forms, and exact "QuickS core" inference can be performed by combining the results of exact inference in each of the 2J product forms (Heckerman 1989). [sent-91, score-0.468]

51 3 Sequential inference using inclusive variational trees As described above, many variational methods minimize D(QIIP), and find approximations that exclude some modes of the posterior distribution. [sent-93, score-1.197]

52 We present a method that minimizes D(PIIQ) sequentially - by absorbing one observation at a time - so as to not exclude modes of the posterior. [sent-94, score-0.476]

53 Also, we approximate the posterior distribution with a tree. [sent-95, score-0.175]

54 (Directed and undirected trees are equivalent we use a directed representation, where each variable has at most one parent. [sent-96, score-0.113]

55 ) The algorithm absorbs one active symptom at a time, producing a new tree by searching for the tree that is closest - in the D(PIIQ) sense - to the product of the previous tree and the likelihood for the next symptom. [sent-97, score-0.945]

56 This search can be performed efficiently in O(N 2 ) time using probability propagation in two versions of the previous tree to compute weights for edges of a new tree, and then applying a minimum-weight spanning-tree algorithm. [sent-98, score-0.323]

57 Let Tk(d) be the tree approximation obtained after absorbing the kth symptom, Sk = 1. [sent-99, score-0.253]

58 Initially, we take To(d) to be a tree that decouples the variables and has marginals equal to the marginals obtained by absorbing the inactive symptoms, as described above. [sent-100, score-0.487]

59 Interpreting the tree Tk-l (d) from the previous step as the current "prior" over the diseases, we use the likelihood P(Sk = lid) for the next symptom to obtain a new estimate of the posterior: N A(dls 1 , . [sent-101, score-0.436]

60 Let the new tree be Tk(d) = TIn T k (d n ld1rk (n)), where 7rk (n) is the index of the parent of d n in the new tree. [sent-105, score-0.216]

61 The parent function 7rk (n) and the conditional probability tables of Tk (d) are found by minimizing (8) Ignoring constants, we have D(FkIITk) = - 2:Fk(dls1, . [sent-106, score-0.167]

62 ,Sk) log Tk (d) d = - 2: (Tk- 1 (d) - TLl(d)) log (II Tk(dnld1fk(n))) n d = - 2: (2:(Tk-l(d) n = - 2:(2: n - TLl(d)) 10gTk(dnld1fk(n))) d 2: (Tk-l(dn,d1fk(n)) - T~_l(dn,d1fk(n))) 10gTk(dnld1fk(n))). [sent-109, score-0.062]

63 dn d"k(n) For a given structure (parent function 7l"k(n)), the optimal conditional probability tables are (9) where f3n is a constant that ensures Ldn T k (dn ld1fk (n)) = 1. [sent-110, score-0.285]

64 This table is easily computed using probability propagation in the two trees to compute the two marginals needed in the difference. [sent-111, score-0.258]

65 The optimal conditional probability table for a variable is independent of the parentchild relationships in the remainder of the network. [sent-112, score-0.04]

66 So, for the current symptom, we compute the optimal conditional probability tables for all N(N - 1)/2 possible parent-child relationships in O(N2) time using probability propagation. [sent-113, score-0.133]

67 Then, we use a minimum-weight directed spanning tree algorithm (Bock 1971) to search for the best tree. [sent-114, score-0.266]

68 Once all of the symptoms have been absorbed, we use the final tree distribution, TJ(d) to make inferences about d given s. [sent-115, score-0.332]

69 The order in which the symptoms are absorbed will generally affect the quality of the resulting tree (Jaakkola and Jordan 1999), but we used a random ordering in the experiments reported below. [sent-116, score-0.411]

70 4 Results on QMR-DT type networks Using the structural and parameter statistics of the QMR-DT network given in Shwe et al. [sent-117, score-0.1]

71 (1991) we simulated 30 QMR-DT type networks with roughly 600 diseases each. [sent-118, score-0.35]

72 There were 10 networks in each of 3 groups with 5, 10 and 15 instantiated active symptoms. [sent-119, score-0.147]

73 We chose the number of active symptoms to be small enough that we can compare our approximate method with the exact QuickScore method (Heckerman 1989). [sent-120, score-0.385]

74 We also tried two other approximate inference methods: local probability propagation (Murphy et al. [sent-121, score-0.403]

75 1999) and a variational upper bound (Jaakkola and Jordan 1999). [sent-122, score-0.199]

76 For medical diagnosis, an important question is how many most probable diseases n' under the approximate posterior must be examined before the most probable n diseases under the exact posterior are found. [sent-123, score-1.249]

77 An exact inference algorithm will give n' = n, whereas an approximate algorithm that mistakenly ranks the most probable disease last will give n' = N. [sent-125, score-0.539]

78 For each group of networks and each inference method, we averaged the 10 values of n' for each value of n. [sent-126, score-0.191]

79 3 shows the average of n' versus n for 5, 10 and 15 active symptoms. [sent-128, score-0.091]

80 The sequential tree-fitting method is closest to optimal (n' = n) in all cases. [sent-129, score-0.082]

81 The right column of plots shows the "extra work" caused by the excess number of diseases n' - n that must be examined for the approximate methods. [sent-130, score-0.419]

82 5 positive findings 5 positive findings 300 ~ 100 , ' , ~ r , 250 200 c '-:. [sent-131, score-0.158]

83 ' 'c I 150 100 , 50 100 150 200 250 50 300 100 10 positive findings 150 200 250 10 positive findings 350 ,-----~~-~~~-~~-~~_____, : '~J 300 . [sent-132, score-0.158]

84 Approximate methods include the sequential tree-fitting method presented in this paper (tree), local probability propagation (pp) and a variational upper bound (ub). [sent-135, score-0.398]

85 5 Summary Noisy-OR networks can be used to model a variety of problems, including medical diagnosis. [sent-136, score-0.201]

86 Exact inference in large, richly connected noisy-OR networks is intractable, and most approximate inference algorithms tend to concentrate on a small number of most probable configurations of the hidden variables under the posterior. [sent-137, score-0.766]

87 We presented an "inclusive" variational method for bipartite noisy-OR networks that favors including all probable configurations, at the cost of including some improbable configurations. [sent-138, score-0.593]

88 The method fits a tree to the posterior distribution sequentially, i. [sent-139, score-0.274]

89 Results on an ensemble of QMR-DT type networks show that the method performs better than local probability propagation and a variational upper bound for ranking most probable diseases. [sent-142, score-0.515]

90 An algorithm to construct a minimum directed spanning tree in a directed network. [sent-151, score-0.32]

91 Loopy belief propagation for approximate inference: An empirical study. [sent-237, score-0.214]

92 Probabilistic diagnosis using a reformulation of the INTERNIST-1/QMR knowledge base I. [sent-276, score-0.133]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('sk', 0.307), ('diseases', 0.294), ('symptom', 0.262), ('modes', 0.218), ('variational', 0.199), ('dn', 0.192), ('qiip', 0.181), ('tree', 0.174), ('frey', 0.16), ('symptoms', 0.158), ('disease', 0.157), ('piiq', 0.157), ('pko', 0.157), ('jordan', 0.156), ('inference', 0.135), ('diagnosis', 0.133), ('exclude', 0.131), ('medical', 0.113), ('probable', 0.111), ('propagation', 0.109), ('inclusive', 0.105), ('jaakkola', 0.093), ('active', 0.091), ('inactive', 0.089), ('bipartite', 0.084), ('saul', 0.082), ('findings', 0.079), ('absorbed', 0.079), ('absorbing', 0.079), ('dls', 0.079), ('favors', 0.079), ('shwe', 0.079), ('tll', 0.079), ('configurations', 0.077), ('approximate', 0.075), ('tk', 0.073), ('posterior', 0.07), ('neal', 0.067), ('heckerman', 0.067), ('exact', 0.061), ('trees', 0.059), ('murphy', 0.057), ('ub', 0.057), ('networks', 0.056), ('ghahramani', 0.055), ('directed', 0.054), ('tables', 0.053), ('mackay', 0.053), ('bock', 0.052), ('negatives', 0.052), ('noisyor', 0.052), ('examined', 0.05), ('marginals', 0.05), ('sequential', 0.05), ('sequentially', 0.048), ('activate', 0.048), ('carlo', 0.047), ('monte', 0.047), ('weiss', 0.046), ('tend', 0.045), ('pkn', 0.045), ('richly', 0.045), ('excluding', 0.045), ('gallager', 0.045), ('variables', 0.045), ('plausible', 0.044), ('et', 0.044), ('minimize', 0.044), ('concentrate', 0.042), ('parent', 0.042), ('pp', 0.041), ('tempered', 0.041), ('mceliece', 0.041), ('substructures', 0.041), ('lid', 0.041), ('probability', 0.04), ('fails', 0.04), ('indicates', 0.039), ('graphical', 0.039), ('product', 0.038), ('vanishing', 0.038), ('spanning', 0.038), ('brackets', 0.038), ('infinity', 0.038), ('intractable', 0.037), ('approximations', 0.037), ('attias', 0.036), ('ii', 0.036), ('hinton', 0.034), ('minimizing', 0.032), ('false', 0.032), ('closest', 0.032), ('pearl', 0.032), ('including', 0.032), ('log', 0.031), ('probabilistic', 0.031), ('target', 0.03), ('distribution', 0.03), ('belief', 0.03), ('chain', 0.03)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0 115 nips-2000-Sequentially Fitting ``Inclusive'' Trees for Inference in Noisy-OR Networks

Author: Brendan J. Frey, Relu Patrascu, Tommi Jaakkola, Jodi Moran

Abstract: An important class of problems can be cast as inference in noisyOR Bayesian networks, where the binary state of each variable is a logical OR of noisy versions of the states of the variable's parents. For example, in medical diagnosis, the presence of a symptom can be expressed as a noisy-OR of the diseases that may cause the symptom - on some occasions, a disease may fail to activate the symptom. Inference in richly-connected noisy-OR networks is intractable, but approximate methods (e .g., variational techniques) are showing increasing promise as practical solutions. One problem with most approximations is that they tend to concentrate on a relatively small number of modes in the true posterior, ignoring other plausible configurations of the hidden variables. We introduce a new sequential variational method for bipartite noisyOR networks, that favors including all modes of the true posterior and models the posterior distribution as a tree. We compare this method with other approximations using an ensemble of networks with network statistics that are comparable to the QMR-DT medical diagnostic network. 1 Inclusive variational approximations Approximate algorithms for probabilistic inference are gaining in popularity and are now even being incorporated into VLSI hardware (T. Richardson, personal communication). Approximate methods include variational techniques (Ghahramani and Jordan 1997; Saul et al. 1996; Frey and Hinton 1999; Jordan et al. 1999), local probability propagation (Gallager 1963; Pearl 1988; Frey 1998; MacKay 1999a; Freeman and Weiss 2001) and Markov chain Monte Carlo (Neal 1993; MacKay 1999b). Many algorithms have been proposed in each of these classes. One problem that most of the above algorithms suffer from is a tendency to concentrate on a relatively small number of modes of the target distribution (the distribution being approximated). In the case of medical diagnosis, different modes correspond to different explanations of the symptoms. Markov chain Monte Carlo methods are usually guaranteed to eventually sample from all the modes, but this may take an extremely long time, even when tempered transitions (Neal 1996) are (a) ,,

2 0.19717777 106 nips-2000-Propagation Algorithms for Variational Bayesian Learning

Author: Zoubin Ghahramani, Matthew J. Beal

Abstract: Variational approximations are becoming a widespread tool for Bayesian learning of graphical models. We provide some theoretical results for the variational updates in a very general family of conjugate-exponential graphical models. We show how the belief propagation and the junction tree algorithms can be used in the inference step of variational Bayesian learning. Applying these results to the Bayesian analysis of linear-Gaussian state-space models we obtain a learning procedure that exploits the Kalman smoothing propagation, while integrating over all model parameters. We demonstrate how this can be used to infer the hidden state dimensionality of the state-space model in a variety of synthetic problems and one real high-dimensional data set. 1

3 0.1427207 15 nips-2000-Accumulator Networks: Suitors of Local Probability Propagation

Author: Brendan J. Frey, Anitha Kannan

Abstract: One way to approximate inference in richly-connected graphical models is to apply the sum-product algorithm (a.k.a. probability propagation algorithm), while ignoring the fact that the graph has cycles. The sum-product algorithm can be directly applied in Gaussian networks and in graphs for coding, but for many conditional probability functions - including the sigmoid function - direct application of the sum-product algorithm is not possible. We introduce

4 0.13830864 140 nips-2000-Tree-Based Modeling and Estimation of Gaussian Processes on Graphs with Cycles

Author: Martin J. Wainwright, Erik B. Sudderth, Alan S. Willsky

Abstract: We present the embedded trees algorithm, an iterative technique for estimation of Gaussian processes defined on arbitrary graphs. By exactly solving a series of modified problems on embedded spanning trees, it computes the conditional means with an efficiency comparable to or better than other techniques. Unlike other methods, the embedded trees algorithm also computes exact error covariances. The error covariance computation is most efficient for graphs in which removing a small number of edges reveals an embedded tree. In this context, we demonstrate that sparse loopy graphs can provide a significant increase in modeling power relative to trees, with only a minor increase in estimation complexity. 1

5 0.10505985 114 nips-2000-Second Order Approximations for Probability Models

Author: Hilbert J. Kappen, Wim Wiegerinck

Abstract: In this paper, we derive a second order mean field theory for directed graphical probability models. By using an information theoretic argument it is shown how this can be done in the absense of a partition function. This method is a direct generalisation of the well-known TAP approximation for Boltzmann Machines. In a numerical example, it is shown that the method greatly improves the first order mean field approximation. For a restricted class of graphical models, so-called single overlap graphs, the second order method has comparable complexity to the first order method. For sigmoid belief networks, the method is shown to be particularly fast and effective.

6 0.10115688 14 nips-2000-A Variational Mean-Field Theory for Sigmoidal Belief Networks

7 0.090141006 47 nips-2000-Error-correcting Codes on a Bethe-like Lattice

8 0.088115305 13 nips-2000-A Tighter Bound for Graphical Models

9 0.079071842 17 nips-2000-Active Learning for Parameter Estimation in Bayesian Networks

10 0.076246418 41 nips-2000-Discovering Hidden Variables: A Structure-Based Approach

11 0.065933987 80 nips-2000-Learning Switching Linear Models of Human Motion

12 0.056011084 122 nips-2000-Sparse Representation for Gaussian Process Models

13 0.054992255 142 nips-2000-Using Free Energies to Represent Q-values in a Multiagent Reinforcement Learning Task

14 0.054072075 62 nips-2000-Generalized Belief Propagation

15 0.053506255 146 nips-2000-What Can a Single Neuron Compute?

16 0.05282503 120 nips-2000-Sparse Greedy Gaussian Process Regression

17 0.052334413 64 nips-2000-High-temperature Expansions for Learning Models of Nonnegative Data

18 0.051635958 85 nips-2000-Mixtures of Gaussian Processes

19 0.051052671 94 nips-2000-On Reversing Jensen's Inequality

20 0.050109375 77 nips-2000-Learning Curves for Gaussian Processes Regression: A Framework for Good Approximations


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.185), (1, -0.018), (2, 0.138), (3, -0.084), (4, 0.234), (5, -0.017), (6, 0.039), (7, 0.048), (8, -0.026), (9, 0.1), (10, 0.109), (11, 0.06), (12, -0.142), (13, 0.037), (14, -0.074), (15, -0.085), (16, -0.058), (17, 0.181), (18, -0.063), (19, 0.037), (20, -0.064), (21, 0.085), (22, 0.069), (23, 0.016), (24, -0.008), (25, -0.101), (26, 0.007), (27, -0.048), (28, -0.036), (29, 0.054), (30, -0.066), (31, 0.089), (32, 0.034), (33, 0.043), (34, -0.072), (35, -0.175), (36, -0.01), (37, 0.065), (38, 0.008), (39, 0.046), (40, 0.077), (41, 0.121), (42, 0.125), (43, 0.051), (44, -0.005), (45, -0.073), (46, 0.013), (47, 0.022), (48, 0.018), (49, -0.136)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95799214 115 nips-2000-Sequentially Fitting ``Inclusive'' Trees for Inference in Noisy-OR Networks

Author: Brendan J. Frey, Relu Patrascu, Tommi Jaakkola, Jodi Moran

Abstract: An important class of problems can be cast as inference in noisyOR Bayesian networks, where the binary state of each variable is a logical OR of noisy versions of the states of the variable's parents. For example, in medical diagnosis, the presence of a symptom can be expressed as a noisy-OR of the diseases that may cause the symptom - on some occasions, a disease may fail to activate the symptom. Inference in richly-connected noisy-OR networks is intractable, but approximate methods (e .g., variational techniques) are showing increasing promise as practical solutions. One problem with most approximations is that they tend to concentrate on a relatively small number of modes in the true posterior, ignoring other plausible configurations of the hidden variables. We introduce a new sequential variational method for bipartite noisyOR networks, that favors including all modes of the true posterior and models the posterior distribution as a tree. We compare this method with other approximations using an ensemble of networks with network statistics that are comparable to the QMR-DT medical diagnostic network. 1 Inclusive variational approximations Approximate algorithms for probabilistic inference are gaining in popularity and are now even being incorporated into VLSI hardware (T. Richardson, personal communication). Approximate methods include variational techniques (Ghahramani and Jordan 1997; Saul et al. 1996; Frey and Hinton 1999; Jordan et al. 1999), local probability propagation (Gallager 1963; Pearl 1988; Frey 1998; MacKay 1999a; Freeman and Weiss 2001) and Markov chain Monte Carlo (Neal 1993; MacKay 1999b). Many algorithms have been proposed in each of these classes. One problem that most of the above algorithms suffer from is a tendency to concentrate on a relatively small number of modes of the target distribution (the distribution being approximated). In the case of medical diagnosis, different modes correspond to different explanations of the symptoms. Markov chain Monte Carlo methods are usually guaranteed to eventually sample from all the modes, but this may take an extremely long time, even when tempered transitions (Neal 1996) are (a) ,,

2 0.6796869 106 nips-2000-Propagation Algorithms for Variational Bayesian Learning

Author: Zoubin Ghahramani, Matthew J. Beal

Abstract: Variational approximations are becoming a widespread tool for Bayesian learning of graphical models. We provide some theoretical results for the variational updates in a very general family of conjugate-exponential graphical models. We show how the belief propagation and the junction tree algorithms can be used in the inference step of variational Bayesian learning. Applying these results to the Bayesian analysis of linear-Gaussian state-space models we obtain a learning procedure that exploits the Kalman smoothing propagation, while integrating over all model parameters. We demonstrate how this can be used to infer the hidden state dimensionality of the state-space model in a variety of synthetic problems and one real high-dimensional data set. 1

3 0.66904992 15 nips-2000-Accumulator Networks: Suitors of Local Probability Propagation

Author: Brendan J. Frey, Anitha Kannan

Abstract: One way to approximate inference in richly-connected graphical models is to apply the sum-product algorithm (a.k.a. probability propagation algorithm), while ignoring the fact that the graph has cycles. The sum-product algorithm can be directly applied in Gaussian networks and in graphs for coding, but for many conditional probability functions - including the sigmoid function - direct application of the sum-product algorithm is not possible. We introduce

4 0.52701199 140 nips-2000-Tree-Based Modeling and Estimation of Gaussian Processes on Graphs with Cycles

Author: Martin J. Wainwright, Erik B. Sudderth, Alan S. Willsky

Abstract: We present the embedded trees algorithm, an iterative technique for estimation of Gaussian processes defined on arbitrary graphs. By exactly solving a series of modified problems on embedded spanning trees, it computes the conditional means with an efficiency comparable to or better than other techniques. Unlike other methods, the embedded trees algorithm also computes exact error covariances. The error covariance computation is most efficient for graphs in which removing a small number of edges reveals an embedded tree. In this context, we demonstrate that sparse loopy graphs can provide a significant increase in modeling power relative to trees, with only a minor increase in estimation complexity. 1

5 0.42276028 41 nips-2000-Discovering Hidden Variables: A Structure-Based Approach

Author: Gal Elidan, Noam Lotner, Nir Friedman, Daphne Koller

Abstract: A serious problem in learning probabilistic models is the presence of hidden variables. These variables are not observed, yet interact with several of the observed variables. As such, they induce seemingly complex dependencies among the latter. In recent years, much attention has been devoted to the development of algorithms for learning parameters, and in some cases structure, in the presence of hidden variables. In this paper, we address the related problem of detecting hidden variables that interact with the observed variables. This problem is of interest both for improving our understanding of the domain and as a preliminary step that guides the learning procedure towards promising models. A very natural approach is to search for

6 0.41807565 14 nips-2000-A Variational Mean-Field Theory for Sigmoidal Belief Networks

7 0.41530153 80 nips-2000-Learning Switching Linear Models of Human Motion

8 0.38909891 114 nips-2000-Second Order Approximations for Probability Models

9 0.35784674 47 nips-2000-Error-correcting Codes on a Bethe-like Lattice

10 0.34295195 62 nips-2000-Generalized Belief Propagation

11 0.3327938 94 nips-2000-On Reversing Jensen's Inequality

12 0.31984416 53 nips-2000-Feature Correspondence: A Markov Chain Monte Carlo Approach

13 0.30468702 17 nips-2000-Active Learning for Parameter Estimation in Bayesian Networks

14 0.30250382 85 nips-2000-Mixtures of Gaussian Processes

15 0.29759648 1 nips-2000-APRICODD: Approximate Policy Construction Using Decision Diagrams

16 0.2729111 13 nips-2000-A Tighter Bound for Graphical Models

17 0.2691057 120 nips-2000-Sparse Greedy Gaussian Process Regression

18 0.25109398 132 nips-2000-The Interplay of Symbolic and Subsymbolic Processes in Anagram Problem Solving

19 0.24960561 103 nips-2000-Probabilistic Semantic Video Indexing

20 0.24729681 27 nips-2000-Automatic Choice of Dimensionality for PCA


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(10, 0.027), (17, 0.094), (32, 0.022), (33, 0.036), (48, 0.388), (54, 0.032), (55, 0.039), (62, 0.022), (65, 0.045), (67, 0.029), (76, 0.057), (79, 0.019), (81, 0.024), (90, 0.035), (91, 0.03), (97, 0.016)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.85955983 115 nips-2000-Sequentially Fitting ``Inclusive'' Trees for Inference in Noisy-OR Networks

Author: Brendan J. Frey, Relu Patrascu, Tommi Jaakkola, Jodi Moran

Abstract: An important class of problems can be cast as inference in noisyOR Bayesian networks, where the binary state of each variable is a logical OR of noisy versions of the states of the variable's parents. For example, in medical diagnosis, the presence of a symptom can be expressed as a noisy-OR of the diseases that may cause the symptom - on some occasions, a disease may fail to activate the symptom. Inference in richly-connected noisy-OR networks is intractable, but approximate methods (e .g., variational techniques) are showing increasing promise as practical solutions. One problem with most approximations is that they tend to concentrate on a relatively small number of modes in the true posterior, ignoring other plausible configurations of the hidden variables. We introduce a new sequential variational method for bipartite noisyOR networks, that favors including all modes of the true posterior and models the posterior distribution as a tree. We compare this method with other approximations using an ensemble of networks with network statistics that are comparable to the QMR-DT medical diagnostic network. 1 Inclusive variational approximations Approximate algorithms for probabilistic inference are gaining in popularity and are now even being incorporated into VLSI hardware (T. Richardson, personal communication). Approximate methods include variational techniques (Ghahramani and Jordan 1997; Saul et al. 1996; Frey and Hinton 1999; Jordan et al. 1999), local probability propagation (Gallager 1963; Pearl 1988; Frey 1998; MacKay 1999a; Freeman and Weiss 2001) and Markov chain Monte Carlo (Neal 1993; MacKay 1999b). Many algorithms have been proposed in each of these classes. One problem that most of the above algorithms suffer from is a tendency to concentrate on a relatively small number of modes of the target distribution (the distribution being approximated). In the case of medical diagnosis, different modes correspond to different explanations of the symptoms. Markov chain Monte Carlo methods are usually guaranteed to eventually sample from all the modes, but this may take an extremely long time, even when tempered transitions (Neal 1996) are (a) ,,

2 0.71276677 44 nips-2000-Efficient Learning of Linear Perceptrons

Author: Shai Ben-David, Hans-Ulrich Simon

Abstract: We consider the existence of efficient algorithms for learning the class of half-spaces in ~n in the agnostic learning model (Le., making no prior assumptions on the example-generating distribution). The resulting combinatorial problem - finding the best agreement half-space over an input sample - is NP hard to approximate to within some constant factor. We suggest a way to circumvent this theoretical bound by introducing a new measure of success for such algorithms. An algorithm is IL-margin successful if the agreement ratio of the half-space it outputs is as good as that of any half-space once training points that are inside the IL-margins of its separating hyper-plane are disregarded. We prove crisp computational complexity results with respect to this success measure: On one hand, for every positive IL, there exist efficient (poly-time) IL-margin successful learning algorithms. On the other hand, we prove that unless P=NP, there is no algorithm that runs in time polynomial in the sample size and in 1/ IL that is IL-margin successful for all IL> O. 1

3 0.35140204 15 nips-2000-Accumulator Networks: Suitors of Local Probability Propagation

Author: Brendan J. Frey, Anitha Kannan

Abstract: One way to approximate inference in richly-connected graphical models is to apply the sum-product algorithm (a.k.a. probability propagation algorithm), while ignoring the fact that the graph has cycles. The sum-product algorithm can be directly applied in Gaussian networks and in graphs for coding, but for many conditional probability functions - including the sigmoid function - direct application of the sum-product algorithm is not possible. We introduce

4 0.33690345 122 nips-2000-Sparse Representation for Gaussian Process Models

Author: Lehel Csatč´¸, Manfred Opper

Abstract: We develop an approach for a sparse representation for Gaussian Process (GP) models in order to overcome the limitations of GPs caused by large data sets. The method is based on a combination of a Bayesian online algorithm together with a sequential construction of a relevant subsample of the data which fully specifies the prediction of the model. Experimental results on toy examples and large real-world data sets indicate the efficiency of the approach.

5 0.32859552 106 nips-2000-Propagation Algorithms for Variational Bayesian Learning

Author: Zoubin Ghahramani, Matthew J. Beal

Abstract: Variational approximations are becoming a widespread tool for Bayesian learning of graphical models. We provide some theoretical results for the variational updates in a very general family of conjugate-exponential graphical models. We show how the belief propagation and the junction tree algorithms can be used in the inference step of variational Bayesian learning. Applying these results to the Bayesian analysis of linear-Gaussian state-space models we obtain a learning procedure that exploits the Kalman smoothing propagation, while integrating over all model parameters. We demonstrate how this can be used to infer the hidden state dimensionality of the state-space model in a variety of synthetic problems and one real high-dimensional data set. 1

6 0.32634154 10 nips-2000-A Productive, Systematic Framework for the Representation of Visual Structure

7 0.31625581 85 nips-2000-Mixtures of Gaussian Processes

8 0.31411597 74 nips-2000-Kernel Expansions with Unlabeled Examples

9 0.31330764 7 nips-2000-A New Approximate Maximal Margin Classification Algorithm

10 0.31316483 140 nips-2000-Tree-Based Modeling and Estimation of Gaussian Processes on Graphs with Cycles

11 0.31140697 68 nips-2000-Improved Output Coding for Classification Using Continuous Relaxation

12 0.31100377 95 nips-2000-On a Connection between Kernel PCA and Metric Multidimensional Scaling

13 0.31009573 37 nips-2000-Convergence of Large Margin Separable Linear Classification

14 0.31004903 4 nips-2000-A Linear Programming Approach to Novelty Detection

15 0.3093113 104 nips-2000-Processing of Time Series by Neural Circuits with Biologically Realistic Synaptic Dynamics

16 0.30835772 60 nips-2000-Gaussianization

17 0.30832651 107 nips-2000-Rate-coded Restricted Boltzmann Machines for Face Recognition

18 0.3068974 119 nips-2000-Some New Bounds on the Generalization Error of Combined Classifiers

19 0.30669439 36 nips-2000-Constrained Independent Component Analysis

20 0.30382058 98 nips-2000-Partially Observable SDE Models for Image Sequence Recognition Tasks