nips nips2000 nips2000-140 knowledge-graph by maker-knowledge-mining

140 nips-2000-Tree-Based Modeling and Estimation of Gaussian Processes on Graphs with Cycles

Source: pdf

Author: Martin J. Wainwright, Erik B. Sudderth, Alan S. Willsky

Abstract: We present the embedded trees algorithm, an iterative technique for estimation of Gaussian processes defined on arbitrary graphs. By exactly solving a series of modified problems on embedded spanning trees, it computes the conditional means with an efficiency comparable to or better than other techniques. Unlike other methods, the embedded trees algorithm also computes exact error covariances. The error covariance computation is most efficient for graphs in which removing a small number of edges reveals an embedded tree. In this context, we demonstrate that sparse loopy graphs can provide a significant increase in modeling power relative to trees, with only a minor increase in estimation complexity. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract We present the embedded trees algorithm, an iterative technique for estimation of Gaussian processes defined on arbitrary graphs. [sent-5, score-0.893]

2 By exactly solving a series of modified problems on embedded spanning trees, it computes the conditional means with an efficiency comparable to or better than other techniques. [sent-6, score-0.739]

3 Unlike other methods, the embedded trees algorithm also computes exact error covariances. [sent-7, score-0.848]

4 The error covariance computation is most efficient for graphs in which removing a small number of edges reveals an embedded tree. [sent-8, score-1.026]

5 In this context, we demonstrate that sparse loopy graphs can provide a significant increase in modeling power relative to trees, with only a minor increase in estimation complexity. [sent-9, score-0.819]

6 In modeling stochastic processes with graphical models, two basic problems arise: (i) specifying a class of graphs with which to model or approximate the process; and (ii) determining efficient techniques for statistical inference. [sent-11, score-0.707]

7 At one extreme are tree-structured graphs: although they lead to highly efficient algorithms for estimation [1, 2], their modeling power is often limited. [sent-13, score-0.356]

8 The addition of edges to the graph tends to increase modeling power, but also introduces loops that necessitate the use of more sophisticated and costly techniques for estimation. [sent-14, score-0.49]

9 In this context, Gaussian processes on graphs are of great practical significance. [sent-17, score-0.371]

10 Moreover, the Gaussian case provides a valuable setting for developing an understanding of estimation algorithms [6, 7]. [sent-18, score-0.156]

11 The focus of this paper is the estimation and modeling of Gaussian processes defined on graphs with cycles. [sent-19, score-0.603]

12 We first develop an estimation algorithm that is based on exploiting trees embedded within the loopy graph. [sent-20, score-0.938]

13 Given a set of noisy measurements, this embedded trees (ET) algorithm computes the conditional means with an efficiency comparable to or better than other techniques. [sent-21, score-1.029]

14 Unlike other methods, the ET algorithm also computes exact error covariances at each node. [sent-22, score-0.401]

15 In many applications, these error statistics are as important as the conditional means. [sent-23, score-0.209]

16 We then demonstrate by example that relative to tree models, graphs with a small number of loops can lead to substantial improvements in modeling fidelity without a significant increase in estimation complexity. [sent-24, score-1.023]

17 1 Linear estimation fundamentals Problem formulation Consider a Gaussian stochastic process x N(O, P) that is Markov with respect to an undirected graph g. [sent-26, score-0.315]

18 If it is partitioned into blocks according to the state dimensions, the (i, j)fh block can be nonzero only if there is an edge between nodes i and j. [sent-30, score-0.318]

19 For estimation purposes, we are interested in p( Xi Iy), the marginal distribution of the state at each node conditioned on the noisy observations. [sent-33, score-0.238]

20 Standard formulas exist for the computation of p(xIY) N(x, P): <'oJ <'oJ x = peT R- 1 y P= [p- 1+ eTR- 1 -1 e] (1) The conditional error covariances Pi are the block diagonal elements of the full error covariance P, where the block sizes are equal to the state dimensions. [sent-34, score-0.745]

21 2 Exploiting graph structure When 9 is tree structured, both the conditional means and error covariances can be computed by a direct and very efficient O(cF N) algorithm [2]. [sent-36, score-0.985]

22 This algorithm is a generalization of classic Kalman smoothing algorithms for time series, and involves passing means and covariances to and from a node chosen as the root. [sent-38, score-0.425]

23 For graphs with cycles, calculating the full error covariance P by brute force matrix inversion would, in principle, provide the conditional means and error variances. [sent-39, score-0.874]

24 This motivates the development of iterative techniques for linear estimation on graphs with cycles. [sent-41, score-0.509]

25 Recently, two groups [6, 7] have analyzed Pearl's belief propagation [1] in application to Gaussian processes defined on loopy graphs. [sent-42, score-0.422]

26 For Gaussians on trees, belief propagation produces results equivalent to the Kalman smoother of Chou et al. [sent-43, score-0.271]

27 For graphs with cycles, these groups showed that when belief propagation converges, it computes the correct conditional means, but that error covariances are incorrect. [sent-45, score-0.893]

28 The complexity per iteration of belief propagation on loopy graphs is O(d 3 N), where one iteration corresponds to updating each message once. [sent-46, score-0.714]

29 Embedded trees produced by two different cutting matrices Ki for a nearest- neighbor grid (observation nodes not shown). [sent-48, score-0.725]

30 It is important to note that conditional means can be efficiently calculated using techniques from numerical linear algebra [9]. [sent-49, score-0.296]

31 In particular, it can be seen from equation (1) that computing the conditional mean iC is equivalent to computing the product of a matrix inverse and a vector. [sent-50, score-0.256]

32 However, like belief propagation, such techniques compute only the means and not the error covariances. [sent-52, score-0.329]

33 1 Embedded trees algorithm Calculation of means In this section, we present an iterative algorithm for computing both the conditional means and error covariances of a Gaussian process defined on any graph. [sent-54, score-1.11]

34 Central to the algorithm is the operation of cutting edges from a loopy graph to reveal an embedded tree. [sent-55, score-1.044]

35 Standard tree algorithms [2] can be used to exactly solve the modified problem, and the results are used in a subsequent iteration. [sent-56, score-0.341]

36 For a Gaussian process on a graph, the operation of removing edges corresponds to modifying the inverse covariance matrix. [sent-57, score-0.3]

37 Specifically, we apply a matrix splitting p-l + C T R-1C = p-l _ K + C T R-1C tree(t) t where K t is a symmetric cutting matrix chosen to ensure that Pt~;e(t) corresponds to a valid tree-structured inverse covariance matrix. [sent-58, score-0.56]

38 This matrix splitting allows us to consider defining a sequence of iterates {xn} by the recursion: + CTR-1C] ~n - K ~n-l + CTR- 1 [ p-l tree(t(n)) X - t(n)x Y Here t(n) indexes the embedded tree used in the nth iteration. [sent-59, score-0.834]

39 For example, Figure 1 shows two of the many spanning trees embedded in a nearest-neighbor grid. [sent-60, score-0.649]

40 By comparing equation (2) to equation (1), it can be seen that computing the nth iterate corresponds to a linear-Gaussian problem, which can be solved efficiently and directly with standard tree algorithms [2]. [sent-63, score-0.533]

41 2 Convergence of means Before stating some convergence results, recall that for any matrix A, the spectral radius is defined as p(A) ~ max>. [sent-65, score-0.245]

42 Let be the conditional mean ofthe original problem on the loopy graph, and consider the sequence of iterates {xn} generated by equation (2). [sent-69, score-0.303]

43 Then for any starting point, is the unique fixed point of the recursion, and the error en ::@, xn - x obeys the dynamics x en = [IT Mt~:e(t(j))Kt(j)l (3) eO J=1 In a typical implementation of the algorithm, one cycles through the embedded trees in some fixed order, say t = 1, . [sent-70, score-0.944]

44 In this case, the convergence of the algorithm can be analyzed in terms of the product matrix A ::@, Mt~:e(j) Kj. [sent-74, score-0.211]

45 Note that the cutting matrices K must be chosen in order to guarantee not only that ~-;;e is tree-structured but also that M ::@, (Pt-;;e + C T R- 1 C) is positive definite. [sent-78, score-0.316]

46 The following theorem, adapted from results in [10], gives conditions guaranteeing the validity and convergence of the ET algorithm when cutting to a single tree. [sent-79, score-0.435]

47 Suppose the cutting matrix K is symmetric and positive semidefinite. [sent-82, score-0.336]

48 In particular, we have the bounds: Am ax (K) :::; p(M-1 K) :::; Amax (K) Amax(K) + Amax(Q) Amax(K) + Amin(Q) (4) It should be noted that the conditions of this theorem are sufficient, but by no means necessary, to guarantee convergence of the ET algorithm. [sent-84, score-0.226]

49 In particular, we find that indefinite cutting matrices often lead to faster convergence. [sent-85, score-0.316]

50 Furthermore, Theorem 1 does not address the superior performance typically achieved by cycling through several embedded trees. [sent-86, score-0.342]

51 3 Calculation of error covariances Although there exist a variety of iterative algorithms for computing the conditional mean of a linear-Gaussian problem, none of these methods correctly compute error covariances at each node. [sent-89, score-0.775]

52 We show here that the ET algorithm can efficiently compute these covariances in an iterative fashion. [sent-90, score-0.321]

53 , oceanography [5]), these error statistics are as important as the estimates. [sent-93, score-0.157]

54 We assume for simplicity in notation that XO = 0 and then expand equation (2) to yield that for any iteration xn = [F n + Mt(~)]CT R- 1 y, where the matrix F n satisfies the recursion F n = M-1 K t(n) t(n) [F n- 1 + M- 1 ] t(n-1) (5) with the initial condition F1 = o. [sent-94, score-0.231]

55 It is straightforward to show that whenever the recursion for the conditional means in equation (2) converges, then the matrix sequence {Fn + Mt(~)} converges to the full error covariance P. [sent-95, score-0.589]

56 Moreover, the cutting matrices K are typically of low rank, say O(E) where E is the number of cut edges. [sent-96, score-0.35]

57 10 20 30 40 50 Iteration 60 70 80 Convergence of error variances 10' f;-~--~-r==:~ mb==;" ed T;=ree -+- Ec::'C edd;=~ =jl 10 (a) 20 30 40 50 Iteration 60 70 80 (b) Figure 2. [sent-100, score-0.159]

58 (a) Convergence rates for computing conditional means (normalized £2 error). [sent-101, score-0.251]

59 (b) Convergence rate of ET algorithm for computing error variances. [sent-102, score-0.21]

60 The diagonal blocks of the low- rank representation may be easily extracted and added to the diagonal blocks of Mi(~), which are computed by standard tree smoothers. [sent-106, score-0.512]

61 All together, we may obtain these error variances in O(d3 EN) operations per iteration. [sent-107, score-0.159]

62 Thus, the computation of error variances will be particularly efficient for graphs where the number of edges E that must be cut is small compared to the total number of nodes N. [sent-108, score-0.779]

63 4 Results We have applied the algorithm to a variety of graphs, ranging from graphs with single loops to densely connected MRFs on grids. [sent-110, score-0.442]

64 Figure 2(a) compares the rates of convergence for three algorithms: conjugate gradient (CG), embedded trees (ET), and belief propagation (BP) on a 20 x 20 nearest-neighbor grid. [sent-111, score-0.878]

65 The ET algorithm employed two embedded trees analogous to those shown in Figure 1. [sent-112, score-0.672]

66 Figure 2(b) shows that in contrast to CG and BP, the ET algorithm can also be used to compute the conditional error variances, where the convergence rate is again geometric. [sent-117, score-0.364]

67 1 Modeling using graphs with cycles Issues in Illodel design A variety of graphical structures may be used to approximate a given stochastic process. [sent-119, score-0.477]

68 Here, additional "coarse scale" nodes have been added to the graph which are not directly linked to any measurements. [sent-124, score-0.236]

69 These nodes are auxiliary variables created to explain the "fine scale" stochastic process of interest. [sent-125, score-0.275]

70 If properly designed, the resulting tree structure 0= auxiliary nodes o = fine scale nodes • = observations rTT1 (a) (c) (b) 0 . [sent-126, score-0.707]

71 (e) Error IP - ptr•• 1 between desired covariance and realized tree covariance. [sent-145, score-0.484]

72 (f) Error IP - Hoopl between desired covariance and covariance realized with loopy graph. [sent-146, score-0.422]

73 In previous work, our group has developed efficient algorithms for estimation and stochastic realization using such multiscale tree models [2, 4, 5, 12]. [sent-148, score-0.831]

74 The gains provided by multiscale models are especially impressive when quadtrees are used to approximate two-dimensional Markov random fields. [sent-149, score-0.252]

75 While statistical inference on MRFs is notoriously difficult, estimation on quadtrees remains extremely efficient. [sent-150, score-0.217]

76 The most significant weakness of tree models is boundary artifacts. [sent-151, score-0.34]

77 That is, leaf nodes that are adjacent in the original process may be widely separated in the tree structure (see Figure 3(b)). [sent-152, score-0.494]

78 Increasing the state dimension d of the hidden nodes will reduce blockiness, but will also reduce estimation efficiency, which is O(d3 N) in total. [sent-154, score-0.351]

79 One potential solution is to add edges between pairs of fine scale nodes where tree artifacts are likely to arise, as shown in Figure 3(c). [sent-155, score-0.652]

80 Such edges should be able to account for short-range dependency neglected by a tree model. [sent-156, score-0.428]

81 2 Application to Illultiscale Illodeling Consider a one-dimensional process of length 32 with exact covariance P shown in Figure 3(d). [sent-159, score-0.146]

82 We approximate this process using two different graphical models, a multiscale tree and a "near-tree" containing an additional edge between two finescale nodes across a tree boundary (see Figure 3(c)). [sent-160, score-1.132]

83 In both models, the state dimension at each node is constrained to be 2; therefore, the finest scale contains 16 nodes to model all 32 process points. [sent-161, score-0.378]

84 Figure 3( e) shows the absolute error IP - P tree I for the tree model, where realization was performed by the scale-recursive algorithm presented in [12]. [sent-162, score-0.843]

85 The tree model matches the desired process statistics relatively well except at the center, where the tree structure causes a boundary artifact. [sent-163, score-0.73]

86 Figure 3(f) shows the absolute error IP -l1oop l for a graph obtained by adding a single edge across the largest fine-scale tree boundary. [sent-164, score-0.541]

87 The addition reduces the peak error by 60%, a substantial gain in modeling fidelity. [sent-165, score-0.25]

88 If the ET algorithm is implemented by cutting to two different embedded trees, it converges extremely rapidly with rate 'Y = 0. [sent-166, score-0.772]

89 5 Discussion This paper makes contributions to both the estimation and modeling of Gaussian processes on graphs. [sent-168, score-0.333]

90 First, we developed the embedded trees algorithm for estimation of Gaussian processes on arbitrary graphs. [sent-169, score-0.895]

91 In contrast to other techniques, our algorithm computes both means and error covariances. [sent-170, score-0.336]

92 Even on densely connected graphs, our algorithm is comparable to or better than other techniques for computing means. [sent-171, score-0.25]

93 The error covariance computation is especially efficient for graphs in which cutting a small number of edges reveals an embedded tree. [sent-172, score-1.306]

94 In this context, we have shown that modeling with sparsely connected loopy graphs can lead to substantial gains in modeling fidelity, with a minor increase in estimation complexity. [sent-173, score-0.883]

95 From the results of this paper arise a number of fundamental questions about the trade-off between modeling fidelity and estimation complexity. [sent-174, score-0.342]

96 In order to address these questions, we are currently working to develop tighter bounds on the convergence rate of the algorithm, and also considering techniques for optimally selecting edges to be removed. [sent-175, score-0.302]

97 On the modeling side, we are expanding on previous work for trees [12] in order to develop a theory of stochastic realization for processes on graphs with cycles. [sent-176, score-0.867]

98 Correctness of belief propagation in Gaussian graphical models of arbitrary topology. [sent-230, score-0.24]

99 Embedded trees for modeling and estimation of Gaussian processes on graphs with cycles. [sent-258, score-0.87]

100 Computationally efficient stochastic realization for internal multiscale autoregressive models. [sent-264, score-0.368]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('embedded', 0.342), ('tree', 0.307), ('cutting', 0.28), ('graphs', 0.27), ('trees', 0.267), ('multiscale', 0.196), ('covariances', 0.162), ('loopy', 0.144), ('nodes', 0.142), ('estimation', 0.122), ('edges', 0.121), ('willsky', 0.112), ('modeling', 0.11), ('conditional', 0.108), ('propagation', 0.102), ('error', 0.101), ('covariance', 0.101), ('processes', 0.101), ('means', 0.097), ('amax', 0.097), ('graph', 0.094), ('et', 0.094), ('convergence', 0.092), ('mt', 0.09), ('cycles', 0.09), ('recursion', 0.078), ('computes', 0.075), ('belief', 0.075), ('node', 0.069), ('ip', 0.068), ('fidelity', 0.066), ('realization', 0.065), ('algorithm', 0.063), ('graphical', 0.063), ('iterative', 0.061), ('loops', 0.061), ('variances', 0.058), ('bp', 0.057), ('cg', 0.057), ('gaussian', 0.056), ('chou', 0.056), ('karl', 0.056), ('mrfs', 0.056), ('oceanography', 0.056), ('quadtrees', 0.056), ('sudderth', 0.056), ('techniques', 0.056), ('matrix', 0.056), ('stochastic', 0.054), ('proposition', 0.054), ('efficient', 0.053), ('xn', 0.052), ('kalman', 0.051), ('iterates', 0.051), ('densely', 0.048), ('converges', 0.048), ('increase', 0.048), ('fine', 0.047), ('state', 0.047), ('computing', 0.046), ('en', 0.046), ('rank', 0.045), ('block', 0.045), ('blocks', 0.045), ('process', 0.045), ('iteration', 0.045), ('pt', 0.044), ('arise', 0.044), ('nth', 0.044), ('wainwright', 0.044), ('spanning', 0.04), ('inversion', 0.04), ('minor', 0.04), ('dimension', 0.04), ('efficiency', 0.04), ('substantial', 0.039), ('edge', 0.039), ('extremely', 0.039), ('desired', 0.038), ('march', 0.038), ('realized', 0.038), ('reveals', 0.038), ('comparable', 0.037), ('theorem', 0.037), ('power', 0.037), ('matrices', 0.036), ('efficiently', 0.035), ('scale', 0.035), ('diagonal', 0.035), ('splitting', 0.034), ('iterate', 0.034), ('cut', 0.034), ('ic', 0.034), ('auxiliary', 0.034), ('siam', 0.034), ('markov', 0.034), ('algorithms', 0.034), ('corresponds', 0.033), ('bounds', 0.033), ('boundary', 0.033)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999917 140 nips-2000-Tree-Based Modeling and Estimation of Gaussian Processes on Graphs with Cycles

Author: Martin J. Wainwright, Erik B. Sudderth, Alan S. Willsky

2 0.13830864 115 nips-2000-Sequentially Fitting ``Inclusive'' Trees for Inference in Noisy-OR Networks

Author: Brendan J. Frey, Relu Patrascu, Tommi Jaakkola, Jodi Moran

Abstract: An important class of problems can be cast as inference in noisyOR Bayesian networks, where the binary state of each variable is a logical OR of noisy versions of the states of the variable's parents. For example, in medical diagnosis, the presence of a symptom can be expressed as a noisy-OR of the diseases that may cause the symptom - on some occasions, a disease may fail to activate the symptom. Inference in richly-connected noisy-OR networks is intractable, but approximate methods (e .g., variational techniques) are showing increasing promise as practical solutions. One problem with most approximations is that they tend to concentrate on a relatively small number of modes in the true posterior, ignoring other plausible configurations of the hidden variables. We introduce a new sequential variational method for bipartite noisyOR networks, that favors including all modes of the true posterior and models the posterior distribution as a tree. We compare this method with other approximations using an ensemble of networks with network statistics that are comparable to the QMR-DT medical diagnostic network. 1 Inclusive variational approximations Approximate algorithms for probabilistic inference are gaining in popularity and are now even being incorporated into VLSI hardware (T. Richardson, personal communication). Approximate methods include variational techniques (Ghahramani and Jordan 1997; Saul et al. 1996; Frey and Hinton 1999; Jordan et al. 1999), local probability propagation (Gallager 1963; Pearl 1988; Frey 1998; MacKay 1999a; Freeman and Weiss 2001) and Markov chain Monte Carlo (Neal 1993; MacKay 1999b). Many algorithms have been proposed in each of these classes. One problem that most of the above algorithms suffer from is a tendency to concentrate on a relatively small number of modes of the target distribution (the distribution being approximated). In the case of medical diagnosis, different modes correspond to different explanations of the symptoms. Markov chain Monte Carlo methods are usually guaranteed to eventually sample from all the modes, but this may take an extremely long time, even when tempered transitions (Neal 1996) are (a) ,,

3 0.12953398 106 nips-2000-Propagation Algorithms for Variational Bayesian Learning

Author: Zoubin Ghahramani, Matthew J. Beal

Abstract: Variational approximations are becoming a widespread tool for Bayesian learning of graphical models. We provide some theoretical results for the variational updates in a very general family of conjugate-exponential graphical models. We show how the belief propagation and the junction tree algorithms can be used in the inference step of variational Bayesian learning. Applying these results to the Bayesian analysis of linear-Gaussian state-space models we obtain a learning procedure that exploits the Kalman smoothing propagation, while integrating over all model parameters. We demonstrate how this can be used to infer the hidden state dimensionality of the state-space model in a variety of synthetic problems and one real high-dimensional data set. 1

4 0.11508384 114 nips-2000-Second Order Approximations for Probability Models

Author: Hilbert J. Kappen, Wim Wiegerinck

Abstract: In this paper, we derive a second order mean field theory for directed graphical probability models. By using an information theoretic argument it is shown how this can be done in the absense of a partition function. This method is a direct generalisation of the well-known TAP approximation for Boltzmann Machines. In a numerical example, it is shown that the method greatly improves the first order mean field approximation. For a restricted class of graphical models, so-called single overlap graphs, the second order method has comparable complexity to the first order method. For sigmoid belief networks, the method is shown to be particularly fast and effective.

5 0.11299686 41 nips-2000-Discovering Hidden Variables: A Structure-Based Approach

Author: Gal Elidan, Noam Lotner, Nir Friedman, Daphne Koller

Abstract: A serious problem in learning probabilistic models is the presence of hidden variables. These variables are not observed, yet interact with several of the observed variables. As such, they induce seemingly complex dependencies among the latter. In recent years, much attention has been devoted to the development of algorithms for learning parameters, and in some cases structure, in the presence of hidden variables. In this paper, we address the related problem of detecting hidden variables that interact with the observed variables. This problem is of interest both for improving our understanding of the domain and as a preliminary step that guides the learning procedure towards promising models. A very natural approach is to search for

6 0.10999725 62 nips-2000-Generalized Belief Propagation

7 0.10420898 15 nips-2000-Accumulator Networks: Suitors of Local Probability Propagation

8 0.097753696 47 nips-2000-Error-correcting Codes on a Bethe-like Lattice

9 0.097293399 120 nips-2000-Sparse Greedy Gaussian Process Regression

10 0.09480717 77 nips-2000-Learning Curves for Gaussian Processes Regression: A Framework for Good Approximations

11 0.09034767 103 nips-2000-Probabilistic Semantic Video Indexing

12 0.089327425 122 nips-2000-Sparse Representation for Gaussian Process Models

13 0.082573511 148 nips-2000-`N-Body' Problems in Statistical Learning

14 0.081934348 51 nips-2000-Factored Semi-Tied Covariance Matrices

15 0.075574771 1 nips-2000-APRICODD: Approximate Policy Construction Using Decision Diagrams

16 0.07557223 49 nips-2000-Explaining Away in Weight Space

17 0.071199745 145 nips-2000-Weak Learners and Improved Rates of Convergence in Boosting

18 0.070582472 17 nips-2000-Active Learning for Parameter Estimation in Bayesian Networks

19 0.06820076 65 nips-2000-Higher-Order Statistical Properties Arising from the Non-Stationarity of Natural Signals

20 0.067963071 26 nips-2000-Automated State Abstraction for Options using the U-Tree Algorithm

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.249), (1, -0.004), (2, 0.145), (3, -0.055), (4, 0.155), (5, 0.015), (6, -0.0), (7, 0.028), (8, -0.077), (9, 0.109), (10, -0.018), (11, 0.16), (12, -0.075), (13, 0.031), (14, -0.035), (15, 0.019), (16, -0.067), (17, 0.301), (18, -0.008), (19, 0.081), (20, 0.03), (21, 0.12), (22, 0.033), (23, 0.067), (24, 0.007), (25, -0.045), (26, 0.001), (27, 0.038), (28, -0.075), (29, -0.035), (30, -0.06), (31, 0.025), (32, -0.063), (33, -0.028), (34, 0.011), (35, -0.084), (36, -0.097), (37, 0.11), (38, -0.094), (39, -0.069), (40, -0.09), (41, -0.023), (42, -0.012), (43, 0.049), (44, 0.039), (45, 0.126), (46, 0.025), (47, -0.076), (48, 0.088), (49, 0.119)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96816164 140 nips-2000-Tree-Based Modeling and Estimation of Gaussian Processes on Graphs with Cycles

Author: Martin J. Wainwright, Erik B. Sudderth, Alan S. Willsky

2 0.5853489 115 nips-2000-Sequentially Fitting ``Inclusive'' Trees for Inference in Noisy-OR Networks

Author: Brendan J. Frey, Relu Patrascu, Tommi Jaakkola, Jodi Moran

3 0.54748976 15 nips-2000-Accumulator Networks: Suitors of Local Probability Propagation

Author: Brendan J. Frey, Anitha Kannan

Abstract: One way to approximate inference in richly-connected graphical models is to apply the sum-product algorithm (a.k.a. probability propagation algorithm), while ignoring the fact that the graph has cycles. The sum-product algorithm can be directly applied in Gaussian networks and in graphs for coding, but for many conditional probability functions - including the sigmoid function - direct application of the sum-product algorithm is not possible. We introduce

4 0.5466696 93 nips-2000-On Iterative Krylov-Dogleg Trust-Region Steps for Solving Neural Networks Nonlinear Least Squares Problems

Author: Eiji Mizutani, James Demmel

Abstract: This paper describes a method of dogleg trust-region steps, or restricted Levenberg-Marquardt steps, based on a projection process onto the Krylov subspaces for neural networks nonlinear least squares problems. In particular, the linear conjugate gradient (CG) method works as the inner iterative algorithm for solving the linearized Gauss-Newton normal equation, whereas the outer nonlinear algorithm repeatedly takes so-called

5 0.51973128 62 nips-2000-Generalized Belief Propagation

Author: Jonathan S. Yedidia, William T. Freeman, Yair Weiss

Abstract: Belief propagation (BP) was only supposed to work for tree-like networks but works surprisingly well in many applications involving networks with loops, including turbo codes. However, there has been little understanding of the algorithm or the nature of the solutions it finds for general graphs. We show that BP can only converge to a stationary point of an approximate free energy, known as the Bethe free energy in statistical physics. This result characterizes BP fixed-points and makes connections with variational approaches to approximate inference. More importantly, our analysis lets us build on the progress made in statistical physics since Bethe's approximation was introduced in 1935. Kikuchi and others have shown how to construct more accurate free energy approximations, of which Bethe's approximation is the simplest. Exploiting the insights from our analysis, we derive generalized belief propagation (GBP) versions ofthese Kikuchi approximations. These new message passing algorithms can be significantly more accurate than ordinary BP, at an adjustable increase in complexity. We illustrate such a new GBP algorithm on a grid Markov network and show that it gives much more accurate marginal probabilities than those found using ordinary BP. 1

6 0.47932774 120 nips-2000-Sparse Greedy Gaussian Process Regression

7 0.46009421 148 nips-2000-`N-Body' Problems in Statistical Learning

8 0.44950181 103 nips-2000-Probabilistic Semantic Video Indexing

9 0.41832647 1 nips-2000-APRICODD: Approximate Policy Construction Using Decision Diagrams

10 0.41219604 47 nips-2000-Error-correcting Codes on a Bethe-like Lattice

11 0.39872441 77 nips-2000-Learning Curves for Gaussian Processes Regression: A Framework for Good Approximations

12 0.39084831 106 nips-2000-Propagation Algorithms for Variational Bayesian Learning

13 0.3891601 41 nips-2000-Discovering Hidden Variables: A Structure-Based Approach

14 0.37581897 49 nips-2000-Explaining Away in Weight Space

15 0.36845672 85 nips-2000-Mixtures of Gaussian Processes

16 0.36376911 114 nips-2000-Second Order Approximations for Probability Models

17 0.35713527 51 nips-2000-Factored Semi-Tied Covariance Matrices

18 0.34299076 60 nips-2000-Gaussianization

19 0.32761738 122 nips-2000-Sparse Representation for Gaussian Process Models

20 0.31211343 79 nips-2000-Learning Segmentation by Random Walks

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(10, 0.021), (17, 0.112), (32, 0.022), (33, 0.053), (54, 0.342), (55, 0.02), (62, 0.047), (65, 0.033), (67, 0.064), (75, 0.01), (76, 0.06), (79, 0.017), (81, 0.039), (90, 0.041), (91, 0.025), (97, 0.02)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.90126735 140 nips-2000-Tree-Based Modeling and Estimation of Gaussian Processes on Graphs with Cycles

Author: Martin J. Wainwright, Erik B. Sudderth, Alan S. Willsky

2 0.8818804 35 nips-2000-Computing with Finite and Infinite Networks

Author: Ole Winther

Abstract: Using statistical mechanics results, I calculate learning curves (average generalization error) for Gaussian processes (GPs) and Bayesian neural networks (NNs) used for regression. Applying the results to learning a teacher defined by a two-layer network, I can directly compare GP and Bayesian NN learning. I find that a GP in general requires CJ (d S )-training examples to learn input features of order s (d is the input dimension), whereas a NN can learn the task with order the number of adjustable weights training examples. Since a GP can be considered as an infinite NN, the results show that even in the Bayesian approach, it is important to limit the complexity of the learning machine. The theoretical findings are confirmed in simulations with analytical GP learning and a NN mean field algorithm.

3 0.71302986 10 nips-2000-A Productive, Systematic Framework for the Representation of Visual Structure

Author: Shimon Edelman, Nathan Intrator

Abstract: We describe a unified framework for the understanding of structure representation in primate vision. A model derived from this framework is shown to be effectively systematic in that it has the ability to interpret and associate together objects that are related through a rearrangement of common

4 0.62922442 122 nips-2000-Sparse Representation for Gaussian Process Models

Author: Lehel Csatč´¸, Manfred Opper

Abstract: We develop an approach for a sparse representation for Gaussian Process (GP) models in order to overcome the limitations of GPs caused by large data sets. The method is based on a combination of a Bayesian online algorithm together with a sequential construction of a relevant subsample of the data which fully specifies the prediction of the model. Experimental results on toy examples and large real-world data sets indicate the efficiency of the approach.

5 0.50387657 60 nips-2000-Gaussianization

Author: Scott Saobing Chen, Ramesh A. Gopinath

Abstract: High dimensional data modeling is difficult mainly because the so-called

6 0.47376657 93 nips-2000-On Iterative Krylov-Dogleg Trust-Region Steps for Solving Neural Networks Nonlinear Least Squares Problems

7 0.47003552 106 nips-2000-Propagation Algorithms for Variational Bayesian Learning

8 0.46915632 7 nips-2000-A New Approximate Maximal Margin Classification Algorithm

9 0.46857616 104 nips-2000-Processing of Time Series by Neural Circuits with Biologically Realistic Synaptic Dynamics

10 0.45860657 85 nips-2000-Mixtures of Gaussian Processes

11 0.4539603 98 nips-2000-Partially Observable SDE Models for Image Sequence Recognition Tasks

12 0.45142174 92 nips-2000-Occam's Razor

13 0.44858927 46 nips-2000-Ensemble Learning and Linear Response Theory for ICA

14 0.44467533 23 nips-2000-An Adaptive Metric Machine for Pattern Classification

15 0.4435676 94 nips-2000-On Reversing Jensen's Inequality

16 0.44136256 36 nips-2000-Constrained Independent Component Analysis

17 0.44019499 71 nips-2000-Interactive Parts Model: An Application to Recognition of On-line Cursive Script

18 0.43995017 61 nips-2000-Generalizable Singular Value Decomposition for Ill-posed Datasets

19 0.43859977 127 nips-2000-Structure Learning in Human Causal Induction

20 0.43585566 21 nips-2000-Algorithmic Stability and Generalization Performance