nips nips2005 nips2005-154 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: John D. Lafferty, Pradeep K. Ravikumar
Abstract: We present a family of approximation techniques for probabilistic graphical models, based on the use of graphical preconditioners developed in the scientific computing literature. Our framework yields rigorous upper and lower bounds on event probabilities and the log partition function of undirected graphical models, using non-iterative procedures that have low time complexity. As in mean field approaches, the approximations are built upon tractable subgraphs; however, we recast the problem of optimizing the tractable distribution parameters and approximate inference in terms of the well-studied linear systems problem of obtaining a good matrix preconditioner. Experiments are presented that compare the new approximation schemes to variational methods. 1
Reference: text
sentIndex sentText sentNum sentScore
1 Our framework yields rigorous upper and lower bounds on event probabilities and the log partition function of undirected graphical models, using non-iterative procedures that have low time complexity. [sent-2, score-0.945]
2 As in mean field approaches, the approximations are built upon tractable subgraphs; however, we recast the problem of optimizing the tractable distribution parameters and approximate inference in terms of the well-studied linear systems problem of obtaining a good matrix preconditioner. [sent-3, score-0.549]
3 Experiments are presented that compare the new approximation schemes to variational methods. [sent-4, score-0.112]
4 One of the primary uses of approximate inference is to estimate the partition function and event probabilities for undirected graphical models, which are natural tools in many domains, from image processing to social network modeling. [sent-6, score-0.527]
5 A central challenge is to improve the accuracy of existing approximation methods, and to derive rigorous rather than heuristic bounds on probabilities in such graphical models. [sent-7, score-0.52]
6 We follow the variational mean field intuition of focusing on tractable subgraphs, however we recast the problem of optimizing the tractable distribution parameters as a generalized linear system problem. [sent-9, score-0.508]
7 In this way, the task of deriving a tractable distribution conveniently reduces to the well-studied problem of obtaining a good preconditioner for a matrix (Boman and Hendrickson, 2003). [sent-10, score-0.828]
8 This framework has the added advantage that tighter bounds can be obtained by reducing the sparsity of the preconditioners, at the expense of increasing the time complexity for computing the approximation. [sent-11, score-0.266]
9 In Section 3, we outline the basic idea of our proposed framework, and explain how to use preconditioners for deriving tractable approximate distributions. [sent-13, score-0.548]
10 1 and 4, we then describe the underlying theory, which we call the generalized support theory for graphical models. [sent-15, score-0.413]
11 In Section 5 we present experiments that compare the new approximation schemes to some of the standard variational and optimization based methods. [sent-16, score-0.147]
12 2 Notation and Background Consider a graph G = (V, E), where V denotes the set of nodes and E denotes the set of edges. [sent-17, score-0.16]
13 (1) α For traditional reasons through connections with statistical physics, Z = exp Ψ(θ) is called the partition function. [sent-25, score-0.156]
14 , 2001), at the expense in increasing the state space one can assume without loss of generality that the graphical model is a pairwise Markov random field, i. [sent-27, score-0.238]
15 We shall assume a pairwise random field, and thus can express the potential function and parameter vectors in more compact form as matrices: θ11 . [sent-30, score-0.204]
16 φnn (xn , xn ) In the following we will denote the trace of the product of two matrices A and B by the inner product A, B . [sent-60, score-0.115]
17 Assuming that each Xi is finite-valued, the partition function Z(Θ) is then given by Z(Θ) = x∈χ exp Θ, Φ(x) . [sent-61, score-0.156]
18 Our goal is to obtain rigorous upper and lower bounds for this partition function, which can then be used to obtain rigorous upper and lower bounds for general event probabilities; this is discussed further in (Ravikumar and Lafferty, 2004). [sent-63, score-1.193]
19 1 Preconditioners in Linear Systems Consider a linear system, Ax = c, where the variable x is n dimensional, and A is an n × n matrix with m non-zero entries. [sent-65, score-0.154]
20 Multiplying both sides of the linear system by the inverse of an invertible matrix B, we get an equivalent “preconditioned” system, B −1 Ax = B −1 c. [sent-67, score-0.184]
21 Such an approximating matrix B is called a preconditioner. [sent-69, score-0.154]
22 Recent developments in the theory of preconditioners are in part based on support graph theory, where the linear system matrix is viewed as the Laplacian of a graph, and graphbased techniques can be used to obtain good approximations. [sent-71, score-0.767]
23 While these methods require diagonally dominant matrices (Aii ≥ j=i |Aij |), they yield “ultra-sparse” (tree plus a constant number of edges) preconditioners with a low condition number. [sent-72, score-0.632]
24 In our experiments, we use two elementary tree-based preconditioners in this family, Vaidya’s Spanning Tree preconditioner Vaidya (1990), and Gremban-Miller’s Support Tree preconditioner Gremban (1996). [sent-73, score-1.411]
25 3 Graphical Model Preconditioners Our proposed framework follows the generalized mean field intuition of looking at sparse graph approximations of the original graph, but solving a different optimization problem. [sent-74, score-0.485]
26 Consider the graphical model with graph G, potential-function matrix Φ(x), and parameter matrix Θ. [sent-76, score-0.638]
27 For purposes of intuition, think of the graphical model “energy” Θ, Φ(x) as the matrix norm x⊤ Θx. [sent-77, score-0.317]
28 If B approximates Θ well, then the condition number κ is small: κ(Θ, B) = max x x⊤ Θx x⊤ Bx min x x⊤ Θx = λmax (Θ, B) /λmin (Θ, B) x⊤ Bx (4) This suggests the following procedure for approximate inference. [sent-79, score-0.467]
29 First, choose a matrix B that minimizes the condition number with Θ (rather than KL divergence as in mean-field). [sent-80, score-0.241]
30 Finally, use the scaled matrix B as the parameter matrix for approximate inference. [sent-82, score-0.415]
31 1 Generalized Eigenvalue Bounds Given a graphical model with graph G, potential-function matrix Φ(x), and parameter matrix Θ, our goal is to obtain parameter matrices ΘU and ΘL , corresponding to sparse graph approximations of G, such that Z(ΘL ) ≤ Z(Θ) ≤ Z(ΘU ). [sent-85, score-0.996]
32 (5) That is, the partition functions of the sparse graph parameter matrices ΘU and ΘL are upper and lower bounds, respectively, of the partition function of the original graph. [sent-86, score-0.739]
33 By monotonicity of exp, this stronger condition implies condition (5) on the partition function, by summing over the values of X. [sent-88, score-0.31]
34 However, this stronger condition will give us greater flexibility, and rigorous bounds for general event probabilities since then exp ΘL , Φ(x) Z(ΘU ) ≤ p(x; Θ) ≤ exp ΘU , Φ(x) . [sent-89, score-0.562]
35 Z(ΘL ) (7) In contrast, while variational methods give bounds on the log partition function, the derived bounds on general event probabilities via the variational parameters are only heuristic. [sent-90, score-0.84]
36 Focusing on the upper bound, we for now would like to obtain a graph G′ ∈ S with parameter matrix B, which approximates G, and whose partition function upper bounds the partition function of the original graph. [sent-92, score-1.049]
37 Following (6), we require, Θ, Φ(x) ≤ B, Φ(x) , such that G(B) ∈ S (8) where G(B) denotes the graph corresponding to the parameter matrix B. [sent-93, score-0.321]
38 Now, we would like the distribution corresponding to B to be as close as possible to the distribution corresponding to Θ; that is, B, Φ(x) should not only upper bound Θ, Φ(x) but should be close to it. [sent-94, score-0.272]
39 In other words, while the upper bound requires that Θ, Φ(x) B, Φ(x) ≤ 1, (9) we would like Θ, Φ(x) (10) x B, Φ(x) to be as high as possible. [sent-96, score-0.272]
40 Expressing these desiderata in the form of an optimization problem, we have min Θ,Φ(x) B,Φ(x) B ⋆ = arg max min B: G(B)∈S x , such that Θ,Φ(x) B,Φ(x) ≤ 1. [sent-97, score-0.583]
41 Before solving this problem, we first make some definitions, which are generalized versions of standard concepts in linear systems theory. [sent-98, score-0.185]
42 For a pairwise Markov random field with potential function matrix Φ(x); the generalized eigenvalues of a pair of parameter matrices (A, B) are defined as Φ λmax (A, B) = λΦ (A, B) min = max x: B,Φ(x) =0 min x: B,Φ(x) =0 A, Φ(x) B, Φ(x) A, Φ(x) . [sent-101, score-1.07]
43 B, Φ(x) (11) (12) Note that λΦ (A, αB) max = = max x: αB,Φ(x) =0 1 α x: max B,Φ(x) =0 A, Φ(x) αB, Φ(x) A, Φ(x) B, Φ(x) (13) = α−1 λΦ (A, B). [sent-102, score-0.597]
44 max (14) We state the basic properties of the generalized eigenvalues in the following lemma. [sent-103, score-0.413]
45 The generalized eigenvalues satisfy λΦ (A, B) ≤ min A, Φ(x) B, Φ(x) ≤ λΦ (A, B) max Φ λΦ (A, αB) = α−1 λmax (A, B) max λΦ (A, αB) min = α−1 λΦ (A, B) min 1 λΦ (A, B) = Φ . [sent-106, score-1.053]
46 min λmax (B, A) (15) (16) (17) (18) In the following, we will use A to generically denote the parameter matrix Θ of the model. [sent-107, score-0.375]
47 We can now rewrite the optimization problem for the upper bound in equation (11) as (Problem Λ1 ) max B: G(B)∈S λΦ (A, B), such that λΦ (A, B) ≤ 1 min max (19) We shall express the optimal solution of Problem Λ1 in terms of the optimal solution of a companion problem. [sent-108, score-0.999]
48 Towards that end, consider the optimization problem (Problem Λ2 ) min C: G(C)∈S λΦ (A, C) max . [sent-109, score-0.422]
49 λΦ (A, C) min The following proposition shows the sense in which these problems are equivalent. [sent-110, score-0.243]
50 If C attains the optimum in Problem Λ2 , then C = λΦ (A, C) C attains max the optimum of Problem Λ1 . [sent-113, score-0.475]
51 For any feasible solution B of Problem Λ1 , we have λΦ (A, B) min λΦ (A, B) ≤ (since λΦ (A, B) ≤ 1) min max λΦ (A, B) max Proof. [sent-114, score-0.783]
52 ≤ λΦ (A, C) min λΦ (A, C) max (since C is the optimum of Problem Λ2 ) = λΦ A, λΦ (A, C)C min max (from Lemma 3. [sent-115, score-0.78]
53 min (21) (22) (23) (24) Thus, C upper bounds all feasible solutions in Problem Λ1 . [sent-117, score-0.524]
54 However, it itself is a feasible solution, since 1 λΦ (A, C) = 1 λΦ (A, C) = λΦ A, λΦ (A, C)C = (25) max max max Φ (A, C) max λmax from Lemma 3. [sent-118, score-0.833]
55 Thus, C attains the maximum in the upper bound Problem Λ1 . [sent-120, score-0.35]
56 The analysis for obtaining an upper bound parameter matrix B for a given parameter matrix A carries over for the lower bound; we need to replace a maximin problem with a minimax problem. [sent-121, score-0.866]
57 For the lower bound, we want a matrix B such that A, Φ(x) A, Φ(x) B⋆ = min max , such that ≥ 1 (26) B, Φ(x) B: G(B)∈S {x: B,Φ(x) =0} B, Φ(x) This leads to the following lower bound optimization problem. [sent-122, score-0.836]
58 (Problem Λ3 ) min B: G(B)∈S λΦ (A, B), such that λΦ (A, B) ≥ 1. [sent-123, score-0.161]
59 max min (27) The proof of the following statement closely parallels the proof of Proposition 3. [sent-124, score-0.391]
60 If C attains the optimum in Problem Λ2 , then C = λmin (A, C)C attains the optimum of the lower bound Problem Λ3 . [sent-128, score-0.495]
61 For any pair of parameter-matrices (A, B), we have λΦ (A, B)B, Φ(x) min 3. [sent-132, score-0.161]
62 (28) Main Procedure We now have in place the machinery necessary to describe the procedure for solving the main problem in equation (6), to obtain upper and lower bound matrices for a graphical model. [sent-134, score-0.706]
63 5 shows how to obtain upper and lower bound parameter matrices with respect to any matrix B, given a parameter matrix A, by solving a generalized eigenvalue problem. [sent-136, score-1.085]
64 4 tell us, in principle, how to obtain the optimal such upper and lower bound matrices. [sent-139, score-0.367]
65 First, obtain a parameter matrix C such that G(C) ∈ S, which minimizes λΦ (Θ, C)/λΦ (Θ, C). [sent-141, score-0.268]
66 Then max min λΦ (Θ, C) C gives the optimal upper bound parameter matrix and λΦ (Θ, C) C gives the max min optimal lower bound parameter matrix. [sent-142, score-1.485]
67 However, as things stand, this recipe appears to be even more challenging to work with than the generalized mean field procedures. [sent-143, score-0.175]
68 4 Generalized Support Theory for Graphical Models In what follows, we begin by assuming that the potential function matrix is positive semidefinite, Φ(x) 0, and later extend our results to general Φ. [sent-146, score-0.201]
69 From the definition of the generalized support number for a graphical model, we have that σ Φ (A, B)B − A, Φ(x) ≥ 0. [sent-155, score-0.413]
70 Therefore, it follows that B,Φ(x) ≤ σ (A, B), and thus A, Φ(x) ≤ σ Φ (A, B) (30) λΦ (A, B) = max max x B, Φ(x) giving the statement of the proposition. [sent-157, score-0.429]
71 This leads to our first relaxation of the generalized eigenvalue bound for a model. [sent-158, score-0.368]
72 For a potential function matrix Φ(x) σ(A, B) = min{τ | (τ B − A) 0}. [sent-164, score-0.201]
73 Therefore, σ Φ (A, B) ≤ σ(A, B) from the definition of generalized support number. [sent-167, score-0.25]
74 The above result reduces the problem of approximating a graphical model to the problem of minimizing classical support numbers, the latter problem being well-studied in the scientific computing literature (Boman and Hendrickson, 2003; Bern et al. [sent-168, score-0.352]
75 , 2001), where the expression σ(A, C)σ(C, A) is called the condition number, and a matrix that minimizes it within a simple family of graphs is called a preconditioner. [sent-169, score-0.363]
76 We can thus plug in any algorithm for finding a sparse preconditioner for Θ, carrying out the optimization B ⋆ = arg min σ(Θ, B) σ(B, Θ) B (33) and then use that matrix B ⋆ in our basic procedure. [sent-170, score-0.995]
77 One example is Vaidya’s preconditioner Vaidya (1990), which is essentially the maximum spanning tree of the graph. [sent-171, score-0.793]
78 Another is the support tree of Gremban (1996), which introduces Steiner nodes, in this case auxiliary nodes introduced via a recursive partitioning of the graph. [sent-172, score-0.366]
79 We present experiments with these basic preconditioners in the following section. [sent-173, score-0.413]
80 Before turning to the experiments, we comment that our generalized support number analysis assumed that the potential function matrix Φ(x) was positive semi-definite. [sent-174, score-0.451]
81 We first add a large positive diagonal matrix D so that Φ′ (x) = Φ(x) + D 0. [sent-176, score-0.154]
82 Then, for a given parameter matrix Θ, we use the above machinery to get an upper bound parameter matrix B such that A, Φ(x) + D ≤ B, Φ(x) + D A, Φ(x) ≤ B, Φ(x) + B − A, D . [sent-177, score-0.733]
83 (34) Exponentiating and summing both sides over x, we then get the required upper bound for the parameter matrix A; the same can be done for the lower bound. [sent-178, score-0.613]
84 In this section we evaluate these bounds on small graphical models for which exact answers can be readily computed, and compare the bounds to variational approximations. [sent-181, score-0.658]
85 We show simulation results averaged over a randomly generated set of graphical models. [sent-182, score-0.163]
86 The graphs used were 2D grid graphs, and the edge potentials were selected according to a uniform distribution Uniform(−2dcoup , 0) for various coupling strengths dcoup . [sent-183, score-0.161]
87 As a baseline, we use the mean field and structured mean field methods for the lower bound, and the Wainwright et al. [sent-185, score-0.174]
88 (2003) tree-reweighted belief propagation approximation for the upper bound. [sent-186, score-0.244]
89 To compute bounds over these larger graphs with Steiner nodes we average an internal node over its children; this is the technique used with such preconditioners for solving linear systems. [sent-188, score-0.757]
90 We note that these preconditioners are quite basic, and the use of better preconditioners (yielding a better condition number) has the potential to achieve much better bounds, as shown in Propositions 3. [sent-189, score-0.849]
91 We also reiterate that while our approach can be used to derive bounds on event probabilities, the variational methods yield bounds only for the partition function, and only apply heuristically to estimating simple event probabilities such as marginals. [sent-192, score-0.835]
92 As the plots in Figure 1 show, even for the simple preconditioners used, the new bounds are quite close to the actual values, outperforming the mean field method and giving comparable results to the tree-reweighted belief propagation method. [sent-193, score-0.705]
93 The spanning tree preconditioner provides a good lower bound, while the support tree preconditioner provides a good upper bound, however not as tight as the bound obtained using tree-reweighted belief propagation. [sent-194, score-1.988]
94 0 Figure 1: Comparison of lower bounds (top left), and upper bounds (top right) for small grid graphs, and lower bounds for grid graphs of increasing size (left). [sent-216, score-1.029]
95 Spanning tree Preconditioner Structured Mean Field Support Tree Preconditioner Mean Field 500 1000 0. [sent-217, score-0.173]
96 25 200 400 600 800 Number of nodes in graph compare bounds. [sent-219, score-0.16]
97 The bottom plot of Figure 1 compares lower bounds for graphs with up to 900 nodes; a larger bound is necessarily tighter, and the preconditioner bounds are seen to outperform mean field. [sent-220, score-1.267]
98 Combinatorial preconditioners for sparse, symmetric, diagonally dominant linear systems. [sent-243, score-0.499]
99 Solving linear equations with symmetric diagonally dominant matrices by constructing good preconditioners. [sent-255, score-0.201]
100 Tree-reweighted belief propagation and approximate ML estimation by pseudo-moment matching. [sent-264, score-0.143]
wordName wordTfidf (topN-words)
[('preconditioner', 0.52), ('preconditioners', 0.371), ('bounds', 0.205), ('max', 0.199), ('tree', 0.173), ('vaidya', 0.171), ('graphical', 0.163), ('min', 0.161), ('matrix', 0.154), ('bound', 0.151), ('generalized', 0.142), ('partition', 0.127), ('upper', 0.121), ('support', 0.108), ('graph', 0.107), ('spanning', 0.1), ('boman', 0.085), ('gremban', 0.085), ('ravikumar', 0.085), ('variational', 0.085), ('graphs', 0.085), ('proposition', 0.082), ('event', 0.08), ('attains', 0.078), ('preconditioned', 0.074), ('hendrickson', 0.074), ('diagonally', 0.074), ('steiner', 0.074), ('matrices', 0.073), ('rigorous', 0.072), ('eld', 0.07), ('lemma', 0.07), ('lower', 0.068), ('tractable', 0.061), ('parameter', 0.06), ('condition', 0.06), ('optimum', 0.06), ('subgraphs', 0.057), ('sparse', 0.056), ('belief', 0.054), ('dominant', 0.054), ('nodes', 0.053), ('probabilities', 0.053), ('bx', 0.05), ('bern', 0.05), ('field', 0.049), ('approximate', 0.047), ('potential', 0.047), ('propositions', 0.045), ('relaxation', 0.043), ('solving', 0.043), ('pairwise', 0.043), ('basic', 0.042), ('yedidia', 0.042), ('propagation', 0.042), ('xn', 0.042), ('coupling', 0.04), ('structured', 0.04), ('cliques', 0.04), ('obtaining', 0.039), ('family', 0.037), ('feasible', 0.037), ('wainwright', 0.036), ('grid', 0.036), ('approximations', 0.035), ('optimization', 0.035), ('nn', 0.035), ('ax', 0.035), ('recast', 0.035), ('intuition', 0.034), ('stronger', 0.034), ('machinery', 0.033), ('mean', 0.033), ('auxiliary', 0.032), ('minimax', 0.032), ('expense', 0.032), ('eigenvalue', 0.032), ('nition', 0.032), ('statement', 0.031), ('sides', 0.03), ('focusing', 0.03), ('eigenvalues', 0.03), ('inference', 0.03), ('summing', 0.029), ('exp', 0.029), ('tighter', 0.029), ('procedures', 0.029), ('obtain', 0.027), ('deriving', 0.027), ('express', 0.027), ('lafferty', 0.027), ('approximation', 0.027), ('problem', 0.027), ('shall', 0.027), ('yielding', 0.027), ('undirected', 0.027), ('arg', 0.027), ('minimizes', 0.027), ('solution', 0.026), ('carnegie', 0.026)]
simIndex simValue paperId paperTitle
same-paper 1 1.0 154 nips-2005-Preconditioner Approximations for Probabilistic Graphical Models
Author: John D. Lafferty, Pradeep K. Ravikumar
Abstract: We present a family of approximation techniques for probabilistic graphical models, based on the use of graphical preconditioners developed in the scientific computing literature. Our framework yields rigorous upper and lower bounds on event probabilities and the log partition function of undirected graphical models, using non-iterative procedures that have low time complexity. As in mean field approaches, the approximations are built upon tractable subgraphs; however, we recast the problem of optimizing the tractable distribution parameters and approximate inference in terms of the well-studied linear systems problem of obtaining a good matrix preconditioner. Experiments are presented that compare the new approximation schemes to variational methods. 1
2 0.1583807 200 nips-2005-Variable KD-Tree Algorithms for Spatial Pattern Search and Discovery
Author: Jeremy Kubica, Joseph Masiero, Robert Jedicke, Andrew Connolly, Andrew W. Moore
Abstract: In this paper we consider the problem of finding sets of points that conform to a given underlying model from within a dense, noisy set of observations. This problem is motivated by the task of efficiently linking faint asteroid detections, but is applicable to a range of spatial queries. We survey current tree-based approaches, showing a trade-off exists between single tree and multiple tree algorithms. To this end, we present a new type of multiple tree algorithm that uses a variable number of trees to exploit the advantages of both approaches. We empirically show that this algorithm performs well using both simulated and astronomical data.
3 0.12990962 139 nips-2005-Non-iterative Estimation with Perturbed Gaussian Markov Processes
Author: Yunsong Huang, B. Keith Jenkins
Abstract: We develop an approach for estimation with Gaussian Markov processes that imposes a smoothness prior while allowing for discontinuities. Instead of propagating information laterally between neighboring nodes in a graph, we study the posterior distribution of the hidden nodes as a whole—how it is perturbed by invoking discontinuities, or weakening the edges, in the graph. We show that the resulting computation amounts to feed-forward fan-in operations reminiscent of V1 neurons. Moreover, using suitable matrix preconditioners, the incurred matrix inverse and determinant can be approximated, without iteration, in the same computational style. Simulation results illustrate the merits of this approach.
4 0.11696743 85 nips-2005-Generalization to Unseen Cases
Author: Teemu Roos, Peter Grünwald, Petri Myllymäki, Henry Tirri
Abstract: We analyze classification error on unseen cases, i.e. cases that are different from those in the training set. Unlike standard generalization error, this off-training-set error may differ significantly from the empirical error with high probability even with large sample sizes. We derive a datadependent bound on the difference between off-training-set and standard generalization error. Our result is based on a new bound on the missing mass, which for small samples is stronger than existing bounds based on Good-Turing estimators. As we demonstrate on UCI data-sets, our bound gives nontrivial generalization guarantees in many practical cases. In light of these results, we show that certain claims made in the No Free Lunch literature are overly pessimistic. 1
5 0.11528821 65 nips-2005-Estimating the wrong Markov random field: Benefits in the computation-limited setting
Author: Martin J. Wainwright
Abstract: Consider the problem of joint parameter estimation and prediction in a Markov random field: i.e., the model parameters are estimated on the basis of an initial set of data, and then the fitted model is used to perform prediction (e.g., smoothing, denoising, interpolation) on a new noisy observation. Working in the computation-limited setting, we analyze a joint method in which the same convex variational relaxation is used to construct an M-estimator for fitting parameters, and to perform approximate marginalization for the prediction step. The key result of this paper is that in the computation-limited setting, using an inconsistent parameter estimator (i.e., an estimator that returns the “wrong” model even in the infinite data limit) is provably beneficial, since the resulting errors can partially compensate for errors made by using an approximate prediction technique. En route to this result, we analyze the asymptotic properties of M-estimators based on convex variational relaxations, and establish a Lipschitz stability property that holds for a broad class of variational methods. We show that joint estimation/prediction based on the reweighted sum-product algorithm substantially outperforms a commonly used heuristic based on ordinary sum-product. 1 Keywords: Markov random fields; variational method; message-passing algorithms; sum-product; belief propagation; parameter estimation; learning. 1
6 0.10845663 202 nips-2005-Variational EM Algorithms for Non-Gaussian Latent Variable Models
7 0.095308416 90 nips-2005-Hot Coupling: A Particle Approach to Inference and Normalization on Pairwise Undirected Graphs
8 0.093653657 201 nips-2005-Variational Bayesian Stochastic Complexity of Mixture Models
9 0.092227392 186 nips-2005-TD(0) Leads to Better Policies than Approximate Value Iteration
10 0.085675277 42 nips-2005-Combining Graph Laplacians for Semi--Supervised Learning
11 0.083246432 96 nips-2005-Inference with Minimal Communication: a Decision-Theoretic Variational Approach
12 0.082099386 105 nips-2005-Large-Scale Multiclass Transduction
13 0.080182485 204 nips-2005-Walk-Sum Interpretation and Analysis of Gaussian Belief Propagation
14 0.078471154 180 nips-2005-Spectral Bounds for Sparse PCA: Exact and Greedy Algorithms
15 0.077150159 46 nips-2005-Consensus Propagation
16 0.076193571 2 nips-2005-A Bayes Rule for Density Matrices
17 0.074194163 95 nips-2005-Improved risk tail bounds for on-line algorithms
18 0.071985908 82 nips-2005-Generalization Error Bounds for Aggregation by Mirror Descent with Averaging
19 0.071391895 83 nips-2005-Generalization error bounds for classifiers trained with interdependent data
20 0.070631988 113 nips-2005-Learning Multiple Related Tasks using Latent Independent Component Analysis
topicId topicWeight
[(0, 0.221), (1, 0.109), (2, 0.008), (3, -0.096), (4, 0.003), (5, -0.083), (6, -0.258), (7, 0.083), (8, -0.024), (9, 0.02), (10, 0.074), (11, -0.052), (12, -0.044), (13, -0.016), (14, -0.016), (15, -0.081), (16, -0.065), (17, -0.102), (18, 0.012), (19, -0.016), (20, 0.153), (21, 0.017), (22, -0.015), (23, -0.078), (24, 0.101), (25, -0.026), (26, -0.066), (27, -0.058), (28, 0.007), (29, 0.108), (30, 0.067), (31, -0.005), (32, 0.101), (33, 0.058), (34, 0.123), (35, 0.185), (36, -0.027), (37, 0.076), (38, -0.094), (39, 0.064), (40, 0.068), (41, 0.111), (42, -0.045), (43, 0.029), (44, -0.012), (45, 0.003), (46, 0.042), (47, 0.058), (48, 0.014), (49, 0.014)]
simIndex simValue paperId paperTitle
same-paper 1 0.97304553 154 nips-2005-Preconditioner Approximations for Probabilistic Graphical Models
Author: John D. Lafferty, Pradeep K. Ravikumar
Abstract: We present a family of approximation techniques for probabilistic graphical models, based on the use of graphical preconditioners developed in the scientific computing literature. Our framework yields rigorous upper and lower bounds on event probabilities and the log partition function of undirected graphical models, using non-iterative procedures that have low time complexity. As in mean field approaches, the approximations are built upon tractable subgraphs; however, we recast the problem of optimizing the tractable distribution parameters and approximate inference in terms of the well-studied linear systems problem of obtaining a good matrix preconditioner. Experiments are presented that compare the new approximation schemes to variational methods. 1
2 0.58470619 85 nips-2005-Generalization to Unseen Cases
Author: Teemu Roos, Peter Grünwald, Petri Myllymäki, Henry Tirri
Abstract: We analyze classification error on unseen cases, i.e. cases that are different from those in the training set. Unlike standard generalization error, this off-training-set error may differ significantly from the empirical error with high probability even with large sample sizes. We derive a datadependent bound on the difference between off-training-set and standard generalization error. Our result is based on a new bound on the missing mass, which for small samples is stronger than existing bounds based on Good-Turing estimators. As we demonstrate on UCI data-sets, our bound gives nontrivial generalization guarantees in many practical cases. In light of these results, we show that certain claims made in the No Free Lunch literature are overly pessimistic. 1
3 0.58325309 200 nips-2005-Variable KD-Tree Algorithms for Spatial Pattern Search and Discovery
Author: Jeremy Kubica, Joseph Masiero, Robert Jedicke, Andrew Connolly, Andrew W. Moore
Abstract: In this paper we consider the problem of finding sets of points that conform to a given underlying model from within a dense, noisy set of observations. This problem is motivated by the task of efficiently linking faint asteroid detections, but is applicable to a range of spatial queries. We survey current tree-based approaches, showing a trade-off exists between single tree and multiple tree algorithms. To this end, we present a new type of multiple tree algorithm that uses a variable number of trees to exploit the advantages of both approaches. We empirically show that this algorithm performs well using both simulated and astronomical data.
4 0.55457366 204 nips-2005-Walk-Sum Interpretation and Analysis of Gaussian Belief Propagation
Author: Dmitry Malioutov, Alan S. Willsky, Jason K. Johnson
Abstract: This paper presents a new framework based on walks in a graph for analysis and inference in Gaussian graphical models. The key idea is to decompose correlations between variables as a sum over all walks between those variables in the graph. The weight of each walk is given by a product of edgewise partial correlations. We provide a walk-sum interpretation of Gaussian belief propagation in trees and of the approximate method of loopy belief propagation in graphs with cycles. This perspective leads to a better understanding of Gaussian belief propagation and of its convergence in loopy graphs. 1
5 0.5488351 180 nips-2005-Spectral Bounds for Sparse PCA: Exact and Greedy Algorithms
Author: Baback Moghaddam, Yair Weiss, Shai Avidan
Abstract: Sparse PCA seeks approximate sparse “eigenvectors” whose projections capture the maximal variance of data. As a cardinality-constrained and non-convex optimization problem, it is NP-hard and is encountered in a wide range of applied fields, from bio-informatics to finance. Recent progress has focused mainly on continuous approximation and convex relaxation of the hard cardinality constraint. In contrast, we consider an alternative discrete spectral formulation based on variational eigenvalue bounds and provide an effective greedy strategy as well as provably optimal solutions using branch-and-bound search. Moreover, the exact methodology used reveals a simple renormalization step that improves approximate solutions obtained by any continuous method. The resulting performance gain of discrete algorithms is demonstrated on real-world benchmark data and in extensive Monte Carlo evaluation trials. 1
6 0.49595973 117 nips-2005-Learning from Data of Variable Quality
7 0.48888743 139 nips-2005-Non-iterative Estimation with Perturbed Gaussian Markov Processes
8 0.47635916 186 nips-2005-TD(0) Leads to Better Policies than Approximate Value Iteration
9 0.47029984 65 nips-2005-Estimating the wrong Markov random field: Benefits in the computation-limited setting
10 0.45450282 112 nips-2005-Learning Minimum Volume Sets
11 0.45310697 125 nips-2005-Message passing for task redistribution on sparse graphs
12 0.4462527 105 nips-2005-Large-Scale Multiclass Transduction
13 0.4448581 71 nips-2005-Fast Krylov Methods for N-Body Learning
14 0.43533403 201 nips-2005-Variational Bayesian Stochastic Complexity of Mixture Models
15 0.42105114 202 nips-2005-Variational EM Algorithms for Non-Gaussian Latent Variable Models
16 0.41543314 90 nips-2005-Hot Coupling: A Particle Approach to Inference and Normalization on Pairwise Undirected Graphs
17 0.40075898 96 nips-2005-Inference with Minimal Communication: a Decision-Theoretic Variational Approach
18 0.39801404 82 nips-2005-Generalization Error Bounds for Aggregation by Mirror Descent with Averaging
19 0.39692315 46 nips-2005-Consensus Propagation
20 0.38690406 75 nips-2005-Fixing two weaknesses of the Spectral Method
topicId topicWeight
[(3, 0.09), (10, 0.057), (27, 0.035), (31, 0.114), (34, 0.119), (50, 0.016), (55, 0.041), (68, 0.192), (69, 0.063), (72, 0.011), (73, 0.042), (88, 0.058), (91, 0.062)]
simIndex simValue paperId paperTitle
same-paper 1 0.83854359 154 nips-2005-Preconditioner Approximations for Probabilistic Graphical Models
Author: John D. Lafferty, Pradeep K. Ravikumar
Abstract: We present a family of approximation techniques for probabilistic graphical models, based on the use of graphical preconditioners developed in the scientific computing literature. Our framework yields rigorous upper and lower bounds on event probabilities and the log partition function of undirected graphical models, using non-iterative procedures that have low time complexity. As in mean field approaches, the approximations are built upon tractable subgraphs; however, we recast the problem of optimizing the tractable distribution parameters and approximate inference in terms of the well-studied linear systems problem of obtaining a good matrix preconditioner. Experiments are presented that compare the new approximation schemes to variational methods. 1
2 0.82116604 203 nips-2005-Visual Encoding with Jittering Eyes
Author: Michele Rucci
Abstract: Under natural viewing conditions, small movements of the eye and body prevent the maintenance of a steady direction of gaze. It is known that stimuli tend to fade when they are stabilized on the retina for several seconds. However, it is unclear whether the physiological self-motion of the retinal image serves a visual purpose during the brief periods of natural visual fixation. This study examines the impact of fixational instability on the statistics of visual input to the retina and on the structure of neural activity in the early visual system. Fixational instability introduces fluctuations in the retinal input signals that, in the presence of natural images, lack spatial correlations. These input fluctuations strongly influence neural activity in a model of the LGN. They decorrelate cell responses, even if the contrast sensitivity functions of simulated cells are not perfectly tuned to counter-balance the power-law spectrum of natural images. A decorrelation of neural activity has been proposed to be beneficial for discarding statistical redundancies in the input signals. Fixational instability might, therefore, contribute to establishing efficient representations of natural stimuli. 1
3 0.72807699 78 nips-2005-From Weighted Classification to Policy Search
Author: Doron Blatt, Alfred O. Hero
Abstract: This paper proposes an algorithm to convert a T -stage stochastic decision problem with a continuous state space to a sequence of supervised learning problems. The optimization problem associated with the trajectory tree and random trajectory methods of Kearns, Mansour, and Ng, 2000, is solved using the Gauss-Seidel method. The algorithm breaks a multistage reinforcement learning problem into a sequence of single-stage reinforcement learning subproblems, each of which is solved via an exact reduction to a weighted-classification problem that can be solved using off-the-self methods. Thus the algorithm converts a reinforcement learning problem into simpler supervised learning subproblems. It is shown that the method converges in a finite number of steps to a solution that cannot be further improved by componentwise optimization. The implication of the proposed algorithm is that a plethora of classification methods can be applied to find policies in the reinforcement learning problem. 1
4 0.70382625 96 nips-2005-Inference with Minimal Communication: a Decision-Theoretic Variational Approach
Author: O. P. Kreidl, Alan S. Willsky
Abstract: Given a directed graphical model with binary-valued hidden nodes and real-valued noisy observations, consider deciding upon the maximum a-posteriori (MAP) or the maximum posterior-marginal (MPM) assignment under the restriction that each node broadcasts only to its children exactly one single-bit message. We present a variational formulation, viewing the processing rules local to all nodes as degrees-of-freedom, that minimizes the loss in expected (MAP or MPM) performance subject to such online communication constraints. The approach leads to a novel message-passing algorithm to be executed offline, or before observations are realized, which mitigates the performance loss by iteratively coupling all rules in a manner implicitly driven by global statistics. We also provide (i) illustrative examples, (ii) assumptions that guarantee convergence and efficiency and (iii) connections to active research areas. 1
5 0.70368677 46 nips-2005-Consensus Propagation
Author: Benjamin V. Roy, Ciamac C. Moallemi
Abstract: We propose consensus propagation, an asynchronous distributed protocol for averaging numbers across a network. We establish convergence, characterize the convergence rate for regular graphs, and demonstrate that the protocol exhibits better scaling properties than pairwise averaging, an alternative that has received much recent attention. Consensus propagation can be viewed as a special case of belief propagation, and our results contribute to the belief propagation literature. In particular, beyond singly-connected graphs, there are very few classes of relevant problems for which belief propagation is known to converge. 1
6 0.69978887 43 nips-2005-Comparing the Effects of Different Weight Distributions on Finding Sparse Representations
7 0.69719291 142 nips-2005-Oblivious Equilibrium: A Mean Field Approximation for Large-Scale Dynamic Games
8 0.69652587 58 nips-2005-Divergences, surrogate loss functions and experimental design
9 0.69469249 47 nips-2005-Consistency of one-class SVM and related algorithms
10 0.6907441 145 nips-2005-On Local Rewards and Scaling Distributed Reinforcement Learning
11 0.69012451 184 nips-2005-Structured Prediction via the Extragradient Method
12 0.68469536 144 nips-2005-Off-policy Learning with Options and Recognizers
13 0.68088585 90 nips-2005-Hot Coupling: A Particle Approach to Inference and Normalization on Pairwise Undirected Graphs
14 0.68054867 9 nips-2005-A Domain Decomposition Method for Fast Manifold Learning
15 0.67930919 177 nips-2005-Size Regularized Cut for Data Clustering
16 0.67551821 190 nips-2005-The Curse of Highly Variable Functions for Local Kernel Machines
17 0.67503941 178 nips-2005-Soft Clustering on Graphs
18 0.67415583 112 nips-2005-Learning Minimum Volume Sets
19 0.67340708 65 nips-2005-Estimating the wrong Markov random field: Benefits in the computation-limited setting
20 0.67321366 49 nips-2005-Convergence and Consistency of Regularized Boosting Algorithms with Stationary B-Mixing Observations