nips nips2003 nips2003-174 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Michael I. Jordan, Martin J. Wainwright
Abstract: We present a new method for calculating approximate marginals for probability distributions defined by graphs with cycles, based on a Gaussian entropy bound combined with a semidefinite outer bound on the marginal polytope. This combination leads to a log-determinant maximization problem that can be solved by efficient interior point methods [8]. As with the Bethe approximation and its generalizations [12], the optimizing arguments of this problem can be taken as approximations to the exact marginals. In contrast to Bethe/Kikuchi approaches, our variational problem is strictly convex and so has a unique global optimum. An additional desirable feature is that the value of the optimal solution is guaranteed to provide an upper bound on the log partition function. In experimental trials, the performance of the log-determinant relaxation is comparable to or better than the sum-product algorithm, and by a substantial margin for certain problem classes. Finally, the zero-temperature limit of our log-determinant relaxation recovers a class of well-known semidefinite relaxations for integer programming [e.g., 3]. 1
Reference: text
sentIndex sentText sentNum sentScore
1 Semidefinite relaxations for approximate inference on graphs with cycles Martin J. [sent-1, score-0.236]
2 edu Abstract We present a new method for calculating approximate marginals for probability distributions defined by graphs with cycles, based on a Gaussian entropy bound combined with a semidefinite outer bound on the marginal polytope. [sent-7, score-0.92]
3 This combination leads to a log-determinant maximization problem that can be solved by efficient interior point methods [8]. [sent-8, score-0.178]
4 As with the Bethe approximation and its generalizations [12], the optimizing arguments of this problem can be taken as approximations to the exact marginals. [sent-9, score-0.202]
5 In contrast to Bethe/Kikuchi approaches, our variational problem is strictly convex and so has a unique global optimum. [sent-10, score-0.227]
6 An additional desirable feature is that the value of the optimal solution is guaranteed to provide an upper bound on the log partition function. [sent-11, score-0.332]
7 In experimental trials, the performance of the log-determinant relaxation is comparable to or better than the sum-product algorithm, and by a substantial margin for certain problem classes. [sent-12, score-0.159]
8 Finally, the zero-temperature limit of our log-determinant relaxation recovers a class of well-known semidefinite relaxations for integer programming [e. [sent-13, score-0.28]
9 , Markov random field, factor graph), a key problem is the computation of marginal distributions. [sent-18, score-0.211]
10 Although highly efficient algorithms exist for trees, exact solutions are prohibitively complex for more general graphs of any substantial size. [sent-19, score-0.115]
11 This difficulty motivates the use of algorithms for computing approximations to marginal distributions, a problem to which we refer as approximate inference. [sent-20, score-0.327]
12 [12], it can be interpreted as a method for attempting to solve a variational problem wherein the exact entropy is replaced by the Bethe approximation. [sent-23, score-0.376]
13 , 6], the associated variational problems are typically not convex, thus leading to algorithmic complications, and also raising the possibility of multiple local optima. [sent-28, score-0.196]
14 , mean field [4]), the optimal values of Bethe-type variational problems fail to provide bounds on the log partition function. [sent-31, score-0.363]
15 This paper introduces a new class of variational problems that are both convex and provide upper bounds. [sent-33, score-0.286]
16 Our derivation relies on a Gaussian upper bound on the discrete entropy of a suitably regularized random vector, and a semidefinite outer bound on the set of valid marginal distributions. [sent-34, score-0.894]
17 The combination leads to a log-determinant maximization problem with a unique optimum that can be found by efficient interior point methods [8]. [sent-35, score-0.237]
18 As with the Bethe/Kikuchi approximations and sum-product algorithms, the optimizing arguments of this problem can be taken as approximations to the marginal distributions of the underlying graphical model. [sent-36, score-0.45]
19 Moreover, taking the “zero-temperature” limit recovers a class of wellknown semidefinite programming relaxations for integer programming problems [e. [sent-37, score-0.181]
20 2 Problem set-up We consider an undirected graph G = (V, E) with n = |V | nodes. [sent-40, score-0.108]
21 Associated with each vertex s ∈ V is a random variable xs taking values in the discrete space X = {0, 1, . [sent-41, score-0.259]
22 For some index set I, we let φ = {φα | α ∈ I} denote a collection of potential functions associated with the cliques of G, and let θ = {θα | α ∈ I} be a vector of parameters associated with these potential functions. [sent-47, score-0.244]
23 α Here Φ(θ) is the log partition function that serves to normalize the distribution. [sent-49, score-0.157]
24 For example, one minimal representation of a binaryvalued random vector on a graph with pairwise cliques is the standard Ising model, in which φ = {xs | s ∈ V } ∪ { xs xt | (s, t) ∈ E}. [sent-51, score-0.572]
25 , xs xt xu for a third order interaction) to the collection of potential functions. [sent-55, score-0.434]
26 1 Duality and marginal polytopes It is well known that Φ is convex in terms of θ, and strictly so for a minimal representation. [sent-58, score-0.336]
27 We refer to this set as the marginal polytope1 associated with the graph G and the potentials φ. [sent-70, score-0.351]
28 (6) x∈X n This relation establishes that for µ in the relative interior of MARG(G; φ), the value of the conjugate dual Φ∗ (µ) is given by the negative entropy of the distribution p(x; θ(µ)), where the pair θ(µ) and µ are dually coupled via Eqn. [sent-73, score-0.482]
29 (7) Moreover, we are guaranteed that the optimum is attained uniquely at the exact marginals µ = {µα } of p(x; θ). [sent-77, score-0.236]
30 This variational formulation plays a central role in our development in the sequel. [sent-78, score-0.135]
31 2 Challenges with the variational formulation There are two difficulties associated with the variational formulation (7). [sent-80, score-0.302]
32 First of all, observe that the (negative) entropy Φ∗ , as a function of only the mean parameters µ, is implicitly defined; indeed, it is typically impossible to specify an explicit form for Φ ∗ . [sent-81, score-0.189]
33 Key exceptions are trees and hypertrees, for which the entropy is well-known to decompose into a sum of local entropies defined by local marginals on the (hyper)edges [1]. [sent-82, score-0.421]
34 Secondly, for a general graph with cycles, the marginal polytope MARG(G; φ) is defined by a number of inequalities that grows rapidly in graph size [e. [sent-83, score-0.613]
35 Trees and hypertrees again are important exceptions: in this case, the junction tree theorem [e. [sent-86, score-0.244]
36 , 1] provides a compact representation of the associated marginal polytopes. [sent-88, score-0.243]
37 The Bethe approach (and its generalizations) can be understood as consisting of two steps: (a) replacing the exact entropy −Φ∗ with a tree (or hypertree) approximation; and (b) replacing the marginal polytope MARG(G; φ) with constraint sets defined by tree (or hypertree) consistency conditions. [sent-89, score-0.791]
38 However, since the (hyper)tree approximations used do not bound the exact entropy, the optimal values of Bethe-type variational problems do not provide a bound on the value of the log partition function Φ(θ). [sent-90, score-0.685]
39 Requirements for bounding Φ are both an outer bound on the marginal polytope, as well as an upper bound on the entropy −Φ∗ . [sent-91, score-0.862]
40 1 When φα corresponds to an indicator function, then µα is a marginal probability; otherwise, this choice entails a minor abuse of terminology. [sent-92, score-0.211]
41 3 Log-determinant relaxation In this section, we state and prove a set of upper bounds based on the solution of a variational problem involving determinant maximization and semidefinite constraints. [sent-93, score-0.497]
42 It is also convenient to define all problems with respect to the complete graph K n (i. [sent-95, score-0.184]
43 We use the standard (minimal) Ising representation for a binary problem, in terms of the potential functions φ = {xs | s ∈ V } ∪ {xs xt | (s, t)}. [sent-98, score-0.15]
44 Of course, any problem can 2 be embedded into the complete graph by setting to zero a subset of the {θst } parameters. [sent-100, score-0.155]
45 (In particular, for a graph G = (V, E), we simply set θst = 0 for all pairs (s, t) ∈ E. [sent-101, score-0.108]
46 1 Outer bounds on the marginal polytope We first focus on the marginal polytope MARG(Kn ) ≡ MARG(Kn ; φ) of valid dual variables {µs , µst }, as defined in Eqn. [sent-103, score-0.948]
47 In this section, we describe a set of semidefinite and linear constraints that any valid dual vector µ ∈ MARG(Kn ) must satisfy. [sent-105, score-0.245]
48 1 Semidefinite constraints Given an arbitrary vector µ ∈ Rd , consider the following (n + 1) × (n + 1) matrix: 1 µ1 µ2 · · · µn−1 µn 1 µ12 · · · ··· µ1n µ1 µ µ21 1 ··· ··· µ2n 2 . [sent-108, score-0.133]
49 µn,(n−1) n−1 µn µn1 µn2 · · · µ(n−1),n 1 (8) The motivation underlying this definition is the following: suppose that the given dual vector µ actually belongs to MARG(Kn ), in which case there exists some distribution p(x; θ) such that µs = x p(x; θ) xs and µst = x p(x; θ) xs xt . [sent-138, score-0.622]
50 (Note in particular that the diagonal elements are all one, since x2 = 1 when xs ∈ {−1, +1}. [sent-140, score-0.228]
51 ) Since any such moment matrix must be positive s semidefinite,2 we have established the following: Lemma 1 (Semidefinite outer bound). [sent-141, score-0.128]
52 The binary marginal polytope MARG(K n ) is contained within the semidefinite constraint set: SDEF1 := µ ∈ Rd M1 [µ] 0 (9) This semidefinite relaxation can be further strengthened by including higher order terms in the moment matrices [5]. [sent-142, score-0.57]
53 2 Additional linear constraints It is straightforward to augment these semidefinite constraints with additional linear constraints. [sent-145, score-0.202]
54 2 T Pairwise edge constraints: It is natural to require that the subset of mean parameters associated with each pair of random variables (xs , xt ) — namely, µs , µt and µst — specify a valid pairwise marginal distribution. [sent-148, score-0.43]
55 (10) It can be shown [11, 10] that these constraints are necessary and sufficient to guarantee the existence of a consistent pairwise marginal. [sent-150, score-0.176]
56 By the junction tree theorem [1], this pairwise consistency guarantees that the constraints of Eqn. [sent-151, score-0.371]
57 (10) provide a complete description of the binary marginal polytope for any tree-structured graph. [sent-152, score-0.489]
58 Moreover, for a general graph with cycles, they are equivalent to the tree-consistent constraint set used in the Bethe approach [12] when applied to a binary vector x ∈ {−1, +1}n . [sent-153, score-0.185]
59 Triplet constraints: Local consistency can be extended to triplets {xs , xt , xu }, and even more generally to higher order subsets. [sent-154, score-0.154]
60 For the triplet case, consider the following set of constraints (and permutations thereof) among the pairwise mean parameters {µst , µsu , µtu }: µst + µsu + µtu ≥ −1, µst − µsu − µtu ≥ −1. [sent-155, score-0.244]
61 (11) It can be shown [11, 10] that these constraints, in conjunction with the pairwise constraints (10), are necessary and sufficient to ensure that the collection of mean parameters {µs , µt , µu , µst , µsu , µtu } uniquely determine a valid marginal over the triplet (xs , xt , xu ). [sent-156, score-0.668]
62 Once again, by applying the junction tree theorem [1], we conclude that the constraints (10) and (11) provide a complete characterization of the binary marginal polytope for hypertrees of width two. [sent-157, score-0.834]
63 It is worthwhile observing that this set of constraints is equivalent to those that are implicitly enforced by any Kikuchi approximation [12] with clusters of size three (when applied to a binary problem). [sent-158, score-0.146]
64 2 Gaussian entropy bound We now turn to the task of upper bounding the entropy. [sent-160, score-0.408]
65 Our starting point is the familiar interpretation of the Gaussian as the maximum entropy distribution subject to covariance constraints: Lemma 2. [sent-161, score-0.189]
66 The (differential) entropy h(x) := − p(x) log p(x)dx is upper bounded by the entropy 1 log det cov(x) + n log(2πe) of a Gaussian with matched covariance. [sent-162, score-0.71]
67 2 2 Of interest to us is the discrete entropy of a discrete-valued random vector x ∈ {−1, +1} n , whereas the Gaussian bound of Lemma 2 applies to the differential entropy of a continuousvalued random vector. [sent-163, score-0.599]
68 We have h(x) = H(x), where h and H denote the differential and discrete entropies of x and x respectively. [sent-168, score-0.11]
69 By construction, the differential entropy can be decomposed as a sum of integrals over hyperboxes of unit volume, one for each configuration, over which the probability density of x is constant. [sent-170, score-0.232]
70 3 Log-determinant relaxation Equipped with these building blocks, we are now ready to state and prove a log-determinant relaxation for the log partition function. [sent-173, score-0.413]
71 Let x ∈ {−1, +1}n , and let OUT(Kn ) be any convex outer bound on MARG(Kn ) that is contained within SDEF1 . [sent-175, score-0.305]
72 Then there holds Φ(θ) ≤ max µ∈OUT(Kn ) θ, µ + 1 1 log det M1 (µ) + blkdiag[0, In ] 2 3 + n πe log( ) 2 2 (12) where blkdiag[0, In ] is an (n+1)×(n+1) block-diagonal matrix. [sent-176, score-0.175]
73 From Lemma 3, we have 2 H(x) = h(x); combining this equality with Lemma 2, we obtain the upper bound H(x) ≤ 1 n 2 log det cov(x) + 2 log(2πe). [sent-181, score-0.35]
74 Next we use the Schur complement formula [8] to 4 express the log determinant as follows: 1 1 blkdiag[0, In ] + n log . [sent-183, score-0.263]
75 (13) with the Gaussian upper bound leads to the following expression: log det cov(x) = log det M1 [µ] + 1 1 n πe log det M1 [µ] + blkdiag[0, In ] + log( ) 2 3 2 2 Substituting this upper bound into the variational representation of Eqn. [sent-185, score-1.01]
76 (7) and using the fact that OUT(Kn ) is an outer bound on MARG(G) yields Eqn. [sent-186, score-0.243]
77 By construction, the cost function is strictly convex so that the optimum is unique. [sent-188, score-0.151]
78 (12) is a determinant maximization problem, for which efficient interior point methods have been developed [e. [sent-191, score-0.247]
79 4 Experimental results The relevance of the log-determinant relaxation for applications is two-fold: it provides upper bounds on the log partition function, and the maximizing arguments µ ∈ OUT(K n ) of Eqn. [sent-194, score-0.425]
80 (12) can be taken as approximations to the exact marginals of the distribution p(x; θ). [sent-195, score-0.23]
81 So as to test its performance in computing approximate marginals, we performed extensive experiments on the complete graph (fully connected) and the 2-D nearestneighbor lattice model. [sent-196, score-0.189]
82 For a given coupling strength dcoup > 0, we investigated three possible types of coupling: (a) for repulsive interactions, θst ∼ U[−2dcoup , 0]; (b) for mixed interactions, θst ∼ U[−dcoup , +dcoup ]; (c) for attractive interactions, θst ∼ U[0, 2dcoup ]. [sent-202, score-0.277]
83 For each distribution p(x; θ), we performed the following computations: (a) the exact marginal probability p(xs = 1; θ) at each node; and (b) approximate marginals computed 4 Here U[a, b] denotes the uniform distribution on [a, b]. [sent-203, score-0.393]
84 from the Bethe approximation with the sum-product algorithm, or (c) log-determinant approximate marginals from Theorem 1 using the outer bound OUT(Kn ) given by the first semidefinite relaxation SDEF1 in conjunction with the pairwise linear constraints in Eqn. [sent-204, score-0.677]
85 We computed the exact marginal values either by exhaustive summation (complete graph), or by the junction tree algorithm (lattices). [sent-206, score-0.382]
86 The log-determinant problem of Theorem 1 was solved using interior point methods [8]. [sent-209, score-0.115]
87 For each graph (fully connected or grid), we examined a total of 6 conditions: 2 different potential strengths (weak or strong) for each of the 3 types of coupling (attractive, mixed, n 1 and repulsive). [sent-210, score-0.231]
88 We computed the 1 -error n s=1 |p(xs = 1; θ) − µs |, where µs was the approximate marginal computed either by SP or by LD. [sent-211, score-0.245]
89 Statistics of the 1 -approximation error for the sum-product (SP) and logdeterminant (LD) methods for the fully connected graph K16 , as well as the 4-nearest neighbor grid with 16 nodes, with varying coupling and potential strengths. [sent-309, score-0.259]
90 The potential strength is given as the pair (dobs , dcoup ); note that dobs = 0. [sent-311, score-0.23]
91 5 new old More precisely, we updated messages in the log domain as γ log M st + (1 − γ) log Mst . [sent-322, score-0.449]
92 5 Discussion In this paper, we developed a new method for approximate inference based on the combination of a Gaussian entropy bound with semidefinite constraints on the marginal polytope. [sent-323, score-0.65]
93 The resultant log-determinant maximization problem can be solved by efficient interior point methods [8]. [sent-324, score-0.178]
94 It can be shown [11, 10] that in the zero-temperature limit, the log-determinant relaxation (12) reduces to a class of semidefinite relaxations that are widely used in combinatorial optimization. [sent-327, score-0.237]
95 One open question is whether techniques for bounding the performance of such semidefinite relaxations [e. [sent-328, score-0.153]
96 It remains to develop a deeper understanding of the interaction between the two components to these approximations (i. [sent-332, score-0.11]
97 , the entropy bound, and the outer bound on the marginal polytope), as well as how to tailor approximations to particular graph structures. [sent-334, score-0.833]
98 Finally, semidefinite constraints can be combined with entropy approximations (preferably convex) other than the Gaussian bound used in this paper, among them “convexified” Bethe/Kikuchi entropy approximations [9]. [sent-335, score-0.758]
99 A new class of upper bounds on the log partition function. [sent-397, score-0.259]
100 Semidefinite relaxations for approximate inference on graphs with cycles. [sent-412, score-0.175]
wordName wordTfidf (topN-words)
[('marg', 0.417), ('semide', 0.331), ('xs', 0.228), ('kn', 0.214), ('marginal', 0.211), ('entropy', 0.189), ('polytope', 0.186), ('st', 0.158), ('variational', 0.135), ('outer', 0.128), ('relaxation', 0.128), ('sp', 0.122), ('interior', 0.115), ('bound', 0.115), ('bethe', 0.11), ('relaxations', 0.109), ('graph', 0.108), ('blkdiag', 0.104), ('dcoup', 0.104), ('ld', 0.103), ('constraints', 0.101), ('log', 0.097), ('marginals', 0.096), ('coupling', 0.085), ('approximations', 0.082), ('cov', 0.082), ('det', 0.078), ('hypertrees', 0.078), ('pairwise', 0.075), ('nite', 0.075), ('tu', 0.072), ('berkeley', 0.069), ('determinant', 0.069), ('exceptions', 0.068), ('triplet', 0.068), ('dual', 0.067), ('xt', 0.067), ('maximization', 0.063), ('rd', 0.062), ('convex', 0.062), ('tree', 0.062), ('cycles', 0.061), ('upper', 0.06), ('partition', 0.06), ('optimum', 0.059), ('wainwright', 0.059), ('xu', 0.058), ('junction', 0.057), ('uc', 0.057), ('lemma', 0.053), ('dobs', 0.052), ('repulsive', 0.052), ('exact', 0.052), ('trials', 0.05), ('su', 0.049), ('complete', 0.047), ('theorem', 0.047), ('hypertree', 0.045), ('deza', 0.045), ('binary', 0.045), ('valid', 0.045), ('bounding', 0.044), ('collection', 0.043), ('differential', 0.043), ('interactions', 0.043), ('recovers', 0.043), ('bounds', 0.042), ('median', 0.041), ('yedidia', 0.041), ('laurent', 0.041), ('ising', 0.041), ('hyper', 0.041), ('coupled', 0.039), ('potential', 0.038), ('arguments', 0.038), ('graphical', 0.037), ('relation', 0.037), ('entropies', 0.036), ('strength', 0.036), ('conjugate', 0.035), ('approximate', 0.034), ('minimal', 0.033), ('moreover', 0.033), ('graphs', 0.032), ('vector', 0.032), ('associated', 0.032), ('trees', 0.032), ('discrete', 0.031), ('substantial', 0.031), ('generalizations', 0.03), ('strictly', 0.03), ('attained', 0.029), ('cliques', 0.029), ('princeton', 0.029), ('january', 0.029), ('consistency', 0.029), ('problems', 0.029), ('ranges', 0.028), ('interaction', 0.028), ('fully', 0.028)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999988 174 nips-2003-Semidefinite Relaxations for Approximate Inference on Graphs with Cycles
Author: Michael I. Jordan, Martin J. Wainwright
Abstract: We present a new method for calculating approximate marginals for probability distributions defined by graphs with cycles, based on a Gaussian entropy bound combined with a semidefinite outer bound on the marginal polytope. This combination leads to a log-determinant maximization problem that can be solved by efficient interior point methods [8]. As with the Bethe approximation and its generalizations [12], the optimizing arguments of this problem can be taken as approximations to the exact marginals. In contrast to Bethe/Kikuchi approaches, our variational problem is strictly convex and so has a unique global optimum. An additional desirable feature is that the value of the optimal solution is guaranteed to provide an upper bound on the log partition function. In experimental trials, the performance of the log-determinant relaxation is comparable to or better than the sum-product algorithm, and by a substantial margin for certain problem classes. Finally, the zero-temperature limit of our log-determinant relaxation recovers a class of well-known semidefinite relaxations for integer programming [e.g., 3]. 1
2 0.2309486 171 nips-2003-Semi-Definite Programming by Perceptron Learning
Author: Thore Graepel, Ralf Herbrich, Andriy Kharechko, John S. Shawe-taylor
Abstract: We present a modified version of the perceptron learning algorithm (PLA) which solves semidefinite programs (SDPs) in polynomial time. The algorithm is based on the following three observations: (i) Semidefinite programs are linear programs with infinitely many (linear) constraints; (ii) every linear program can be solved by a sequence of constraint satisfaction problems with linear constraints; (iii) in general, the perceptron learning algorithm solves a constraint satisfaction problem with linear constraints in finitely many updates. Combining the PLA with a probabilistic rescaling algorithm (which, on average, increases the size of the feasable region) results in a probabilistic algorithm for solving SDPs that runs in polynomial time. We present preliminary results which demonstrate that the algorithm works, but is not competitive with state-of-the-art interior point methods. 1
3 0.14737134 189 nips-2003-Tree-structured Approximations by Expectation Propagation
Author: Yuan Qi, Tom Minka
Abstract: Approximation structure plays an important role in inference on loopy graphs. As a tractable structure, tree approximations have been utilized in the variational method of Ghahramani & Jordan (1997) and the sequential projection method of Frey et al. (2000). However, belief propagation represents each factor of the graph with a product of single-node messages. In this paper, belief propagation is extended to represent factors with tree approximations, by way of the expectation propagation framework. That is, each factor sends a “message” to all pairs of nodes in a tree structure. The result is more accurate inferences and more frequent convergence than ordinary belief propagation, at a lower cost than variational trees or double-loop algorithms. 1
4 0.14731492 32 nips-2003-Approximate Expectation Maximization
Author: Tom Heskes, Onno Zoeter, Wim Wiegerinck
Abstract: We discuss the integration of the expectation-maximization (EM) algorithm for maximum likelihood learning of Bayesian networks with belief propagation algorithms for approximate inference. Specifically we propose to combine the outer-loop step of convergent belief propagation algorithms with the M-step of the EM algorithm. This then yields an approximate EM algorithm that is essentially still double loop, with the important advantage of an inner loop that is guaranteed to converge. Simulations illustrate the merits of such an approach. 1
5 0.13005349 193 nips-2003-Variational Linear Response
Author: Manfred Opper, Ole Winther
Abstract: A general linear response method for deriving improved estimates of correlations in the variational Bayes framework is presented. Three applications are given and it is discussed how to use linear response as a general principle for improving mean field approximations.
6 0.12206578 96 nips-2003-Invariant Pattern Recognition by Semi-Definite Programming Machines
7 0.11308224 155 nips-2003-Perspectives on Sparse Bayesian Learning
8 0.10556773 94 nips-2003-Information Maximization in Noisy Channels : A Variational Approach
9 0.097616047 142 nips-2003-On the Concentration of Expectation and Approximate Inference in Layered Networks
10 0.089030102 64 nips-2003-Estimating Internal Variables and Paramters of a Learning Agent by a Particle Filter
11 0.088562056 177 nips-2003-Simplicial Mixtures of Markov Chains: Distributed Modelling of Dynamic User Profiles
12 0.08677955 152 nips-2003-Pairwise Clustering and Graphical Models
13 0.084868424 117 nips-2003-Linear Response for Approximate Inference
14 0.081364155 167 nips-2003-Robustness in Markov Decision Problems with Uncertain Transition Matrices
15 0.081145048 124 nips-2003-Max-Margin Markov Networks
16 0.080805793 51 nips-2003-Design of Experiments via Information Theory
17 0.078047037 169 nips-2003-Sample Propagation
18 0.077406339 107 nips-2003-Learning Spectral Clustering
19 0.073883787 30 nips-2003-Approximability of Probability Distributions
20 0.071871608 48 nips-2003-Convex Methods for Transduction
topicId topicWeight
[(0, -0.241), (1, -0.058), (2, -0.115), (3, 0.122), (4, 0.176), (5, -0.09), (6, 0.093), (7, 0.021), (8, -0.003), (9, -0.015), (10, -0.009), (11, -0.1), (12, 0.012), (13, 0.046), (14, -0.126), (15, 0.018), (16, -0.097), (17, 0.024), (18, -0.078), (19, -0.111), (20, -0.049), (21, -0.026), (22, -0.134), (23, -0.28), (24, 0.048), (25, -0.03), (26, 0.193), (27, 0.099), (28, -0.04), (29, 0.042), (30, -0.036), (31, 0.073), (32, 0.011), (33, 0.002), (34, -0.051), (35, 0.006), (36, -0.094), (37, 0.121), (38, -0.058), (39, 0.001), (40, -0.009), (41, -0.11), (42, 0.075), (43, -0.017), (44, -0.049), (45, 0.051), (46, 0.027), (47, 0.039), (48, 0.072), (49, -0.015)]
simIndex simValue paperId paperTitle
same-paper 1 0.95970869 174 nips-2003-Semidefinite Relaxations for Approximate Inference on Graphs with Cycles
Author: Michael I. Jordan, Martin J. Wainwright
Abstract: We present a new method for calculating approximate marginals for probability distributions defined by graphs with cycles, based on a Gaussian entropy bound combined with a semidefinite outer bound on the marginal polytope. This combination leads to a log-determinant maximization problem that can be solved by efficient interior point methods [8]. As with the Bethe approximation and its generalizations [12], the optimizing arguments of this problem can be taken as approximations to the exact marginals. In contrast to Bethe/Kikuchi approaches, our variational problem is strictly convex and so has a unique global optimum. An additional desirable feature is that the value of the optimal solution is guaranteed to provide an upper bound on the log partition function. In experimental trials, the performance of the log-determinant relaxation is comparable to or better than the sum-product algorithm, and by a substantial margin for certain problem classes. Finally, the zero-temperature limit of our log-determinant relaxation recovers a class of well-known semidefinite relaxations for integer programming [e.g., 3]. 1
2 0.77643514 171 nips-2003-Semi-Definite Programming by Perceptron Learning
Author: Thore Graepel, Ralf Herbrich, Andriy Kharechko, John S. Shawe-taylor
Abstract: We present a modified version of the perceptron learning algorithm (PLA) which solves semidefinite programs (SDPs) in polynomial time. The algorithm is based on the following three observations: (i) Semidefinite programs are linear programs with infinitely many (linear) constraints; (ii) every linear program can be solved by a sequence of constraint satisfaction problems with linear constraints; (iii) in general, the perceptron learning algorithm solves a constraint satisfaction problem with linear constraints in finitely many updates. Combining the PLA with a probabilistic rescaling algorithm (which, on average, increases the size of the feasable region) results in a probabilistic algorithm for solving SDPs that runs in polynomial time. We present preliminary results which demonstrate that the algorithm works, but is not competitive with state-of-the-art interior point methods. 1
3 0.50493139 32 nips-2003-Approximate Expectation Maximization
Author: Tom Heskes, Onno Zoeter, Wim Wiegerinck
Abstract: We discuss the integration of the expectation-maximization (EM) algorithm for maximum likelihood learning of Bayesian networks with belief propagation algorithms for approximate inference. Specifically we propose to combine the outer-loop step of convergent belief propagation algorithms with the M-step of the EM algorithm. This then yields an approximate EM algorithm that is essentially still double loop, with the important advantage of an inner loop that is guaranteed to converge. Simulations illustrate the merits of such an approach. 1
4 0.50112027 189 nips-2003-Tree-structured Approximations by Expectation Propagation
Author: Yuan Qi, Tom Minka
Abstract: Approximation structure plays an important role in inference on loopy graphs. As a tractable structure, tree approximations have been utilized in the variational method of Ghahramani & Jordan (1997) and the sequential projection method of Frey et al. (2000). However, belief propagation represents each factor of the graph with a product of single-node messages. In this paper, belief propagation is extended to represent factors with tree approximations, by way of the expectation propagation framework. That is, each factor sends a “message” to all pairs of nodes in a tree structure. The result is more accurate inferences and more frequent convergence than ordinary belief propagation, at a lower cost than variational trees or double-loop algorithms. 1
5 0.49684796 96 nips-2003-Invariant Pattern Recognition by Semi-Definite Programming Machines
Author: Thore Graepel, Ralf Herbrich
Abstract: Knowledge about local invariances with respect to given pattern transformations can greatly improve the accuracy of classification. Previous approaches are either based on regularisation or on the generation of virtual (transformed) examples. We develop a new framework for learning linear classifiers under known transformations based on semidefinite programming. We present a new learning algorithm— the Semidefinite Programming Machine (SDPM)—which is able to find a maximum margin hyperplane when the training examples are polynomial trajectories instead of single points. The solution is found to be sparse in dual variables and allows to identify those points on the trajectory with minimal real-valued output as virtual support vectors. Extensions to segments of trajectories, to more than one transformation parameter, and to learning with kernels are discussed. In experiments we use a Taylor expansion to locally approximate rotational invariance in pixel images from USPS and find improvements over known methods. 1
6 0.49424246 155 nips-2003-Perspectives on Sparse Bayesian Learning
7 0.46221116 74 nips-2003-Finding the M Most Probable Configurations using Loopy Belief Propagation
8 0.44795051 48 nips-2003-Convex Methods for Transduction
9 0.43945083 94 nips-2003-Information Maximization in Noisy Channels : A Variational Approach
10 0.43815595 30 nips-2003-Approximability of Probability Distributions
11 0.38821504 193 nips-2003-Variational Linear Response
12 0.36550948 142 nips-2003-On the Concentration of Expectation and Approximate Inference in Layered Networks
13 0.36378357 116 nips-2003-Linear Program Approximations for Factored Continuous-State Markov Decision Processes
14 0.35701269 152 nips-2003-Pairwise Clustering and Graphical Models
15 0.35392493 169 nips-2003-Sample Propagation
16 0.34343952 167 nips-2003-Robustness in Markov Decision Problems with Uncertain Transition Matrices
17 0.34317562 134 nips-2003-Near-Minimax Optimal Classification with Dyadic Classification Trees
18 0.32595107 168 nips-2003-Salient Boundary Detection using Ratio Contour
19 0.31870189 177 nips-2003-Simplicial Mixtures of Markov Chains: Distributed Modelling of Dynamic User Profiles
20 0.30062905 44 nips-2003-Can We Learn to Beat the Best Stock
topicId topicWeight
[(0, 0.073), (11, 0.011), (29, 0.011), (30, 0.018), (35, 0.072), (53, 0.074), (69, 0.014), (71, 0.077), (76, 0.04), (85, 0.092), (91, 0.083), (99, 0.356)]
simIndex simValue paperId paperTitle
1 0.93397498 167 nips-2003-Robustness in Markov Decision Problems with Uncertain Transition Matrices
Author: Arnab Nilim, Laurent El Ghaoui
Abstract: Optimal solutions to Markov Decision Problems (MDPs) are very sensitive with respect to the state transition probabilities. In many practical problems, the estimation of those probabilities is far from accurate. Hence, estimation errors are limiting factors in applying MDPs to realworld problems. We propose an algorithm for solving finite-state and finite-action MDPs, where the solution is guaranteed to be robust with respect to estimation errors on the state transition probabilities. Our algorithm involves a statistically accurate yet numerically efficient representation of uncertainty, via Kullback-Leibler divergence bounds. The worst-case complexity of the robust algorithm is the same as the original Bellman recursion. Hence, robustness can be added at practically no extra computing cost.
same-paper 2 0.89077532 174 nips-2003-Semidefinite Relaxations for Approximate Inference on Graphs with Cycles
Author: Michael I. Jordan, Martin J. Wainwright
Abstract: We present a new method for calculating approximate marginals for probability distributions defined by graphs with cycles, based on a Gaussian entropy bound combined with a semidefinite outer bound on the marginal polytope. This combination leads to a log-determinant maximization problem that can be solved by efficient interior point methods [8]. As with the Bethe approximation and its generalizations [12], the optimizing arguments of this problem can be taken as approximations to the exact marginals. In contrast to Bethe/Kikuchi approaches, our variational problem is strictly convex and so has a unique global optimum. An additional desirable feature is that the value of the optimal solution is guaranteed to provide an upper bound on the log partition function. In experimental trials, the performance of the log-determinant relaxation is comparable to or better than the sum-product algorithm, and by a substantial margin for certain problem classes. Finally, the zero-temperature limit of our log-determinant relaxation recovers a class of well-known semidefinite relaxations for integer programming [e.g., 3]. 1
3 0.77613437 191 nips-2003-Unsupervised Context Sensitive Language Acquisition from a Large Corpus
Author: Zach Solan, David Horn, Eytan Ruppin, Shimon Edelman
Abstract: We describe a pattern acquisition algorithm that learns, in an unsupervised fashion, a streamlined representation of linguistic structures from a plain natural-language corpus. This paper addresses the issues of learning structured knowledge from a large-scale natural language data set, and of generalization to unseen text. The implemented algorithm represents sentences as paths on a graph whose vertices are words (or parts of words). Significant patterns, determined by recursive context-sensitive statistical inference, form new vertices. Linguistic constructions are represented by trees composed of significant patterns and their associated equivalence classes. An input module allows the algorithm to be subjected to a standard test of English as a Second Language (ESL) proficiency. The results are encouraging: the model attains a level of performance considered to be “intermediate” for 9th-grade students, despite having been trained on a corpus (CHILDES) containing transcribed speech of parents directed to small children. 1
4 0.6722225 158 nips-2003-Policy Search by Dynamic Programming
Author: J. A. Bagnell, Sham M. Kakade, Jeff G. Schneider, Andrew Y. Ng
Abstract: We consider the policy search approach to reinforcement learning. We show that if a “baseline distribution” is given (indicating roughly how often we expect a good policy to visit each state), then we can derive a policy search algorithm that terminates in a finite number of steps, and for which we can provide non-trivial performance guarantees. We also demonstrate this algorithm on several grid-world POMDPs, a planar biped walking robot, and a double-pole balancing problem. 1
5 0.61767364 34 nips-2003-Approximate Policy Iteration with a Policy Language Bias
Author: Alan Fern, Sungwook Yoon, Robert Givan
Abstract: We explore approximate policy iteration, replacing the usual costfunction learning step with a learning step in policy space. We give policy-language biases that enable solution of very large relational Markov decision processes (MDPs) that no previous technique can solve. In particular, we induce high-quality domain-specific planners for classical planning domains (both deterministic and stochastic variants) by solving such domains as extremely large MDPs. 1
6 0.5885309 78 nips-2003-Gaussian Processes in Reinforcement Learning
7 0.58844733 116 nips-2003-Linear Program Approximations for Factored Continuous-State Markov Decision Processes
8 0.58411801 42 nips-2003-Bounded Finite State Controllers
9 0.57228458 121 nips-2003-Log-Linear Models for Label Ranking
10 0.57185352 189 nips-2003-Tree-structured Approximations by Expectation Propagation
11 0.56882018 101 nips-2003-Large Margin Classifiers: Convex Loss, Low Noise, and Convergence Rates
12 0.56134862 103 nips-2003-Learning Bounds for a Generalized Family of Bayesian Posterior Distributions
13 0.55591542 151 nips-2003-PAC-Bayesian Generic Chaining
14 0.54926246 30 nips-2003-Approximability of Probability Distributions
15 0.54244721 177 nips-2003-Simplicial Mixtures of Markov Chains: Distributed Modelling of Dynamic User Profiles
16 0.53851449 134 nips-2003-Near-Minimax Optimal Classification with Dyadic Classification Trees
17 0.53752476 48 nips-2003-Convex Methods for Transduction
18 0.53560525 63 nips-2003-Error Bounds for Transductive Learning via Compression and Clustering
19 0.53524172 50 nips-2003-Denoising and Untangling Graphs Using Degree Priors
20 0.53463233 146 nips-2003-Online Learning of Non-stationary Sequences