nips nips2012 nips2012-335 knowledge-graph by maker-knowledge-mining

335 nips-2012-The Bethe Partition Function of Log-supermodular Graphical Models


Source: pdf

Author: Nicholas Ruozzi

Abstract: Sudderth, Wainwright, and Willsky conjectured that the Bethe approximation corresponding to any fixed point of the belief propagation algorithm over an attractive, pairwise binary graphical model provides a lower bound on the true partition function. In this work, we resolve this conjecture in the affirmative by demonstrating that, for any graphical model with binary variables whose potential functions (not necessarily pairwise) are all log-supermodular, the Bethe partition function always lower bounds the true partition function. The proof of this result follows from a new variant of the “four functions” theorem that may be of independent interest. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 ch Abstract Sudderth, Wainwright, and Willsky conjectured that the Bethe approximation corresponding to any fixed point of the belief propagation algorithm over an attractive, pairwise binary graphical model provides a lower bound on the true partition function. [sent-3, score-0.737]

2 In this work, we resolve this conjecture in the affirmative by demonstrating that, for any graphical model with binary variables whose potential functions (not necessarily pairwise) are all log-supermodular, the Bethe partition function always lower bounds the true partition function. [sent-4, score-0.927]

3 The proof of this result follows from a new variant of the “four functions” theorem that may be of independent interest. [sent-5, score-0.26]

4 Computing the partition function of a given graphical model, a typical inference problem, is an NP-hard problem in general. [sent-7, score-0.321]

5 However, the Bethe partition function is only an approximation to the true partition function and need not provide an upper or lower bound. [sent-10, score-0.481]

6 In certain special cases, the Bethe approximation is conjectured to provide a lower bound on the true partition function. [sent-11, score-0.387]

7 One such example is the class of attractive pairwise graphical models: models in which the interaction between any two neighboring variables places a greater weight on assignments in which the two variables agree. [sent-12, score-0.335]

8 Many applications in computer vision and statistical physics can be expressed as attractive pairwise graphical models (e. [sent-13, score-0.376]

9 Sudderth, Wainwright, and Willsky [2] used a loop series expansion of Chertkov and Chernyak [3, 4] in order to study the fixed points of BP over attractive graphical models. [sent-16, score-0.358]

10 They provided conditions on the fixed points of BP under which the stationary points of the Bethe free energy function corresponding to these fixed points are a lower bound on the true partition function. [sent-17, score-0.403]

11 Empirically, they observed that, even when their conditions were not satisfied, the Bethe partition function appeared to lower bound the true partition function, and they conjectured that this is always the case for attractive pairwise binary graphical models. [sent-18, score-0.872]

12 Recent work on the relationship between the Bethe partition function and the graph covers of a given graphical model has suggested a new approach to resolving this conjecture. [sent-19, score-0.657]

13 Vontobel [5] demonstrated that the Bethe partition function can be precisely characterized by the average of the 1 true partition functions corresponding to graph covers of the base graphical model. [sent-20, score-0.992]

14 The primary contribution of the present work is to show that, for graphical models with log-supermodular potentials, the partition function associated with any graph cover of the base graph, appropriately normalized, must lower bound the true partition function. [sent-21, score-0.9]

15 As pairwise binary graphical models are log-supermodular if and only if they are attractive, combining our result with the observations of [5] resolves the conjecture of [2]. [sent-22, score-0.37]

16 The key element in our proof, and the second contribution of this work, is a new variant of the “four functions” theorem that is specific to log-supermodular functions. [sent-23, score-0.225]

17 As a final contribution, we demonstrate that our variant of the “four functions” theorem has applications beyond log-supermodular functions: as an example, we use it to show that the Bethe partition function can also provide a lower bound on the number of independent sets in a bipartite graph. [sent-27, score-0.601]

18 We say that f factors with respect to a hypergraph G = (V, A) where A ⊆ 2V , if there exist potential functions φi : {0, 1} → R≥0 for each i ∈ V and ψα : {0, 1}|α| → R≥0 for each α ∈ A such that f (x) = φi (xi ) i∈V ψα (xα ) α∈A where xα is the subvector of the vector x indexed by the set α. [sent-29, score-0.187]

19 We will express the hypergraph G as a bipartite graph that consists of a variable node for each i ∈ V , a factor node for each α ∈ A, and an edge joining the factor node corresponding to α to the variable node representing i if i ∈ α. [sent-30, score-0.751]

20 This is typically referred to as the factor graph representation of G. [sent-31, score-0.192]

21 A factorization of a function f : {0, 1}n → R≥0 over G = (V, A) is logsupermodular if for all α ∈ A, ψα (xα ) is log-supermodular. [sent-38, score-0.119]

22 Every function that admits a log-supermodular factorization is necessarily log-supermodular, products of log-supermodular functions are easily seen to be log-supermodular, but the converse may not be true outside of special cases. [sent-39, score-0.369]

23 If |α| ≤ 2 for each α ∈ A, then we call the factorization pairwise. [sent-40, score-0.119]

24 For any pairwise factorization, f is log-supermodular if and only if ψij is log-supermodular for each i and j. [sent-41, score-0.093]

25 Pairwise graphical models such that ψα (xα ) is log-supermodular for all α ∈ A are referred to as attractive graphical models. [sent-42, score-0.375]

26 A generalization of attractive interactions to the non-pairwise case is presented in [2]: for all α ∈ A, ψα , when appropriately normalized, has non-negative central moments. [sent-43, score-0.137]

27 1 Graph Covers Graph covers have played an important role in our understanding of graphical models [5, 6]. [sent-46, score-0.285]

28 Roughly, if a graph H covers a graph G, then H looks locally the same as G. [sent-47, score-0.462]

29 A graph H covers a graph G = (V, E) if there exists a graph homomorphism h : H → G such that for all vertices v ∈ G and all w ∈ h−1 (v), h maps the neighborhood ∂w of w in H bijectively to the neighborhood ∂v of v in G. [sent-50, score-0.75]

30 If h(w) = v, then we say that w ∈ H is a copy of v ∈ G. [sent-51, score-0.11]

31 The nodes in the cover are labeled for the node that they copy in the base graph. [sent-56, score-0.296]

32 For the factor graph corresponding to G = (V, A), each k-cover consists of a variable node for each of the k|V | variables, a factor node for each of the k|A| factors, and an edge joining each copy of α ∈ A to a distinct copy of each i ∈ α. [sent-58, score-0.688]

33 To any k-cover H = (VH , AH ) of G given by the homomorphism h, we can associate a collection of potentials: the potential at node i ∈ VH is equal to φh(i) , the potential at node h(i) ∈ G, and for each α ∈ AH , we associate the potential ψh(α) . [sent-59, score-0.325]

34 Notice that if f G admits a log-supermodular factorization over G and H is a k-cover of G, then f H admits a log-supermodular factorization over H. [sent-61, score-0.486]

35 2 Bethe Approximations For a function f : {0, 1}n → R≥0 that factorizes over G = (V, A), we are interested computing the partition function Z(G) = x f (x). [sent-63, score-0.241]

36 In general, this is an NP-hard problem, but in practice, algorithms, such as belief propagation, based on variational approximations produce reasonable estimates in many settings. [sent-64, score-0.09]

37 xi The fixed points of the belief propagation algorithm correspond to stationary points of log ZB (G, τ ) over T , the set of pseudomarginals [1], and the Bethe partition function is defined to be the maximum value achieved by this approximation over T : ZB (G) = max ZB (G, τ ). [sent-66, score-0.52]

38 τ ∈T For a fixed factor graph G, we are interested in the relationship between the true partition function, Z(G), and the Bethe approximation corresponding to G, ZB (G). [sent-67, score-0.483]

39 While, in general, ZB (G) can be either an upper or a lower bound on the true partition function, in this work, we address the following conjecture of [2]: Conjecture 2. [sent-68, score-0.428]

40 If f : {0, 1}n → R≥0 admits a pairwise, log-supermodular factorization over G = (V, A), then ZB (G) ≤ Z(G). [sent-70, score-0.243]

41 We resolve this conjecture in the affirmative, and show that it continues to hold for a larger class of log-supermodular functions. [sent-71, score-0.252]

42 Our results are based, primarily, on two observations: a variant of the “four functions” theorem [7] and the following, recent theorem of Vontobel [5]: 3 Theorem 2. [sent-72, score-0.374]

43 3 The “Four Functions” Theorem and Related Results The “four functions” theorem [7] is a general result concerning nonnegative functions over distributive lattices. [sent-80, score-0.401]

44 Many correlation inequalities from statistical physics, such as the FKG inequality, can be seen as special cases of this theorem [8]. [sent-81, score-0.184]

45 Let f1 , f2 , f3 , f4 : {0, 1}n → R≥0 be nonnegative real-valued functions. [sent-84, score-0.075]

46 x∈{0,1}n The following lemma is a direct consequence of the four functions theorem: Lemma 3. [sent-86, score-0.269]

47 The four functions theorem can be extended to more than four functions, by generalizing ∧ and ∨. [sent-89, score-0.389]

48 , xk ) be the vector whose j th component is the ith largest element of x1 , . [sent-96, score-0.315]

49 , xk )j = { a=1 xa ≥ i} where {· ≥ ·} is one if the j inequality is satisfied and zero otherwise. [sent-109, score-0.363]

50 The “four functions” theorem is then a special case of the more general “2k functions” theorem [9, 10, 11]: Theorem 3. [sent-110, score-0.298]

51 , gk : {0, 1}n → R≥0 be nonnegative real-valued functions. [sent-118, score-0.113]

52 , xk ∈ {0, 1}n , k k gi (xi ) ≤ i=1 fi (z i (x1 , . [sent-122, score-0.531]

53 3 would be to replace the product of functions on the left-hand side of Equation 1 with an arbitrary function over x1 , . [sent-128, score-0.09]

54 , xk : we will show that we can replace this product with an arbitrary log-supermodular function while preserving the conclusion of the theorem. [sent-131, score-0.286]

55 The key property of log-supermodular functions that makes this possible is the following lemma: 1 The proof of the theorem is demonstrated for “normal” factor graphs, but it easily extends to the factor graphs described above by replacing variable nodes with equality constraints. [sent-132, score-0.391]

56 The proof of our variant of the “2k functions theorem” uses the properties of weak majorizations: Definition 3. [sent-144, score-0.201]

57 For x, y ∈ Rn , x w y if and only if increasing, and convex functions g : R → R. [sent-158, score-0.09]

58 We now state and prove our variant of the 2k functions theorem in two pieces. [sent-166, score-0.315]

59 , fk : {0, 1} → R≥0 and g : {0, 1}k → R≥0 be nonnegative real-valued functions such that g is log-supermodular. [sent-172, score-0.227]

60 , v ∈ X c , we will show how to construct distinct vectors v 1 , . [sent-258, score-0.082]

61 c As our construction will work for any choice of distinct vectors v 1 , . [sent-266, score-0.082]

62 , v T ∈ X c , it will work, T in particular, for the T distinct vectors in X c that maximize t=1 g(v t ), and the lemma will then follow as a consequence of our previous arguments. [sent-269, score-0.186]

63 Intuitively, the first row of A corresponds to the row of A with the most nonzero elements, the second row of A corresponds to the row of A with the second largest number of nonzero elements, and so on. [sent-282, score-0.078]

64 , v T are distinct vectors in X c and that, by construction, z j (z t (v 1 , . [sent-290, score-0.082]

65 , v T )) = t=1 f (v t ) t=1 where the equality follows from the definition of f as a product of the fi . [sent-315, score-0.196]

66 , v T )) = t=1 f (v t ) t=1 and the lemma follows as a consequence . [sent-347, score-0.104]

67 In the case that n = 1 and k ≥ 1, this lemma is a more general result than the 2k functions theorem: if g(x1 , . [sent-349, score-0.167]

68 As in the proof of the 2k functions theorem, the general theorem for n ≥ 1 follows by induction on n. [sent-356, score-0.318]

69 This inductive proof closely follows the inductive argument in the proof of the “four functions” theorem described in [8] with the added observation that marginals of log-supermodular functions continue to be log-supermodular. [sent-357, score-0.377]

70 , fk : {0, 1}n → R≥0 and g : {0, 1}kn → R≥0 be nonnegative real-valued functions such that g is log-supermodular. [sent-363, score-0.227]

71 , fk : {0, 1}n → R≥0 and g : {0, 1}kn → R≥0 be nonnegative real-valued functions such that g is log-supermodular. [sent-387, score-0.227]

72 Define f : {0, 1}n−1 → R≥0 and g : {0, 1}k(n−1) → R≥0 as fi (y) = fi (y, 0) + fi (y, 1) 1 g (y , . [sent-388, score-0.588]

73 , y k ∈ {0, 1}n−1 and define f : {0, 1} → R≥0 and g : {0, 1}k → R≥0 as f i (s) = fi (z i (y 1 , . [sent-412, score-0.196]

74 In addition, we show that the theorem can be applied more generally to yield similar results for a class of functions that can be converted into log-supermodular functions by a change of variables. [sent-452, score-0.357]

75 1 Log-supermodularity and Graph Covers The following theorem follows easily from Theorem 3. [sent-454, score-0.149]

76 If f G : {0, 1}n → R≥0 admits a log-supermodular factorization over G = (V, A), then for any k-cover, H, of G, Z(H) ≤ Z(G)k . [sent-457, score-0.243]

77 , Sk such that each set contains exactly one copy of each vertex i ∈ V . [sent-463, score-0.147]

78 Let the assignments to the variables in the set Si be denoted by the vector xi . [sent-464, score-0.08]

79 i For each α ∈ A, let yα denote the assignment to the ith copy of α by the elements of x1 , . [sent-465, score-0.139]

80 ,xk xi This theorem settles the conjecture of [2] for any log-supermodular function that admits a pairwise binary factorization, and the conjecture continues to hold for any graphical model that admits a log-supermodular factorization. [sent-495, score-1.022]

81 If f : {0, 1}n → R≥0 admits a log-supermodular factorization over G = (V, A), then ZB (G) ≤ Z(G). [sent-498, score-0.243]

82 As the value of the Bethe approximation at any of the fixed points of BP is always a lower bound on ZB (G), the conclusion of the corollary holds for any fixed point of the BP algorithm as well. [sent-503, score-0.127]

83 An independent set, I ⊆ V , in G is a subset of the vertices such that no two adjacent vertices are in I. [sent-508, score-0.094]

84 , x|V | ) = (1 − xi xj ) (i,j)∈E which is equal to one if the nonzero xi ’s define an independent set and zero otherwise. [sent-512, score-0.199]

85 As every potential function depends on at most two variables, I G factorizes over the graph G = (V, E). [sent-513, score-0.248]

86 In this section, we will focus on bipartite graphs: G = (V, E) is bipartite if we can partition the vertex set into two sets A ⊆ V and B = V \ A such that A and B are independent sets. [sent-515, score-0.481]

87 Examples of bipartite graphs include single cycles, trees, and grid graphs. [sent-516, score-0.171]

88 We will denote bipartite graphs as G = (A, B, E). [sent-517, score-0.171]

89 For any bipartite graph G = (A, B, E), I G can be converted into a log-supermodular graphical model by a simple change of variables. [sent-518, score-0.444]

90 Define ya = xa for all a ∈ A and yb = 1 − xb for all b ∈ B. [sent-519, score-0.141]

91 , x|V | ) = (1 − xi xj ) (i,j)∈E (1 − ya (1 − yb )) = (a,b)∈E,a∈A,b∈B G I (y1 , . [sent-523, score-0.18]

92 G I admits a log-supermodular factorization over G and G y I (y) = x I G (x). [sent-527, score-0.243]

93 Similarly, for any H graph cover H of G, we have y I (y) = x I H (x). [sent-528, score-0.217]

94 Similar observations can be used to show that the Bethe partition function provides a lower bound on the true partition function for other problems that factor over pairwise bipartite graphical models (e. [sent-531, score-0.863]

95 1 shows that the maximum value of the objective function on any graph cover is achieved by a lift of a maximizing assignment on the base graph. [sent-537, score-0.267]

96 Related work on the Bethe approximation for permanents suggests that conjectures similar to those discussed above can be made for other classes of functions [13, 14]. [sent-539, score-0.128]

97 While, like the “four functions” theorem, many of the above results can be extended to general distributive lattices, understanding when similar results may hold for non-binary problems may be of interest for graphical models that arise in certain application areas such as computer vision. [sent-540, score-0.186]

98 Loop series and Bethe variational bounds in attractive graphical models. [sent-558, score-0.307]

99 Counting in graph covers: A combinatorial characterization of the Bethe entropy function. [sent-581, score-0.155]

100 Unleashing the power of Schrijver’s permanental inequality with the help of the Bethe approximation. [sent-633, score-0.101]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('bethe', 0.542), ('xk', 0.286), ('zb', 0.283), ('fi', 0.196), ('partition', 0.188), ('graph', 0.155), ('covers', 0.152), ('theorem', 0.149), ('conjecture', 0.144), ('graphical', 0.133), ('vontobel', 0.13), ('bipartite', 0.128), ('admits', 0.124), ('factorization', 0.119), ('copy', 0.11), ('attractive', 0.109), ('pairwise', 0.093), ('functions', 0.09), ('sk', 0.083), ('bp', 0.083), ('xi', 0.08), ('lemma', 0.077), ('resolve', 0.077), ('variant', 0.076), ('nonnegative', 0.075), ('four', 0.075), ('gc', 0.074), ('node', 0.074), ('propagation', 0.067), ('conjectured', 0.065), ('chertkov', 0.065), ('majorizations', 0.065), ('permanental', 0.065), ('rinott', 0.065), ('cover', 0.062), ('fk', 0.062), ('sudderth', 0.061), ('hypergraph', 0.057), ('homomorphism', 0.057), ('belief', 0.057), ('kn', 0.056), ('loop', 0.055), ('vh', 0.053), ('distributive', 0.053), ('willsky', 0.053), ('yb', 0.053), ('factorizes', 0.053), ('distinct', 0.05), ('base', 0.05), ('gi', 0.049), ('ya', 0.047), ('vertices', 0.047), ('induction', 0.044), ('ah', 0.043), ('rmative', 0.043), ('graphs', 0.043), ('joining', 0.041), ('xa', 0.041), ('allerton', 0.041), ('physics', 0.041), ('potential', 0.04), ('nonzero', 0.039), ('wainwright', 0.039), ('approximation', 0.038), ('gk', 0.038), ('factor', 0.037), ('vertex', 0.037), ('ising', 0.037), ('true', 0.036), ('inequality', 0.036), ('notice', 0.036), ('inequalities', 0.035), ('proof', 0.035), ('temperature', 0.034), ('concerning', 0.034), ('inductive', 0.034), ('variational', 0.033), ('vectors', 0.032), ('series', 0.032), ('stationary', 0.032), ('continues', 0.031), ('lower', 0.031), ('wj', 0.03), ('corr', 0.03), ('potentials', 0.029), ('relationship', 0.029), ('rn', 0.029), ('points', 0.029), ('ith', 0.029), ('ldpc', 0.029), ('permanent', 0.029), ('unleashing', 0.029), ('marshall', 0.029), ('lattices', 0.029), ('bijectively', 0.029), ('exponentiation', 0.029), ('bound', 0.029), ('converted', 0.028), ('appropriately', 0.028), ('consequence', 0.027)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0 335 nips-2012-The Bethe Partition Function of Log-supermodular Graphical Models

Author: Nicholas Ruozzi

Abstract: Sudderth, Wainwright, and Willsky conjectured that the Bethe approximation corresponding to any fixed point of the belief propagation algorithm over an attractive, pairwise binary graphical model provides a lower bound on the true partition function. In this work, we resolve this conjecture in the affirmative by demonstrating that, for any graphical model with binary variables whose potential functions (not necessarily pairwise) are all log-supermodular, the Bethe partition function always lower bounds the true partition function. The proof of this result follows from a new variant of the “four functions” theorem that may be of independent interest. 1

2 0.31441045 213 nips-2012-Minimization of Continuous Bethe Approximations: A Positive Variation

Author: Jason Pacheco, Erik B. Sudderth

Abstract: We develop convergent minimization algorithms for Bethe variational approximations which explicitly constrain marginal estimates to families of valid distributions. While existing message passing algorithms define fixed point iterations corresponding to stationary points of the Bethe free energy, their greedy dynamics do not distinguish between local minima and maxima, and can fail to converge. For continuous estimation problems, this instability is linked to the creation of invalid marginal estimates, such as Gaussians with negative variance. Conversely, our approach leverages multiplier methods with well-understood convergence properties, and uses bound projection methods to ensure that marginal approximations are valid at all iterations. We derive general algorithms for discrete and Gaussian pairwise Markov random fields, showing improvements over standard loopy belief propagation. We also apply our method to a hybrid model with both discrete and continuous variables, showing improvements over expectation propagation. 1

3 0.26324677 300 nips-2012-Scalable nonconvex inexact proximal splitting

Author: Suvrit Sra

Abstract: We study a class of large-scale, nonsmooth, and nonconvex optimization problems. In particular, we focus on nonconvex problems with composite objectives. This class includes the extensively studied class of convex composite objective problems as a subclass. To solve composite nonconvex problems we introduce a powerful new framework based on asymptotically nonvanishing errors, avoiding the common stronger assumption of vanishing errors. Within our new framework we derive both batch and incremental proximal splitting algorithms. To our knowledge, our work is first to develop and analyze incremental nonconvex proximalsplitting algorithms, even if we were to disregard the ability to handle nonvanishing errors. We illustrate one instance of our general framework by showing an application to large-scale nonsmooth matrix factorization. 1

4 0.16168031 120 nips-2012-Exact and Stable Recovery of Sequences of Signals with Sparse Increments via Differential 1-Minimization

Author: Demba Ba, Behtash Babadi, Patrick Purdon, Emery Brown

Abstract: We consider the problem of recovering a sequence of vectors, (xk )K , for which k=0 the increments xk − xk−1 are Sk -sparse (with Sk typically smaller than S1 ), based on linear measurements (yk = Ak xk + ek )K , where Ak and ek denote the meak=1 surement matrix and noise, respectively. Assuming each Ak obeys the restricted isometry property (RIP) of a certain order—depending only on Sk —we show that in the absence of noise a convex program, which minimizes the weighted sum of the ℓ1 -norm of successive differences subject to the linear measurement constraints, recovers the sequence (xk )K exactly. This is an interesting result bek=1 cause this convex program is equivalent to a standard compressive sensing problem with a highly-structured aggregate measurement matrix which does not satisfy the RIP requirements in the standard sense, and yet we can achieve exact recovery. In the presence of bounded noise, we propose a quadratically-constrained convex program for recovery and derive bounds on the reconstruction error of the sequence. We supplement our theoretical analysis with simulations and an application to real video data. These further support the validity of the proposed approach for acquisition and recovery of signals with time-varying sparsity.

5 0.14866747 282 nips-2012-Proximal Newton-type methods for convex optimization

Author: Jason Lee, Yuekai Sun, Michael Saunders

Abstract: We seek to solve convex optimization problems in composite form: minimize f (x) := g(x) + h(x), n x∈R where g is convex and continuously differentiable and h : Rn → R is a convex but not necessarily differentiable function whose proximal mapping can be evaluated efficiently. We derive a generalization of Newton-type methods to handle such convex but nonsmooth objective functions. We prove such methods are globally convergent and achieve superlinear rates of convergence in the vicinity of an optimal solution. We also demonstrate the performance of these methods using problems of relevance in machine learning and statistics. 1

6 0.13362944 326 nips-2012-Structure estimation for discrete graphical models: Generalized covariance matrices and their inverses

7 0.11816145 96 nips-2012-Density Propagation and Improved Bounds on the Partition Function

8 0.1158009 145 nips-2012-Gradient Weights help Nonparametric Regressors

9 0.1122321 248 nips-2012-Nonparanormal Belief Propagation (NPNBP)

10 0.11171574 180 nips-2012-Learning Mixtures of Tree Graphical Models

11 0.1065333 172 nips-2012-Latent Graphical Model Selection: Efficient Methods for Locally Tree-like Graphs

12 0.10459798 27 nips-2012-A quasi-Newton proximal splitting method

13 0.10275952 206 nips-2012-Majorization for CRFs and Latent Likelihoods

14 0.09158282 340 nips-2012-The representer theorem for Hilbert spaces: a necessary and sufficient condition

15 0.090698257 147 nips-2012-Graphical Models via Generalized Linear Models

16 0.084594168 115 nips-2012-Efficient high dimensional maximum entropy modeling via symmetric partition functions

17 0.081677534 260 nips-2012-Online Sum-Product Computation Over Trees

18 0.081657022 106 nips-2012-Dynamical And-Or Graph Learning for Object Shape Modeling and Detection

19 0.076168872 69 nips-2012-Clustering Sparse Graphs

20 0.074140303 20 nips-2012-A Stochastic Gradient Method with an Exponential Convergence Rate for Finite Training Sets


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.205), (1, 0.066), (2, 0.153), (3, -0.116), (4, -0.009), (5, 0.023), (6, -0.01), (7, -0.188), (8, 0.044), (9, 0.125), (10, -0.14), (11, 0.046), (12, 0.102), (13, -0.032), (14, -0.172), (15, -0.146), (16, 0.004), (17, 0.018), (18, -0.076), (19, 0.147), (20, -0.113), (21, -0.122), (22, -0.096), (23, -0.04), (24, 0.016), (25, 0.138), (26, 0.078), (27, 0.048), (28, -0.009), (29, -0.015), (30, -0.009), (31, 0.019), (32, 0.016), (33, 0.023), (34, 0.055), (35, 0.007), (36, -0.026), (37, -0.054), (38, -0.094), (39, 0.043), (40, 0.039), (41, -0.021), (42, -0.021), (43, -0.013), (44, 0.067), (45, 0.139), (46, -0.057), (47, -0.015), (48, 0.031), (49, -0.007)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9478789 335 nips-2012-The Bethe Partition Function of Log-supermodular Graphical Models

Author: Nicholas Ruozzi

Abstract: Sudderth, Wainwright, and Willsky conjectured that the Bethe approximation corresponding to any fixed point of the belief propagation algorithm over an attractive, pairwise binary graphical model provides a lower bound on the true partition function. In this work, we resolve this conjecture in the affirmative by demonstrating that, for any graphical model with binary variables whose potential functions (not necessarily pairwise) are all log-supermodular, the Bethe partition function always lower bounds the true partition function. The proof of this result follows from a new variant of the “four functions” theorem that may be of independent interest. 1

2 0.73100394 213 nips-2012-Minimization of Continuous Bethe Approximations: A Positive Variation

Author: Jason Pacheco, Erik B. Sudderth

Abstract: We develop convergent minimization algorithms for Bethe variational approximations which explicitly constrain marginal estimates to families of valid distributions. While existing message passing algorithms define fixed point iterations corresponding to stationary points of the Bethe free energy, their greedy dynamics do not distinguish between local minima and maxima, and can fail to converge. For continuous estimation problems, this instability is linked to the creation of invalid marginal estimates, such as Gaussians with negative variance. Conversely, our approach leverages multiplier methods with well-understood convergence properties, and uses bound projection methods to ensure that marginal approximations are valid at all iterations. We derive general algorithms for discrete and Gaussian pairwise Markov random fields, showing improvements over standard loopy belief propagation. We also apply our method to a hybrid model with both discrete and continuous variables, showing improvements over expectation propagation. 1

3 0.65528685 300 nips-2012-Scalable nonconvex inexact proximal splitting

Author: Suvrit Sra

Abstract: We study a class of large-scale, nonsmooth, and nonconvex optimization problems. In particular, we focus on nonconvex problems with composite objectives. This class includes the extensively studied class of convex composite objective problems as a subclass. To solve composite nonconvex problems we introduce a powerful new framework based on asymptotically nonvanishing errors, avoiding the common stronger assumption of vanishing errors. Within our new framework we derive both batch and incremental proximal splitting algorithms. To our knowledge, our work is first to develop and analyze incremental nonconvex proximalsplitting algorithms, even if we were to disregard the ability to handle nonvanishing errors. We illustrate one instance of our general framework by showing an application to large-scale nonsmooth matrix factorization. 1

4 0.63424271 96 nips-2012-Density Propagation and Improved Bounds on the Partition Function

Author: Stefano Ermon, Ashish Sabharwal, Bart Selman, Carla P. Gomes

Abstract: Given a probabilistic graphical model, its density of states is a distribution that, for any likelihood value, gives the number of configurations with that probability. We introduce a novel message-passing algorithm called Density Propagation (DP) for estimating this distribution. We show that DP is exact for tree-structured graphical models and is, in general, a strict generalization of both sum-product and max-product algorithms. Further, we use density of states and tree decomposition to introduce a new family of upper and lower bounds on the partition function. For any tree decomposition, the new upper bound based on finer-grained density of state information is provably at least as tight as previously known bounds based on convexity of the log-partition function, and strictly stronger if a general condition holds. We conclude with empirical evidence of improvement over convex relaxations and mean-field based bounds. 1

5 0.59124082 120 nips-2012-Exact and Stable Recovery of Sequences of Signals with Sparse Increments via Differential 1-Minimization

Author: Demba Ba, Behtash Babadi, Patrick Purdon, Emery Brown

Abstract: We consider the problem of recovering a sequence of vectors, (xk )K , for which k=0 the increments xk − xk−1 are Sk -sparse (with Sk typically smaller than S1 ), based on linear measurements (yk = Ak xk + ek )K , where Ak and ek denote the meak=1 surement matrix and noise, respectively. Assuming each Ak obeys the restricted isometry property (RIP) of a certain order—depending only on Sk —we show that in the absence of noise a convex program, which minimizes the weighted sum of the ℓ1 -norm of successive differences subject to the linear measurement constraints, recovers the sequence (xk )K exactly. This is an interesting result bek=1 cause this convex program is equivalent to a standard compressive sensing problem with a highly-structured aggregate measurement matrix which does not satisfy the RIP requirements in the standard sense, and yet we can achieve exact recovery. In the presence of bounded noise, we propose a quadratically-constrained convex program for recovery and derive bounds on the reconstruction error of the sequence. We supplement our theoretical analysis with simulations and an application to real video data. These further support the validity of the proposed approach for acquisition and recovery of signals with time-varying sparsity.

6 0.56023222 248 nips-2012-Nonparanormal Belief Propagation (NPNBP)

7 0.53351629 282 nips-2012-Proximal Newton-type methods for convex optimization

8 0.53030252 27 nips-2012-A quasi-Newton proximal splitting method

9 0.52566844 147 nips-2012-Graphical Models via Generalized Linear Models

10 0.51392484 85 nips-2012-Convergence and Energy Landscape for Cheeger Cut Clustering

11 0.51303267 340 nips-2012-The representer theorem for Hilbert spaces: a necessary and sufficient condition

12 0.45045865 206 nips-2012-Majorization for CRFs and Latent Likelihoods

13 0.43189096 326 nips-2012-Structure estimation for discrete graphical models: Generalized covariance matrices and their inverses

14 0.42782488 123 nips-2012-Exponential Concentration for Mutual Information Estimation with Application to Forests

15 0.42364413 331 nips-2012-Symbolic Dynamic Programming for Continuous State and Observation POMDPs

16 0.41446483 20 nips-2012-A Stochastic Gradient Method with an Exponential Convergence Rate for Finite Training Sets

17 0.41390899 106 nips-2012-Dynamical And-Or Graph Learning for Object Shape Modeling and Detection

18 0.40734634 145 nips-2012-Gradient Weights help Nonparametric Regressors

19 0.40394074 302 nips-2012-Scaling MPE Inference for Constrained Continuous Markov Random Fields with Consensus Optimization

20 0.38611236 180 nips-2012-Learning Mixtures of Tree Graphical Models


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.05), (21, 0.042), (38, 0.174), (39, 0.033), (42, 0.01), (54, 0.029), (55, 0.012), (74, 0.06), (76, 0.14), (80, 0.073), (83, 0.202), (92, 0.065), (98, 0.011)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.86352974 146 nips-2012-Graphical Gaussian Vector for Image Categorization

Author: Tatsuya Harada, Yasuo Kuniyoshi

Abstract: This paper proposes a novel image representation called a Graphical Gaussian Vector (GGV), which is a counterpart of the codebook and local feature matching approaches. We model the distribution of local features as a Gaussian Markov Random Field (GMRF) which can efficiently represent the spatial relationship among local features. Using concepts of information geometry, proper parameters and a metric from the GMRF can be obtained. Then we define a new image feature by embedding the proper metric into the parameters, which can be directly applied to scalable linear classifiers. We show that the GGV obtains better performance over the state-of-the-art methods in the standard object recognition datasets and comparable performance in the scene dataset. 1

same-paper 2 0.84481031 335 nips-2012-The Bethe Partition Function of Log-supermodular Graphical Models

Author: Nicholas Ruozzi

Abstract: Sudderth, Wainwright, and Willsky conjectured that the Bethe approximation corresponding to any fixed point of the belief propagation algorithm over an attractive, pairwise binary graphical model provides a lower bound on the true partition function. In this work, we resolve this conjecture in the affirmative by demonstrating that, for any graphical model with binary variables whose potential functions (not necessarily pairwise) are all log-supermodular, the Bethe partition function always lower bounds the true partition function. The proof of this result follows from a new variant of the “four functions” theorem that may be of independent interest. 1

3 0.77221805 178 nips-2012-Learning Label Trees for Probabilistic Modelling of Implicit Feedback

Author: Andriy Mnih, Yee W. Teh

Abstract: User preferences for items can be inferred from either explicit feedback, such as item ratings, or implicit feedback, such as rental histories. Research in collaborative filtering has concentrated on explicit feedback, resulting in the development of accurate and scalable models. However, since explicit feedback is often difficult to collect it is important to develop effective models that take advantage of the more widely available implicit feedback. We introduce a probabilistic approach to collaborative filtering with implicit feedback based on modelling the user’s item selection process. In the interests of scalability, we restrict our attention to treestructured distributions over items and develop a principled and efficient algorithm for learning item trees from data. We also identify a problem with a widely used protocol for evaluating implicit feedback models and propose a way of addressing it using a small quantity of explicit feedback data. 1

4 0.76944649 149 nips-2012-Hierarchical Optimistic Region Selection driven by Curiosity

Author: Odalric-ambrym Maillard

Abstract: This paper aims to take a step forwards making the term “intrinsic motivation” from reinforcement learning theoretically well founded, focusing on curiositydriven learning. To that end, we consider the setting where, a fixed partition P of a continuous space X being given, and a process ν defined on X being unknown, we are asked to sequentially decide which cell of the partition to select as well as where to sample ν in that cell, in order to minimize a loss function that is inspired from previous work on curiosity-driven learning. The loss on each cell consists of one term measuring a simple worst case quadratic sampling error, and a penalty term proportional to the range of the variance in that cell. The corresponding problem formulation extends the setting known as active learning for multi-armed bandits to the case when each arm is a continuous region, and we show how an adaptation of recent algorithms for that problem and of hierarchical optimistic sampling algorithms for optimization can be used in order to solve this problem. The resulting procedure, called Hierarchical Optimistic Region SElection driven by Curiosity (HORSE.C) is provided together with a finite-time regret analysis. 1

5 0.7685371 83 nips-2012-Controlled Recognition Bounds for Visual Learning and Exploration

Author: Vasiliy Karasev, Alessandro Chiuso, Stefano Soatto

Abstract: We describe the tradeoff between the performance in a visual recognition problem and the control authority that the agent can exercise on the sensing process. We focus on the problem of “visual search” of an object in an otherwise known and static scene, propose a measure of control authority, and relate it to the expected risk and its proxy (conditional entropy of the posterior density). We show this analytically, as well as empirically by simulation using the simplest known model that captures the phenomenology of image formation, including scaling and occlusions. We show that a “passive” agent given a training set can provide no guarantees on performance beyond what is afforded by the priors, and that an “omnipotent” agent, capable of infinite control authority, can achieve arbitrarily good performance (asymptotically). In between these limiting cases, the tradeoff can be characterized empirically. 1

6 0.76744252 13 nips-2012-A Nonparametric Conjugate Prior Distribution for the Maximizing Argument of a Noisy Function

7 0.764332 199 nips-2012-Link Prediction in Graphs with Autoregressive Features

8 0.76310378 114 nips-2012-Efficient coding provides a direct link between prior and likelihood in perceptual Bayesian inference

9 0.76260942 260 nips-2012-Online Sum-Product Computation Over Trees

10 0.76224583 333 nips-2012-Synchronization can Control Regularization in Neural Systems via Correlated Noise Processes

11 0.760647 113 nips-2012-Efficient and direct estimation of a neural subunit model for sensory coding

12 0.75962782 112 nips-2012-Efficient Spike-Coding with Multiplicative Adaptation in a Spike Response Model

13 0.75954342 125 nips-2012-Factoring nonnegative matrices with linear programs

14 0.75886041 187 nips-2012-Learning curves for multi-task Gaussian process regression

15 0.75857913 236 nips-2012-Near-Optimal MAP Inference for Determinantal Point Processes

16 0.75802082 111 nips-2012-Efficient Sampling for Bipartite Matching Problems

17 0.75777149 96 nips-2012-Density Propagation and Improved Bounds on the Partition Function

18 0.75746453 38 nips-2012-Algorithms for Learning Markov Field Policies

19 0.7574383 6 nips-2012-A Convex Formulation for Learning Scale-Free Networks via Submodular Relaxation

20 0.75720096 316 nips-2012-Small-Variance Asymptotics for Exponential Family Dirichlet Process Mixture Models