nips nips2008 nips2008-69 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Nicol N. Schraudolph, Dmitry Kamenetsky
Abstract: We give polynomial-time algorithms for the exact computation of lowest-energy states, worst margin violators, partition functions, and marginals in certain binary undirected graphical models. Our approach provides an interesting alternative to the well-known graph cut paradigm in that it does not impose any submodularity constraints; instead we require planarity to establish a correspondence with perfect matchings in an expanded dual graph. Maximum-margin parameter estimation for a boundary detection task shows our approach to be efficient and effective. A C++ implementation is available from http://nic.schraudolph.org/isinf/. 1
Reference: text
sentIndex sentText sentNum sentScore
1 Our approach provides an interesting alternative to the well-known graph cut paradigm in that it does not impose any submodularity constraints; instead we require planarity to establish a correspondence with perfect matchings in an expanded dual graph. [sent-7, score-1.039]
2 1 Introduction Undirected graphical models are a popular tool in machine learning; they represent real-valued energy functions of the form E (y) := Ei (yi ) + i∈V Eij (yi , yj ) , (1) (i,j)∈E where the terms in the first sum range over the nodes V = {1, 2, . [sent-12, score-0.292]
3 n}, and those in the second sum over the edges E ⊆ V × V of an undirected graph G(V, E). [sent-15, score-0.352]
4 The junction tree decomposition provides an efficient framework for exact statistical inference in graphs that are (or can be turned into) trees of small cliques. [sent-16, score-0.13]
5 1 The Ising Model Efficient exact inference is possible in certain graphical models with binary node labels. [sent-23, score-0.195]
6 Here we focus on Ising models, whose energy functions have the form E : {0, 1}n → R with E(y) := [yi = yj ] Eij , (2) (i,j)∈E where [·] denotes the indicator function, i. [sent-24, score-0.186]
7 Compared to the general model (1) for binary nodes, (2) imposes two additional restrictions: zero node energies, and edge energies in the form of disagreement costs. [sent-27, score-0.404]
8 Proof by construction: Two energy functions are equivalent if they differ only by a constant. [sent-32, score-0.136]
9 Given an energy function of the form (1), construct an Ising model with disagreement costs as follows: 1. [sent-34, score-0.358]
10 For each node energy function Ei (yi ), add a disagreement cost term E0i := Ei (1)−Ei (0); 2. [sent-35, score-0.399]
11 For each edge energy function Eij (yi , yj ), add the three disagreement cost terms Eij := 1 [(Eij (0, 1) + Eij (1, 0)) − (Eij (0, 0) + Eij (1, 1))], 2 E0i := Eij (1, 0) − Eij (0, 0) − Eij , and (3) E0j := Eij (0, 1) − Eij (0, 0) − Eij . [sent-36, score-0.44]
12 Summing the above terms, the total bias of node i (i. [sent-37, score-0.129]
13 , its disagreement cost with the bias node) is E0i = Ei (1) − Ei (0) + [Eij (1, 0) − Eij (0, 0) − Eij ] . [sent-39, score-0.224]
14 (4) j:(i,j)∈E This construction defines an Ising model whose energy in every configuration y is shifted, relative to that of the general model we started with, by the same constant amount, namely E (0): ∀y ∈ {0, 1}n : E 0 y = E (y) − Ei (0) − i∈V Eij (0, 0) = E (y) − E (0). [sent-40, score-0.163]
15 (5) (i,j)∈E The two models’ energy functions are therefore equivalent. [sent-41, score-0.136]
16 2 Energy Minimization via Graph Cuts Definition 2 The cut C of a binary graphical model G(V, E) induced by state y ∈ {0, 1}n is the set C(y) := {(i, j) ∈ E : yi = yj }; its weight |C(y)| is the sum of the weights of its edges. [sent-44, score-0.354]
17 Any given state y partitions the nodes of a binary graphical model into two sets: those labeled ‘0’, and those labeled ‘1’. [sent-45, score-0.186]
18 The corresponding graph cut is the set of edges crossing the partition; since only they contribute disagreement costs to the Ising model (2), we have ∀y : |C(y)| = E(y). [sent-46, score-0.683]
19 Minimum-weight cuts can be computed in polynomial time in graphs whose edge weights are all non-negative. [sent-48, score-0.248]
20 Introducing one more node, with the constraint yn+1 := 1, allows us to construct an equivalent energy function by replacing each negatively weighted bias edge E0i < 0 by an edge to the new node n + 1 with the positive weight Ei,n+1 := −E0i > 0 (Figure 1c). [sent-49, score-0.415]
21 This submodularity constraint implies that agreement between nodes must be locally preferable to disagreement — a severe limitation. [sent-51, score-0.345]
22 (In directed graphs, the weight of a cut is the sum of the weights of only those edges crossing the cut in a given direction. [sent-54, score-0.383]
23 2 Planarity Unlike graph cut methods, the inference algorithms we describe below do not depend on submodularity; instead they require that the model graph be planar, and that a planar embedding be provided. [sent-56, score-1.2]
24 For each vertex i ∈ V, let Ei denote the set of edges in E incident upon i, considered as being oriented away from i, and let πi be a cyclic permutation of Ei . [sent-59, score-0.143]
25 Rotation systems [2] directly correspond to topological graph embeddings in orientable surfaces: Theorem 4 (White and Beineke [2], p. [sent-61, score-0.299]
26 Note that while in graph visualisation “embedding” is often used as a synonym for “drawing”, in modern topological graph theory it stands for “rotation system”. [sent-63, score-0.485]
27 We adopt the latter usage, which views embeddings as equivalence classes of graph drawings characterized by identical cyclic ordering of the edges incident upon each vertex. [sent-64, score-0.359]
28 The genus g of the embedding surface S can be determined from the Euler characteristic |V| − |E| + |F| = 2 − 2g, (6) where |F| is found by counting the orbits of the rotation system, as described in Theorem 4. [sent-67, score-0.34]
29 Since planar graphs are exactly those that can be embedded on a surface of genus g = 0 (a topological sphere), we arrive at a purely combinatorial definition of planarity: Definition 5 A graph G(V, E) is planar iff it has a rotation system Π producing exactly 2 + |E| − |V| orbits. [sent-68, score-1.494]
30 Such a system is called a planar embedding of G, and G(V, E, Π) is called a plane graph. [sent-69, score-0.684]
31 Our inference algorithms require a plane graph as input. [sent-70, score-0.344]
32 , when working with geographic information) a plane drawing of the graph (from which the corresponding embedding is readily determined) may be available. [sent-73, score-0.485]
33 Where it is not, we employ the algorithm of Boyer and Myrvold [3] which, given any connected graph G as input, produces in linear time either a planar embedding for G or a proof that G is non-planar. [sent-74, score-0.809]
34 1 we have mapped a general binary graphical model to an Ising model with an additional bias node; now we require that that Ising model be planar. [sent-78, score-0.119]
35 If all nodes of the graph are to be connected to the bias node without violating planarity, the graph has to be outerplanar, i. [sent-80, score-0.624]
36 , have a planar embedding in which all its nodes lie on the external face — a very severe restriction. [sent-82, score-0.725]
37 Figure 3: Possible cuts (bold blue dashes) of a square face of the model graph (dashed) and complementary perfect matchings (bold red lines) of its expanded dual (solid lines). [sent-83, score-0.881]
38 The situation improves, however, if only a subset B ⊂ V of nodes have non-zero bias (4): Then the graph only has to be B-outerplanar, i. [sent-84, score-0.324]
39 , have a planar embedding in which all nodes in B lie on the same face. [sent-86, score-0.656]
40 In image processing, for instance, where it is common to operate on a square grid of pixels, we can permit bias for all nodes on the perimeter of the grid. [sent-87, score-0.142]
41 They then take time exponential in the genus of the embedding though still polynomial in the size of the graph; for graphs of low genus this may well be preferable to current approximative methods. [sent-96, score-0.428]
42 3 Computing Optimal States via Maximum-Weight Perfect Matching The relationship between the states of a planar Ising model and perfect matchings (“dimer coverings” to physicists) was first discovered by Kasteleyn [6] and Fisher [7]. [sent-97, score-0.727]
43 1 The Expanded Dual Graph Definition 6 The dual G∗ (F, E) of an embedded graph G(V, E, Π) has a vertex for each face of G, with edges connecting vertices corresponding to faces that are adjacent (i. [sent-100, score-0.513]
44 Each edge of the dual crosses exactly one edge of the original graph; due to this one-to-one relationship we will consider the dual to have the same set of edges E (with the same energies) as the original. [sent-103, score-0.417]
45 We now expand the dual graph by replacing each node with a q-clique, where q is the degree of the node, as shown in Figure 3 for q = 4. [sent-104, score-0.38]
46 The additional edges internal to each q-clique are given zero energy so as to leave the model unaffected. [sent-105, score-0.243]
47 For large q the introduction of these O(q 2 ) internal edges slows down subsequent computations (solid line in Figure 4, left); this can be avoided by subdividing the offending q-gonal face with chords (which are also given zero energy) before constructing the dual. [sent-106, score-0.148]
48 2 Complementary Perfect Matchings Definition 7 A perfect matching of a graph G(V, E) is a subset M ⊆ E of edges wherein exactly one edge is incident upon each vertex in V; its weight |M| is the sum of the weights of its edges. [sent-111, score-0.639]
49 Theorem 8 For every cut C of an embedded graph G(V, E, Π) there exists at least one (if G is triangulated: exactly one) perfect matching M of its expanded dual complementary to C, i. [sent-112, score-0.867]
50 Proof sketch Consider the complement E\C of the cut as a partial matching of the expanded dual. [sent-115, score-0.382]
51 In each clique of the expanded dual, C’s complement thus leaves an even number of nodes unmatched; M can therefore be completed using only edges interior to the cliques. [sent-117, score-0.424]
52 In other words, there exists a surjection from perfect matchings in the expanded dual of G to cuts in G. [sent-119, score-0.554]
53 Furthermore, since we have given edges interior to the cliques of the expanded dual zero energy, every perfect matching M complementary to a cut C of our Ising model (2) obeys the relation |M| + |C| = Eij = const. [sent-120, score-0.759]
54 (7) (i,j)∈E This means that instead of a minimum-weight cut in a graph we can look for a maximum-weight perfect matching in its expanded dual. [sent-121, score-0.688]
55 But will that matching always be complementary to a cut? [sent-122, score-0.125]
56 Theorem 9 Every perfect matching M of the expanded dual of a plane graph G(V, E, Π) is complementary to a cut C of G, i. [sent-123, score-0.929]
57 Proof sketch In each clique of the expanded dual, an even number of nodes is matched by edges interior to the clique. [sent-126, score-0.364]
58 The complement E\M of the matching in G thus contains an even number of edges around the perimeter of each face of the embedding. [sent-127, score-0.297]
59 By induction over faces, this holds for every contractible (on the embedding surface) cycle of G. [sent-128, score-0.165]
60 Because a plane is simply connected, all cycles in a plane graph are contractible; thus E\M is a cut. [sent-129, score-0.431]
61 For planar graphs, however, the above theorems allow us to leverage known polynomial-time algorithms for perfect matchings into inference methods for Ising models. [sent-131, score-0.764]
62 3 The Lowest-Energy (MAP or Ground) State The blossom-shrinking algorithm [9, 10] is a sophisticated method to efficiently compute the maximum-weight perfect matching of a graph. [sent-133, score-0.205]
63 We can now efficiently compute the lowest-energy state of a planar Ising model as follows: Find a planar embedding of the model graph (Section 2. [sent-136, score-1.32]
64 1), and run the blossom-shrinking algorithm on that to compute its maximum-weight perfect matching. [sent-138, score-0.15]
65 Its complement in the original model is the minimum-weight graph cut (Section 3. [sent-139, score-0.414]
66 We can identify the state which induces this cut via a depth-first graph traversal that labels nodes as it encounters them, starting by labeling the bias node y0 := 0; this is shown below as Algorithm 1. [sent-141, score-0.595]
67 4 Ising model graph G(V, E) graph cut C(y) ⊆ E ∀ i ∈ {0, 1, 2, . [sent-145, score-0.57]
68 n} : yi := unknown; dfs state(0, 0); state vector y procedure dfs state(i ∈ {0, 1, 2, . [sent-148, score-0.212]
69 If d(·|·) is the weighted Hamming distance d(y|y ∗ ) := ∗ ∗ [[yi = yj ] = [yi = yj ]] vij , (i,j)∈E (9) 0. [sent-152, score-0.145]
70 3 to efficiently find the worst margin violator, argminy M (y|y ∗ ), for maximum-margin parameter estimation. [sent-168, score-0.118]
71 For planar graphs, however, the generating function for perfect matchings can be calculated in polynomial time via the determinant of a skewsymmetric matrix [6, 7]. [sent-171, score-0.727]
72 Due to the close relationship with graph cuts (Section 3. [sent-172, score-0.296]
73 Elaborating on work of Globerson and Jaakkola [8], we first convert the Ising model graph into a Boolean “half-Kasteleyn” matrix H: 1. [sent-174, score-0.216]
74 plane triangulate the embedded graph so as to make the relationship between cuts and complementary perfect matchings a bijection (cf. [sent-175, score-0.751]
75 orient the edges of the graph such that the in-degree of every node is odd; 3. [sent-178, score-0.407]
76 prefactor the triangulation edges (added in Step 1) out of H. [sent-180, score-0.134]
77 For a given set of disagreement edge costs Ek , k = {1, 2, , . [sent-184, score-0.297]
78 The partition function for perfect matchings is |K| [6–8], so we factor K and use (7) to compute 1 the log partition function for (11) as ln Z = 2 ln |K| − k∈E Ek . [sent-194, score-0.415]
79 Its derivative yields the marginal probability of disagreement on the k th edge, and is computed via the inverse of K: P(k ∈ C) := − 1 ∂ ln Z 1 ∂|K| =1− =1− ∂Ek 2|K| ∂Ek 1 2 tr K −1 ∂K ∂Ek −1 = 1 + K2k−1,2k K2k−1,2k . [sent-195, score-0.213]
80 (12) Figure 5: Boundary detection by maximum-margin training of planar Ising grids; from left to right: Ising model (100 × 100 grid), original image, noisy mask, and MAP segmentation of the Ising grid. [sent-197, score-0.499]
81 In a linear planar Ising CRF, the disagreement costs Eij in (2) are computed as inner products between features (sufficient statistics) x of the modeled data and corresponding parameters θ of the model, and (11) is used to model the conditional distribution P(y|x, θ). [sent-200, score-0.684]
82 For maximum margin (MM) parameter estimation [14] we instead minimize LMM (θ) := 1 2 = 1 2 λ θ 2 λ θ 2 + E(y ∗ |x, θ) − min M (y|y ∗ x, θ) , (14) y ∗ ∗ + E(y |x, θ) − E(ˆ|x, θ) + d(ˆ|y ), y y ∗ where y := argminy M (y|y , x, θ) is the worst margin violator, i. [sent-204, score-0.175]
83 , the state that minimizes the ˆ margin energy (8). [sent-206, score-0.242]
84 To demonstrate the scalability of planar Ising models, we designed a simple boundary detection task based on images from the GrabCut Ground Truth image segmentation database [16]. [sent-208, score-0.547]
85 We then employed a planar Ising model to recover the original boundary — namely, a 100 × 100 square grid with one additional edge pegged to a high energy, encoding prior knowledge that two opposing corners of the grid depict different regions (Figure 5, left). [sent-213, score-0.585]
86 The energy of the other edges was Eij := [1, |xi − xj |], θ , where xi is the pixel intensity at node i. [sent-214, score-0.327]
87 We did not employ a bias node for this task, and simply set λ = 1. [sent-215, score-0.129]
88 For S/N ratios of 1:9 and lower the system was unable to locate the boundary; for S/N ratios of 1:7 and higher we obtained perfect reconstruction. [sent-227, score-0.15]
89 marginals for prediction, to our knowledge for the first time on graphs intractable for the junction tree appproach, such as the grids often used in image processing. [sent-232, score-0.17]
90 6 Discussion We have proposed an alternative algorithmic framework for efficient exact inference in binary graphical models, which replaces the submodularity constraint of graph cut methods with a planarity constraint. [sent-234, score-0.676]
91 Besides proving efficient and effective in first experiments, our approach opens up a number of interesting research directions to be explored: Our algorithms can all be extended to nonplanar graphs, at a cost exponential in the genus of the embedding. [sent-235, score-0.153]
92 We are currently developing these extensions, which may prove of great practical value for graphs that are “almost” planar; examples include road networks (where edge crossings arise from overpasses without on-ramps) and graphs describing the tertiary structure of proteins [17]. [sent-236, score-0.261]
93 Our method for calculating the ground state (Section 3) actually works for nonplanar graphs whose ground state does not contain frustrated non-contractible cycles. [sent-238, score-0.401]
94 The QPBO graph cut method [18] finds ground states that do not contain any frustrated cycles, and otherwise yields a partial labeling. [sent-239, score-0.456]
95 The existence of two distinct tractable frameworks for inference in binary graphical models implies a yet more powerful hybrid: Consider a graph each of whose biconnected components is either planar or submodular. [sent-241, score-0.789]
96 As a whole, this graph may be neither planar nor submodular, yet efficient exact inference in it is clearly possible by applying the appropriate framework to each component. [sent-242, score-0.715]
97 What energy functions can be minimized via graph cuts? [sent-247, score-0.352]
98 On the cutting edge: Simplified O(n) planarity by edge addition. [sent-267, score-0.211]
99 Graph embedding with minimum depth and maximum external face. [sent-283, score-0.131]
100 Minimizing nonsubmodular functions with graph cuts – a review. [sent-387, score-0.296]
wordName wordTfidf (topN-words)
[('planar', 0.462), ('eij', 0.423), ('ising', 0.357), ('graph', 0.216), ('disagreement', 0.179), ('perfect', 0.15), ('cut', 0.138), ('planarity', 0.136), ('energy', 0.136), ('embedding', 0.131), ('expanded', 0.129), ('ei', 0.12), ('matchings', 0.115), ('edges', 0.107), ('genus', 0.102), ('graphs', 0.093), ('plane', 0.091), ('node', 0.084), ('dual', 0.08), ('cuts', 0.08), ('edge', 0.075), ('submodularity', 0.075), ('complementary', 0.07), ('ek', 0.066), ('nodes', 0.063), ('kolmogorov', 0.06), ('dfs', 0.06), ('violator', 0.06), ('complement', 0.06), ('margin', 0.057), ('ground', 0.057), ('matching', 0.055), ('topological', 0.053), ('kasteleyn', 0.051), ('lmm', 0.051), ('nonplanar', 0.051), ('schraudolph', 0.051), ('yj', 0.05), ('state', 0.049), ('boundary', 0.048), ('drawing', 0.047), ('bias', 0.045), ('frustrated', 0.045), ('triangulated', 0.045), ('vij', 0.045), ('graphical', 0.043), ('yi', 0.043), ('costs', 0.043), ('rotation', 0.042), ('marginals', 0.042), ('face', 0.041), ('partition', 0.041), ('faces', 0.04), ('eds', 0.038), ('boolean', 0.038), ('segmentation', 0.037), ('inference', 0.037), ('incident', 0.036), ('surface', 0.035), ('globerson', 0.035), ('clique', 0.035), ('energies', 0.035), ('grids', 0.035), ('worst', 0.034), ('ln', 0.034), ('beineke', 0.034), ('blossom', 0.034), ('boyer', 0.034), ('contractible', 0.034), ('dimer', 0.034), ('dimers', 0.034), ('grabcut', 0.034), ('macbook', 0.034), ('perimeter', 0.034), ('prefactored', 0.034), ('cycles', 0.033), ('bundle', 0.033), ('ik', 0.032), ('binary', 0.031), ('australia', 0.031), ('mm', 0.031), ('ij', 0.03), ('orbits', 0.03), ('orientable', 0.03), ('interior', 0.03), ('embedded', 0.029), ('code', 0.029), ('undirected', 0.029), ('cpu', 0.029), ('jk', 0.029), ('severe', 0.028), ('crf', 0.028), ('ml', 0.028), ('submodular', 0.027), ('argminy', 0.027), ('duo', 0.027), ('triangulation', 0.027), ('zabih', 0.027), ('construction', 0.027), ('apple', 0.026)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000004 69 nips-2008-Efficient Exact Inference in Planar Ising Models
Author: Nicol N. Schraudolph, Dmitry Kamenetsky
Abstract: We give polynomial-time algorithms for the exact computation of lowest-energy states, worst margin violators, partition functions, and marginals in certain binary undirected graphical models. Our approach provides an interesting alternative to the well-known graph cut paradigm in that it does not impose any submodularity constraints; instead we require planarity to establish a correspondence with perfect matchings in an expanded dual graph. Maximum-margin parameter estimation for a boundary detection task shows our approach to be efficient and effective. A C++ implementation is available from http://nic.schraudolph.org/isinf/. 1
2 0.12589261 107 nips-2008-Influence of graph construction on graph-based clustering measures
Author: Markus Maier, Ulrike V. Luxburg, Matthias Hein
Abstract: Graph clustering methods such as spectral clustering are defined for general weighted graphs. In machine learning, however, data often is not given in form of a graph, but in terms of similarity (or distance) values between points. In this case, first a neighborhood graph is constructed using the similarities between the points and then a graph clustering algorithm is applied to this graph. In this paper we investigate the influence of the construction of the similarity graph on the clustering results. We first study the convergence of graph clustering criteria such as the normalized cut (Ncut) as the sample size tends to infinity. We find that the limit expressions are different for different types of graph, for example the r-neighborhood graph or the k-nearest neighbor graph. In plain words: Ncut on a kNN graph does something systematically different than Ncut on an r-neighborhood graph! This finding shows that graph clustering criteria cannot be studied independently of the kind of graph they are applied to. We also provide examples which show that these differences can be observed for toy and real data already for rather small sample sizes. 1
3 0.12014078 2 nips-2008-A Convex Upper Bound on the Log-Partition Function for Binary Distributions
Author: Laurent E. Ghaoui, Assane Gueye
Abstract: We consider the problem of bounding from above the log-partition function corresponding to second-order Ising models for binary distributions. We introduce a new bound, the cardinality bound, which can be computed via convex optimization. The corresponding error on the logpartition function is bounded above by twice the distance, in model parameter space, to a class of “standard” Ising models, for which variable inter-dependence is described via a simple mean field term. In the context of maximum-likelihood, using the new bound instead of the exact log-partition function, while constraining the distance to the class of standard Ising models, leads not only to a good approximation to the log-partition function, but also to a model that is parsimonious, and easily interpretable. We compare our bound with the log-determinant bound introduced by Wainwright and Jordan (2006), and show that when the l1 -norm of the model parameter vector is small enough, the latter is outperformed by the new bound. 1 1.1
Author: Garvesh Raskutti, Bin Yu, Martin J. Wainwright, Pradeep K. Ravikumar
Abstract: We consider the problem of estimating the graph structure associated with a Gaussian Markov random field (GMRF) from i.i.d. samples. We study the performance of study the performance of the ℓ1 -regularized maximum likelihood estimator in the high-dimensional setting, where the number of nodes in the graph p, the number of edges in the graph s and the maximum node degree d, are allowed to grow as a function of the number of samples n. Our main result provides sufficient conditions on (n, p, d) for the ℓ1 -regularized MLE estimator to recover all the edges of the graph with high probability. Under some conditions on the model covariance, we show that model selection can be achieved for sample sizes n = Ω(d2 log(p)), with the error decaying as O(exp(−c log(p))) for some constant c. We illustrate our theoretical results via simulations and show good correspondences between the theoretical predictions and behavior in simulations.
5 0.1125536 171 nips-2008-Online Prediction on Large Diameter Graphs
Author: Mark Herbster, Guy Lever, Massimiliano Pontil
Abstract: We continue our study of online prediction of the labelling of a graph. We show a fundamental limitation of Laplacian-based algorithms: if the graph has a large diameter then the number of mistakes made by such algorithms may be proportional to the square root of the number of vertices, even when tackling simple problems. We overcome this drawback by means of an efficient algorithm which achieves a logarithmic mistake bound. It is based on the notion of a spine, a path graph which provides a linear embedding of the original graph. In practice, graphs may exhibit cluster structure; thus in the last part, we present a modified algorithm which achieves the “best of both worlds”: it performs well locally in the presence of cluster structure, and globally on large diameter graphs. 1
6 0.10981161 34 nips-2008-Bayesian Network Score Approximation using a Metagraph Kernel
7 0.09869545 84 nips-2008-Fast Prediction on a Tree
8 0.095874041 215 nips-2008-Sparse Signal Recovery Using Markov Random Fields
9 0.093160816 129 nips-2008-MAS: a multiplicative approximation scheme for probabilistic inference
10 0.091192447 115 nips-2008-Learning Bounded Treewidth Bayesian Networks
11 0.085462615 53 nips-2008-Counting Solution Clusters in Graph Coloring Problems Using Belief Propagation
12 0.084995724 194 nips-2008-Regularized Learning with Networks of Features
13 0.075757354 191 nips-2008-Recursive Segmentation and Recognition Templates for 2D Parsing
14 0.074518345 40 nips-2008-Bounds on marginal probability distributions
15 0.073392421 49 nips-2008-Clusters and Coarse Partitions in LP Relaxations
16 0.072074324 104 nips-2008-Improved Moves for Truncated Convex Models
17 0.05711307 63 nips-2008-Dimensionality Reduction for Data in Multiple Feature Representations
18 0.055021107 151 nips-2008-Non-parametric Regression Between Manifolds
19 0.054457329 201 nips-2008-Robust Near-Isometric Matching via Structured Learning of Graphical Models
20 0.050551768 224 nips-2008-Structured ranking learning using cumulative distribution networks
topicId topicWeight
[(0, -0.174), (1, -0.047), (2, -0.029), (3, 0.036), (4, 0.063), (5, -0.123), (6, 0.069), (7, -0.059), (8, 0.05), (9, -0.234), (10, -0.11), (11, 0.021), (12, -0.012), (13, -0.054), (14, -0.038), (15, -0.043), (16, 0.047), (17, 0.018), (18, -0.053), (19, -0.063), (20, 0.034), (21, -0.099), (22, 0.023), (23, -0.083), (24, 0.0), (25, -0.014), (26, -0.038), (27, 0.017), (28, 0.036), (29, -0.06), (30, -0.072), (31, -0.069), (32, -0.012), (33, -0.025), (34, 0.023), (35, 0.084), (36, -0.002), (37, 0.045), (38, -0.018), (39, -0.06), (40, 0.058), (41, -0.141), (42, 0.05), (43, -0.027), (44, 0.064), (45, -0.026), (46, -0.157), (47, 0.188), (48, -0.064), (49, 0.049)]
simIndex simValue paperId paperTitle
same-paper 1 0.9527629 69 nips-2008-Efficient Exact Inference in Planar Ising Models
Author: Nicol N. Schraudolph, Dmitry Kamenetsky
Abstract: We give polynomial-time algorithms for the exact computation of lowest-energy states, worst margin violators, partition functions, and marginals in certain binary undirected graphical models. Our approach provides an interesting alternative to the well-known graph cut paradigm in that it does not impose any submodularity constraints; instead we require planarity to establish a correspondence with perfect matchings in an expanded dual graph. Maximum-margin parameter estimation for a boundary detection task shows our approach to be efficient and effective. A C++ implementation is available from http://nic.schraudolph.org/isinf/. 1
2 0.7057001 115 nips-2008-Learning Bounded Treewidth Bayesian Networks
Author: Gal Elidan, Stephen Gould
Abstract: With the increased availability of data for complex domains, it is desirable to learn Bayesian network structures that are sufficiently expressive for generalization while also allowing for tractable inference. While the method of thin junction trees can, in principle, be used for this purpose, its fully greedy nature makes it prone to overfitting, particularly when data is scarce. In this work we present a novel method for learning Bayesian networks of bounded treewidth that employs global structure modifications and that is polynomial in the size of the graph and the treewidth bound. At the heart of our method is a triangulated graph that we dynamically update in a way that facilitates the addition of chain structures that increase the bound on the model’s treewidth by at most one. We demonstrate the effectiveness of our “treewidth-friendly” method on several real-life datasets. Importantly, we also show that by using global operators, we are able to achieve better generalization even when learning Bayesian networks of unbounded treewidth. 1
3 0.68954253 107 nips-2008-Influence of graph construction on graph-based clustering measures
Author: Markus Maier, Ulrike V. Luxburg, Matthias Hein
Abstract: Graph clustering methods such as spectral clustering are defined for general weighted graphs. In machine learning, however, data often is not given in form of a graph, but in terms of similarity (or distance) values between points. In this case, first a neighborhood graph is constructed using the similarities between the points and then a graph clustering algorithm is applied to this graph. In this paper we investigate the influence of the construction of the similarity graph on the clustering results. We first study the convergence of graph clustering criteria such as the normalized cut (Ncut) as the sample size tends to infinity. We find that the limit expressions are different for different types of graph, for example the r-neighborhood graph or the k-nearest neighbor graph. In plain words: Ncut on a kNN graph does something systematically different than Ncut on an r-neighborhood graph! This finding shows that graph clustering criteria cannot be studied independently of the kind of graph they are applied to. We also provide examples which show that these differences can be observed for toy and real data already for rather small sample sizes. 1
4 0.64768386 34 nips-2008-Bayesian Network Score Approximation using a Metagraph Kernel
Author: Benjamin Yackley, Eduardo Corona, Terran Lane
Abstract: Many interesting problems, including Bayesian network structure-search, can be cast in terms of finding the optimum value of a function over the space of graphs. However, this function is often expensive to compute exactly. We here present a method derived from the study of Reproducing Kernel Hilbert Spaces which takes advantage of the regular structure of the space of all graphs on a fixed number of nodes to obtain approximations to the desired function quickly and with reasonable accuracy. We then test this method on both a small testing set and a real-world Bayesian network; the results suggest that not only is this method reasonably accurate, but that the BDe score itself varies quadratically over the space of all graphs. 1
5 0.6070078 84 nips-2008-Fast Prediction on a Tree
Author: Mark Herbster, Massimiliano Pontil, Sergio R. Galeano
Abstract: Given an n-vertex weighted tree with structural diameter S and a subset of m vertices, we present a technique to compute a corresponding m × m Gram matrix of the pseudoinverse of the graph Laplacian in O(n + m2 + mS) time. We discuss the application of this technique to fast label prediction on a generic graph. We approximate the graph with a spanning tree and then we predict with the kernel perceptron. We address the approximation of the graph with either a minimum spanning tree or a shortest path tree. The fast computation of the pseudoinverse enables us to address prediction problems on large graphs. We present experiments on two web-spam classification tasks, one of which includes a graph with 400,000 vertices and more than 10,000,000 edges. The results indicate that the accuracy of our technique is competitive with previous methods using the full graph information. 1
6 0.60114843 171 nips-2008-Online Prediction on Large Diameter Graphs
7 0.5569216 104 nips-2008-Improved Moves for Truncated Convex Models
8 0.52295464 212 nips-2008-Skill Characterization Based on Betweenness
9 0.51825017 53 nips-2008-Counting Solution Clusters in Graph Coloring Problems Using Belief Propagation
10 0.49499959 129 nips-2008-MAS: a multiplicative approximation scheme for probabilistic inference
11 0.47786063 2 nips-2008-A Convex Upper Bound on the Log-Partition Function for Binary Distributions
12 0.47086057 225 nips-2008-Supervised Bipartite Graph Inference
13 0.43299118 50 nips-2008-Continuously-adaptive discretization for message-passing algorithms
14 0.42911151 135 nips-2008-Model Selection in Gaussian Graphical Models: High-Dimensional Consistency of \boldmath$\ell 1$-regularized MLE
15 0.42220175 40 nips-2008-Bounds on marginal probability distributions
16 0.40051451 215 nips-2008-Sparse Signal Recovery Using Markov Random Fields
17 0.38998777 236 nips-2008-The Mondrian Process
18 0.36817539 105 nips-2008-Improving on Expectation Propagation
19 0.36364597 134 nips-2008-Mixed Membership Stochastic Blockmodels
20 0.35054299 194 nips-2008-Regularized Learning with Networks of Features
topicId topicWeight
[(6, 0.049), (7, 0.083), (9, 0.279), (12, 0.04), (28, 0.171), (57, 0.092), (59, 0.024), (63, 0.031), (71, 0.015), (77, 0.057), (83, 0.052)]
simIndex simValue paperId paperTitle
same-paper 1 0.80441242 69 nips-2008-Efficient Exact Inference in Planar Ising Models
Author: Nicol N. Schraudolph, Dmitry Kamenetsky
Abstract: We give polynomial-time algorithms for the exact computation of lowest-energy states, worst margin violators, partition functions, and marginals in certain binary undirected graphical models. Our approach provides an interesting alternative to the well-known graph cut paradigm in that it does not impose any submodularity constraints; instead we require planarity to establish a correspondence with perfect matchings in an expanded dual graph. Maximum-margin parameter estimation for a boundary detection task shows our approach to be efficient and effective. A C++ implementation is available from http://nic.schraudolph.org/isinf/. 1
2 0.79446518 240 nips-2008-Tracking Changing Stimuli in Continuous Attractor Neural Networks
Author: K. Wong, Si Wu, Chi Fung
Abstract: Continuous attractor neural networks (CANNs) are emerging as promising models for describing the encoding of continuous stimuli in neural systems. Due to the translational invariance of their neuronal interactions, CANNs can hold a continuous family of neutrally stable states. In this study, we systematically explore how neutral stability of a CANN facilitates its tracking performance, a capacity believed to have wide applications in brain functions. We develop a perturbative approach that utilizes the dominant movement of the network stationary states in the state space. We quantify the distortions of the bump shape during tracking, and study their effects on the tracking performance. Results are obtained on the maximum speed for a moving stimulus to be trackable, and the reaction time to catch up an abrupt change in stimulus. 1
3 0.71116799 162 nips-2008-On the Design of Loss Functions for Classification: theory, robustness to outliers, and SavageBoost
Author: Hamed Masnadi-shirazi, Nuno Vasconcelos
Abstract: The machine learning problem of classifier design is studied from the perspective of probability elicitation, in statistics. This shows that the standard approach of proceeding from the specification of a loss, to the minimization of conditional risk is overly restrictive. It is shown that a better alternative is to start from the specification of a functional form for the minimum conditional risk, and derive the loss function. This has various consequences of practical interest, such as showing that 1) the widely adopted practice of relying on convex loss functions is unnecessary, and 2) many new losses can be derived for classification problems. These points are illustrated by the derivation of a new loss which is not convex, but does not compromise the computational tractability of classifier design, and is robust to the contamination of data with outliers. A new boosting algorithm, SavageBoost, is derived for the minimization of this loss. Experimental results show that it is indeed less sensitive to outliers than conventional methods, such as Ada, Real, or LogitBoost, and converges in fewer iterations. 1
4 0.62899309 200 nips-2008-Robust Kernel Principal Component Analysis
Author: Minh H. Nguyen, Fernando Torre
Abstract: Kernel Principal Component Analysis (KPCA) is a popular generalization of linear PCA that allows non-linear feature extraction. In KPCA, data in the input space is mapped to higher (usually) dimensional feature space where the data can be linearly modeled. The feature space is typically induced implicitly by a kernel function, and linear PCA in the feature space is performed via the kernel trick. However, due to the implicitness of the feature space, some extensions of PCA such as robust PCA cannot be directly generalized to KPCA. This paper presents a technique to overcome this problem, and extends it to a unified framework for treating noise, missing data, and outliers in KPCA. Our method is based on a novel cost function to perform inference in KPCA. Extensive experiments, in both synthetic and real data, show that our algorithm outperforms existing methods. 1
5 0.62831068 118 nips-2008-Learning Transformational Invariants from Natural Movies
Author: Charles Cadieu, Bruno A. Olshausen
Abstract: We describe a hierarchical, probabilistic model that learns to extract complex motion from movies of the natural environment. The model consists of two hidden layers: the first layer produces a sparse representation of the image that is expressed in terms of local amplitude and phase variables. The second layer learns the higher-order structure among the time-varying phase variables. After training on natural movies, the top layer units discover the structure of phase-shifts within the first layer. We show that the top layer units encode transformational invariants: they are selective for the speed and direction of a moving pattern, but are invariant to its spatial structure (orientation/spatial-frequency). The diversity of units in both the intermediate and top layers of the model provides a set of testable predictions for representations that might be found in V1 and MT. In addition, the model demonstrates how feedback from higher levels can influence representations at lower levels as a by-product of inference in a graphical model. 1
6 0.62676692 138 nips-2008-Modeling human function learning with Gaussian processes
7 0.62604225 63 nips-2008-Dimensionality Reduction for Data in Multiple Feature Representations
8 0.62564397 197 nips-2008-Relative Performance Guarantees for Approximate Inference in Latent Dirichlet Allocation
9 0.62382221 62 nips-2008-Differentiable Sparse Coding
10 0.62350893 192 nips-2008-Reducing statistical dependencies in natural signals using radial Gaussianization
11 0.62325484 116 nips-2008-Learning Hybrid Models for Image Annotation with Partially Labeled Data
12 0.62261838 79 nips-2008-Exploring Large Feature Spaces with Hierarchical Multiple Kernel Learning
13 0.6222989 66 nips-2008-Dynamic visual attention: searching for coding length increments
14 0.62186182 176 nips-2008-Partially Observed Maximum Entropy Discrimination Markov Networks
15 0.62162435 4 nips-2008-A Scalable Hierarchical Distributed Language Model
16 0.62018216 205 nips-2008-Semi-supervised Learning with Weakly-Related Unlabeled Data : Towards Better Text Categorization
18 0.61852956 194 nips-2008-Regularized Learning with Networks of Features
19 0.61843979 235 nips-2008-The Infinite Hierarchical Factor Regression Model
20 0.61836684 64 nips-2008-DiscLDA: Discriminative Learning for Dimensionality Reduction and Classification