nips nips2008 nips2008-115 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Gal Elidan, Stephen Gould
Abstract: With the increased availability of data for complex domains, it is desirable to learn Bayesian network structures that are sufficiently expressive for generalization while also allowing for tractable inference. While the method of thin junction trees can, in principle, be used for this purpose, its fully greedy nature makes it prone to overfitting, particularly when data is scarce. In this work we present a novel method for learning Bayesian networks of bounded treewidth that employs global structure modifications and that is polynomial in the size of the graph and the treewidth bound. At the heart of our method is a triangulated graph that we dynamically update in a way that facilitates the addition of chain structures that increase the bound on the model’s treewidth by at most one. We demonstrate the effectiveness of our “treewidth-friendly” method on several real-life datasets. Importantly, we also show that by using global operators, we are able to achieve better generalization even when learning Bayesian networks of unbounded treewidth. 1
Reference: text
sentIndex sentText sentNum sentScore
1 While the method of thin junction trees can, in principle, be used for this purpose, its fully greedy nature makes it prone to overfitting, particularly when data is scarce. [sent-5, score-0.339]
2 In this work we present a novel method for learning Bayesian networks of bounded treewidth that employs global structure modifications and that is polynomial in the size of the graph and the treewidth bound. [sent-6, score-1.918]
3 At the heart of our method is a triangulated graph that we dynamically update in a way that facilitates the addition of chain structures that increase the bound on the model’s treewidth by at most one. [sent-7, score-1.345]
4 With the goal of making predictions or providing probabilistic explanations, it is desirable to learn models that generalize well and at the same time have low inference complexity or a small treewidth [23]. [sent-12, score-0.861]
5 , [1, 21]), by searching for a local maxima over tree mixtures [20], or by approximate methods that are polynomial in the size of the graph but exponential in the treewidth bound (e. [sent-18, score-1.093]
6 In the context of general Bayesian networks, the thin junction tree approach of Bach and Jordan [2] is a local greedy search procedure that relies at each step on tree-decomposition heuristic techniques for computing an upper bound the true treewidth of the model. [sent-21, score-1.314]
7 Like any local search approach, this method does not provide performance guarantees but is appealing in its ability to efficiently learn models with an arbitrary treewidth bound. [sent-22, score-0.911]
8 The thin junction tree method, however, suffers from two important limitations. [sent-23, score-0.357]
9 First, while useful on average, even the best of the tree-decomposition heuristics exhibit some variance in the treewidth estimate [16]. [sent-24, score-0.834]
10 As a result, a single edge addition can lead to a jump in the treewidth estimate despite the fact that it can increase the true treewidth by at most one. [sent-25, score-1.809]
11 Intuitively, to generalize well, we want to learn bounded treewidth Bayesian networks where structure modifications are globally beneficial (i. [sent-29, score-0.943]
12 In this work we propose a novel method for efficiently learning Bayesian networks of bounded treewidth that addresses these concerns. [sent-32, score-0.92]
13 At the heart of our method is a dynamic update of the triangulation of the model in a way that is tree-width friendly: the treewidth of the triangulated graph (upper bound on the model’s true treewidth) is guaranteed to increase by at most one when an 1 edge is added to the network. [sent-33, score-1.602]
14 Finally, we learn a bounded treewidth Bayesian network by iteratively augmenting the model with such chains. [sent-36, score-0.923]
15 We are also able to guarantee that the bound on the model’s treewidth grows by at most one at each iteration. [sent-38, score-0.884]
16 Thus, our method resembles the global nature of Chow and Liu [5] more closely than the thin junction tree approach of Bach and Jordan [2], while being applicable in practice to any desired treewidth. [sent-39, score-0.405]
17 We evaluate our method on several challenging real-life datasets and show that our method is able to learn richer models that generalize better than the thin junction tree approach as well as an unbounded aggressive search strategy. [sent-40, score-0.604]
18 The actual complexity of inference in a Bayesian network is proportional to its treewidth [23] which, roughly speaking, measures how closely the network resembles a tree. [sent-62, score-0.902]
19 The notions of tree-decompositions and treewidth were introduced by Robertson and Seymour [23]:1 Definition 2. [sent-63, score-0.834]
20 The treewidth of a tree-decomposition is defined to be maxi∈T |Ci | − 1. [sent-66, score-0.834]
21 The treewidth T W (H) of an undirected graph H is the minimum treewidth over all possible tree-decompositions of H. [sent-67, score-1.807]
22 An equivalent notion of treewidth can be phrased in terms of a graph that is a triangulation of H. [sent-68, score-1.151]
23 2: An induced path P in an undirected graph H is a path such that for every nonadjacent vertices pi , pj ∈ P there is no edge (pi —pj ) ∈ H. [sent-70, score-0.402]
24 A triangulated (chordal) graph is an undirected graph with no induced cycles. [sent-71, score-0.394]
25 It can be easily shown that the treewidth of a triangulated graph is the size of the maximal clique of the graph minus one [23]. [sent-73, score-1.227]
26 The treewidth of an undirected graph H is then the minimum treewidth of all triangulations of H. [sent-74, score-1.828]
27 For the underlying directed acyclic graph of a Bayesian network, the treewidth can be characterized via a triangulation of the moralized graph. [sent-75, score-1.263]
28 3: A moralized graph M of a directed acyclic graph G is an undirected graph that has an edge (i—j) for every (i → j) ∈ G and an edge (p—q) for every pair (p → i), (q → i) ∈ G. [sent-77, score-0.647]
29 (right) An example of the different steps of our triangulation procedure (b)-(e) when (s → t) is added to the graph in (a). [sent-80, score-0.367]
30 The augmented graph (e) has a treewidth of three (maximal clique of size four). [sent-82, score-0.974]
31 The treewidth of a Bayesian network graph G is defined as the treewidth of its moralized graph M. [sent-84, score-1.976]
32 It follows that the maximal clique of any moralized triangulation of G is an upper bound on the treewidth of the model, and thus its inference complexity. [sent-85, score-1.281]
33 3 Learning Bounded Treewidth Bayesian Networks In this section we outline our approach for learning Bayesian networks given an arbitrary treewidth bound that is polynomial in both the number of variables and the desired treewidth. [sent-86, score-0.972]
34 At the heart of our method is the idea of using a dynamically maintained triangulated graph to upper bound the treewidth of the current model. [sent-88, score-1.174]
35 When an edge is added to the Bayesian network we update this triangulated graph in a way that is not only guaranteed to produce a valid triangulation, but that is also treewidth-friendly. [sent-89, score-0.467]
36 That is, our update is guaranteed to increase the size of the maximal clique of the triangulated graph, and hence the treewidth bound, by at most one. [sent-90, score-1.134]
37 Briefly, we learn a Bayesian network with bounded treewidth K by starting from a Chow-Liu tree and iteratively augmenting the current structure with an optimal treewidth-friendly chain. [sent-95, score-1.013]
38 During each iteration (below the treewidth bound) we apply our treewidth-friendly edge update procedure that maintains a moralized and triangulated graph for the model at hand. [sent-96, score-1.339]
39 Appealingly, as each global modification can increase the treewidth by at most one, at least K such chains will be added before we face the problem of local maxima. [sent-97, score-0.991]
40 1: Given a treewidth bound K and dataset over N variables, the algorithm outlined in Figure 1 runs in time polynomial in N and K. [sent-100, score-0.914]
41 This result relies on the efficiency of each step of the algorithm and that there can be at most N · K iterations (≤ |edges|) before exceeding the treewidth bound. [sent-101, score-0.834]
42 4 Treewidth-Friendly Edge Update The basic building block of our method is a procedure for maintaining a valid triangulation of the Bayesian network. [sent-103, score-0.357]
43 An appealing feature of this procedure is that the treewidth bound is guaranteed to grow by at most one after the update. [sent-104, score-0.924]
44 For clarity of exposition, we start with a simple variant of our procedure, and later refine this to allow for multiple edge additions while maintaining our guarantee on the treewidth bound. [sent-106, score-0.969]
45 1: Let G be a Bayesian network structure and let M+ be a moralized triangulation of G. [sent-108, score-0.376]
46 Stated simply, if the graph was triangulated before the addition of (s → t) to the Bayesian network, then we only need to triangulate cycles created by the addition of the new edge or those forced by moralization. [sent-111, score-0.353]
47 This observation immediately suggests a straight-forward single-source triangulation whereby we simply add an edge (s—v) for every node v on an induced path between s and t or its parents before the edge update. [sent-112, score-0.644]
48 Clearly, this naive method results in a valid moralized triangulation of G ∪ (s → t). [sent-113, score-0.362]
49 2: The treewidth of the graph produced by the single-source triangulation procedure is greater than the treewidth of the input graph M+ by at most one. [sent-116, score-2.097]
50 Proof: (outline) For the treewidth to increase by more than one, some maximal C in M+ needs to connect to two new nodes. [sent-117, score-0.893]
51 Since all edges are being added from s, this can only happen in one of two ways: (i) either t, a parent p of t, or a node v on induced path between s and t is also connected to C, but not part of C, or (ii) two such (non-adjacent) nodes exist and s is in C. [sent-118, score-0.335]
52 4: The (unique) block tree B of an undirected graph H is a graph with nodes that correspond both to cut-vertices and to blocks of H. [sent-132, score-0.449]
53 The edges in the block tree connect any block node Bi with a cut-vertex node vj if and only if vj ∈ Bi in H. [sent-133, score-0.438]
54 An important consequence that follows from Dirac [11] is that an undirected graph whose blocks are triangulated is overall triangulated. [sent-135, score-0.316]
55 First, the triangulated graph is augmented with the edge (s—t) and any edges needed for moralization (Figure 1(b) and (c)). [sent-137, score-0.445]
56 Second, a block level triangulation is carried out by zig-zagging across cut-vertices along the unique path between the blocks containing s and t and its parents (Figure 1(d)). [sent-138, score-0.418]
57 The block containing t and its parents is treated differently: we add chords directly from s to any node v within the block that is on an induced path between s and t (or parents of t) (Figure 1(e)). [sent-142, score-0.382]
58 This is required to prevent moralization and triangulation edges from interacting in a way that will increase the treewidth by more than one (e. [sent-143, score-1.204]
59 5: Our revised edge update procedure results in a triangulated graph with a treewidth at most one greater than that of the input graph. [sent-151, score-1.227]
60 Since each block along the block path between s and t is triangulated separately, we only need to consider the zig-zag triangulation between blocks. [sent-154, score-0.553]
61 To see that the treewidth 4 increases by at most one, we use similar arguments to those used in the proof of Theorem 4. [sent-156, score-0.834]
62 Below are several examples of contaminated sets (solid nodes) incident to edges added (dashed) by our edge update procedure for different candidate edge additions (s → t) to the Bayesian network on the left. [sent-162, score-0.528]
63 In all examples except the last treewidth is increased by one. [sent-163, score-0.834]
64 We will use this to learn optimal treewidth friendly chains given a ordering in Section 5. [sent-165, score-1.05]
65 5 Learning Optimal Treewidth-Friendly Chains In the previous section we described our edge update procedure and characterized edge chains that jointly increase the treewidth bound by at most one. [sent-173, score-1.264]
66 6, and are thus treewidth friendly, given a topological node ordering. [sent-175, score-0.907]
67 On the surface, one might question the need for a specific node ordering altogether if chain global operators are to be used—given the result of Chow and Liu [5], one might expect that learning the optimal chain with respect to any ordering can be carried out efficiently. [sent-176, score-0.439]
68 Instead, we commit to a single node ordering that is topologically consistent (each node appears after its parent in the network) and learn the optimal treewidth-friendly chain with respect to that order (we briefly discuss the details of our ordering below). [sent-179, score-0.443]
69 Our method (solid blue squares) is compared to the thin junction tree method (dashed red circles), and an unbounded aggressive search (dotted black). [sent-185, score-0.577]
70 (middle) the treewidth estimate and the number of edges in the chain during the iterations of a typical run with the bound set to 10. [sent-186, score-1.059]
71 Intuitively, since edges contaminate nodes along the block path between the edge’s endpoints (see Section 4), we want to adopt a DFS ordering over the blocks so as to facilitate as many edges as possible between different branches of the block tree. [sent-194, score-0.549]
72 We order nodes with a block by the distance from the “entry” vertex as motivated by the following result on the distance dM (u, v) between min nodes u, v in the triangulated graph M+ (proof not shown for lack of space): Theorem 5. [sent-195, score-0.382]
73 1: Let r, s, t be nodes in a block B in the triangulated graph M+ with dM (r, s) ≤ min dM (r, t). [sent-196, score-0.338]
74 min min min The efficiency of our method outlined in Figure 1 in the number of variables and the treewidth bound (Theorem 3. [sent-198, score-0.909]
75 The first is an improved variant of the thin junction tree method [2]. [sent-201, score-0.382]
76 We start (as in our method) with a Chow-Liu forest and iteratively add the single best scoring edge as long as the treewidth bound is not exceeded. [sent-202, score-1.008]
77 To make the comparison independent of the choice of triangulation method, at each iteration we replace the heuristic triangulation (best of maximum cardinality search or minimum fill-in [16], which in practice had negligible differences) with our triangulation if it results in a lower treewidth. [sent-203, score-0.724]
78 , [13]) and random moves and that is not constrained by a treewidth bound. [sent-206, score-0.834]
79 Figure 2(a) shows test log-loss as a function of treewidth bound. [sent-212, score-0.834]
80 The first obvious phenomenon is that both our method and the thin junction tree approach are superior to the aggressive baseline. [sent-213, score-0.47]
81 The consistent superiority of our method over thin junction trees demonstrates that a better choice of edges, i. [sent-215, score-0.335]
82 Compared are our approach (solid blue squares), the thin junction tree method (dashed red circles), an aggressive unbounded search (dotted black), and the method of Chechetka and Guestrin [3] (dash-dot magenta diamonds). [sent-220, score-0.577]
83 Compared are an unbounded aggressive search (dotted) and unconstrained (thin) and constrained by the DNA order (thick) variants of ours and the thin junction tree method. [sent-223, score-0.527]
84 To qualitatively illustrate the progression of our algorithm, in Figure 2(b) we plot the number of edges in the chain and the treewidth estimate at the end of each iteration for a typical run. [sent-226, score-1.027]
85 Our algorithm aggressively adds multi-edge chains until the treewidth bound is reached, at which point (iteration 24) it becomes fully greedy. [sent-227, score-0.957]
86 It is also worth noting that despite their complexity, some chains do not increase the treewidth estimate and we typically have more than K iterations where chains with more than one edge are added. [sent-229, score-1.121]
87 The number of such iterations is still polynomially bounded as for a Bayesian network with N variables adding more than K · N edges will necessarily result in a treewidth that is greater than K. [sent-230, score-0.985]
88 To evaluate the efficiency of our method we measured its running time as a function of the treewidth bound. [sent-231, score-0.859]
89 Observe that our method and the greedy thin junction tree approach are both approximately linear in the treewidth bound. [sent-233, score-1.242]
90 Furthermore, as their method can only be applied to a small treewidth bound, we also limited our model to a treewidth of two. [sent-246, score-1.693]
91 Both our method and the thin junction tree approach significantly outperform the LPACJT on small sample sizes. [sent-248, score-0.382]
92 The performance of our method is comparable to the greedy thin junction tree approach with no obvious superiority of either method. [sent-250, score-0.43]
93 In fact, Chechetka and Guestrin [3] show that even a Chow-Liu tree does rather well on these datasets (compare this to the gene expression dataset where the aggressive variant was superior even at a treewidth of five). [sent-252, score-1.018]
94 The superiority of our method when the ordering is used is obvious while the performance of the thin junction tree method degrades. [sent-258, score-0.503]
95 7 Discussion and Future Work In this work we presented a novel method for learning Bayesian networks of bounded treewidth in time that is polynomial in both the number of variables and the treewidth bound. [sent-261, score-1.784]
96 Our method builds on an edge update algorithm that dynamically maintains a valid moralized triangulation in a way that facilitates the addition of chains that are guaranteed to increase the treewidth bound by at most one. [sent-262, score-1.536]
97 We demonstrated the effectiveness of our treewidth-friendly method on real-life datasets, and showed that by utilizing global structure modification operators, we are able to learn better models than competing methods, even when the treewidth of the models learned is not constrained. [sent-263, score-0.93]
98 In addition, unlike the thin junction trees approach of Bach and Jordan [2], we provide a guarantee that our estimate of the treewidth bound will not increase by more than one at each iteration. [sent-265, score-1.205]
99 To our knowledge, ours is the first method for efficiently learning Bayesian networks with an arbitrary treewidth bound that is not fully greedy. [sent-267, score-0.942]
100 Finally, we are most interested in exploring whether tools similar to the ones employed in this work could be used to dynamically update the bounded treewidth structure that is the approximating distribution in a variational approximate inference setting. [sent-272, score-0.941]
wordName wordTfidf (topN-words)
[('treewidth', 0.834), ('triangulation', 0.227), ('thin', 0.147), ('junction', 0.141), ('triangulated', 0.137), ('edge', 0.108), ('chechetka', 0.094), ('moralized', 0.094), ('graph', 0.09), ('edges', 0.089), ('aggressive', 0.088), ('chain', 0.086), ('contaminated', 0.076), ('ordering', 0.074), ('node', 0.073), ('chains', 0.073), ('tree', 0.069), ('block', 0.067), ('contamination', 0.064), ('guestrin', 0.059), ('unbounded', 0.057), ('path', 0.055), ('cm', 0.053), ('chow', 0.05), ('bound', 0.05), ('clique', 0.05), ('undirected', 0.049), ('bayesian', 0.048), ('nodes', 0.044), ('friendly', 0.042), ('hapmap', 0.042), ('blocks', 0.04), ('bic', 0.037), ('update', 0.036), ('network', 0.034), ('os', 0.034), ('networks', 0.033), ('increase', 0.033), ('lpacjt', 0.031), ('polynomial', 0.03), ('ot', 0.03), ('traf', 0.03), ('ci', 0.029), ('parents', 0.029), ('dm', 0.029), ('induced', 0.028), ('bounded', 0.028), ('added', 0.028), ('additions', 0.027), ('learn', 0.027), ('paths', 0.027), ('gene', 0.027), ('maximal', 0.026), ('greedy', 0.026), ('outline', 0.025), ('method', 0.025), ('search', 0.025), ('liu', 0.024), ('endpoints', 0.024), ('global', 0.023), ('operators', 0.023), ('procedure', 0.022), ('superiority', 0.022), ('dynamically', 0.022), ('structure', 0.021), ('appealingly', 0.021), ('haplotype', 0.021), ('moralization', 0.021), ('pai', 0.021), ('robertson', 0.021), ('snps', 0.021), ('tabu', 0.021), ('treewidths', 0.021), ('triangulations', 0.021), ('cations', 0.021), ('maxima', 0.02), ('score', 0.019), ('bach', 0.018), ('chromosome', 0.018), ('exit', 0.018), ('chords', 0.018), ('triangulate', 0.018), ('commit', 0.018), ('cpds', 0.018), ('ol', 0.018), ('surpasses', 0.018), ('iteration', 0.018), ('guaranteed', 0.018), ('parent', 0.018), ('acyclic', 0.018), ('temperature', 0.017), ('vertices', 0.017), ('yeast', 0.017), ('lauritzen', 0.017), ('add', 0.016), ('modi', 0.016), ('characterize', 0.016), ('valid', 0.016), ('heart', 0.016), ('structures', 0.016)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000005 115 nips-2008-Learning Bounded Treewidth Bayesian Networks
Author: Gal Elidan, Stephen Gould
Abstract: With the increased availability of data for complex domains, it is desirable to learn Bayesian network structures that are sufficiently expressive for generalization while also allowing for tractable inference. While the method of thin junction trees can, in principle, be used for this purpose, its fully greedy nature makes it prone to overfitting, particularly when data is scarce. In this work we present a novel method for learning Bayesian networks of bounded treewidth that employs global structure modifications and that is polynomial in the size of the graph and the treewidth bound. At the heart of our method is a triangulated graph that we dynamically update in a way that facilitates the addition of chain structures that increase the bound on the model’s treewidth by at most one. We demonstrate the effectiveness of our “treewidth-friendly” method on several real-life datasets. Importantly, we also show that by using global operators, we are able to achieve better generalization even when learning Bayesian networks of unbounded treewidth. 1
2 0.091192447 69 nips-2008-Efficient Exact Inference in Planar Ising Models
Author: Nicol N. Schraudolph, Dmitry Kamenetsky
Abstract: We give polynomial-time algorithms for the exact computation of lowest-energy states, worst margin violators, partition functions, and marginals in certain binary undirected graphical models. Our approach provides an interesting alternative to the well-known graph cut paradigm in that it does not impose any submodularity constraints; instead we require planarity to establish a correspondence with perfect matchings in an expanded dual graph. Maximum-margin parameter estimation for a boundary detection task shows our approach to be efficient and effective. A C++ implementation is available from http://nic.schraudolph.org/isinf/. 1
3 0.078351982 84 nips-2008-Fast Prediction on a Tree
Author: Mark Herbster, Massimiliano Pontil, Sergio R. Galeano
Abstract: Given an n-vertex weighted tree with structural diameter S and a subset of m vertices, we present a technique to compute a corresponding m × m Gram matrix of the pseudoinverse of the graph Laplacian in O(n + m2 + mS) time. We discuss the application of this technique to fast label prediction on a generic graph. We approximate the graph with a spanning tree and then we predict with the kernel perceptron. We address the approximation of the graph with either a minimum spanning tree or a shortest path tree. The fast computation of the pseudoinverse enables us to address prediction problems on large graphs. We present experiments on two web-spam classification tasks, one of which includes a graph with 400,000 vertices and more than 10,000,000 edges. The results indicate that the accuracy of our technique is competitive with previous methods using the full graph information. 1
4 0.077867381 152 nips-2008-Non-stationary dynamic Bayesian networks
Author: Joshua W. Robinson, Alexander J. Hartemink
Abstract: A principled mechanism for identifying conditional dependencies in time-series data is provided through structure learning of dynamic Bayesian networks (DBNs). An important assumption of DBN structure learning is that the data are generated by a stationary process—an assumption that is not true in many important settings. In this paper, we introduce a new class of graphical models called non-stationary dynamic Bayesian networks, in which the conditional dependence structure of the underlying data-generation process is permitted to change over time. Non-stationary dynamic Bayesian networks represent a new framework for studying problems in which the structure of a network is evolving over time. We define the non-stationary DBN model, present an MCMC sampling algorithm for learning the structure of the model from time-series data under different assumptions, and demonstrate the effectiveness of the algorithm on both simulated and biological data. 1
5 0.069154583 34 nips-2008-Bayesian Network Score Approximation using a Metagraph Kernel
Author: Benjamin Yackley, Eduardo Corona, Terran Lane
Abstract: Many interesting problems, including Bayesian network structure-search, can be cast in terms of finding the optimum value of a function over the space of graphs. However, this function is often expensive to compute exactly. We here present a method derived from the study of Reproducing Kernel Hilbert Spaces which takes advantage of the regular structure of the space of all graphs on a fixed number of nodes to obtain approximations to the desired function quickly and with reasonable accuracy. We then test this method on both a small testing set and a real-world Bayesian network; the results suggest that not only is this method reasonably accurate, but that the BDe score itself varies quadratically over the space of all graphs. 1
6 0.068321817 135 nips-2008-Model Selection in Gaussian Graphical Models: High-Dimensional Consistency of \boldmath$\ell 1$-regularized MLE
7 0.059593227 171 nips-2008-Online Prediction on Large Diameter Graphs
8 0.053135864 70 nips-2008-Efficient Inference in Phylogenetic InDel Trees
9 0.051592026 108 nips-2008-Integrating Locally Learned Causal Structures with Overlapping Variables
10 0.049503371 194 nips-2008-Regularized Learning with Networks of Features
11 0.048393127 40 nips-2008-Bounds on marginal probability distributions
12 0.045674872 107 nips-2008-Influence of graph construction on graph-based clustering measures
13 0.044146884 224 nips-2008-Structured ranking learning using cumulative distribution networks
14 0.043502025 129 nips-2008-MAS: a multiplicative approximation scheme for probabilistic inference
15 0.043431979 170 nips-2008-Online Optimization in X-Armed Bandits
16 0.041778762 104 nips-2008-Improved Moves for Truncated Convex Models
17 0.039842673 77 nips-2008-Evaluating probabilities under high-dimensional latent variable models
18 0.038514715 53 nips-2008-Counting Solution Clusters in Graph Coloring Problems Using Belief Propagation
19 0.037684333 4 nips-2008-A Scalable Hierarchical Distributed Language Model
20 0.034494393 50 nips-2008-Continuously-adaptive discretization for message-passing algorithms
topicId topicWeight
[(0, -0.114), (1, -0.004), (2, -0.014), (3, 0.025), (4, 0.066), (5, -0.098), (6, 0.042), (7, 0.013), (8, 0.031), (9, -0.123), (10, -0.095), (11, 0.045), (12, -0.015), (13, -0.066), (14, 0.014), (15, -0.021), (16, 0.033), (17, -0.043), (18, -0.035), (19, -0.05), (20, 0.007), (21, 0.019), (22, 0.003), (23, -0.082), (24, -0.048), (25, 0.003), (26, -0.058), (27, 0.014), (28, -0.002), (29, -0.003), (30, 0.034), (31, 0.029), (32, -0.007), (33, -0.009), (34, 0.038), (35, -0.01), (36, -0.014), (37, -0.036), (38, 0.015), (39, -0.002), (40, -0.045), (41, -0.052), (42, 0.016), (43, 0.0), (44, -0.004), (45, -0.025), (46, -0.025), (47, 0.06), (48, -0.044), (49, 0.022)]
simIndex simValue paperId paperTitle
same-paper 1 0.94135982 115 nips-2008-Learning Bounded Treewidth Bayesian Networks
Author: Gal Elidan, Stephen Gould
Abstract: With the increased availability of data for complex domains, it is desirable to learn Bayesian network structures that are sufficiently expressive for generalization while also allowing for tractable inference. While the method of thin junction trees can, in principle, be used for this purpose, its fully greedy nature makes it prone to overfitting, particularly when data is scarce. In this work we present a novel method for learning Bayesian networks of bounded treewidth that employs global structure modifications and that is polynomial in the size of the graph and the treewidth bound. At the heart of our method is a triangulated graph that we dynamically update in a way that facilitates the addition of chain structures that increase the bound on the model’s treewidth by at most one. We demonstrate the effectiveness of our “treewidth-friendly” method on several real-life datasets. Importantly, we also show that by using global operators, we are able to achieve better generalization even when learning Bayesian networks of unbounded treewidth. 1
2 0.75983351 69 nips-2008-Efficient Exact Inference in Planar Ising Models
Author: Nicol N. Schraudolph, Dmitry Kamenetsky
Abstract: We give polynomial-time algorithms for the exact computation of lowest-energy states, worst margin violators, partition functions, and marginals in certain binary undirected graphical models. Our approach provides an interesting alternative to the well-known graph cut paradigm in that it does not impose any submodularity constraints; instead we require planarity to establish a correspondence with perfect matchings in an expanded dual graph. Maximum-margin parameter estimation for a boundary detection task shows our approach to be efficient and effective. A C++ implementation is available from http://nic.schraudolph.org/isinf/. 1
3 0.71017206 34 nips-2008-Bayesian Network Score Approximation using a Metagraph Kernel
Author: Benjamin Yackley, Eduardo Corona, Terran Lane
Abstract: Many interesting problems, including Bayesian network structure-search, can be cast in terms of finding the optimum value of a function over the space of graphs. However, this function is often expensive to compute exactly. We here present a method derived from the study of Reproducing Kernel Hilbert Spaces which takes advantage of the regular structure of the space of all graphs on a fixed number of nodes to obtain approximations to the desired function quickly and with reasonable accuracy. We then test this method on both a small testing set and a real-world Bayesian network; the results suggest that not only is this method reasonably accurate, but that the BDe score itself varies quadratically over the space of all graphs. 1
4 0.70081419 84 nips-2008-Fast Prediction on a Tree
Author: Mark Herbster, Massimiliano Pontil, Sergio R. Galeano
Abstract: Given an n-vertex weighted tree with structural diameter S and a subset of m vertices, we present a technique to compute a corresponding m × m Gram matrix of the pseudoinverse of the graph Laplacian in O(n + m2 + mS) time. We discuss the application of this technique to fast label prediction on a generic graph. We approximate the graph with a spanning tree and then we predict with the kernel perceptron. We address the approximation of the graph with either a minimum spanning tree or a shortest path tree. The fast computation of the pseudoinverse enables us to address prediction problems on large graphs. We present experiments on two web-spam classification tasks, one of which includes a graph with 400,000 vertices and more than 10,000,000 edges. The results indicate that the accuracy of our technique is competitive with previous methods using the full graph information. 1
5 0.64176583 171 nips-2008-Online Prediction on Large Diameter Graphs
Author: Mark Herbster, Guy Lever, Massimiliano Pontil
Abstract: We continue our study of online prediction of the labelling of a graph. We show a fundamental limitation of Laplacian-based algorithms: if the graph has a large diameter then the number of mistakes made by such algorithms may be proportional to the square root of the number of vertices, even when tackling simple problems. We overcome this drawback by means of an efficient algorithm which achieves a logarithmic mistake bound. It is based on the notion of a spine, a path graph which provides a linear embedding of the original graph. In practice, graphs may exhibit cluster structure; thus in the last part, we present a modified algorithm which achieves the “best of both worlds”: it performs well locally in the presence of cluster structure, and globally on large diameter graphs. 1
6 0.6320346 152 nips-2008-Non-stationary dynamic Bayesian networks
7 0.56649524 70 nips-2008-Efficient Inference in Phylogenetic InDel Trees
8 0.56499195 104 nips-2008-Improved Moves for Truncated Convex Models
9 0.54124063 107 nips-2008-Influence of graph construction on graph-based clustering measures
10 0.51334339 236 nips-2008-The Mondrian Process
11 0.50629938 108 nips-2008-Integrating Locally Learned Causal Structures with Overlapping Variables
12 0.48951918 135 nips-2008-Model Selection in Gaussian Graphical Models: High-Dimensional Consistency of \boldmath$\ell 1$-regularized MLE
13 0.48283488 212 nips-2008-Skill Characterization Based on Betweenness
14 0.48199052 53 nips-2008-Counting Solution Clusters in Graph Coloring Problems Using Belief Propagation
15 0.48060244 86 nips-2008-Finding Latent Causes in Causal Networks: an Efficient Approach Based on Markov Blankets
16 0.4603315 129 nips-2008-MAS: a multiplicative approximation scheme for probabilistic inference
17 0.45729434 225 nips-2008-Supervised Bipartite Graph Inference
18 0.45472145 11 nips-2008-A spatially varying two-sample recombinant coalescent, with applications to HIV escape response
19 0.4452804 82 nips-2008-Fast Computation of Posterior Mode in Multi-Level Hierarchical Models
20 0.44078946 18 nips-2008-An Efficient Sequential Monte Carlo Algorithm for Coalescent Clustering
topicId topicWeight
[(6, 0.042), (7, 0.039), (12, 0.032), (28, 0.618), (57, 0.039), (59, 0.019), (63, 0.018), (71, 0.016), (77, 0.029), (83, 0.03)]
simIndex simValue paperId paperTitle
1 0.99885398 72 nips-2008-Empirical performance maximization for linear rank statistics
Author: Stéphan J. Clémençcon, Nicolas Vayatis
Abstract: The ROC curve is known to be the golden standard for measuring performance of a test/scoring statistic regarding its capacity of discrimination between two populations in a wide variety of applications, ranging from anomaly detection in signal processing to information retrieval, through medical diagnosis. Most practical performance measures used in scoring applications such as the AUC, the local AUC, the p-norm push, the DCG and others, can be seen as summaries of the ROC curve. This paper highlights the fact that many of these empirical criteria can be expressed as (conditional) linear rank statistics. We investigate the properties of empirical maximizers of such performance criteria and provide preliminary results for the concentration properties of a novel class of random variables that we will call a linear rank process. 1
2 0.99872184 190 nips-2008-Reconciling Real Scores with Binary Comparisons: A New Logistic Based Model for Ranking
Author: Nir Ailon
Abstract: The problem of ranking arises ubiquitously in almost every aspect of life, and in particular in Machine Learning/Information Retrieval. A statistical model for ranking predicts how humans rank subsets V of some universe U . In this work we define a statistical model for ranking that satisfies certain desirable properties. The model automatically gives rise to a logistic regression based approach to learning how to rank, for which the score and comparison based approaches are dual views. This offers a new generative approach to ranking which can be used for IR. There are two main contexts for this work. The first is the theory of econometrics and study of statistical models explaining human choice of alternatives. In this context, we will compare our model with other well known models. The second context is the problem of ranking in machine learning, usually arising in the context of information retrieval. Here, much work has been done in the discriminative setting, where different heuristics are used to define ranking risk functions. Our model is built rigorously and axiomatically based on very simple desirable properties defined locally for comparisons, and automatically implies the existence of a global score function serving as a natural model parameter which can be efficiently fitted to pairwise comparison judgment data by solving a convex optimization problem. 1
3 0.99774528 206 nips-2008-Sequential effects: Superstition or rational behavior?
Author: Angela J. Yu, Jonathan D. Cohen
Abstract: In a variety of behavioral tasks, subjects exhibit an automatic and apparently suboptimal sequential effect: they respond more rapidly and accurately to a stimulus if it reinforces a local pattern in stimulus history, such as a string of repetitions or alternations, compared to when it violates such a pattern. This is often the case even if the local trends arise by chance in the context of a randomized design, such that stimulus history has no real predictive power. In this work, we use a normative Bayesian framework to examine the hypothesis that such idiosyncrasies may reflect the inadvertent engagement of mechanisms critical for adapting to a changing environment. We show that prior belief in non-stationarity can induce experimentally observed sequential effects in an otherwise Bayes-optimal algorithm. The Bayesian algorithm is shown to be well approximated by linear-exponential filtering of past observations, a feature also apparent in the behavioral data. We derive an explicit relationship between the parameters and computations of the exact Bayesian algorithm and those of the approximate linear-exponential filter. Since the latter is equivalent to a leaky-integration process, a commonly used model of neuronal dynamics underlying perceptual decision-making and trial-to-trial dependencies, our model provides a principled account of why such dynamics are useful. We also show that parameter-tuning of the leaky-integration process is possible, using stochastic gradient descent based only on the noisy binary inputs. This is a proof of concept that not only can neurons implement near-optimal prediction based on standard neuronal dynamics, but that they can also learn to tune the processing parameters without explicitly representing probabilities. 1
same-paper 4 0.99759156 115 nips-2008-Learning Bounded Treewidth Bayesian Networks
Author: Gal Elidan, Stephen Gould
Abstract: With the increased availability of data for complex domains, it is desirable to learn Bayesian network structures that are sufficiently expressive for generalization while also allowing for tractable inference. While the method of thin junction trees can, in principle, be used for this purpose, its fully greedy nature makes it prone to overfitting, particularly when data is scarce. In this work we present a novel method for learning Bayesian networks of bounded treewidth that employs global structure modifications and that is polynomial in the size of the graph and the treewidth bound. At the heart of our method is a triangulated graph that we dynamically update in a way that facilitates the addition of chain structures that increase the bound on the model’s treewidth by at most one. We demonstrate the effectiveness of our “treewidth-friendly” method on several real-life datasets. Importantly, we also show that by using global operators, we are able to achieve better generalization even when learning Bayesian networks of unbounded treewidth. 1
5 0.99096143 117 nips-2008-Learning Taxonomies by Dependence Maximization
Author: Matthew Blaschko, Arthur Gretton
Abstract: We introduce a family of unsupervised algorithms, numerical taxonomy clustering, to simultaneously cluster data, and to learn a taxonomy that encodes the relationship between the clusters. The algorithms work by maximizing the dependence between the taxonomy and the original data. The resulting taxonomy is a more informative visualization of complex data than simple clustering; in addition, taking into account the relations between different clusters is shown to substantially improve the quality of the clustering, when compared with state-ofthe-art algorithms in the literature (both spectral clustering and a previous dependence maximization approach). We demonstrate our algorithm on image and text data. 1
6 0.9903909 110 nips-2008-Kernel-ARMA for Hand Tracking and Brain-Machine interfacing During 3D Motor Control
7 0.9902873 174 nips-2008-Overlaying classifiers: a practical approach for optimal ranking
8 0.98587722 126 nips-2008-Localized Sliced Inverse Regression
9 0.98310971 222 nips-2008-Stress, noradrenaline, and realistic prediction of mouse behaviour using reinforcement learning
10 0.97366422 77 nips-2008-Evaluating probabilities under high-dimensional latent variable models
11 0.96703148 159 nips-2008-On Bootstrapping the ROC Curve
12 0.9520784 93 nips-2008-Global Ranking Using Continuous Conditional Random Fields
13 0.94480455 223 nips-2008-Structure Learning in Human Sequential Decision-Making
14 0.94456869 101 nips-2008-Human Active Learning
15 0.94388765 211 nips-2008-Simple Local Models for Complex Dynamical Systems
16 0.94304049 107 nips-2008-Influence of graph construction on graph-based clustering measures
17 0.93972713 53 nips-2008-Counting Solution Clusters in Graph Coloring Problems Using Belief Propagation
18 0.93810171 112 nips-2008-Kernel Measures of Independence for non-iid Data
19 0.93808961 132 nips-2008-Measures of Clustering Quality: A Working Set of Axioms for Clustering
20 0.93486196 34 nips-2008-Bayesian Network Score Approximation using a Metagraph Kernel