nips nips2007 nips2007-75 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Ozgur Sumer, Umut Acar, Alexander T. Ihler, Ramgopal R. Mettu
Abstract: Motivated by stochastic systems in which observed evidence and conditional dependencies between states of the network change over time, and certain quantities of interest (marginal distributions, likelihood estimates etc.) must be updated, we study the problem of adaptive inference in tree-structured Bayesian networks. We describe an algorithm for adaptive inference that handles a broad range of changes to the network and is able to maintain marginal distributions, MAP estimates, and data likelihoods in all expected logarithmic time. We give an implementation of our algorithm and provide experiments that show that the algorithm can yield up to two orders of magnitude speedups on answering queries and responding to dynamic changes over the sum-product algorithm. 1
Reference: text
sentIndex sentText sentNum sentScore
1 We describe an algorithm for adaptive inference that handles a broad range of changes to the network and is able to maintain marginal distributions, MAP estimates, and data likelihoods in all expected logarithmic time. [sent-19, score-0.294]
2 We give an implementation of our algorithm and provide experiments that show that the algorithm can yield up to two orders of magnitude speedups on answering queries and responding to dynamic changes over the sum-product algorithm. [sent-20, score-0.339]
3 [6] identify dynamic Bayesian inference as the problem of performing Bayesian inference on a dynamically changing graphical model. [sent-26, score-0.352]
4 Dynamic changes to the graphical model may include changes to the observed evidence, to the structure of the graph itself (such as edge or node insertions/deletions), and changes to the conditional relationships among variables. [sent-27, score-0.769]
5 As an example of where adaptive inference can be effective, consider a computational biology application that requires computing the state of the active site in a protein as the user modifies the protein (e. [sent-43, score-0.362]
6 In this application, we can represent the protein with a graphical model and use marginal computations to determine the state of the active site. [sent-46, score-0.375]
7 We reflect the modifications to the protein by updating the graphical model representation and performing marginal queries to obtain the state of the active site. [sent-47, score-0.491]
8 Our approach achieves logarithmic update and query times by mapping an arbitrary tree-structued graphical model into a balanced representation that we call a cluster tree (Sec. [sent-50, score-1.029]
9 We perform an O(n)-time preprocessing step to compute the cluster tree using a technique known as tree contraction [13]. [sent-52, score-1.191]
10 We ensure that for an input network with n nodes, the cluster tree has an expected depth of O(log n) and expected size O(n). [sent-53, score-0.907]
11 We show that the nodes in the cluster tree can be tagged with partial computations (corresponding to marginalizations of subtrees of the input network) in way that allows marginal computations and changes to the network to be performed in O(log n) expected time. [sent-54, score-1.304]
12 In particular, we can still support changes to the parameters of the distribution (evidence and conditional relationships), although changes to the underlying graph structure become more difficult. [sent-59, score-0.357]
13 Additionally, a number of more sophisticated graphical models require efficient inference over trees at their core, including learning mixtures of trees [12] and tree-reparameterized max-product [15]. [sent-60, score-0.328]
14 It is important to contrast our notion of adapting to dynamic updates to the graphical model (due to changes in the evidence, or alterations of the structure and distribution) with the potentially more general definition of dynamic Bayes’ nets (DBNs) [14]. [sent-63, score-0.487]
15 Our work builds on previous work by Delcher, Grove, Kasif and Pearl [6]; they give an algorithm to update Bayesian networks dynamically as the observed variables in the network change and compute belief queries of hidden variables in logarithmic time. [sent-67, score-0.293]
16 The key difference between their work and ours is that their algorithm only supports updates to observed evidence, and does not support dynamic changes to the graph structure (i. [sent-68, score-0.386]
17 In many applications it is important to consider the effect of changes to conditional relationships between variables; for example, to study protein structure (see Sec. [sent-71, score-0.316]
18 Another difference includes the use of tree contraction: they use tree contractions to answer queries and perform updates. [sent-75, score-0.779]
19 We use tree contractions to construct a cluster tree, which we then use to perform queries and all other updates (except for insertions/deletions). [sent-76, score-1.115]
20 Our approach to clustering factor graphs using tree contractions is based on previous results that show that tree contractions can be updated in expected logarithmic time under certain dynamic changes by using a general-purpose change-propagation algorithm [2]. [sent-78, score-1.112]
21 We write the operation of marginalizing over xi as xi , and let Xj represent an index-ordered subset of variables and f (Xj ) a function defined over those variables, so that for example if Xj = {x2 , x3 , x5 }, then the function f (Xj ) = f (x2 , x3 , x5 ). [sent-89, score-0.373]
22 A factor graph [10] is one type of graphical model, similar to a Bayes’ net [14] or Markov random field [5] used to represent the factorization structure of a function g(x1 , . [sent-95, score-0.345]
23 In particular, suppose that g decomposes into a product of simpler functions, g(X) = j fj (Xj ), for some collection of real-valued functions fj , called factors, whose arguments are (index-ordered) sets Xj ⊆ X. [sent-99, score-0.495]
24 A factor graph consists of a graph-theoretic abstraction of g’s factorization, with vertices of the graph representing variables xi and factors fj . [sent-100, score-0.751]
25 Because of the close correspondence between these quantities, we abuse notation slightly and use xi to indicate both the variable and its associated vertex, and fj to indicate both the factor and its vertex. [sent-101, score-0.454]
26 A factor graph is a bipartite graph G = (X + F, E) where X = {x1 , x2 , . [sent-104, score-0.324]
27 A factor tree is a factor graph G where G is a tree. [sent-111, score-0.62]
28 The graph G represents the function g(X) = j fj (Xj ) if, for each factor fj , the arguments of fj are its neighbors in G, i. [sent-113, score-0.98]
29 Other types of graphical models, such as Bayes’ nets [14], can be easily converted into a factor graph representation. [sent-116, score-0.313]
30 When the Bayes’ net is a polytree (singly connected directed acyclic graph), the resulting factor graph is a factor tree. [sent-117, score-0.333]
31 For every pair of neighboring vertices (xi , fj ) ∈ E, the vertex xi sends a message µxi →fj as soon as it receives the messages from all of its neighbors except for fj , and similarly for the message from fj to xi . [sent-122, score-1.285]
32 The messages between these vertices take the form of a real-valued function of the variable xi ; for discrete-valued xi this can be represented as a vector of length |Ai |. [sent-123, score-0.383]
33 3 Constructing the Cluster Tree In this section, we describe an algorithm for constructing a balanced representation of the input graphical model, that we call a cluster tree. [sent-136, score-0.635]
34 Given the input graphical model, we first apply a clustering algorithm that hierarchically clusters the graphical model, and then apply a labeling algorithm that labels the clusters with cluster functions that can be used to compute marginal queries. [sent-137, score-1.077]
35 Given a factor graph as input, we first tag each node v with a unary cluster that consists of v and each edge (u, v) with a binary cluster that consists of the edge (u, v). [sent-139, score-1.737]
36 We then cluster the tree hierarchically by applying the rake, compress, and finalize operations. [sent-140, score-0.828]
37 When applied to a leaf node v with neighbor u, the rake operation deletes the v and the edge (u, v), and forms unary cluster by combining the clusters which tag either v or (u, v); u is tagged with the resulting cluster. [sent-141, score-1.388]
38 When applied to a degree-two node v with neighbors u and w, a compress operation deletes v and the edges (u, v) and (v, w), inserts the edge (u, w), and forms a binary cluster by combining the clusters which tag the deleted node and edges; (u, w) is then tagged with the resulting cluster. [sent-142, score-1.601]
39 A finalize operation is applied when the tree consists of a single node (when no edges remain); it constructs a final cluster that consists of all the clusters with which the final node is tagged. [sent-143, score-1.362]
40 We cluster a tree T by applying rake and compress operations in rounds. [sent-144, score-1.193]
41 Each round consists of the following two steps until no more edges remain: (1) Apply the rake operation to each leaf; (2) Apply the compress operation to an independent set of degree-two nodes. [sent-145, score-0.67]
42 We choose a random independent set: we flip a coin for each node in each round and apply compress to a degree-two node only if it flips heads and its two neighbors flips tails. [sent-146, score-0.595]
43 x3 ¯ ¯ f3 x3 x1 f3 x2 ¯ ¯ x3f3 ¯ f1 ¯ x1 x1f3 f2 x2 x2f3 f5 f1 x1f1 f2 x4 ¯ ¯ ¯ f5 x4 f4 = x3x4 x3f4 f4 x4f4 x4f5 x2f2 Figure 2: A cluster tree. [sent-149, score-0.541]
44 The leaves of the cluster tree correspond to the nodes and the edges of the factor graph. [sent-157, score-1.142]
45 Each internal node v corresponds a unary or a binary cluster formed by deleting v. [sent-158, score-0.829]
46 The ¯ children of an internal node are the edges and the nodes deleted during the operation that forms the ¯ cluster. [sent-159, score-0.662]
47 For example, the cluster f1 is formed by the rake operation applied to f1 in round 1. [sent-160, score-0.893]
48 The ¯ children of f1 are node f1 and edge (f1 , x1 ), which are deleted during that operation. [sent-161, score-0.399]
49 After building the cluster tree, we compute cluster functions along with a notion of orientation for neighboring clusters in a second pass, which we call labeling. [sent-163, score-1.157]
50 1 The cluster function at a node v in the tree is computed recursively using the cluster functions at v ’s child ¯ ¯ clusters, which we denote Sv = {¯1 , . [sent-164, score-1.514]
51 Intuitively, each cluster function corresponds to a v ¯ ¯ partial marginalization of the factors contained in cluster v . [sent-168, score-1.151]
52 ¯ Since each cluster function is defined over a subset of the variables in the original graph, we require some additional notation to represent these sets. [sent-169, score-0.541]
53 Specifically, for a cluster v , let A(¯) be ¯ v the arguments of its cluster function, and let V(¯) be the set of all arguments of its children, v V(¯) = i A(¯i ). [sent-170, score-1.204]
54 In a slight abuse of notation, we let A(v) be the arguments of the node v in v v the original graph, so that if v is a variable node A(v) = v and if v is a factor node A(v) = N (v). [sent-171, score-0.61]
55 The cluster functions cv (·) and their arguments are then defined recursively, as follows. [sent-172, score-0.691]
56 For the base ¯ case, the leaf nodes of the cluster tree correspond to nodes v in the original graph, and we define cv using the original variables and factors. [sent-173, score-1.22]
57 If v is a factor node fj , we take cv (A(v)) = fj (Xj ), and if v is a variable node xi , A(v) = xi and cv = 1. [sent-174, score-1.262]
58 For nodes of the cluster tree corresponding to edges (u, v) of the original graph, we simply take A(u, v) = ∅ and cu,v = 1. [sent-175, score-1.028]
59 The cluster function at an internal node of the cluster tree is then given by combining the cluster functions of its children and marginalizing over as many variables as possible. [sent-176, score-2.252]
60 If v is a binary cluster (u, w), that is, ¯ at v’s removal it had two neighbors u and w, then cv is given by ¯ cv (A(¯)) = v ¯ cvi (A(¯i )) v ¯ V(¯)\A(¯) vi ∈Sv v v ¯ ¯ where the arguments A(¯) = V(¯) ∩ (A(u) ∪ A(w)). [sent-178, score-0.916]
61 For unary cluster v , where v had a single v v ¯ neighbor u at its removal, cv (·) is calculated in the same way with A(w) = ∅. [sent-179, score-0.723]
62 ¯ We also compute an orientation for each cluster’s neighbors based on their proximity to the cluster tree’s root. [sent-180, score-0.59]
63 For a unary cluster v with parent u in the cluster tree, we define in(¯) = u. [sent-182, score-1.175]
64 For a binary cluster v with ¯ ¯ v ¯ ¯ neighbors u, w at v’s removal, define in(¯) = w and out(¯) = u if w = in(¯); otherwise in(¯) = u v ¯ v ¯ ¯ u v ¯ and out(¯) = w. [sent-183, score-0.59]
65 v ¯ We now describe the efficiency of our clustering and labeling algorithms and show that the resulting cluster tree is linear in the size of the input factor graph. [sent-184, score-1.069]
66 A factor tree of n nodes with maximum degree of k can be clustered and labeled in expected O(dk+2 n) time where d is the domain size of each variable in the factor tree. [sent-186, score-0.639]
67 The resulting cluster tree has exactly 2n − 1 leaves and n internal clusters (nodes) and expected O(log n) depth where the expectation is taken over internal randomization (over the coin flips). [sent-187, score-1.04]
68 Furthermore, the cluster tree has the following properties: (1) each cluster has at most k + 1 children, and (2) if v = (u, w) is a binary cluster, then u and w are ancestors of v , and one of them ¯ ¯ ¯ ¯ is the parent of v . [sent-188, score-1.407]
69 The bound on the number of nodes holds because the leaves of the cluster tree correspond to the n − 1 edges and n nodes of the factor graph. [sent-192, score-1.266]
70 To see that each cluster has at most k + 1 children, note that the a rake or compress operation deletes one node and the at most k edges incident on that node. [sent-193, score-1.232]
71 Every edge appearing in any level of the tree contraction algorithm is represented as a binary cluster v = (u, w) in the cluster tree. [sent-194, score-1.525]
72 To bound the time for computing a cluster function, observe that A(¯) is always a singleton set if v is a unary cluster, and A(¯) has at most two variables v ¯ v if v is a binary cluster. [sent-199, score-0.634]
73 ¯ 5 v the cluster function at v is bounded by O(|Sv | d|V(¯)| ), where Sv are the children of v . [sent-202, score-0.654]
74 Since each ¯ ¯ ¯ ¯ cluster can appear only once as a child, |Sv | is O(n) and thus the labeling step takes O(dk+2 n) ¯ time. [sent-203, score-0.603]
75 4 Queries and Dynamic Changes We give algorithms for computing marginal queries on the cluster trees and restructuring the cluster tree with respect to changes in the underlying graphical model. [sent-205, score-1.887]
76 We answer marginal queries at a vertex v of the graphical node by using the cluster tree. [sent-208, score-1.05]
77 At a high level, the idea is to find the leaf of the cluster tree corresponding to v and compute the messages along the path from the root of the cluster tree to v. [sent-209, score-1.797]
78 ¯ ¯ Messages are computed only at the ancestors of the query node xi and downward along the path to xi ; there are at most O(log n) nodes in this path by Theorem 1. [sent-214, score-0.62]
79 Given a factor graph and its cluster tree, we can change the function of a factor and update the cluster tree by starting at the leaf of the cluster tree that corresponds to the factor and relabeling all the clusters on the path to the root. [sent-217, score-2.838]
80 Updating these labels suffices, because the label of a cluster is a function of its children only. [sent-218, score-0.654]
81 Since relabeling a cluster takes O(kdk+2 ) time and the cluster tree has expected O(log n) depth, any update requires O(kdk+2 log n) time. [sent-219, score-1.402]
82 To allow changes to the factor graph itself by insertion/deletion of edges, we maintain a forest of factor trees and the corresponding cluster trees (obtained by clustering the trees one by one). [sent-220, score-1.351]
83 We also maintain the sequence of operations used to construct each cluster tree, i. [sent-221, score-0.605]
84 It turns out that both operations have only a limited effect on the sequence of clustering operations performed during construction, affecting only a constant number of nodes at each round of the process. [sent-226, score-0.424]
85 Using a general-purpose change propagation technique (detailed in previous work [2, 1]) the necessary alterations can be made to the cluster tree in expected O(log n) time. [sent-227, score-0.897]
86 Change propagation gives us a new cluster tree that corresponds to the cluster tree that we would have obtained by re-clustering from scratch, conditioned on the same internal randomization process. [sent-228, score-1.706]
87 In addition to changing the structure of the cluster tree via change propagation, we must also change the labeling information (cluster functions and orientation) of the affected nodes, which can be done using the same process described in Sec. [sent-229, score-1.021]
88 It is a property of the tree contraction process that all such affected clusters form a subtree of the cluster tree that includes the root. [sent-231, score-1.266]
89 Since change propagation affects an expected O(log n) clusters, and since each cluster can be labeled in O(kdk+2 ) time, the new labels can be computed in O(kdk+2 log n) time. [sent-232, score-0.572]
90 For a factor forest F of n vertices with maximum degree k, and domain size d, the forest of cluster trees can be updated in expected O(kdk+2 log n) time under edge insertions/deletions, and changes to factors. [sent-235, score-1.052]
91 3 shows the results of a simulation experiment in which we considered randomly generated factor trees between 100 and 1000 nodes, with each variable having 51 , 52 , or 53 states, so that each factor has size between 52 and 56 . [sent-238, score-0.321]
92 These factor tree correspond roughly to the junction trees of models with between 200 and 6000 nodes, where each node has up to 5 states. [sent-239, score-0.677]
93 Our results show that the time required to build the cluster tree is comparable to one run of sum-product. [sent-240, score-0.828]
94 Furthermore, the query and update operations in the cluster tree incur relatively small constant factors in their asymptotic running time, and are between one to two orders of magnitude faster than recomputing from scratch. [sent-241, score-1.038]
95 These models are typically constructed by taking each node as an amino acid whose states represent their most common conformations, known as rotamers [7], and basing conditional probabilities on proximity, and a physical energy function (e. [sent-244, score-0.348]
96 One such application is computational mutagenesis, in which functional amino acids in a protein structure are identified by examining systematic amino acid mutations in the protein structure (i. [sent-248, score-0.675]
97 Thus, our algorithm has the potential to substantially speed up computational studies that examine each of a large number local changes to protein structure, such as in the study of protein flexibility and dynamics. [sent-256, score-0.407]
98 Applying our algorithm on the junction tree representation of a graph yields an inference algorithm that can handle updates on conditional distributions and observed evidence in general Bayesian networks (e. [sent-267, score-0.645]
99 7 Naive sum−product Build Query Update Restructure −1 Time (sec) 10 −2 10 −3 10 2 3 10 10 # of nodes Figure 3: Log-log plot of run time for naive sum-product, building the cluster tree, computing queries, updating factors, and restructuring (adding and deleting edges). [sent-276, score-0.743]
100 Although building the cluster tree is slightly more expensive than sum-product, each subsequent update and query is between 10 and 100 times more efficient than recomputing from scratch. [sent-277, score-0.895]
wordName wordTfidf (topN-words)
[('cluster', 0.541), ('tree', 0.287), ('fj', 0.217), ('protein', 0.157), ('kdk', 0.152), ('rake', 0.152), ('compress', 0.149), ('node', 0.145), ('queries', 0.129), ('nodes', 0.124), ('xi', 0.123), ('factor', 0.114), ('children', 0.113), ('round', 0.107), ('graph', 0.105), ('acar', 0.095), ('umut', 0.095), ('graphical', 0.094), ('unary', 0.093), ('changes', 0.093), ('trees', 0.093), ('operation', 0.093), ('cv', 0.089), ('messages', 0.086), ('updates', 0.082), ('amino', 0.08), ('edge', 0.08), ('edges', 0.076), ('contractions', 0.076), ('delcher', 0.076), ('deletes', 0.076), ('nalize', 0.076), ('contraction', 0.076), ('clusters', 0.075), ('dynamic', 0.074), ('marginal', 0.071), ('vertex', 0.07), ('query', 0.067), ('message', 0.066), ('clustering', 0.065), ('operations', 0.064), ('labeling', 0.062), ('arguments', 0.061), ('sv', 0.061), ('deleted', 0.061), ('xj', 0.058), ('blelloch', 0.057), ('mettu', 0.057), ('leaf', 0.055), ('acids', 0.053), ('computations', 0.053), ('dynamically', 0.051), ('evidence', 0.051), ('acid', 0.051), ('vertices', 0.051), ('internal', 0.05), ('neighbors', 0.049), ('removal', 0.049), ('inference', 0.048), ('ips', 0.045), ('guy', 0.045), ('orders', 0.043), ('network', 0.042), ('logarithmic', 0.04), ('sent', 0.04), ('tagged', 0.04), ('forest', 0.04), ('updating', 0.04), ('alterations', 0.038), ('cvi', 0.038), ('distributive', 0.038), ('kasif', 0.038), ('mout', 0.038), ('mutagenesis', 0.038), ('raked', 0.038), ('restructuring', 0.038), ('rotameric', 0.038), ('rotamers', 0.038), ('sxi', 0.038), ('tag', 0.038), ('junction', 0.038), ('ancestors', 0.038), ('changing', 0.037), ('depth', 0.037), ('chicago', 0.036), ('factors', 0.036), ('marginalizing', 0.034), ('conditional', 0.034), ('mutations', 0.033), ('inserts', 0.033), ('cui', 0.033), ('jorge', 0.033), ('recompute', 0.033), ('relabeling', 0.033), ('rotamer', 0.033), ('marginalization', 0.033), ('marginals', 0.032), ('structure', 0.032), ('su', 0.032), ('change', 0.031)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000006 75 nips-2007-Efficient Bayesian Inference for Dynamically Changing Graphs
Author: Ozgur Sumer, Umut Acar, Alexander T. Ihler, Ramgopal R. Mettu
Abstract: Motivated by stochastic systems in which observed evidence and conditional dependencies between states of the network change over time, and certain quantities of interest (marginal distributions, likelihood estimates etc.) must be updated, we study the problem of adaptive inference in tree-structured Bayesian networks. We describe an algorithm for adaptive inference that handles a broad range of changes to the network and is able to maintain marginal distributions, MAP estimates, and data likelihoods in all expected logarithmic time. We give an implementation of our algorithm and provide experiments that show that the algorithm can yield up to two orders of magnitude speedups on answering queries and responding to dynamic changes over the sum-product algorithm. 1
2 0.18548432 80 nips-2007-Ensemble Clustering using Semidefinite Programming
Author: Vikas Singh, Lopamudra Mukherjee, Jiming Peng, Jinhui Xu
Abstract: We consider the ensemble clustering problem where the task is to ‘aggregate’ multiple clustering solutions into a single consolidated clustering that maximizes the shared information among given clustering solutions. We obtain several new results for this problem. First, we note that the notion of agreement under such circumstances can be better captured using an agreement measure based on a 2D string encoding rather than voting strategy based methods proposed in literature. Using this generalization, we first derive a nonlinear optimization model to maximize the new agreement measure. We then show that our optimization problem can be transformed into a strict 0-1 Semidefinite Program (SDP) via novel convexification techniques which can subsequently be relaxed to a polynomial time solvable SDP. Our experiments indicate improvements not only in terms of the proposed agreement measure but also the existing agreement measures based on voting strategies. We discuss evaluations on clustering and image segmentation databases. 1
3 0.17875537 61 nips-2007-Convex Clustering with Exemplar-Based Models
Author: Danial Lashkari, Polina Golland
Abstract: Clustering is often formulated as the maximum likelihood estimation of a mixture model that explains the data. The EM algorithm widely used to solve the resulting optimization problem is inherently a gradient-descent method and is sensitive to initialization. The resulting solution is a local optimum in the neighborhood of the initial guess. This sensitivity to initialization presents a significant challenge in clustering large data sets into many clusters. In this paper, we present a different approach to approximate mixture fitting for clustering. We introduce an exemplar-based likelihood function that approximates the exact likelihood. This formulation leads to a convex minimization problem and an efficient algorithm with guaranteed convergence to the globally optimal solution. The resulting clustering can be thought of as a probabilistic mapping of the data points to the set of exemplars that minimizes the average distance and the information-theoretic cost of mapping. We present experimental results illustrating the performance of our algorithm and its comparison with the conventional approach to mixture model clustering. 1
4 0.14403895 116 nips-2007-Learning the structure of manifolds using random projections
Author: Yoav Freund, Sanjoy Dasgupta, Mayank Kabra, Nakul Verma
Abstract: We present a simple variant of the k-d tree which automatically adapts to intrinsic low dimensional structure in data. 1
5 0.13705847 157 nips-2007-Privacy-Preserving Belief Propagation and Sampling
Author: Michael Kearns, Jinsong Tan, Jennifer Wortman
Abstract: We provide provably privacy-preserving versions of belief propagation, Gibbs sampling, and other local algorithms — distributed multiparty protocols in which each party or vertex learns only its final local value, and absolutely nothing else. 1
6 0.13207538 197 nips-2007-The Infinite Markov Model
7 0.12416242 92 nips-2007-Fixing Max-Product: Convergent Message Passing Algorithms for MAP LP-Relaxations
8 0.12402464 20 nips-2007-Adaptive Embedded Subgraph Algorithms using Walk-Sum Analysis
9 0.12071918 78 nips-2007-Efficient Principled Learning of Thin Junction Trees
10 0.11859813 46 nips-2007-Cluster Stability for Finite Samples
11 0.11373898 204 nips-2007-Theoretical Analysis of Heuristic Search Methods for Online POMDPs
12 0.11166268 94 nips-2007-Gaussian Process Models for Link Analysis and Transfer Learning
13 0.10775231 143 nips-2007-Object Recognition by Scene Alignment
14 0.10145524 141 nips-2007-New Outer Bounds on the Marginal Polytope
15 0.094294816 120 nips-2007-Linear programming analysis of loopy belief propagation for weighted matching
16 0.093708336 187 nips-2007-Structured Learning with Approximate Inference
17 0.092900209 121 nips-2007-Local Algorithms for Approximate Inference in Minor-Excluded Graphs
18 0.091773853 68 nips-2007-Discovering Weakly-Interacting Factors in a Complex Stochastic Process
19 0.091383658 31 nips-2007-Bayesian Agglomerative Clustering with Coalescents
20 0.091289528 128 nips-2007-Message Passing for Max-weight Independent Set
topicId topicWeight
[(0, -0.237), (1, 0.018), (2, -0.151), (3, -0.019), (4, -0.063), (5, -0.252), (6, 0.017), (7, 0.136), (8, 0.118), (9, -0.112), (10, -0.215), (11, 0.11), (12, 0.066), (13, -0.157), (14, -0.072), (15, 0.142), (16, 0.008), (17, -0.069), (18, -0.015), (19, 0.028), (20, 0.089), (21, 0.022), (22, 0.018), (23, 0.084), (24, 0.163), (25, -0.032), (26, -0.05), (27, -0.099), (28, -0.077), (29, 0.046), (30, 0.03), (31, 0.014), (32, -0.023), (33, -0.057), (34, -0.057), (35, -0.048), (36, -0.089), (37, -0.016), (38, 0.012), (39, 0.011), (40, 0.013), (41, 0.047), (42, 0.048), (43, -0.069), (44, 0.037), (45, 0.021), (46, 0.001), (47, -0.092), (48, -0.057), (49, 0.096)]
simIndex simValue paperId paperTitle
same-paper 1 0.98093927 75 nips-2007-Efficient Bayesian Inference for Dynamically Changing Graphs
Author: Ozgur Sumer, Umut Acar, Alexander T. Ihler, Ramgopal R. Mettu
Abstract: Motivated by stochastic systems in which observed evidence and conditional dependencies between states of the network change over time, and certain quantities of interest (marginal distributions, likelihood estimates etc.) must be updated, we study the problem of adaptive inference in tree-structured Bayesian networks. We describe an algorithm for adaptive inference that handles a broad range of changes to the network and is able to maintain marginal distributions, MAP estimates, and data likelihoods in all expected logarithmic time. We give an implementation of our algorithm and provide experiments that show that the algorithm can yield up to two orders of magnitude speedups on answering queries and responding to dynamic changes over the sum-product algorithm. 1
2 0.70411038 78 nips-2007-Efficient Principled Learning of Thin Junction Trees
Author: Anton Chechetka, Carlos Guestrin
Abstract: We present the first truly polynomial algorithm for PAC-learning the structure of bounded-treewidth junction trees – an attractive subclass of probabilistic graphical models that permits both the compact representation of probability distributions and efficient exact inference. For a constant treewidth, our algorithm has polynomial time and sample complexity. If a junction tree with sufficiently strong intraclique dependencies exists, we provide strong theoretical guarantees in terms of KL divergence of the result from the true distribution. We also present a lazy extension of our approach that leads to very significant speed ups in practice, and demonstrate the viability of our method empirically, on several real world datasets. One of our key new theoretical insights is a method for bounding the conditional mutual information of arbitrarily large sets of variables with only polynomially many mutual information computations on fixed-size subsets of variables, if the underlying distribution can be approximated by a bounded-treewidth junction tree. 1
3 0.62122947 61 nips-2007-Convex Clustering with Exemplar-Based Models
Author: Danial Lashkari, Polina Golland
Abstract: Clustering is often formulated as the maximum likelihood estimation of a mixture model that explains the data. The EM algorithm widely used to solve the resulting optimization problem is inherently a gradient-descent method and is sensitive to initialization. The resulting solution is a local optimum in the neighborhood of the initial guess. This sensitivity to initialization presents a significant challenge in clustering large data sets into many clusters. In this paper, we present a different approach to approximate mixture fitting for clustering. We introduce an exemplar-based likelihood function that approximates the exact likelihood. This formulation leads to a convex minimization problem and an efficient algorithm with guaranteed convergence to the globally optimal solution. The resulting clustering can be thought of as a probabilistic mapping of the data points to the set of exemplars that minimizes the average distance and the information-theoretic cost of mapping. We present experimental results illustrating the performance of our algorithm and its comparison with the conventional approach to mixture model clustering. 1
4 0.61617607 27 nips-2007-Anytime Induction of Cost-sensitive Trees
Author: Saher Esmeir, Shaul Markovitch
Abstract: Machine learning techniques are increasingly being used to produce a wide-range of classifiers for complex real-world applications that involve nonuniform testing costs and misclassification costs. As the complexity of these applications grows, the management of resources during the learning and classification processes becomes a challenging task. In this work we introduce ACT (Anytime Cost-sensitive Trees), a novel framework for operating in such environments. ACT is an anytime algorithm that allows trading computation time for lower classification costs. It builds a tree top-down and exploits additional time resources to obtain better estimations for the utility of the different candidate splits. Using sampling techniques ACT approximates for each candidate split the cost of the subtree under it and favors the one with a minimal cost. Due to its stochastic nature ACT is expected to be able to escape local minima, into which greedy methods may be trapped. Experiments with a variety of datasets were conducted to compare the performance of ACT to that of the state of the art cost-sensitive tree learners. The results show that for most domains ACT produces trees of significantly lower costs. ACT is also shown to exhibit good anytime behavior with diminishing returns.
5 0.60327101 80 nips-2007-Ensemble Clustering using Semidefinite Programming
Author: Vikas Singh, Lopamudra Mukherjee, Jiming Peng, Jinhui Xu
Abstract: We consider the ensemble clustering problem where the task is to ‘aggregate’ multiple clustering solutions into a single consolidated clustering that maximizes the shared information among given clustering solutions. We obtain several new results for this problem. First, we note that the notion of agreement under such circumstances can be better captured using an agreement measure based on a 2D string encoding rather than voting strategy based methods proposed in literature. Using this generalization, we first derive a nonlinear optimization model to maximize the new agreement measure. We then show that our optimization problem can be transformed into a strict 0-1 Semidefinite Program (SDP) via novel convexification techniques which can subsequently be relaxed to a polynomial time solvable SDP. Our experiments indicate improvements not only in terms of the proposed agreement measure but also the existing agreement measures based on voting strategies. We discuss evaluations on clustering and image segmentation databases. 1
6 0.5395273 116 nips-2007-Learning the structure of manifolds using random projections
7 0.53341061 31 nips-2007-Bayesian Agglomerative Clustering with Coalescents
8 0.52846825 89 nips-2007-Feature Selection Methods for Improving Protein Structure Prediction with Rosetta
9 0.49841249 20 nips-2007-Adaptive Embedded Subgraph Algorithms using Walk-Sum Analysis
10 0.49133238 121 nips-2007-Local Algorithms for Approximate Inference in Minor-Excluded Graphs
11 0.49038926 141 nips-2007-New Outer Bounds on the Marginal Polytope
12 0.47609419 157 nips-2007-Privacy-Preserving Belief Propagation and Sampling
13 0.46550003 46 nips-2007-Cluster Stability for Finite Samples
14 0.45828521 16 nips-2007-A learning framework for nearest neighbor search
15 0.45225814 92 nips-2007-Fixing Max-Product: Convergent Message Passing Algorithms for MAP LP-Relaxations
16 0.4342711 68 nips-2007-Discovering Weakly-Interacting Factors in a Complex Stochastic Process
17 0.42192522 197 nips-2007-The Infinite Markov Model
18 0.41000602 64 nips-2007-Cooled and Relaxed Survey Propagation for MRFs
19 0.40551674 187 nips-2007-Structured Learning with Approximate Inference
20 0.4053753 128 nips-2007-Message Passing for Max-weight Independent Set
topicId topicWeight
[(5, 0.061), (13, 0.033), (16, 0.037), (18, 0.023), (21, 0.062), (31, 0.029), (34, 0.03), (35, 0.03), (47, 0.074), (49, 0.011), (72, 0.224), (83, 0.211), (85, 0.056), (87, 0.01), (90, 0.047)]
simIndex simValue paperId paperTitle
1 0.95927066 29 nips-2007-Automatic Generation of Social Tags for Music Recommendation
Author: Douglas Eck, Paul Lamere, Thierry Bertin-mahieux, Stephen Green
Abstract: Social tags are user-generated keywords associated with some resource on the Web. In the case of music, social tags have become an important component of “Web2.0” recommender systems, allowing users to generate playlists based on use-dependent terms such as chill or jogging that have been applied to particular songs. In this paper, we propose a method for predicting these social tags directly from MP3 files. Using a set of boosted classifiers, we map audio features onto social tags collected from the Web. The resulting automatic tags (or autotags) furnish information about music that is otherwise untagged or poorly tagged, allowing for insertion of previously unheard music into a social recommender. This avoids the ”cold-start problem” common in such systems. Autotags can also be used to smooth the tag space from which similarities and recommendations are made by providing a set of comparable baseline tags for all tracks in a recommender system. 1
2 0.88408148 98 nips-2007-Hierarchical Apprenticeship Learning with Application to Quadruped Locomotion
Author: J. Z. Kolter, Pieter Abbeel, Andrew Y. Ng
Abstract: We consider apprenticeship learning—learning from expert demonstrations—in the setting of large, complex domains. Past work in apprenticeship learning requires that the expert demonstrate complete trajectories through the domain. However, in many problems even an expert has difficulty controlling the system, which makes this approach infeasible. For example, consider the task of teaching a quadruped robot to navigate over extreme terrain; demonstrating an optimal policy (i.e., an optimal set of foot locations over the entire terrain) is a highly non-trivial task, even for an expert. In this paper we propose a method for hierarchical apprenticeship learning, which allows the algorithm to accept isolated advice at different hierarchical levels of the control task. This type of advice is often feasible for experts to give, even if the expert is unable to demonstrate complete trajectories. This allows us to extend the apprenticeship learning paradigm to much larger, more challenging domains. In particular, in this paper we apply the hierarchical apprenticeship learning algorithm to the task of quadruped locomotion over extreme terrain, and achieve, to the best of our knowledge, results superior to any previously published work. 1
same-paper 3 0.85196513 75 nips-2007-Efficient Bayesian Inference for Dynamically Changing Graphs
Author: Ozgur Sumer, Umut Acar, Alexander T. Ihler, Ramgopal R. Mettu
Abstract: Motivated by stochastic systems in which observed evidence and conditional dependencies between states of the network change over time, and certain quantities of interest (marginal distributions, likelihood estimates etc.) must be updated, we study the problem of adaptive inference in tree-structured Bayesian networks. We describe an algorithm for adaptive inference that handles a broad range of changes to the network and is able to maintain marginal distributions, MAP estimates, and data likelihoods in all expected logarithmic time. We give an implementation of our algorithm and provide experiments that show that the algorithm can yield up to two orders of magnitude speedups on answering queries and responding to dynamic changes over the sum-product algorithm. 1
4 0.73345292 190 nips-2007-Support Vector Machine Classification with Indefinite Kernels
Author: Ronny Luss, Alexandre D'aspremont
Abstract: In this paper, we propose a method for support vector machine classification using indefinite kernels. Instead of directly minimizing or stabilizing a nonconvex loss function, our method simultaneously finds the support vectors and a proxy kernel matrix used in computing the loss. This can be interpreted as a robust classification problem where the indefinite kernel matrix is treated as a noisy observation of the true positive semidefinite kernel. Our formulation keeps the problem convex and relatively large problems can be solved efficiently using the analytic center cutting plane method. We compare the performance of our technique with other methods on several data sets.
5 0.73342395 187 nips-2007-Structured Learning with Approximate Inference
Author: Alex Kulesza, Fernando Pereira
Abstract: In many structured prediction problems, the highest-scoring labeling is hard to compute exactly, leading to the use of approximate inference methods. However, when inference is used in a learning algorithm, a good approximation of the score may not be sufficient. We show in particular that learning can fail even with an approximate inference method with rigorous approximation guarantees. There are two reasons for this. First, approximate methods can effectively reduce the expressivity of an underlying model by making it impossible to choose parameters that reliably give good predictions. Second, approximations can respond to parameter changes in such a way that standard learning algorithms are misled. In contrast, we give two positive results in the form of learning bounds for the use of LP-relaxed inference in structured perceptron and empirical risk minimization settings. We argue that without understanding combinations of inference and learning, such as these, that are appropriately compatible, learning performance under approximate inference cannot be guaranteed. 1
6 0.73217452 63 nips-2007-Convex Relaxations of Latent Variable Training
7 0.73099571 94 nips-2007-Gaussian Process Models for Link Analysis and Transfer Learning
8 0.73037469 141 nips-2007-New Outer Bounds on the Marginal Polytope
9 0.72911853 65 nips-2007-DIFFRAC: a discriminative and flexible framework for clustering
10 0.72898227 120 nips-2007-Linear programming analysis of loopy belief propagation for weighted matching
11 0.72793549 116 nips-2007-Learning the structure of manifolds using random projections
12 0.72517842 76 nips-2007-Efficient Convex Relaxation for Transductive Support Vector Machine
13 0.72516072 209 nips-2007-Ultrafast Monte Carlo for Statistical Summations
14 0.72378862 204 nips-2007-Theoretical Analysis of Heuristic Search Methods for Online POMDPs
15 0.7237438 180 nips-2007-Sparse Feature Learning for Deep Belief Networks
16 0.72265726 115 nips-2007-Learning the 2-D Topology of Images
17 0.72239876 212 nips-2007-Using Deep Belief Nets to Learn Covariance Kernels for Gaussian Processes
18 0.71991503 38 nips-2007-Boosting Algorithms for Maximizing the Soft Margin
19 0.71987867 128 nips-2007-Message Passing for Max-weight Independent Set
20 0.7196539 40 nips-2007-Bundle Methods for Machine Learning