nips nips2007 nips2007-75 knowledge-graph by maker-knowledge-mining

75 nips-2007-Efficient Bayesian Inference for Dynamically Changing Graphs

Source: pdf

Author: Ozgur Sumer, Umut Acar, Alexander T. Ihler, Ramgopal R. Mettu

Abstract: Motivated by stochastic systems in which observed evidence and conditional dependencies between states of the network change over time, and certain quantities of interest (marginal distributions, likelihood estimates etc.) must be updated, we study the problem of adaptive inference in tree-structured Bayesian networks. We describe an algorithm for adaptive inference that handles a broad range of changes to the network and is able to maintain marginal distributions, MAP estimates, and data likelihoods in all expected logarithmic time. We give an implementation of our algorithm and provide experiments that show that the algorithm can yield up to two orders of magnitude speedups on answering queries and responding to dynamic changes over the sum-product algorithm. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 We describe an algorithm for adaptive inference that handles a broad range of changes to the network and is able to maintain marginal distributions, MAP estimates, and data likelihoods in all expected logarithmic time. [sent-19, score-0.294]

2 We give an implementation of our algorithm and provide experiments that show that the algorithm can yield up to two orders of magnitude speedups on answering queries and responding to dynamic changes over the sum-product algorithm. [sent-20, score-0.339]

3 [6] identify dynamic Bayesian inference as the problem of performing Bayesian inference on a dynamically changing graphical model. [sent-26, score-0.352]

4 Dynamic changes to the graphical model may include changes to the observed evidence, to the structure of the graph itself (such as edge or node insertions/deletions), and changes to the conditional relationships among variables. [sent-27, score-0.769]

5 As an example of where adaptive inference can be effective, consider a computational biology application that requires computing the state of the active site in a protein as the user modiﬁes the protein (e. [sent-43, score-0.362]

6 In this application, we can represent the protein with a graphical model and use marginal computations to determine the state of the active site. [sent-46, score-0.375]

7 We reﬂect the modiﬁcations to the protein by updating the graphical model representation and performing marginal queries to obtain the state of the active site. [sent-47, score-0.491]

8 Our approach achieves logarithmic update and query times by mapping an arbitrary tree-structued graphical model into a balanced representation that we call a cluster tree (Sec. [sent-50, score-1.029]

9 We perform an O(n)-time preprocessing step to compute the cluster tree using a technique known as tree contraction [13]. [sent-52, score-1.191]

10 We ensure that for an input network with n nodes, the cluster tree has an expected depth of O(log n) and expected size O(n). [sent-53, score-0.907]

11 We show that the nodes in the cluster tree can be tagged with partial computations (corresponding to marginalizations of subtrees of the input network) in way that allows marginal computations and changes to the network to be performed in O(log n) expected time. [sent-54, score-1.304]

12 In particular, we can still support changes to the parameters of the distribution (evidence and conditional relationships), although changes to the underlying graph structure become more difﬁcult. [sent-59, score-0.357]

13 Additionally, a number of more sophisticated graphical models require efﬁcient inference over trees at their core, including learning mixtures of trees [12] and tree-reparameterized max-product [15]. [sent-60, score-0.328]

14 It is important to contrast our notion of adapting to dynamic updates to the graphical model (due to changes in the evidence, or alterations of the structure and distribution) with the potentially more general deﬁnition of dynamic Bayes’ nets (DBNs) [14]. [sent-63, score-0.487]

15 Our work builds on previous work by Delcher, Grove, Kasif and Pearl [6]; they give an algorithm to update Bayesian networks dynamically as the observed variables in the network change and compute belief queries of hidden variables in logarithmic time. [sent-67, score-0.293]

16 The key difference between their work and ours is that their algorithm only supports updates to observed evidence, and does not support dynamic changes to the graph structure (i. [sent-68, score-0.386]

17 In many applications it is important to consider the effect of changes to conditional relationships between variables; for example, to study protein structure (see Sec. [sent-71, score-0.316]

18 Another difference includes the use of tree contraction: they use tree contractions to answer queries and perform updates. [sent-75, score-0.779]

19 We use tree contractions to construct a cluster tree, which we then use to perform queries and all other updates (except for insertions/deletions). [sent-76, score-1.115]

20 Our approach to clustering factor graphs using tree contractions is based on previous results that show that tree contractions can be updated in expected logarithmic time under certain dynamic changes by using a general-purpose change-propagation algorithm [2]. [sent-78, score-1.112]

21 We write the operation of marginalizing over xi as xi , and let Xj represent an index-ordered subset of variables and f (Xj ) a function deﬁned over those variables, so that for example if Xj = {x2 , x3 , x5 }, then the function f (Xj ) = f (x2 , x3 , x5 ). [sent-89, score-0.373]

22 A factor graph [10] is one type of graphical model, similar to a Bayes’ net [14] or Markov random ﬁeld [5] used to represent the factorization structure of a function g(x1 , . [sent-95, score-0.345]

23 In particular, suppose that g decomposes into a product of simpler functions, g(X) = j fj (Xj ), for some collection of real-valued functions fj , called factors, whose arguments are (index-ordered) sets Xj ⊆ X. [sent-99, score-0.495]

24 A factor graph consists of a graph-theoretic abstraction of g’s factorization, with vertices of the graph representing variables xi and factors fj . [sent-100, score-0.751]

25 Because of the close correspondence between these quantities, we abuse notation slightly and use xi to indicate both the variable and its associated vertex, and fj to indicate both the factor and its vertex. [sent-101, score-0.454]

26 A factor graph is a bipartite graph G = (X + F, E) where X = {x1 , x2 , . [sent-104, score-0.324]

27 A factor tree is a factor graph G where G is a tree. [sent-111, score-0.62]

28 The graph G represents the function g(X) = j fj (Xj ) if, for each factor fj , the arguments of fj are its neighbors in G, i. [sent-113, score-0.98]

29 Other types of graphical models, such as Bayes’ nets [14], can be easily converted into a factor graph representation. [sent-116, score-0.313]

30 When the Bayes’ net is a polytree (singly connected directed acyclic graph), the resulting factor graph is a factor tree. [sent-117, score-0.333]

31 For every pair of neighboring vertices (xi , fj ) ∈ E, the vertex xi sends a message µxi →fj as soon as it receives the messages from all of its neighbors except for fj , and similarly for the message from fj to xi . [sent-122, score-1.285]

32 The messages between these vertices take the form of a real-valued function of the variable xi ; for discrete-valued xi this can be represented as a vector of length |Ai |. [sent-123, score-0.383]

33 3 Constructing the Cluster Tree In this section, we describe an algorithm for constructing a balanced representation of the input graphical model, that we call a cluster tree. [sent-136, score-0.635]

34 Given the input graphical model, we ﬁrst apply a clustering algorithm that hierarchically clusters the graphical model, and then apply a labeling algorithm that labels the clusters with cluster functions that can be used to compute marginal queries. [sent-137, score-1.077]

35 Given a factor graph as input, we ﬁrst tag each node v with a unary cluster that consists of v and each edge (u, v) with a binary cluster that consists of the edge (u, v). [sent-139, score-1.737]

36 We then cluster the tree hierarchically by applying the rake, compress, and ﬁnalize operations. [sent-140, score-0.828]

37 When applied to a leaf node v with neighbor u, the rake operation deletes the v and the edge (u, v), and forms unary cluster by combining the clusters which tag either v or (u, v); u is tagged with the resulting cluster. [sent-141, score-1.388]

38 When applied to a degree-two node v with neighbors u and w, a compress operation deletes v and the edges (u, v) and (v, w), inserts the edge (u, w), and forms a binary cluster by combining the clusters which tag the deleted node and edges; (u, w) is then tagged with the resulting cluster. [sent-142, score-1.601]

39 A ﬁnalize operation is applied when the tree consists of a single node (when no edges remain); it constructs a ﬁnal cluster that consists of all the clusters with which the ﬁnal node is tagged. [sent-143, score-1.362]

40 We cluster a tree T by applying rake and compress operations in rounds. [sent-144, score-1.193]

41 Each round consists of the following two steps until no more edges remain: (1) Apply the rake operation to each leaf; (2) Apply the compress operation to an independent set of degree-two nodes. [sent-145, score-0.67]

42 We choose a random independent set: we ﬂip a coin for each node in each round and apply compress to a degree-two node only if it ﬂips heads and its two neighbors ﬂips tails. [sent-146, score-0.595]

43 x3 ¯ ¯ f3 x3 x1 f3 x2 ¯ ¯ x3f3 ¯ f1 ¯ x1 x1f3 f2 x2 x2f3 f5 f1 x1f1 f2 x4 ¯ ¯ ¯ f5 x4 f4 = x3x4 x3f4 f4 x4f4 x4f5 x2f2 Figure 2: A cluster tree. [sent-149, score-0.541]

44 The leaves of the cluster tree correspond to the nodes and the edges of the factor graph. [sent-157, score-1.142]

45 Each internal node v corresponds a unary or a binary cluster formed by deleting v. [sent-158, score-0.829]

46 The ¯ children of an internal node are the edges and the nodes deleted during the operation that forms the ¯ cluster. [sent-159, score-0.662]

47 For example, the cluster f1 is formed by the rake operation applied to f1 in round 1. [sent-160, score-0.893]

48 The ¯ children of f1 are node f1 and edge (f1 , x1 ), which are deleted during that operation. [sent-161, score-0.399]

49 After building the cluster tree, we compute cluster functions along with a notion of orientation for neighboring clusters in a second pass, which we call labeling. [sent-163, score-1.157]

50 1 The cluster function at a node v in the tree is computed recursively using the cluster functions at v ’s child ¯ ¯ clusters, which we denote Sv = {¯1 , . [sent-164, score-1.514]

51 Intuitively, each cluster function corresponds to a v ¯ ¯ partial marginalization of the factors contained in cluster v . [sent-168, score-1.151]

52 ¯ Since each cluster function is deﬁned over a subset of the variables in the original graph, we require some additional notation to represent these sets. [sent-169, score-0.541]

53 Speciﬁcally, for a cluster v , let A(¯) be ¯ v the arguments of its cluster function, and let V(¯) be the set of all arguments of its children, v V(¯) = i A(¯i ). [sent-170, score-1.204]

54 In a slight abuse of notation, we let A(v) be the arguments of the node v in v v the original graph, so that if v is a variable node A(v) = v and if v is a factor node A(v) = N (v). [sent-171, score-0.61]

55 The cluster functions cv (·) and their arguments are then deﬁned recursively, as follows. [sent-172, score-0.691]

56 For the base ¯ case, the leaf nodes of the cluster tree correspond to nodes v in the original graph, and we deﬁne cv using the original variables and factors. [sent-173, score-1.22]

57 If v is a factor node fj , we take cv (A(v)) = fj (Xj ), and if v is a variable node xi , A(v) = xi and cv = 1. [sent-174, score-1.262]

58 For nodes of the cluster tree corresponding to edges (u, v) of the original graph, we simply take A(u, v) = ∅ and cu,v = 1. [sent-175, score-1.028]

59 The cluster function at an internal node of the cluster tree is then given by combining the cluster functions of its children and marginalizing over as many variables as possible. [sent-176, score-2.252]

60 If v is a binary cluster (u, w), that is, ¯ at v’s removal it had two neighbors u and w, then cv is given by ¯ cv (A(¯)) = v ¯ cvi (A(¯i )) v ¯ V(¯)\A(¯) vi ∈Sv v v ¯ ¯ where the arguments A(¯) = V(¯) ∩ (A(u) ∪ A(w)). [sent-178, score-0.916]

61 For unary cluster v , where v had a single v v ¯ neighbor u at its removal, cv (·) is calculated in the same way with A(w) = ∅. [sent-179, score-0.723]

62 ¯ We also compute an orientation for each cluster’s neighbors based on their proximity to the cluster tree’s root. [sent-180, score-0.59]

63 For a unary cluster v with parent u in the cluster tree, we deﬁne in(¯) = u. [sent-182, score-1.175]

64 For a binary cluster v with ¯ ¯ v ¯ ¯ neighbors u, w at v’s removal, deﬁne in(¯) = w and out(¯) = u if w = in(¯); otherwise in(¯) = u v ¯ v ¯ ¯ u v ¯ and out(¯) = w. [sent-183, score-0.59]

65 v ¯ We now describe the efﬁciency of our clustering and labeling algorithms and show that the resulting cluster tree is linear in the size of the input factor graph. [sent-184, score-1.069]

66 A factor tree of n nodes with maximum degree of k can be clustered and labeled in expected O(dk+2 n) time where d is the domain size of each variable in the factor tree. [sent-186, score-0.639]

67 The resulting cluster tree has exactly 2n − 1 leaves and n internal clusters (nodes) and expected O(log n) depth where the expectation is taken over internal randomization (over the coin ﬂips). [sent-187, score-1.04]

68 Furthermore, the cluster tree has the following properties: (1) each cluster has at most k + 1 children, and (2) if v = (u, w) is a binary cluster, then u and w are ancestors of v , and one of them ¯ ¯ ¯ ¯ is the parent of v . [sent-188, score-1.407]

69 The bound on the number of nodes holds because the leaves of the cluster tree correspond to the n − 1 edges and n nodes of the factor graph. [sent-192, score-1.266]

70 To see that each cluster has at most k + 1 children, note that the a rake or compress operation deletes one node and the at most k edges incident on that node. [sent-193, score-1.232]

71 Every edge appearing in any level of the tree contraction algorithm is represented as a binary cluster v = (u, w) in the cluster tree. [sent-194, score-1.525]

72 To bound the time for computing a cluster function, observe that A(¯) is always a singleton set if v is a unary cluster, and A(¯) has at most two variables v ¯ v if v is a binary cluster. [sent-199, score-0.634]

73 ¯ 5 v the cluster function at v is bounded by O(|Sv | d|V(¯)| ), where Sv are the children of v . [sent-202, score-0.654]

74 Since each ¯ ¯ ¯ ¯ cluster can appear only once as a child, |Sv | is O(n) and thus the labeling step takes O(dk+2 n) ¯ time. [sent-203, score-0.603]

75 4 Queries and Dynamic Changes We give algorithms for computing marginal queries on the cluster trees and restructuring the cluster tree with respect to changes in the underlying graphical model. [sent-205, score-1.887]

76 We answer marginal queries at a vertex v of the graphical node by using the cluster tree. [sent-208, score-1.05]

77 At a high level, the idea is to ﬁnd the leaf of the cluster tree corresponding to v and compute the messages along the path from the root of the cluster tree to v. [sent-209, score-1.797]

78 ¯ ¯ Messages are computed only at the ancestors of the query node xi and downward along the path to xi ; there are at most O(log n) nodes in this path by Theorem 1. [sent-214, score-0.62]

79 Given a factor graph and its cluster tree, we can change the function of a factor and update the cluster tree by starting at the leaf of the cluster tree that corresponds to the factor and relabeling all the clusters on the path to the root. [sent-217, score-2.838]

80 Updating these labels sufﬁces, because the label of a cluster is a function of its children only. [sent-218, score-0.654]

81 Since relabeling a cluster takes O(kdk+2 ) time and the cluster tree has expected O(log n) depth, any update requires O(kdk+2 log n) time. [sent-219, score-1.402]

82 To allow changes to the factor graph itself by insertion/deletion of edges, we maintain a forest of factor trees and the corresponding cluster trees (obtained by clustering the trees one by one). [sent-220, score-1.351]

83 We also maintain the sequence of operations used to construct each cluster tree, i. [sent-221, score-0.605]

84 It turns out that both operations have only a limited effect on the sequence of clustering operations performed during construction, affecting only a constant number of nodes at each round of the process. [sent-226, score-0.424]

85 Using a general-purpose change propagation technique (detailed in previous work [2, 1]) the necessary alterations can be made to the cluster tree in expected O(log n) time. [sent-227, score-0.897]

86 Change propagation gives us a new cluster tree that corresponds to the cluster tree that we would have obtained by re-clustering from scratch, conditioned on the same internal randomization process. [sent-228, score-1.706]

87 In addition to changing the structure of the cluster tree via change propagation, we must also change the labeling information (cluster functions and orientation) of the affected nodes, which can be done using the same process described in Sec. [sent-229, score-1.021]

88 It is a property of the tree contraction process that all such affected clusters form a subtree of the cluster tree that includes the root. [sent-231, score-1.266]

89 Since change propagation affects an expected O(log n) clusters, and since each cluster can be labeled in O(kdk+2 ) time, the new labels can be computed in O(kdk+2 log n) time. [sent-232, score-0.572]

90 For a factor forest F of n vertices with maximum degree k, and domain size d, the forest of cluster trees can be updated in expected O(kdk+2 log n) time under edge insertions/deletions, and changes to factors. [sent-235, score-1.052]

91 3 shows the results of a simulation experiment in which we considered randomly generated factor trees between 100 and 1000 nodes, with each variable having 51 , 52 , or 53 states, so that each factor has size between 52 and 56 . [sent-238, score-0.321]

92 These factor tree correspond roughly to the junction trees of models with between 200 and 6000 nodes, where each node has up to 5 states. [sent-239, score-0.677]

93 Our results show that the time required to build the cluster tree is comparable to one run of sum-product. [sent-240, score-0.828]

94 Furthermore, the query and update operations in the cluster tree incur relatively small constant factors in their asymptotic running time, and are between one to two orders of magnitude faster than recomputing from scratch. [sent-241, score-1.038]

95 These models are typically constructed by taking each node as an amino acid whose states represent their most common conformations, known as rotamers [7], and basing conditional probabilities on proximity, and a physical energy function (e. [sent-244, score-0.348]

96 One such application is computational mutagenesis, in which functional amino acids in a protein structure are identiﬁed by examining systematic amino acid mutations in the protein structure (i. [sent-248, score-0.675]

97 Thus, our algorithm has the potential to substantially speed up computational studies that examine each of a large number local changes to protein structure, such as in the study of protein ﬂexibility and dynamics. [sent-256, score-0.407]

98 Applying our algorithm on the junction tree representation of a graph yields an inference algorithm that can handle updates on conditional distributions and observed evidence in general Bayesian networks (e. [sent-267, score-0.645]

99 7 Naive sum−product Build Query Update Restructure −1 Time (sec) 10 −2 10 −3 10 2 3 10 10 # of nodes Figure 3: Log-log plot of run time for naive sum-product, building the cluster tree, computing queries, updating factors, and restructuring (adding and deleting edges). [sent-276, score-0.743]

100 Although building the cluster tree is slightly more expensive than sum-product, each subsequent update and query is between 10 and 100 times more efﬁcient than recomputing from scratch. [sent-277, score-0.895]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('cluster', 0.541), ('tree', 0.287), ('fj', 0.217), ('protein', 0.157), ('kdk', 0.152), ('rake', 0.152), ('compress', 0.149), ('node', 0.145), ('queries', 0.129), ('nodes', 0.124), ('xi', 0.123), ('factor', 0.114), ('children', 0.113), ('round', 0.107), ('graph', 0.105), ('acar', 0.095), ('umut', 0.095), ('graphical', 0.094), ('unary', 0.093), ('changes', 0.093), ('trees', 0.093), ('operation', 0.093), ('cv', 0.089), ('messages', 0.086), ('updates', 0.082), ('amino', 0.08), ('edge', 0.08), ('edges', 0.076), ('contractions', 0.076), ('delcher', 0.076), ('deletes', 0.076), ('nalize', 0.076), ('contraction', 0.076), ('clusters', 0.075), ('dynamic', 0.074), ('marginal', 0.071), ('vertex', 0.07), ('query', 0.067), ('message', 0.066), ('clustering', 0.065), ('operations', 0.064), ('labeling', 0.062), ('arguments', 0.061), ('sv', 0.061), ('deleted', 0.061), ('xj', 0.058), ('blelloch', 0.057), ('mettu', 0.057), ('leaf', 0.055), ('acids', 0.053), ('computations', 0.053), ('dynamically', 0.051), ('evidence', 0.051), ('acid', 0.051), ('vertices', 0.051), ('internal', 0.05), ('neighbors', 0.049), ('removal', 0.049), ('inference', 0.048), ('ips', 0.045), ('guy', 0.045), ('orders', 0.043), ('network', 0.042), ('logarithmic', 0.04), ('sent', 0.04), ('tagged', 0.04), ('forest', 0.04), ('updating', 0.04), ('alterations', 0.038), ('cvi', 0.038), ('distributive', 0.038), ('kasif', 0.038), ('mout', 0.038), ('mutagenesis', 0.038), ('raked', 0.038), ('restructuring', 0.038), ('rotameric', 0.038), ('rotamers', 0.038), ('sxi', 0.038), ('tag', 0.038), ('junction', 0.038), ('ancestors', 0.038), ('changing', 0.037), ('depth', 0.037), ('chicago', 0.036), ('factors', 0.036), ('marginalizing', 0.034), ('conditional', 0.034), ('mutations', 0.033), ('inserts', 0.033), ('cui', 0.033), ('jorge', 0.033), ('recompute', 0.033), ('relabeling', 0.033), ('rotamer', 0.033), ('marginalization', 0.033), ('marginals', 0.032), ('structure', 0.032), ('su', 0.032), ('change', 0.031)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000006 75 nips-2007-Efficient Bayesian Inference for Dynamically Changing Graphs

Author: Ozgur Sumer, Umut Acar, Alexander T. Ihler, Ramgopal R. Mettu

2 0.18548432 80 nips-2007-Ensemble Clustering using Semidefinite Programming

Author: Vikas Singh, Lopamudra Mukherjee, Jiming Peng, Jinhui Xu

Abstract: We consider the ensemble clustering problem where the task is to ‘aggregate’ multiple clustering solutions into a single consolidated clustering that maximizes the shared information among given clustering solutions. We obtain several new results for this problem. First, we note that the notion of agreement under such circumstances can be better captured using an agreement measure based on a 2D string encoding rather than voting strategy based methods proposed in literature. Using this generalization, we ﬁrst derive a nonlinear optimization model to maximize the new agreement measure. We then show that our optimization problem can be transformed into a strict 0-1 Semideﬁnite Program (SDP) via novel convexiﬁcation techniques which can subsequently be relaxed to a polynomial time solvable SDP. Our experiments indicate improvements not only in terms of the proposed agreement measure but also the existing agreement measures based on voting strategies. We discuss evaluations on clustering and image segmentation databases. 1

3 0.17875537 61 nips-2007-Convex Clustering with Exemplar-Based Models

Author: Danial Lashkari, Polina Golland

Abstract: Clustering is often formulated as the maximum likelihood estimation of a mixture model that explains the data. The EM algorithm widely used to solve the resulting optimization problem is inherently a gradient-descent method and is sensitive to initialization. The resulting solution is a local optimum in the neighborhood of the initial guess. This sensitivity to initialization presents a signiﬁcant challenge in clustering large data sets into many clusters. In this paper, we present a different approach to approximate mixture ﬁtting for clustering. We introduce an exemplar-based likelihood function that approximates the exact likelihood. This formulation leads to a convex minimization problem and an efﬁcient algorithm with guaranteed convergence to the globally optimal solution. The resulting clustering can be thought of as a probabilistic mapping of the data points to the set of exemplars that minimizes the average distance and the information-theoretic cost of mapping. We present experimental results illustrating the performance of our algorithm and its comparison with the conventional approach to mixture model clustering. 1

4 0.14403895 116 nips-2007-Learning the structure of manifolds using random projections

Author: Yoav Freund, Sanjoy Dasgupta, Mayank Kabra, Nakul Verma

Abstract: We present a simple variant of the k-d tree which automatically adapts to intrinsic low dimensional structure in data. 1

5 0.13705847 157 nips-2007-Privacy-Preserving Belief Propagation and Sampling

Author: Michael Kearns, Jinsong Tan, Jennifer Wortman

Abstract: We provide provably privacy-preserving versions of belief propagation, Gibbs sampling, and other local algorithms — distributed multiparty protocols in which each party or vertex learns only its ﬁnal local value, and absolutely nothing else. 1

6 0.13207538 197 nips-2007-The Infinite Markov Model

7 0.12416242 92 nips-2007-Fixing Max-Product: Convergent Message Passing Algorithms for MAP LP-Relaxations

8 0.12402464 20 nips-2007-Adaptive Embedded Subgraph Algorithms using Walk-Sum Analysis

9 0.12071918 78 nips-2007-Efficient Principled Learning of Thin Junction Trees

10 0.11859813 46 nips-2007-Cluster Stability for Finite Samples

11 0.11373898 204 nips-2007-Theoretical Analysis of Heuristic Search Methods for Online POMDPs

12 0.11166268 94 nips-2007-Gaussian Process Models for Link Analysis and Transfer Learning

13 0.10775231 143 nips-2007-Object Recognition by Scene Alignment

14 0.10145524 141 nips-2007-New Outer Bounds on the Marginal Polytope

15 0.094294816 120 nips-2007-Linear programming analysis of loopy belief propagation for weighted matching

16 0.093708336 187 nips-2007-Structured Learning with Approximate Inference

17 0.092900209 121 nips-2007-Local Algorithms for Approximate Inference in Minor-Excluded Graphs

18 0.091773853 68 nips-2007-Discovering Weakly-Interacting Factors in a Complex Stochastic Process

19 0.091383658 31 nips-2007-Bayesian Agglomerative Clustering with Coalescents

20 0.091289528 128 nips-2007-Message Passing for Max-weight Independent Set

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.237), (1, 0.018), (2, -0.151), (3, -0.019), (4, -0.063), (5, -0.252), (6, 0.017), (7, 0.136), (8, 0.118), (9, -0.112), (10, -0.215), (11, 0.11), (12, 0.066), (13, -0.157), (14, -0.072), (15, 0.142), (16, 0.008), (17, -0.069), (18, -0.015), (19, 0.028), (20, 0.089), (21, 0.022), (22, 0.018), (23, 0.084), (24, 0.163), (25, -0.032), (26, -0.05), (27, -0.099), (28, -0.077), (29, 0.046), (30, 0.03), (31, 0.014), (32, -0.023), (33, -0.057), (34, -0.057), (35, -0.048), (36, -0.089), (37, -0.016), (38, 0.012), (39, 0.011), (40, 0.013), (41, 0.047), (42, 0.048), (43, -0.069), (44, 0.037), (45, 0.021), (46, 0.001), (47, -0.092), (48, -0.057), (49, 0.096)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.98093927 75 nips-2007-Efficient Bayesian Inference for Dynamically Changing Graphs

Author: Ozgur Sumer, Umut Acar, Alexander T. Ihler, Ramgopal R. Mettu

2 0.70411038 78 nips-2007-Efficient Principled Learning of Thin Junction Trees

Author: Anton Chechetka, Carlos Guestrin

Abstract: We present the ﬁrst truly polynomial algorithm for PAC-learning the structure of bounded-treewidth junction trees – an attractive subclass of probabilistic graphical models that permits both the compact representation of probability distributions and efﬁcient exact inference. For a constant treewidth, our algorithm has polynomial time and sample complexity. If a junction tree with sufﬁciently strong intraclique dependencies exists, we provide strong theoretical guarantees in terms of KL divergence of the result from the true distribution. We also present a lazy extension of our approach that leads to very signiﬁcant speed ups in practice, and demonstrate the viability of our method empirically, on several real world datasets. One of our key new theoretical insights is a method for bounding the conditional mutual information of arbitrarily large sets of variables with only polynomially many mutual information computations on ﬁxed-size subsets of variables, if the underlying distribution can be approximated by a bounded-treewidth junction tree. 1

3 0.62122947 61 nips-2007-Convex Clustering with Exemplar-Based Models

Author: Danial Lashkari, Polina Golland

4 0.61617607 27 nips-2007-Anytime Induction of Cost-sensitive Trees

Author: Saher Esmeir, Shaul Markovitch

Abstract: Machine learning techniques are increasingly being used to produce a wide-range of classiﬁers for complex real-world applications that involve nonuniform testing costs and misclassiﬁcation costs. As the complexity of these applications grows, the management of resources during the learning and classiﬁcation processes becomes a challenging task. In this work we introduce ACT (Anytime Cost-sensitive Trees), a novel framework for operating in such environments. ACT is an anytime algorithm that allows trading computation time for lower classiﬁcation costs. It builds a tree top-down and exploits additional time resources to obtain better estimations for the utility of the different candidate splits. Using sampling techniques ACT approximates for each candidate split the cost of the subtree under it and favors the one with a minimal cost. Due to its stochastic nature ACT is expected to be able to escape local minima, into which greedy methods may be trapped. Experiments with a variety of datasets were conducted to compare the performance of ACT to that of the state of the art cost-sensitive tree learners. The results show that for most domains ACT produces trees of signiﬁcantly lower costs. ACT is also shown to exhibit good anytime behavior with diminishing returns.

5 0.60327101 80 nips-2007-Ensemble Clustering using Semidefinite Programming

Author: Vikas Singh, Lopamudra Mukherjee, Jiming Peng, Jinhui Xu

6 0.5395273 116 nips-2007-Learning the structure of manifolds using random projections

7 0.53341061 31 nips-2007-Bayesian Agglomerative Clustering with Coalescents

8 0.52846825 89 nips-2007-Feature Selection Methods for Improving Protein Structure Prediction with Rosetta

9 0.49841249 20 nips-2007-Adaptive Embedded Subgraph Algorithms using Walk-Sum Analysis

10 0.49133238 121 nips-2007-Local Algorithms for Approximate Inference in Minor-Excluded Graphs

11 0.49038926 141 nips-2007-New Outer Bounds on the Marginal Polytope

12 0.47609419 157 nips-2007-Privacy-Preserving Belief Propagation and Sampling

13 0.46550003 46 nips-2007-Cluster Stability for Finite Samples

14 0.45828521 16 nips-2007-A learning framework for nearest neighbor search

15 0.45225814 92 nips-2007-Fixing Max-Product: Convergent Message Passing Algorithms for MAP LP-Relaxations

16 0.4342711 68 nips-2007-Discovering Weakly-Interacting Factors in a Complex Stochastic Process

17 0.42192522 197 nips-2007-The Infinite Markov Model

18 0.41000602 64 nips-2007-Cooled and Relaxed Survey Propagation for MRFs

19 0.40551674 187 nips-2007-Structured Learning with Approximate Inference

20 0.4053753 128 nips-2007-Message Passing for Max-weight Independent Set

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(5, 0.061), (13, 0.033), (16, 0.037), (18, 0.023), (21, 0.062), (31, 0.029), (34, 0.03), (35, 0.03), (47, 0.074), (49, 0.011), (72, 0.224), (83, 0.211), (85, 0.056), (87, 0.01), (90, 0.047)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.95927066 29 nips-2007-Automatic Generation of Social Tags for Music Recommendation

Author: Douglas Eck, Paul Lamere, Thierry Bertin-mahieux, Stephen Green

Abstract: Social tags are user-generated keywords associated with some resource on the Web. In the case of music, social tags have become an important component of “Web2.0” recommender systems, allowing users to generate playlists based on use-dependent terms such as chill or jogging that have been applied to particular songs. In this paper, we propose a method for predicting these social tags directly from MP3 ﬁles. Using a set of boosted classiﬁers, we map audio features onto social tags collected from the Web. The resulting automatic tags (or autotags) furnish information about music that is otherwise untagged or poorly tagged, allowing for insertion of previously unheard music into a social recommender. This avoids the ”cold-start problem” common in such systems. Autotags can also be used to smooth the tag space from which similarities and recommendations are made by providing a set of comparable baseline tags for all tracks in a recommender system. 1

2 0.88408148 98 nips-2007-Hierarchical Apprenticeship Learning with Application to Quadruped Locomotion

Author: J. Z. Kolter, Pieter Abbeel, Andrew Y. Ng

Abstract: We consider apprenticeship learning—learning from expert demonstrations—in the setting of large, complex domains. Past work in apprenticeship learning requires that the expert demonstrate complete trajectories through the domain. However, in many problems even an expert has difﬁculty controlling the system, which makes this approach infeasible. For example, consider the task of teaching a quadruped robot to navigate over extreme terrain; demonstrating an optimal policy (i.e., an optimal set of foot locations over the entire terrain) is a highly non-trivial task, even for an expert. In this paper we propose a method for hierarchical apprenticeship learning, which allows the algorithm to accept isolated advice at different hierarchical levels of the control task. This type of advice is often feasible for experts to give, even if the expert is unable to demonstrate complete trajectories. This allows us to extend the apprenticeship learning paradigm to much larger, more challenging domains. In particular, in this paper we apply the hierarchical apprenticeship learning algorithm to the task of quadruped locomotion over extreme terrain, and achieve, to the best of our knowledge, results superior to any previously published work. 1

same-paper 3 0.85196513 75 nips-2007-Efficient Bayesian Inference for Dynamically Changing Graphs

Author: Ozgur Sumer, Umut Acar, Alexander T. Ihler, Ramgopal R. Mettu

4 0.73345292 190 nips-2007-Support Vector Machine Classification with Indefinite Kernels

Author: Ronny Luss, Alexandre D'aspremont

Abstract: In this paper, we propose a method for support vector machine classiﬁcation using indeﬁnite kernels. Instead of directly minimizing or stabilizing a nonconvex loss function, our method simultaneously ﬁnds the support vectors and a proxy kernel matrix used in computing the loss. This can be interpreted as a robust classiﬁcation problem where the indeﬁnite kernel matrix is treated as a noisy observation of the true positive semideﬁnite kernel. Our formulation keeps the problem convex and relatively large problems can be solved efﬁciently using the analytic center cutting plane method. We compare the performance of our technique with other methods on several data sets.

5 0.73342395 187 nips-2007-Structured Learning with Approximate Inference

Author: Alex Kulesza, Fernando Pereira

Abstract: In many structured prediction problems, the highest-scoring labeling is hard to compute exactly, leading to the use of approximate inference methods. However, when inference is used in a learning algorithm, a good approximation of the score may not be sufﬁcient. We show in particular that learning can fail even with an approximate inference method with rigorous approximation guarantees. There are two reasons for this. First, approximate methods can effectively reduce the expressivity of an underlying model by making it impossible to choose parameters that reliably give good predictions. Second, approximations can respond to parameter changes in such a way that standard learning algorithms are misled. In contrast, we give two positive results in the form of learning bounds for the use of LP-relaxed inference in structured perceptron and empirical risk minimization settings. We argue that without understanding combinations of inference and learning, such as these, that are appropriately compatible, learning performance under approximate inference cannot be guaranteed. 1

6 0.73217452 63 nips-2007-Convex Relaxations of Latent Variable Training

7 0.73099571 94 nips-2007-Gaussian Process Models for Link Analysis and Transfer Learning

8 0.73037469 141 nips-2007-New Outer Bounds on the Marginal Polytope

9 0.72911853 65 nips-2007-DIFFRAC: a discriminative and flexible framework for clustering

10 0.72898227 120 nips-2007-Linear programming analysis of loopy belief propagation for weighted matching

11 0.72793549 116 nips-2007-Learning the structure of manifolds using random projections

12 0.72517842 76 nips-2007-Efficient Convex Relaxation for Transductive Support Vector Machine

13 0.72516072 209 nips-2007-Ultrafast Monte Carlo for Statistical Summations

14 0.72378862 204 nips-2007-Theoretical Analysis of Heuristic Search Methods for Online POMDPs

15 0.7237438 180 nips-2007-Sparse Feature Learning for Deep Belief Networks

16 0.72265726 115 nips-2007-Learning the 2-D Topology of Images

17 0.72239876 212 nips-2007-Using Deep Belief Nets to Learn Covariance Kernels for Gaussian Processes

18 0.71991503 38 nips-2007-Boosting Algorithms for Maximizing the Soft Margin

19 0.71987867 128 nips-2007-Message Passing for Max-weight Independent Set

20 0.7196539 40 nips-2007-Bundle Methods for Machine Learning