nips nips2008 nips2008-49 knowledge-graph by maker-knowledge-mining

49 nips-2008-Clusters and Coarse Partitions in LP Relaxations

Source: pdf

Author: David Sontag, Amir Globerson, Tommi S. Jaakkola

Abstract: We propose a new class of consistency constraints for Linear Programming (LP) relaxations for ﬁnding the most probable (MAP) conﬁguration in graphical models. Usual cluster-based LP relaxations enforce joint consistency on the beliefs of a cluster of variables, with computational cost increasing exponentially with the size of the clusters. By partitioning the state space of a cluster and enforcing consistency only across partitions, we obtain a class of constraints which, although less tight, are computationally feasible for large clusters. We show how to solve the cluster selection and partitioning problem monotonically in the dual LP, using the current beliefs to guide these choices. We obtain a dual message passing algorithm and apply it to protein design problems where the variables have large state spaces and the usual cluster-based relaxations are very costly. The resulting method solves many of these problems exactly, and signiﬁcantly faster than a method that does not use partitioning. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract We propose a new class of consistency constraints for Linear Programming (LP) relaxations for ﬁnding the most probable (MAP) conﬁguration in graphical models. [sent-8, score-0.215]

2 Usual cluster-based LP relaxations enforce joint consistency on the beliefs of a cluster of variables, with computational cost increasing exponentially with the size of the clusters. [sent-9, score-0.447]

3 By partitioning the state space of a cluster and enforcing consistency only across partitions, we obtain a class of constraints which, although less tight, are computationally feasible for large clusters. [sent-10, score-0.383]

4 We show how to solve the cluster selection and partitioning problem monotonically in the dual LP, using the current beliefs to guide these choices. [sent-11, score-0.475]

5 We obtain a dual message passing algorithm and apply it to protein design problems where the variables have large state spaces and the usual cluster-based relaxations are very costly. [sent-12, score-0.774]

6 One promising approach is based on linear programming relaxations, solved via message passing algorithms akin to belief propagation [2, 3]. [sent-23, score-0.464]

7 Relaxations can be made increasingly tight by introducing LP variables that correspond to clusters of variables in the original model. [sent-27, score-0.333]

8 In fact, in recent work [6] we have shown that by adding a set of clusters over three variables, complex problems such as protein-design and stereo-vision may be solved exactly. [sent-28, score-0.202]

9 The problem with adding clusters over variables is that computational cost scales exponentially with the cluster size. [sent-29, score-0.434]

10 Using clusters of s variables means adding 100s LP variables, which is computationally demanding even for clusters of size three. [sent-32, score-0.352]

11 The key observation is that it may not be necessary to represent all the possible joint states of a cluster of variables. [sent-35, score-0.255]

12 Following the approach of [2], we formulate a dual LP for the partition-based LP relaxations and derive a message passing algorithm for optimizing the dual LP based on block coordinate descent. [sent-38, score-0.74]

13 Unlike standard message passing algorithms, the algorithm we derive involves passing messages between coarse and ﬁne representations of the same set of variables. [sent-39, score-0.662]

14 This is equivalent to ﬁnding the assignment xM that maximizes the function f (x; θ) = ij∈E θij (xi , xj ). [sent-43, score-0.218]

15 Deﬁne µ to be a vector of marginal probabilities associated with the interacting pairs of variables (edges) {µij (xi , xj )}ij∈E as well as {µi (xi )}i∈V for the nodes. [sent-45, score-0.308]

16 The MAP problem is then equivalent to the following linear program: max f (x; θ) = max µ · θ , x µ∈M(G) (2) where µ · θ = ij∈E xi ,xj θij (xi , xj )µij (xi , xj ). [sent-47, score-0.635]

17 The extreme points of the marginal polytope are integral and correspond one-to-one with assignments x. [sent-48, score-0.256]

18 Although the number of variables in this LP is only O(|E| + |V |), the difﬁculty comes from an exponential number of linear inequalities typically required to describe the marginal polytope M(G). [sent-50, score-0.219]

19 LP relaxations replace the difﬁcult global constraint that the marginals in µ must arise from some common joint distribution by ensuring only that the marginals are locally consistent with one another. [sent-51, score-0.305]

20 The most common such relaxation, pairwise consistency, enforces that the edge marginals are consistent with the node marginals, {µ | xj µij (xi , xj ) = µi (xi )}. [sent-52, score-0.617]

21 The integral extreme points of this local marginal polytope also correspond to assignments. [sent-53, score-0.256]

22 However, the local marginal polytope also contains fractional extreme points, and, as a relaxation, will in general not be tight. [sent-55, score-0.209]

23 However, perhaps the most straightforward approach corresponds to lifting the relaxation by adding marginals over clusters of nodes to the model (cf. [sent-58, score-0.342]

24 However, each cluster comes with a computational cost that grows as k s , where s is the number of variables in the cluster and k is the number of states for each variable. [sent-60, score-0.525]

25 We seek to offset this exponential cost by introducing coarsened clusters, as we show next. [sent-61, score-0.347]

26 2 Coarsened clusters and consistency constraints We begin with an illustrative example. [sent-62, score-0.258]

27 We can recover the exact marginal polytope in this case by forcing the pairwise marginals µij (xi , xj ) to be consistent with some distribution µ123 (x1 , x2 , x3 ). [sent-64, score-0.464]

28 However, when k is large, introducing the corresponding k 3 variables to our LP may be too costly and perhaps unnecessary, if a weaker consistency constraint would already lead to an integral extreme point. [sent-65, score-0.287]

29 zk xk xi zk zi zj zi Figure 1: A graphical illustration of the consistency constraint between the original (ﬁne granularity) edge (xi xk ) and the coarsened triplet (zi zj zk ). [sent-69, score-1.681]

30 Given such a partitioning scheme, we can introduce a distribution over coarsened variables 123 (z1 z2 z3 ) and constrain it to agree with ik (xi xk ) in the sense that they both yield the same marginals for zi zk . [sent-75, score-0.885]

31 , 1 2 3 4 , we recover the usual cluster consistency constraint. [sent-80, score-0.276]

32 We use the above idea to construct tighter outer bounds on the marginal polytope and incorporate them into the MAP-LP relaxation. [sent-81, score-0.202]

33 For each cluster c c C and variable i c we also have a partition Zi as in the above example2 (the choice of clusters and partitions will be discussed later). [sent-83, score-0.409]

34 Our LP optimizes over the following marginal variables: ij (xi xj ) i (xi ) for the edges and nodes of the original graph, and c (zc ) for the coarse-grained clusters. [sent-88, score-0.73]

35 However, a more efﬁcient and scalable approach is to solve it via message passing in the dual LP, which we show how to do in the next section. [sent-91, score-0.492]

36 Our approach for choosing the coarsenings is to iteratively solve the LP using an initial relaxation (beginning with the pairwise consistency constraints), then to introduce additional cluster constraints, letting the current solution guide how to coarsen the variables. [sent-93, score-0.463]

37 As we showed in earlier work [6], solving with the dual LP gives us a simple method for “warm starting” the new LP (the tighter relaxation) using the previous solution, and also results in an algorithm for which every step monotonically decreases an upper bound on the MAP assignment. [sent-94, score-0.205]

38 3 Dual linear program and a message passing algorithm In this section we give the dual of the partition-based LP from Eq. [sent-98, score-0.492]

39 5, and use it to obtain a message passing algorithm to efﬁciently optimize this relaxation. [sent-99, score-0.345]

40 Our approach extends earlier work by Globerson and Jaakkola [2] who gave the generalized max-product linear programming (MPLP) algorithm to solve the usual (non-coarsened) cluster LP relaxation in the dual. [sent-100, score-0.3]

41 The dual formulation in [2] was derived by adding auxiliary variables to the primal. [sent-101, score-0.255]

42 The dual variables are as follows: βij→i (xi , xj ), βij→j (xi , xj ), βij→ij (xi , xj ) for every edge ij ∈ E, and βc→ij (zc ) for every coarsened cluster c and edge ij ∈ c. [sent-104, score-2.341]

43 Thus λij→i (xi ) should be read as the message sent from edge c c ij to node i, and λc→ij (zi , zj ) is the message from the coarsened cluster to one of its intersection edges. [sent-106, score-1.654]

44 Finally, λij→ij (xi , xj ) is the message sent from an edge to itself. [sent-107, score-0.506]

45 By convex duality, the dual objective evaluated at a dual feasible point upper bounds the primal LP optimum, which in turn upper bounds the value of the MAP assignment. [sent-112, score-0.355]

46 It is illustrative to compare this dual LP with [2] where the cluster dual variables were βc→ij (xc ). [sent-113, score-0.563]

47 Our dual corresponds to introducing the additional constraint that βc→ij (xc ) = βc→ij (xc ) whenever zc [xc ] = zc [xc ]. [sent-114, score-0.798]

48 The advantage of the above dual is that it can be optimized via a simple message passing algorithm that corresponds to block coordinate descent. [sent-115, score-0.492]

49 It then turns out that one does not need to work with β variables directly, but can keep only the λ message variables. [sent-117, score-0.26]

50 2 provides the form of the updates for all three message types. [sent-119, score-0.194]

51 Importantly, all messages outgoing from a cluster or edge must be sent simultaneously. [sent-123, score-0.372]

52 Here we derive the cluster to edge updates, which differ from [2]. [sent-124, score-0.258]

53 Assume that all values of β are c c ﬁxed except for βc→ij (zi , zj ) for all ij ∈ c in some cluster c. [sent-125, score-0.806]

54 The term in the dual objective that c c depends on βc→ij (zi , zj ) can be written equivalently as c c c c λc →ij (zi [xi ], zj [xj ]) + λc→ij (zi [xi ], zj [xj ]) max λij→ij (xi , xj ) + xi ,xj = c :c =c,ij∈c c c c c max bij (zi , zj ) + λc→ij (zi [xi ], zj [xj ]) . [sent-126, score-1.526]

55 c c zi ,zj (11) Due to the constraint ij∈c βc→ij (zc ) = 0, all of the βc→ij need to be updated simultaneously. [sent-127, score-0.282]

56 It can be easily shown (using an equalization argument as in [2]) that the βc→ij (zc ) that satisfy the constraint and minimize the objective are given by 1 c c c c βc→ij (zc ) = −bij (zi , zj ) + bst (zs , zt ). [sent-128, score-0.324]

57 (12) |S(c)| st∈c The message update given in Fig. [sent-129, score-0.194]

58 Note that none of the cluster messages involve the original cluster variables xc , but rather only zc . [sent-131, score-0.869]

59 • Edge to Node: For every edge ij ∈ E and node i (or j) in the edge: 1 2 λij→i (xi )←− λ−j (xi )+ max 3 i 3 xj where λ−j (xi ) i = k∈N (i)\j c c λc→ij (zi [xi ], zj [xj ])+λij→ij (xi , xj )+λ−i (xj )+θij (xi , xj ) j c:ij∈c λik→i (xi ). [sent-133, score-1.348]

60 2 solves the dual for a given choice of coarsened clusters. [sent-137, score-0.446]

61 Our overall algorithm is thus similar in structure to [6] and proceeds as follows (we denote the message passing algorithm from Fig. [sent-140, score-0.345]

62 Add a new coarsened cluster c using the strategy given in Sec. [sent-146, score-0.503]

63 Initialize messages going out of the new cluster c to zero, and keep all the previous message values (this will not change the bound value), 6. [sent-148, score-0.477]

64 4 Choosing coarse partitions Until now we have not discussed how to choose the clusters to add and their partitionings. [sent-150, score-0.334]

65 , the set of all triplets in the graph as in [6]), we would like to add a cluster that would result in the maximum decrease of the dual bound on the MAP. [sent-154, score-0.584]

66 In principle such a cluster could be found by optimizing the dual for each candidate cluster, then choosing the best one. [sent-155, score-0.351]

67 However, this is computationally costly, so in [6] we instead use the bound decrease resulting from just once sending messages from the candidate cluster to its intersection edges. [sent-156, score-0.36]

68 If we were to add the full (un-coarsened) cluster, this bound decrease would be: max bij (xi , xj ) − max d(c) = ij∈c xi ,xj where bij (xi , xj ) = λij→ij (xi , xj ) + xc c:ij∈c bij (xi , xj ), (13) ij∈c c c λc→ij (zi [xi ], zj [xj ]). [sent-157, score-1.625]

69 Our strategy now is as follows: we add the cluster c that maximizes d(c), and then choose a partic tioning Zi for all i ∈ c that is guaranteed to achieve a decrease that is close to d(c). [sent-158, score-0.306]

70 We will therefore consider partitions where the k states with lowest belief values are put into the same “catch-all” coarse state sc , and all other states of xi get their own coarse state. [sent-164, score-0.768]

71 Formally, a partition i c Zi is characterized by a value κi such that sc is the set of all xi with bi (xi ) < κi . [sent-165, score-0.342]

72 Since each subsequent value of sc differs by one additional state xi , we can re-use the maximizations over zc\i for the i previous value of sc in evaluating the constraint for the current sc . [sent-172, score-0.546]

73 Setting γ < 0 would give a partitioning with fewer coarse states at the cost of a smaller guaranteed bound decrease. [sent-174, score-0.312]

74 On the other hand, setting γ > 0 results in a margin between the value of the dual objective (after sending the coarsened cluster message) and its value if we were to ﬁx xi in the max terms of Eq. [sent-175, score-0.902]

75 This makes it less likely that a state i in sc will become important again in subsequent message passing iterations. [sent-177, score-0.474]

76 Note that this greedy algorithm does not necessarily ﬁnd the partitioning with the fewest number of coarse states that achieves the bound decrease. [sent-179, score-0.26]

77 Recently we showed [6] that all but one of the problems described in [9] can be solved exactly by using a LP relaxation with clusters on three variables. [sent-187, score-0.257]

78 In both experimental setups we ﬁrst run the standard edge-based message passing algorithm for 1000 iterations. [sent-191, score-0.345]

79 In the ﬁrst experiment, we add all triplets that correspond to variables whose single node beliefs are tied (within 10−5 ) at the maximum after running the edge-based algorithm. [sent-192, score-0.481]

80 Since tied beliefs correspond to fractional LP solutions, it is natural to consider these in tighter relaxations. [sent-193, score-0.207]

81 The triplets correspond to partitioned variables, as explained in Sec. [sent-194, score-0.196]

82 Speciﬁcally, for each variable Xi we ﬁnd states whose single node beliefs are tied at the maximum. [sent-197, score-0.249]

83 The triplets are then constructed over the coarsened c variables Zi and the message passing algorithm of Sec. [sent-207, score-0.86]

84 We add new triplets corresponding to the new variables and re-run. [sent-211, score-0.26]

85 We repeat until the dual-LP bound is sufﬁciently close to the value of the integral assignment obtained from the messages (note that these values would not coincide if the relaxation were not tight; in these experiments they do, so the ﬁnal relaxation is tight). [sent-212, score-0.373]

86 We applied the above scheme to the ten smallest proteins in the dataset used in [6] (for the larger proteins we used a different strategy described next). [sent-213, score-0.206]

87 4 This big factor comes about because a very small number of states are tied per variable, thus increasing the efﬁciency of our method where the number of partitions is equal to the number of tied states. [sent-217, score-0.279]

88 While running on full triplets was completely impractical, the coarsened message passing algorithm is very practical and achieves the exact MAP assignments. [sent-218, score-0.818]

89 3), alternating between adding 5 triplets to the relaxation and running MPLP for 20 more iterations. [sent-220, score-0.313]

90 6 To compare the running times on all 15 proteins, we checked how long it took for the difference between the dual and primal objectives to be less than . [sent-230, score-0.2]

91 The cost per iteration increases very little after adding each triplet, showing that our algorithm signiﬁcantly coarsened the clusters. [sent-237, score-0.364]

92 Two triplet clusters were added twice using different coarsenings, but otherwise each triplet only needed to be added once, demonstrating that our algorithm chose the right coarsenings. [sent-239, score-0.24]

93 In applying the method, we chose to cluster variables’ states based a bound minimization criterion after solving using a looser constraint on the polytope. [sent-245, score-0.33]

94 One of the key technical differences is that in our formulation the setting of coarse and ﬁne variables are reﬁned iteratively whereas in [1], once a coarse MRF has been solved, it is not revisited. [sent-252, score-0.26]

95 Using the same ideas as in this paper, one can introduce coarsened pairwise consistency constraints in addition the full pairwise consistency constraints. [sent-254, score-0.566]

96 Although this would not tighten the relaxation, by passing messages more frequently in the coarsened space, and only occasionally revisiting the full edges, this could give signiﬁcant computational beneﬁts when the nodes have large numbers of states. [sent-255, score-0.519]

97 With the coarsening strategy used here, the number of variables still grows exponentially with the cluster size, albeit at a lower rate. [sent-257, score-0.343]

98 One way to avoid the exponential growth is to partition the states of a cluster into a ﬁxed number of states (e. [sent-258, score-0.364]

99 Such a process may be repeated recursively, generating a hierarchy of coarsened variables. [sent-261, score-0.299]

100 Fixing max-product: Convergent message passing algorithms for MAP LP-relaxations. [sent-279, score-0.345]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('ij', 0.462), ('coarsened', 0.299), ('zc', 0.292), ('zi', 0.24), ('lp', 0.223), ('message', 0.194), ('xj', 0.19), ('xi', 0.181), ('cluster', 0.181), ('zj', 0.163), ('passing', 0.151), ('triplets', 0.15), ('dual', 0.147), ('clusters', 0.122), ('polytope', 0.101), ('relaxations', 0.101), ('coarse', 0.097), ('relaxation', 0.097), ('sc', 0.097), ('bij', 0.087), ('mplp', 0.083), ('marginals', 0.081), ('xc', 0.08), ('edge', 0.077), ('proteins', 0.075), ('states', 0.074), ('consistency', 0.073), ('coarsening', 0.073), ('map', 0.072), ('partitions', 0.071), ('beliefs', 0.069), ('messages', 0.069), ('tied', 0.067), ('sontag', 0.067), ('variables', 0.066), ('protein', 0.061), ('triplet', 0.059), ('bst', 0.058), ('partitioning', 0.056), ('marginal', 0.052), ('coarsenings', 0.05), ('integral', 0.049), ('zk', 0.048), ('belief', 0.045), ('sent', 0.045), ('globerson', 0.045), ('add', 0.044), ('constraint', 0.042), ('adding', 0.042), ('constraints', 0.041), ('pairwise', 0.04), ('node', 0.039), ('solved', 0.038), ('max', 0.037), ('propagation', 0.036), ('mrf', 0.036), ('partition', 0.035), ('constrain', 0.034), ('coarser', 0.034), ('zs', 0.034), ('meltzer', 0.033), ('tightening', 0.033), ('bound', 0.033), ('scheme', 0.033), ('tight', 0.032), ('extreme', 0.032), ('state', 0.032), ('objective', 0.032), ('ik', 0.031), ('agree', 0.03), ('primal', 0.029), ('xm', 0.029), ('tommi', 0.029), ('decrease', 0.029), ('zt', 0.029), ('guaranteed', 0.029), ('bi', 0.029), ('assignment', 0.028), ('edges', 0.026), ('granularity', 0.025), ('sending', 0.025), ('partitionings', 0.025), ('introducing', 0.025), ('tighter', 0.025), ('outer', 0.024), ('running', 0.024), ('partitioned', 0.024), ('fractional', 0.024), ('csail', 0.024), ('st', 0.023), ('candidate', 0.023), ('strategy', 0.023), ('cost', 0.023), ('illustrative', 0.022), ('mrfs', 0.022), ('usual', 0.022), ('guide', 0.022), ('correspond', 0.022), ('ner', 0.022), ('uai', 0.021)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000008 49 nips-2008-Clusters and Coarse Partitions in LP Relaxations

Author: David Sontag, Amir Globerson, Tommi S. Jaakkola

2 0.17276636 199 nips-2008-Risk Bounds for Randomized Sample Compressed Classifiers

Author: Mohak Shah

Abstract: We derive risk bounds for the randomized classiﬁers in Sample Compression setting where the classiﬁer-speciﬁcation utilizes two sources of information viz. the compression set and the message string. By extending the recently proposed Occam’s Hammer principle to the data-dependent settings, we derive point-wise versions of the bounds on the stochastic sample compressed classiﬁers and also recover the corresponding classical PAC-Bayes bound. We further show how these compare favorably to the existing results.

3 0.15730946 216 nips-2008-Sparse probabilistic projections

Author: Cédric Archambeau, Francis R. Bach

Abstract: We present a generative model for performing sparse probabilistic projections, which includes sparse principal component analysis and sparse canonical correlation analysis as special cases. Sparsity is enforced by means of automatic relevance determination or by imposing appropriate prior distributions, such as generalised hyperbolic distributions. We derive a variational Expectation-Maximisation algorithm for the estimation of the hyperparameters and show that our novel probabilistic approach compares favourably to existing techniques. We illustrate how the proposed method can be applied in the context of cryptoanalysis as a preprocessing tool for the construction of template attacks. 1

4 0.15194163 135 nips-2008-Model Selection in Gaussian Graphical Models: High-Dimensional Consistency of \boldmath$\ell 1$-regularized MLE

Author: Garvesh Raskutti, Bin Yu, Martin J. Wainwright, Pradeep K. Ravikumar

Abstract: We consider the problem of estimating the graph structure associated with a Gaussian Markov random ﬁeld (GMRF) from i.i.d. samples. We study the performance of study the performance of the ℓ1 -regularized maximum likelihood estimator in the high-dimensional setting, where the number of nodes in the graph p, the number of edges in the graph s and the maximum node degree d, are allowed to grow as a function of the number of samples n. Our main result provides sufﬁcient conditions on (n, p, d) for the ℓ1 -regularized MLE estimator to recover all the edges of the graph with high probability. Under some conditions on the model covariance, we show that model selection can be achieved for sample sizes n = Ω(d2 log(p)), with the error decaying as O(exp(−c log(p))) for some constant c. We illustrate our theoretical results via simulations and show good correspondences between the theoretical predictions and behavior in simulations.

5 0.14594682 40 nips-2008-Bounds on marginal probability distributions

Author: Joris M. Mooij, Hilbert J. Kappen

Abstract: We propose a novel bound on single-variable marginal probability distributions in factor graphs with discrete variables. The bound is obtained by propagating local bounds (convex sets of probability distributions) over a subtree of the factor graph, rooted in the variable of interest. By construction, the method not only bounds the exact marginal probability distribution of a variable, but also its approximate Belief Propagation marginal (“belief”). Thus, apart from providing a practical means to calculate bounds on marginals, our contribution also lies in providing a better understanding of the error made by Belief Propagation. We show that our bound outperforms the state-of-the-art on some inference problems arising in medical diagnosis. 1

6 0.12518138 48 nips-2008-Clustering via LP-based Stabilities

7 0.11998736 47 nips-2008-Clustered Multi-Task Learning: A Convex Formulation

8 0.11583425 175 nips-2008-PSDBoost: Matrix-Generation Linear Programming for Positive Semidefinite Matrices Learning

9 0.11520814 53 nips-2008-Counting Solution Clusters in Graph Coloring Problems Using Belief Propagation

10 0.11203903 50 nips-2008-Continuously-adaptive discretization for message-passing algorithms

11 0.10422265 116 nips-2008-Learning Hybrid Models for Image Annotation with Partially Labeled Data

12 0.10334872 145 nips-2008-Multi-stage Convex Relaxation for Learning with Sparse Regularization

13 0.095943607 104 nips-2008-Improved Moves for Truncated Convex Models

14 0.092522688 89 nips-2008-Gates

15 0.088734612 29 nips-2008-Automatic online tuning for fast Gaussian summation

16 0.086520508 161 nips-2008-On the Complexity of Linear Prediction: Risk Bounds, Margin Bounds, and Regularization

17 0.085810237 245 nips-2008-Unlabeled data: Now it helps, now it doesn't

18 0.079388306 78 nips-2008-Exact Convex Confidence-Weighted Learning

19 0.079242319 204 nips-2008-Self-organization using synaptic plasticity

20 0.077459112 193 nips-2008-Regularized Co-Clustering with Dual Supervision

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.234), (1, -0.055), (2, -0.088), (3, 0.078), (4, 0.038), (5, -0.065), (6, 0.012), (7, -0.085), (8, -0.067), (9, -0.144), (10, -0.05), (11, 0.055), (12, 0.174), (13, 0.024), (14, -0.104), (15, 0.124), (16, -0.005), (17, 0.04), (18, 0.043), (19, -0.097), (20, 0.132), (21, -0.116), (22, -0.171), (23, 0.076), (24, 0.076), (25, -0.231), (26, 0.055), (27, 0.111), (28, 0.045), (29, 0.006), (30, -0.14), (31, 0.051), (32, -0.088), (33, 0.025), (34, 0.044), (35, 0.012), (36, -0.121), (37, 0.066), (38, 0.003), (39, 0.057), (40, -0.048), (41, 0.141), (42, -0.063), (43, -0.114), (44, 0.008), (45, -0.031), (46, 0.016), (47, -0.086), (48, -0.012), (49, -0.021)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96987021 49 nips-2008-Clusters and Coarse Partitions in LP Relaxations

Author: David Sontag, Amir Globerson, Tommi S. Jaakkola

2 0.56800425 89 nips-2008-Gates

Author: Tom Minka, John Winn

Abstract: Gates are a new notation for representing mixture models and context-sensitive independence in factor graphs. Factor graphs provide a natural representation for message-passing algorithms, such as expectation propagation. However, message passing in mixture models is not well captured by factor graphs unless the entire mixture is represented by one factor, because the message equations have a containment structure. Gates capture this containment structure graphically, allowing both the independences and the message-passing equations for a model to be readily visualized. Different variational approximations for mixture models can be understood as different ways of drawing the gates in a model. We present general equations for expectation propagation and variational message passing in the presence of gates. 1

3 0.54672951 40 nips-2008-Bounds on marginal probability distributions

Author: Joris M. Mooij, Hilbert J. Kappen

4 0.54317242 53 nips-2008-Counting Solution Clusters in Graph Coloring Problems Using Belief Propagation

Author: Lukas Kroc, Ashish Sabharwal, Bart Selman

Abstract: We show that an important and computationally challenging solution space feature of the graph coloring problem (COL), namely the number of clusters of solutions, can be accurately estimated by a technique very similar to one for counting the number of solutions. This cluster counting approach can be naturally written in terms of a new factor graph derived from the factor graph representing the COL instance. Using a variant of the Belief Propagation inference framework, we can efﬁciently approximate cluster counts in random COL problems over a large range of graph densities. We illustrate the algorithm on instances with up to 100, 000 vertices. Moreover, we supply a methodology for computing the number of clusters exactly using advanced techniques from the knowledge compilation literature. This methodology scales up to several hundred variables. 1

5 0.53600717 50 nips-2008-Continuously-adaptive discretization for message-passing algorithms

Author: Michael Isard, John MacCormick, Kannan Achan

Abstract: Continuously-Adaptive Discretization for Message-Passing (CAD-MP) is a new message-passing algorithm for approximate inference. Most message-passing algorithms approximate continuous probability distributions using either: a family of continuous distributions such as the exponential family; a particle-set of discrete samples; or a ﬁxed, uniform discretization. In contrast, CAD-MP uses a discretization that is (i) non-uniform, and (ii) adaptive to the structure of the marginal distributions. Non-uniformity allows CAD-MP to localize interesting features (such as sharp peaks) in the marginal belief distributions with time complexity that scales logarithmically with precision, as opposed to uniform discretization which scales at best linearly. We give a principled method for altering the non-uniform discretization according to information-based measures. CAD-MP is shown in experiments to estimate marginal beliefs much more precisely than competing approaches for the same computational expense. 1

6 0.4992457 175 nips-2008-PSDBoost: Matrix-Generation Linear Programming for Positive Semidefinite Matrices Learning

7 0.498216 104 nips-2008-Improved Moves for Truncated Convex Models

8 0.4882417 199 nips-2008-Risk Bounds for Randomized Sample Compressed Classifiers

9 0.45510072 48 nips-2008-Clustering via LP-based Stabilities

10 0.44183809 216 nips-2008-Sparse probabilistic projections

11 0.42099896 47 nips-2008-Clustered Multi-Task Learning: A Convex Formulation

12 0.42050546 29 nips-2008-Automatic online tuning for fast Gaussian summation

13 0.40926403 2 nips-2008-A Convex Upper Bound on the Log-Partition Function for Binary Distributions

14 0.40541324 135 nips-2008-Model Selection in Gaussian Graphical Models: High-Dimensional Consistency of \boldmath$\ell 1$-regularized MLE

15 0.37436149 116 nips-2008-Learning Hybrid Models for Image Annotation with Partially Labeled Data

16 0.37104616 183 nips-2008-Predicting the Geometry of Metal Binding Sites from Protein Sequence

17 0.35640264 176 nips-2008-Partially Observed Maximum Entropy Discrimination Markov Networks

18 0.34976935 117 nips-2008-Learning Taxonomies by Dependence Maximization

19 0.34637263 55 nips-2008-Cyclizing Clusters via Zeta Function of a Graph

20 0.34265175 245 nips-2008-Unlabeled data: Now it helps, now it doesn't

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(6, 0.069), (7, 0.068), (12, 0.044), (28, 0.197), (34, 0.214), (57, 0.064), (59, 0.012), (63, 0.032), (71, 0.02), (77, 0.109), (78, 0.017), (83, 0.057)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.84649962 49 nips-2008-Clusters and Coarse Partitions in LP Relaxations

Author: David Sontag, Amir Globerson, Tommi S. Jaakkola

2 0.83921933 125 nips-2008-Local Gaussian Process Regression for Real Time Online Model Learning

Author: Duy Nguyen-tuong, Jan R. Peters, Matthias Seeger

Abstract: Learning in real-time applications, e.g., online approximation of the inverse dynamics model for model-based robot control, requires fast online regression techniques. Inspired by local learning, we propose a method to speed up standard Gaussian process regression (GPR) with local GP models (LGP). The training data is partitioned in local regions, for each an individual GP model is trained. The prediction for a query point is performed by weighted estimation using nearby local models. Unlike other GP approximations, such as mixtures of experts, we use a distance based measure for partitioning of the data and weighted prediction. The proposed method achieves online learning and prediction in real-time. Comparisons with other non-parametric regression methods show that LGP has higher accuracy than LWPR and close to the performance of standard GPR and ν-SVR. 1

3 0.80788285 202 nips-2008-Robust Regression and Lasso

Author: Huan Xu, Constantine Caramanis, Shie Mannor

Abstract: We consider robust least-squares regression with feature-wise disturbance. We show that this formulation leads to tractable convex optimization problems, and we exhibit a particular uncertainty set for which the robust problem is equivalent to 1 regularized regression (Lasso). This provides an interpretation of Lasso from a robust optimization perspective. We generalize this robust formulation to consider more general uncertainty sets, which all lead to tractable convex optimization problems. Therefore, we provide a new methodology for designing regression algorithms, which generalize known formulations. The advantage is that robustness to disturbance is a physical property that can be exploited: in addition to obtaining new formulations, we use it directly to show sparsity properties of Lasso, as well as to prove a general consistency result for robust regression problems, including Lasso, from a uniﬁed robustness perspective. 1

4 0.76729977 216 nips-2008-Sparse probabilistic projections

Author: Cédric Archambeau, Francis R. Bach

5 0.74818164 87 nips-2008-Fitted Q-iteration by Advantage Weighted Regression

Author: Gerhard Neumann, Jan R. Peters

Abstract: Recently, ﬁtted Q-iteration (FQI) based methods have become more popular due to their increased sample efﬁciency, a more stable learning process and the higher quality of the resulting policy. However, these methods remain hard to use for continuous action spaces which frequently occur in real-world tasks, e.g., in robotics and other technical applications. The greedy action selection commonly used for the policy improvement step is particularly problematic as it is expensive for continuous actions, can cause an unstable learning process, introduces an optimization bias and results in highly non-smooth policies unsuitable for real-world systems. In this paper, we show that by using a soft-greedy action selection the policy improvement step used in FQI can be simpliﬁed to an inexpensive advantageweighted regression. With this result, we are able to derive a new, computationally efﬁcient FQI algorithm which can even deal with high dimensional action spaces. 1

6 0.74478209 195 nips-2008-Regularized Policy Iteration

7 0.74439305 79 nips-2008-Exploring Large Feature Spaces with Hierarchical Multiple Kernel Learning

8 0.74408752 135 nips-2008-Model Selection in Gaussian Graphical Models: High-Dimensional Consistency of \boldmath$\ell 1$-regularized MLE

9 0.7398349 37 nips-2008-Biasing Approximate Dynamic Programming with a Lower Discount Factor

10 0.73921669 227 nips-2008-Supervised Exponential Family Principal Component Analysis via Convex Optimization

11 0.73916042 200 nips-2008-Robust Kernel Principal Component Analysis

12 0.73855937 205 nips-2008-Semi-supervised Learning with Weakly-Related Unlabeled Data : Towards Better Text Categorization

13 0.73760909 21 nips-2008-An Homotopy Algorithm for the Lasso with Online Observations

14 0.73700625 231 nips-2008-Temporal Dynamics of Cognitive Control

15 0.73659372 175 nips-2008-PSDBoost: Matrix-Generation Linear Programming for Positive Semidefinite Matrices Learning

16 0.73618037 245 nips-2008-Unlabeled data: Now it helps, now it doesn't

17 0.73547709 81 nips-2008-Extracting State Transition Dynamics from Multiple Spike Trains with Correlated Poisson HMM

18 0.73526007 238 nips-2008-Theory of matching pursuit

19 0.73510158 173 nips-2008-Optimization on a Budget: A Reinforcement Learning Approach

20 0.73475033 50 nips-2008-Continuously-adaptive discretization for message-passing algorithms