nips nips2011 nips2011-170 knowledge-graph by maker-knowledge-mining

170 nips-2011-Message-Passing for Approximate MAP Inference with Latent Variables


Source: pdf

Author: Jiarong Jiang, Piyush Rai, Hal Daume

Abstract: We consider a general inference setting for discrete probabilistic graphical models where we seek maximum a posteriori (MAP) estimates for a subset of the random variables (max nodes), marginalizing over the rest (sum nodes). We present a hybrid message-passing algorithm to accomplish this. The hybrid algorithm passes a mix of sum and max messages depending on the type of source node (sum or max). We derive our algorithm by showing that it falls out as the solution of a particular relaxation of a variational framework. We further show that the Expectation Maximization algorithm can be seen as an approximation to our algorithm. Experimental results on synthetic and real-world datasets, against several baselines, demonstrate the efficacy of our proposed algorithm. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 We present a hybrid message-passing algorithm to accomplish this. [sent-10, score-0.385]

2 The hybrid algorithm passes a mix of sum and max messages depending on the type of source node (sum or max). [sent-11, score-1.063]

3 A typical example is the minimum Bayes risk (MBR) problem [1] where the goal is to find an assignment x which optimizes a loss ℓ(ˆ, x) with regard to some usually ˆ x unknown truth x. [sent-18, score-0.197]

4 ˆ Although the specific problems of estimating marginals and estimating MAP individually have been studied extensively [2, 3, 4], similar developments for the more general problem of simultaneous marginal and MAP estimation are lacking. [sent-20, score-0.176]

5 More recently, [5] proposed a method based optimizing a variational objective on specific graph structures, and is a simultaneous development as the method we propose in this paper (please refer to the supplementary material for further details and other related work). [sent-21, score-0.177]

6 Motivated by this problem, we propose a hybrid message passing algorithm which is both intuitive and justified according to variational principles. [sent-26, score-0.633]

7 Our hybrid message passing algorithm uses a mix of sum and max messages with the message type depending on the source node type. [sent-27, score-1.373]

8 Our estimates can be further improved by a few steps of local 1 search [6]. [sent-29, score-0.127]

9 Therefore, using the solution found by our hybrid algorithm to initialize some local search algorithms largely improves the performance on both accuracy and convergence speed, compared to the greedy stochastic search method described in [6]. [sent-30, score-0.566]

10 2 Problem Setting In our setting, the nodes in a graphical model with discrete random variables are divided into two sets: max and sum nodes. [sent-34, score-0.555]

11 We denote a graph G = (V, E), V = X ∪ Z where X is the set of nodes for which we want to compute the MAP assignment (max nodes), and Z is the set of nodes for which we need the marginals (sum nodes). [sent-35, score-0.799]

12 , zn } (zs ∈ Zs ) be the random variables associated with the nodes in X and Z respectively. [sent-42, score-0.231]

13 The exponential family distribution p over these random variables is defined as follows: pθ (x, z) = exp [ θ, φ(x, z) − A(θ)] where φ(x, z) is the sufficient statistics of the enumeration of all node assignments, and θ is the vector of canonical or exponential parameters. [sent-43, score-0.183]

14 In this paper, we consider only pairwise node interactions and use standard overcomplete representation of the sufficient statistics [10] (defined by indicator function I later). [sent-45, score-0.156]

15 The general MAP problem can be formalized as the following maximization problem: x∗ = arg max x pθ (x, z) (1) z with corresponding marginal probabilities of the z nodes, given x∗ . [sent-46, score-0.284]

16 zs , xs are sum and max random variables respectively, associated with some node s. [sent-52, score-1.019]

17 vs can be either a sum (zs ) or a max random (xs ) variable, associated with some node s. [sent-53, score-0.736]

18 Xs , Zs , Vs are the state spaces from which xs , zs , vs take values. [sent-55, score-0.865]

19 1 Message Passing Algorithms The sum-product and max-product algorithms are standard message-passing algorithms for inferring marginal and MAP estimates respectively in probabilistic graphical models. [sent-57, score-0.119]

20 Their idea is to store a belief state associated with each node, and iteratively passing messages between adjacent nodes, which are used to update the belief states. [sent-58, score-0.359]

21 In the standard sum product algorithm, the message Mts passed from node s to one of its neighbors t is as follows:     ′ ′ ′ (3) Mts (vs ) ← κ exp[θst (vs , vt ) + θt (vt )] Mut (vt )   ′ vt ∈Vt u∈N (t)\s where κ is a normalization constant. [sent-61, score-0.833]

22 {Mts , Mst } does not change for every pair of nodes s and t, the belief (psuedomarginal distribution) for the node s is given by µs (vs ) = κ exp{θs (vs )} t∈N (s) Mts (vs ). [sent-64, score-0.434]

23 The outgoing messages for max product algorithm have the same form but with a maximization instead of a summation in Eq. [sent-65, score-0.486]

24 After convergence, the MAP assignment for each node is the assignment with the highest max-marginal probability. [sent-67, score-0.42]

25 On loopy graphs, the tree-weighted sum and max product [13, 14] can help find the upper bound of the marginal or MAP problem. [sent-68, score-0.458]

26 They decompose the loopy graph into several spanning trees and reweight the messages by the edge appearance probability. [sent-69, score-0.335]

27 2 Local Search Algorithm Eq (1) can be viewed as doing a variable elimination for z nodes first, followed by a maximization over x. [sent-71, score-0.3]

28 Its maximization step may be performed using heuristic search techniques [7, 6]. [sent-72, score-0.128]

29 In [6], the assignment for the MAP nodes are found by greedily searching the best neighboring assignments which only differs on one node. [sent-74, score-0.434]

30 However, the hybrid algorithm we propose allows simultaneously approximating both Eq (1) and Eq (2). [sent-75, score-0.385]

31 3 HYBRID MESSAGE PASSING In our setting, we wish to compute MAP estimates for one set of nodes and marginals for the rest. [sent-76, score-0.379]

32 One possible approach is to run standard sum/max product algorithms over the graph, and find the most-likely assignment for each max node according to the maximum of sum or max marginals1 . [sent-77, score-0.772]

33 These na¨ve approaches have their own shortcomings; for example, although using standard maxı product may perform reasonably when there are many max nodes, it inevitably ignores the effect of sum nodes which should ideally be summed over. [sent-78, score-0.581]

34 1 ALGORITHM We now present a hybrid message-passing algorithm which passes sum-style or max-style messages based on the type of nodes from which the message originates. [sent-82, score-0.988]

35 In the hybrid message-passing algorithm, a sum node sends sum messages to its neighbors and a max node sends max messages. [sent-83, score-1.549]

36 The type of message passed depends on the type of source node, not the destination node. [sent-84, score-0.167]

37 Algo 1 shows the procedure to do hybrid message-passing. [sent-86, score-0.385]

38 For each node s ∈ V in G, do the following until messages converge (or maximum number of iterations reached) • If s ∈ X, update messages by Eq. [sent-91, score-0.552]

39 When there is only a single type of node in the graph, the hybrid algorithm reduces to the standard max or sum-product algorithm. [sent-100, score-0.675]

40 Otherwise, it passes different messages simultaneously and gives an approximation to the MAP assignment on max nodes as well as the marginals on sum nodes. [sent-101, score-1.005]

41 On the loopy graphs, we can also apply this scheme to pass hybrid tree-reweighted messages between nodes to obtain marginal and MAP estimates. [sent-102, score-0.922]

42 (See Appendix C of the supplementary material) 1 Running the standard sum-product algorithm and choosing the maximum likelihood assignment for the max nodes is also called maximum marginal decoding [15, 16]. [sent-103, score-0.604]

43 2 VARIATIONAL DERIVATION In this section, we show that the Marginal-MAP problem can be framed under a variational framework, and the hybrid message passing algorithm turns out to be a solution of it. [sent-105, score-0.633]

44 Solving the Marginal-MAP problem is therefore equivalent to solving the following optimization problem: max sup ¯ x∈X µother ∈M (Gx ) ¯ θ, µ + HBethe (µ) ≈ sup sup θ, µ + HBethe (µ) (10) µmax ∈Mx µother ∈M (Gx ) ¯ ¯ µother contains all other node/pairwise marginals except µmax . [sent-112, score-0.377]

45 For mutual information between different types of nodes, we can either force xs to have integral solutions, or relax xs to have non-integral solution, or relax xs on one direction2 . [sent-114, score-1.026]

46 Finally, we only require sum nodes to satisfy normalization and marginalization conditions, the entropy for sum nodes, mutual information between sum nodes, and from sum node to max node can be nonzero. [sent-116, score-1.384]

47 ¯ ¯ zs µ≥0         2 This results in four different relaxations for different combinations of message types and the hybrid algorithm performed empirically the best. [sent-118, score-0.769]

48 sup θ, µ + H(µsum ) − I(µsum→sum ) − I(µsum→max ) sup µmax ∈Mx µothers ∈M (Gx ) ¯ ¯ Further relaxing µx s to have non-integral solutions, define ¯  L(G) = Finally we get sup µ∈L(G)  µ≥0  vs µs (vs ) = 1, vt µst (vs , vt ) = µs (vs ), vs   µst (vs , vt ) = µt (vt ). [sent-120, score-1.071]

49 This can be done by fixing x ˜ values at their MAP assignments and running the sum-product algorithm over the resulting graph: The M-step works by maximizing Epθ (z | x) log pθ (x, z), where x is the assignment given by the pre¯ ¯ vious M-step. [sent-137, score-0.267]

50 This is equivalent to maximizing Ez∼pθ (z | x) log pθ (x | z), as the log pθ (z) term in the ¯ maximization is independent of x. [sent-138, score-0.165]

51 Then, the M-step amounts to running the max product algorithm with potentials on ¯ x nodes modified according to Eq. [sent-140, score-0.45]

52 Summarizing, the EM algorithm for solving marginal-MAP estimation can be interpreted as follows: • E-step: Fix xs to be the MAP assignment value from iteration (k − 1) and run sum product to get beliefs on sum nodes zs, say µt , t ∈ Z. [sent-142, score-1.063]

53 4 of the supplementary material 4 5 ˜ ˜ ˜ ˜ ˜ • M-step: Build a new graph G = (V , E) only containing the max nodes. [sent-145, score-0.269]

54 For each max node s in the graph, set its potential as ˜ ˜ ˜ θs;i = θs;i + j θst;ij µt;j , where t ∈ Z and (s, t) ∈ E. [sent-147, score-0.29]

55 Run max product over this new graph and update the MAP assignment. [sent-149, score-0.28]

56 2 Relationship with the Hybrid Algorithm Apart from the fact that the hybrid algorithm passes different messages simultaneously and EM does it iteratively, to see the connection with the hybrid algorithm, let us first consider the message passed in the E-step at iteration k. [sent-151, score-1.17]

57 xs are fixed at the last assignment which maximizes the message (k−1) at iteration k − 1, denoted as x∗ here. [sent-152, score-0.6]

58 The Mut are the messages computed at iteration k − 1. [sent-153, score-0.198]

59 (k) (k−1) Mts (zs ) = κ1 {exp[θst (zs , x∗ ) + θt (x∗ )] t t Mut (x∗ )} t (13) u∈N (t)\s Now assume there exists an iterative algorithm which, at each iteration, computes the messages ˜ used in both steps of the message-passing variant of the EM algorithm, denoted Mts . [sent-154, score-0.198]

60 Eq (13) then becomes ˜ (k) ˜ (k−1) Mts (zs ) == κ1 max{exp[θst (zs , x′ ) + θt (x′ )] Mut (x′ )} t x′ t t u∈N (t)\s So the max nodes (x’s) should pass the max messages to its neighbors (z’s), which is what the hybrid message-passing algorithm does. [sent-155, score-1.108]

61 4), all the sum nodes t are removed from the graph and the parameters of the adjacent max nodes are modified as: θs;i = θs;i + j θst;ij µt;j . [sent-157, score-0.836]

62 µt is computed by the sum product at the E-step of iteration k, and these sum messages are used (in form of the marginals µt ) in the subsequent M-step (with the sum nodes removed). [sent-158, score-1.075]

63 However, a max node may prefer different assignments according to different neighboring nodes. [sent-159, score-0.361]

64 In comparison, the hybrid message passing algorithm passes mixed messages instead of making deterministic assignments in each iteration. [sent-161, score-0.895]

65 Given this loss ˆ function, the minimum Bayes risk solution is the minimizer of Eq (14): MBRθ = arg min Ex∼p [ℓ(x, x)] = arg min ˆ x ˆ x ˆ p(x)ℓ(x, x) ˆ (14) x We now assume that ℓ decomposes over the structure of x. [sent-167, score-0.115]

66 In particular, suppose that: ℓ(x, x) = ˆ ℓ(xc , xc ), where C is some set of cliques in x, and xc denotes the variables associated with ˆ c∈C that clique. [sent-168, score-0.12]

67 Therefore, we can apply our hybrid algorithm to solve the MBR problem. [sent-176, score-0.385]

68 1 Synthetic Data For synthetic data, we first take a 10-node chain graph with varying splits of sum vs max nodes, and random potentials. [sent-189, score-0.723]

69 The node and the edge potentials are drawn from U NIFORM(0,1) and we randomly pick nodes in the graph to be sum or max nodes. [sent-191, score-0.785]

70 For this small graph, the true assignment is computable by explicitly maximizing 1 p(x) = z p(x, z) = Z (s,t)∈E ψst (vs , vt ), where Z is some normalization z s∈V ψs (vs ) constant and ψs (vs ) = exp θs (vs ). [sent-192, score-0.337]

71 Assume that the aforementioned maximization gives assignment x∗ = (x∗ , . [sent-194, score-0.201]

72 0/1 Loss on a 10−node chain graph Hamming loss on a 10−node chain graph 0. [sent-202, score-0.276]

73 25 max sum hybrid EM max+local search sum+local search hybrid+local search 0. [sent-204, score-0.851]

74 3 max sum hybrid EM max+local search sum+local search hybrid+local search 0. [sent-209, score-0.851]

75 3 shows the loss on the assignment of the max nodes. [sent-233, score-0.304]

76 In the figure, as the number of sum nodes goes up, the accuracy of the standard sum-product based estimation (sum) gets better, whereas the accuracy of standard max-product based estimation (max) worsens. [sent-234, score-0.386]

77 However, our hybrid messagepassing algorithm (hybrid), on an average, results in the lowest loss compared to the other baselines, with running times similar to the sum/max product algorithms. [sent-235, score-0.519]

78 As shown in [6], local search with sum product initialization empirically performs better than with max product, so later on, we only compare the results with local search using sum product initialization (LS). [sent-237, score-0.764]

79 Best of the three initialization methods, by starting at the hybrid algorithm results, the search algorithm in very few steps can find 7 log−likelihood of the assignments normalized by hybrid algo log−likelihood of the assignment normalized by hybrid algorithm 1. [sent-238, score-1.44]

80 In particular, it only takes 1 or 2 steps of search in the 10-node chain case and 1 to 3 steps in the 50-node tree case. [sent-263, score-0.093]

81 Fig 2 shows the mean of KL-divergence on the marginals for the three message-passing algorithms (each averaged over 100 random experiments) compared to the true marginals of p(z|x). [sent-265, score-0.24]

82 The x-axis shows the percentage of sum nodes in the graph. [sent-267, score-0.386]

83 Just like in the MAP case, our hybrid method consistently produces the smallest KL-divergence compared to others. [sent-268, score-0.385]

84 When the computation of the truth is intractable, the loglikelihood of the assignment can be approximated by the log-partition function with Bethe approximation according to Sec. [sent-269, score-0.132]

85 Here, we use a 50-node tree with binary node states and 8 × 10 grid with various states 1 ≤ |Ys | ≤ 20. [sent-273, score-0.182]

86 On the grid graph, we apply tree-reweighted sum or max product [14, 13], and our hybrid version based on TRBP. [sent-274, score-0.761]

87 For the edge appearance probability in TRBP, we apply a common approach that use a greedy algorithm to finding the spanning trees with as many uncovered edges as possible until all the edges in the graph are covered at least once. [sent-275, score-0.108]

88 Even if the messagepassing algorithms are not guaranteed to converge on loopy graphs, we can still compare the best result they provide after a certain number of iterations Fig. [sent-276, score-0.087]

89 In the tree case, as expected, using hybrid message-passing algorithms’s result to initialize the local search algorithm performs the best. [sent-278, score-0.484]

90 On the grid graph, the local search algorithm initialized by the sum product results works well when there are few max nodes, but as the search space grows exponentially with the number of max nodes, so it takes hundreds of steps to find the optimum. [sent-279, score-0.668]

91 On the other hand, because the hybrid TRBP starts in a good area, it consistently achieves the highest likelihood among all four algorithms with fewer extra steps. [sent-280, score-0.411]

92 2 Real-world Data We then experiment with the protein side-chain prediction dataset [23, 24] which consists a set of pro- Table 1: Accuracy on the 1st, the 1st & 2rd tein structures for which we need to find lowest en- Angles ergy assignment for rotamer residues. [sent-282, score-0.132]

93 8336 The core residues are the residues which are conhybrid 0. [sent-290, score-0.374]

94 Since the MAP results are χ1 ∧ χ2 ALL SURFACE CORE usually lower on the surface residues than the core sum product 0. [sent-300, score-0.484]

95 7005 residues nodes [24], we choose the surface residues max product 0. [sent-303, score-0.791]

96 7033 to be max nodes and the core nodes to be the sum TRBP 0. [sent-309, score-0.811]

97 The ground truth is given by the maximum hybrid TRBP 0. [sent-313, score-0.385]

98 7186 likelihood assignment of the residues, so we do not expect to have a better results on the core nodes, but we hope that any improvement on the accuracy of the surface nodes can make up the loss on the core nodes and thus give a better performance overall. [sent-316, score-0.829]

99 2, the improvements of the hybrid methods on the surface nodes are more than the loss the the core nodes, thus improving the overall performance. [sent-318, score-0.765]

100 Fixing max-product: Convergent message passing algorithms for map lp-relaxations. [sent-326, score-0.335]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('hybrid', 0.385), ('xs', 0.329), ('vs', 0.291), ('zs', 0.245), ('nodes', 0.231), ('messages', 0.198), ('gx', 0.189), ('residues', 0.157), ('node', 0.156), ('sum', 0.155), ('mts', 0.154), ('st', 0.145), ('mbr', 0.14), ('message', 0.139), ('zt', 0.136), ('max', 0.134), ('assignment', 0.132), ('map', 0.129), ('trbp', 0.122), ('vt', 0.122), ('marginals', 0.12), ('hbethe', 0.105), ('mut', 0.105), ('eq', 0.087), ('graph', 0.085), ('em', 0.077), ('mx', 0.075), ('assignments', 0.071), ('maximization', 0.069), ('passing', 0.067), ('product', 0.061), ('xc', 0.06), ('core', 0.06), ('search', 0.059), ('marginal', 0.056), ('tommi', 0.053), ('ist', 0.052), ('loopy', 0.052), ('surface', 0.051), ('hamming', 0.051), ('ls', 0.051), ('belief', 0.047), ('variational', 0.042), ('sup', 0.041), ('local', 0.04), ('lz', 0.04), ('mutual', 0.039), ('loss', 0.038), ('bethe', 0.038), ('yair', 0.038), ('passes', 0.035), ('jiarong', 0.035), ('messagepassing', 0.035), ('graphical', 0.035), ('ij', 0.034), ('chain', 0.034), ('william', 0.033), ('hs', 0.032), ('log', 0.032), ('martin', 0.032), ('maximizing', 0.032), ('appendix', 0.031), ('maxx', 0.031), ('argmaxx', 0.031), ('yanover', 0.031), ('wainwright', 0.029), ('xt', 0.029), ('derivation', 0.028), ('piyush', 0.028), ('maryland', 0.028), ('passed', 0.028), ('iff', 0.028), ('estimates', 0.028), ('exp', 0.027), ('speech', 0.027), ('risk', 0.027), ('mz', 0.027), ('neighbors', 0.026), ('grid', 0.026), ('likelihood', 0.026), ('material', 0.025), ('ys', 0.025), ('alekh', 0.025), ('sends', 0.025), ('polytope', 0.025), ('supplementary', 0.025), ('baselines', 0.025), ('propagation', 0.025), ('arg', 0.025), ('pradeep', 0.024), ('outgoing', 0.024), ('potentials', 0.024), ('ep', 0.024), ('entropy', 0.024), ('normalization', 0.024), ('synthetic', 0.024), ('greedy', 0.023), ('algo', 0.023), ('hal', 0.023), ('latent', 0.023)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999952 170 nips-2011-Message-Passing for Approximate MAP Inference with Latent Variables

Author: Jiarong Jiang, Piyush Rai, Hal Daume

Abstract: We consider a general inference setting for discrete probabilistic graphical models where we seek maximum a posteriori (MAP) estimates for a subset of the random variables (max nodes), marginalizing over the rest (sum nodes). We present a hybrid message-passing algorithm to accomplish this. The hybrid algorithm passes a mix of sum and max messages depending on the type of source node (sum or max). We derive our algorithm by showing that it falls out as the solution of a particular relaxation of a variational framework. We further show that the Expectation Maximization algorithm can be seen as an approximation to our algorithm. Experimental results on synthetic and real-world datasets, against several baselines, demonstrate the efficacy of our proposed algorithm. 1

2 0.24215803 140 nips-2011-Kernel Embeddings of Latent Tree Graphical Models

Author: Le Song, Eric P. Xing, Ankur P. Parikh

Abstract: Latent tree graphical models are natural tools for expressing long range and hierarchical dependencies among many variables which are common in computer vision, bioinformatics and natural language processing problems. However, existing models are largely restricted to discrete and Gaussian variables due to computational constraints; furthermore, algorithms for estimating the latent tree structure and learning the model parameters are largely restricted to heuristic local search. We present a method based on kernel embeddings of distributions for latent tree graphical models with continuous and non-Gaussian variables. Our method can recover the latent tree structures with provable guarantees and perform local-minimum free parameter learning and efficient inference. Experiments on simulated and real data show the advantage of our proposed approach. 1

3 0.15690976 265 nips-2011-Sparse recovery by thresholded non-negative least squares

Author: Martin Slawski, Matthias Hein

Abstract: Non-negative data are commonly encountered in numerous fields, making nonnegative least squares regression (NNLS) a frequently used tool. At least relative to its simplicity, it often performs rather well in practice. Serious doubts about its usefulness arise for modern high-dimensional linear models. Even in this setting − unlike first intuition may suggest − we show that for a broad class of designs, NNLS is resistant to overfitting and works excellently for sparse recovery when combined with thresholding, experimentally even outperforming 1 regularization. Since NNLS also circumvents the delicate choice of a regularization parameter, our findings suggest that NNLS may be the method of choice. 1

4 0.15490539 159 nips-2011-Learning with the weighted trace-norm under arbitrary sampling distributions

Author: Rina Foygel, Ohad Shamir, Nati Srebro, Ruslan Salakhutdinov

Abstract: We provide rigorous guarantees on learning with the weighted trace-norm under arbitrary sampling distributions. We show that the standard weighted-trace norm might fail when the sampling distribution is not a product distribution (i.e. when row and column indexes are not selected independently), present a corrected variant for which we establish strong learning guarantees, and demonstrate that it works better in practice. We provide guarantees when weighting by either the true or empirical sampling distribution, and suggest that even if the true distribution is known (or is uniform), weighting by the empirical distribution may be beneficial. 1

5 0.1192392 158 nips-2011-Learning unbelievable probabilities

Author: Xaq Pitkow, Yashar Ahmadian, Ken D. Miller

Abstract: Loopy belief propagation performs approximate inference on graphical models with loops. One might hope to compensate for the approximation by adjusting model parameters. Learning algorithms for this purpose have been explored previously, and the claim has been made that every set of locally consistent marginals can arise from belief propagation run on a graphical model. On the contrary, here we show that many probability distributions have marginals that cannot be reached by belief propagation using any set of model parameters or any learning algorithm. We call such marginals ‘unbelievable.’ This problem occurs whenever the Hessian of the Bethe free energy is not positive-definite at the target marginals. All learning algorithms for belief propagation necessarily fail in these cases, producing beliefs or sets of beliefs that may even be worse than the pre-learning approximation. We then show that averaging inaccurate beliefs, each obtained from belief propagation using model parameters perturbed about some learned mean values, can achieve the unbelievable marginals. 1

6 0.1124691 234 nips-2011-Reconstructing Patterns of Information Diffusion from Incomplete Observations

7 0.10184041 117 nips-2011-High-Dimensional Graphical Model Selection: Tractable Graph Families and Necessary Conditions

8 0.08957728 204 nips-2011-Online Learning: Stochastic, Constrained, and Smoothed Adversaries

9 0.089536302 188 nips-2011-Non-conjugate Variational Message Passing for Multinomial and Binary Regression

10 0.088119902 166 nips-2011-Maximal Cliques that Satisfy Hard Constraints with Application to Deformable Object Model Learning

11 0.082180843 57 nips-2011-Comparative Analysis of Viterbi Training and Maximum Likelihood Estimation for HMMs

12 0.080623344 242 nips-2011-See the Tree Through the Lines: The Shazoo Algorithm

13 0.076007247 96 nips-2011-Fast and Balanced: Efficient Label Tree Learning for Large Scale Object Recognition

14 0.0755511 58 nips-2011-Complexity of Inference in Latent Dirichlet Allocation

15 0.075316571 233 nips-2011-Rapid Deformable Object Detection using Dual-Tree Branch-and-Bound

16 0.074685402 146 nips-2011-Learning Higher-Order Graph Structure with Features by Structure Penalty

17 0.073631875 145 nips-2011-Learning Eigenvectors for Free

18 0.071428448 195 nips-2011-On Learning Discrete Graphical Models using Greedy Methods

19 0.070730783 274 nips-2011-Structure Learning for Optimization

20 0.069437206 296 nips-2011-Uniqueness of Belief Propagation on Signed Graphs


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.2), (1, -0.034), (2, -0.046), (3, -0.047), (4, -0.037), (5, -0.148), (6, -0.052), (7, -0.141), (8, 0.096), (9, -0.232), (10, 0.038), (11, -0.064), (12, -0.09), (13, 0.129), (14, -0.085), (15, -0.011), (16, 0.095), (17, -0.095), (18, 0.046), (19, -0.045), (20, -0.12), (21, -0.018), (22, 0.038), (23, -0.034), (24, 0.166), (25, 0.003), (26, 0.019), (27, 0.086), (28, -0.038), (29, 0.067), (30, -0.022), (31, 0.008), (32, 0.069), (33, -0.033), (34, 0.044), (35, 0.132), (36, -0.038), (37, -0.088), (38, -0.022), (39, -0.048), (40, 0.005), (41, 0.06), (42, -0.07), (43, 0.022), (44, -0.008), (45, 0.001), (46, 0.013), (47, 0.165), (48, -0.007), (49, 0.032)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96278781 170 nips-2011-Message-Passing for Approximate MAP Inference with Latent Variables

Author: Jiarong Jiang, Piyush Rai, Hal Daume

Abstract: We consider a general inference setting for discrete probabilistic graphical models where we seek maximum a posteriori (MAP) estimates for a subset of the random variables (max nodes), marginalizing over the rest (sum nodes). We present a hybrid message-passing algorithm to accomplish this. The hybrid algorithm passes a mix of sum and max messages depending on the type of source node (sum or max). We derive our algorithm by showing that it falls out as the solution of a particular relaxation of a variational framework. We further show that the Expectation Maximization algorithm can be seen as an approximation to our algorithm. Experimental results on synthetic and real-world datasets, against several baselines, demonstrate the efficacy of our proposed algorithm. 1

2 0.68672013 140 nips-2011-Kernel Embeddings of Latent Tree Graphical Models

Author: Le Song, Eric P. Xing, Ankur P. Parikh

Abstract: Latent tree graphical models are natural tools for expressing long range and hierarchical dependencies among many variables which are common in computer vision, bioinformatics and natural language processing problems. However, existing models are largely restricted to discrete and Gaussian variables due to computational constraints; furthermore, algorithms for estimating the latent tree structure and learning the model parameters are largely restricted to heuristic local search. We present a method based on kernel embeddings of distributions for latent tree graphical models with continuous and non-Gaussian variables. Our method can recover the latent tree structures with provable guarantees and perform local-minimum free parameter learning and efficient inference. Experiments on simulated and real data show the advantage of our proposed approach. 1

3 0.56973934 159 nips-2011-Learning with the weighted trace-norm under arbitrary sampling distributions

Author: Rina Foygel, Ohad Shamir, Nati Srebro, Ruslan Salakhutdinov

Abstract: We provide rigorous guarantees on learning with the weighted trace-norm under arbitrary sampling distributions. We show that the standard weighted-trace norm might fail when the sampling distribution is not a product distribution (i.e. when row and column indexes are not selected independently), present a corrected variant for which we establish strong learning guarantees, and demonstrate that it works better in practice. We provide guarantees when weighting by either the true or empirical sampling distribution, and suggest that even if the true distribution is known (or is uniform), weighting by the empirical distribution may be beneficial. 1

4 0.55487925 117 nips-2011-High-Dimensional Graphical Model Selection: Tractable Graph Families and Necessary Conditions

Author: Animashree Anandkumar, Vincent Tan, Alan S. Willsky

Abstract: We consider the problem of Ising and Gaussian graphical model selection given n i.i.d. samples from the model. We propose an efficient threshold-based algorithm for structure estimation based on conditional mutual information thresholding. This simple local algorithm requires only loworder statistics of the data and decides whether two nodes are neighbors in the unknown graph. We identify graph families for which the proposed algorithm has low sample and computational complexities. Under some transparent assumptions, we establish that the proposed algorithm is −4 structurally consistent (or sparsistent) when the number of samples scales as n = Ω(Jmin log p), where p is the number of nodes and Jmin is the minimum edge potential. We also develop novel non-asymptotic techniques for obtaining necessary conditions for graphical model selection. Keywords: Graphical model selection, high-dimensional learning, local-separation property, necessary conditions, typical sets, Fano’s inequality.

5 0.54686701 265 nips-2011-Sparse recovery by thresholded non-negative least squares

Author: Martin Slawski, Matthias Hein

Abstract: Non-negative data are commonly encountered in numerous fields, making nonnegative least squares regression (NNLS) a frequently used tool. At least relative to its simplicity, it often performs rather well in practice. Serious doubts about its usefulness arise for modern high-dimensional linear models. Even in this setting − unlike first intuition may suggest − we show that for a broad class of designs, NNLS is resistant to overfitting and works excellently for sparse recovery when combined with thresholding, experimentally even outperforming 1 regularization. Since NNLS also circumvents the delicate choice of a regularization parameter, our findings suggest that NNLS may be the method of choice. 1

6 0.53901356 274 nips-2011-Structure Learning for Optimization

7 0.53798175 31 nips-2011-An Application of Tree-Structured Expectation Propagation for Channel Decoding

8 0.53062332 242 nips-2011-See the Tree Through the Lines: The Shazoo Algorithm

9 0.51449907 158 nips-2011-Learning unbelievable probabilities

10 0.48983589 234 nips-2011-Reconstructing Patterns of Information Diffusion from Incomplete Observations

11 0.47363347 6 nips-2011-A Global Structural EM Algorithm for a Model of Cancer Progression

12 0.45524669 233 nips-2011-Rapid Deformable Object Detection using Dual-Tree Branch-and-Bound

13 0.44802397 296 nips-2011-Uniqueness of Belief Propagation on Signed Graphs

14 0.43351135 166 nips-2011-Maximal Cliques that Satisfy Hard Constraints with Application to Deformable Object Model Learning

15 0.42110124 146 nips-2011-Learning Higher-Order Graph Structure with Features by Structure Penalty

16 0.39166567 188 nips-2011-Non-conjugate Variational Message Passing for Multinomial and Binary Regression

17 0.38781434 267 nips-2011-Spectral Methods for Learning Multivariate Latent Tree Structure

18 0.38279542 204 nips-2011-Online Learning: Stochastic, Constrained, and Smoothed Adversaries

19 0.37955707 226 nips-2011-Projection onto A Nonnegative Max-Heap

20 0.36847854 256 nips-2011-Solving Decision Problems with Limited Information


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.015), (4, 0.04), (20, 0.033), (26, 0.012), (31, 0.111), (34, 0.01), (43, 0.059), (45, 0.088), (57, 0.431), (74, 0.046), (83, 0.031), (99, 0.031)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.96114647 234 nips-2011-Reconstructing Patterns of Information Diffusion from Incomplete Observations

Author: Flavio Chierichetti, David Liben-nowell, Jon M. Kleinberg

Abstract: Motivated by the spread of on-line information in general and on-line petitions in particular, recent research has raised the following combinatorial estimation problem. There is a tree T that we cannot observe directly (representing the structure along which the information has spread), and certain nodes randomly decide to make their copy of the information public. In the case of a petition, the list of names on each public copy of the petition also reveals a path leading back to the root of the tree. What can we conclude about the properties of the tree we observe from these revealed paths, and can we use the structure of the observed tree to estimate the size of the full unobserved tree T ? Here we provide the first algorithm for this size estimation task, together with provable guarantees on its performance. We also establish structural properties of the observed tree, providing the first rigorous explanation for some of the unusual structural phenomena present in the spread of real chain-letter petitions on the Internet. 1

2 0.95950371 224 nips-2011-Probabilistic Modeling of Dependencies Among Visual Short-Term Memory Representations

Author: Emin Orhan, Robert A. Jacobs

Abstract: Extensive evidence suggests that items are not encoded independently in visual short-term memory (VSTM). However, previous research has not quantitatively considered how the encoding of an item influences the encoding of other items. Here, we model the dependencies among VSTM representations using a multivariate Gaussian distribution with a stimulus-dependent mean and covariance matrix. We report the results of an experiment designed to determine the specific form of the stimulus-dependence of the mean and the covariance matrix. We find that the magnitude of the covariance between the representations of two items is a monotonically decreasing function of the difference between the items’ feature values, similar to a Gaussian process with a distance-dependent, stationary kernel function. We further show that this type of covariance function can be explained as a natural consequence of encoding multiple stimuli in a population of neurons with correlated responses. 1

same-paper 3 0.93036622 170 nips-2011-Message-Passing for Approximate MAP Inference with Latent Variables

Author: Jiarong Jiang, Piyush Rai, Hal Daume

Abstract: We consider a general inference setting for discrete probabilistic graphical models where we seek maximum a posteriori (MAP) estimates for a subset of the random variables (max nodes), marginalizing over the rest (sum nodes). We present a hybrid message-passing algorithm to accomplish this. The hybrid algorithm passes a mix of sum and max messages depending on the type of source node (sum or max). We derive our algorithm by showing that it falls out as the solution of a particular relaxation of a variational framework. We further show that the Expectation Maximization algorithm can be seen as an approximation to our algorithm. Experimental results on synthetic and real-world datasets, against several baselines, demonstrate the efficacy of our proposed algorithm. 1

4 0.90967172 100 nips-2011-Gaussian Process Training with Input Noise

Author: Andrew Mchutchon, Carl E. Rasmussen

Abstract: In standard Gaussian Process regression input locations are assumed to be noise free. We present a simple yet effective GP model for training on input points corrupted by i.i.d. Gaussian noise. To make computations tractable we use a local linear expansion about each input point. This allows the input noise to be recast as output noise proportional to the squared gradient of the GP posterior mean. The input noise variances are inferred from the data as extra hyperparameters. They are trained alongside other hyperparameters by the usual method of maximisation of the marginal likelihood. Training uses an iterative scheme, which alternates between optimising the hyperparameters and calculating the posterior gradient. Analytic predictive moments can then be found for Gaussian distributed test points. We compare our model to others over a range of different regression problems and show that it improves over current methods. 1

5 0.88935697 111 nips-2011-Hashing Algorithms for Large-Scale Learning

Author: Ping Li, Anshumali Shrivastava, Joshua L. Moore, Arnd C. König

Abstract: Minwise hashing is a standard technique in the context of search for efficiently computing set similarities. The recent development of b-bit minwise hashing provides a substantial improvement by storing only the lowest b bits of each hashed value. In this paper, we demonstrate that b-bit minwise hashing can be naturally integrated with linear learning algorithms such as linear SVM and logistic regression, to solve large-scale and high-dimensional statistical learning tasks, especially when the data do not fit in memory. We compare b-bit minwise hashing with the Count-Min (CM) and Vowpal Wabbit (VW) algorithms, which have essentially the same variances as random projections. Our theoretical and empirical comparisons illustrate that b-bit minwise hashing is significantly more accurate (at the same storage cost) than VW (and random projections) for binary data.

6 0.73593682 171 nips-2011-Metric Learning with Multiple Kernels

7 0.66853523 242 nips-2011-See the Tree Through the Lines: The Shazoo Algorithm

8 0.65258712 157 nips-2011-Learning to Search Efficiently in High Dimensions

9 0.65155464 140 nips-2011-Kernel Embeddings of Latent Tree Graphical Models

10 0.63453752 89 nips-2011-Estimating time-varying input signals and ion channel states from a single voltage trace of a neuron

11 0.62431955 226 nips-2011-Projection onto A Nonnegative Max-Heap

12 0.60619438 177 nips-2011-Multi-armed bandits on implicit metric spaces

13 0.60457015 135 nips-2011-Information Rates and Optimal Decoding in Large Neural Populations

14 0.60423541 219 nips-2011-Predicting response time and error rates in visual search

15 0.60368013 31 nips-2011-An Application of Tree-Structured Expectation Propagation for Channel Decoding

16 0.59385985 292 nips-2011-Two is better than one: distinct roles for familiarity and recollection in retrieving palimpsest memories

17 0.59254903 82 nips-2011-Efficient coding of natural images with a population of noisy Linear-Nonlinear neurons

18 0.59010416 30 nips-2011-Algorithms for Hyper-Parameter Optimization

19 0.58018088 24 nips-2011-Active learning of neural response functions with Gaussian processes

20 0.57611609 233 nips-2011-Rapid Deformable Object Detection using Dual-Tree Branch-and-Bound