nips nips2003 nips2003-74 knowledge-graph by maker-knowledge-mining

74 nips-2003-Finding the M Most Probable Configurations using Loopy Belief Propagation

Source: pdf

Author: Chen Yanover, Yair Weiss

Abstract: Loopy belief propagation (BP) has been successfully used in a number of diﬃcult graphical models to ﬁnd the most probable conﬁguration of the hidden variables. In applications ranging from protein folding to image analysis one would like to ﬁnd not just the best conﬁguration but rather the top M . While this problem has been solved using the junction tree formalism, in many real world problems the clique size in the junction tree is prohibitively large. In this work we address the problem of ﬁnding the M best conﬁgurations when exact inference is impossible. We start by developing a new exact inference algorithm for calculating the best conﬁgurations that uses only max-marginals. For approximate inference, we replace the max-marginals with the beliefs calculated using max-product BP and generalized BP. We show empirically that the algorithm can accurately and rapidly approximate the M best conﬁgurations in graphs with hundreds of variables. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 il Abstract Loopy belief propagation (BP) has been successfully used in a number of diﬃcult graphical models to ﬁnd the most probable conﬁguration of the hidden variables. [sent-4, score-0.282]

2 In applications ranging from protein folding to image analysis one would like to ﬁnd not just the best conﬁguration but rather the top M . [sent-5, score-0.173]

3 While this problem has been solved using the junction tree formalism, in many real world problems the clique size in the junction tree is prohibitively large. [sent-6, score-0.362]

4 In this work we address the problem of ﬁnding the M best conﬁgurations when exact inference is impossible. [sent-7, score-0.135]

5 We start by developing a new exact inference algorithm for calculating the best conﬁgurations that uses only max-marginals. [sent-8, score-0.256]

6 For approximate inference, we replace the max-marginals with the beliefs calculated using max-product BP and generalized BP. [sent-9, score-0.13]

7 We show empirically that the algorithm can accurately and rapidly approximate the M best conﬁgurations in graphs with hundreds of variables. [sent-10, score-0.171]

8 1 Introduction Considerable progress has been made in the ﬁeld of approximate inference using techniques such as variational methods [7], Monte-Carlo methods [5], mini-bucket elimination [4] and belief propagation (BP) [6]. [sent-11, score-0.275]

9 These techniques allow approximate solutions to various inference tasks in graphical models where building a junction tree is infeasible due to the exponentially large clique size. [sent-12, score-0.349]

10 The inference tasks that have been considered include calculating marginal probabilities, ﬁnding the most likely conﬁguration, and evaluating or bounding the log likelihood. [sent-13, score-0.154]

11 In this paper we consider an inference task that has not been tackled with the same tools of approximate inference: calculating the M most probable conﬁgurations (MPCs). [sent-14, score-0.3]

12 As a motivating example, consider the protein folding task known as the side-chain prediction problem. [sent-16, score-0.108]

13 In our previous work [17], we showed how to ﬁnd the minimal-energy side-chain conﬁguration using approximate inference in a graphical model. [sent-17, score-0.151]

14 The graph has 300 nodes and the clique size in a junction tree calculated using standard software [10] can be up to an order of 1042 , so that exact inference is obviously impossible. [sent-18, score-0.393]

15 We showed that loopy max-product belief propagation (BP) achieved excellent results in ﬁnding the ﬁrst MPC for this graph. [sent-19, score-0.381]

16 But we are also interested in ﬁnding the second best conﬁguration, the third best or, more generally, the top M conﬁgurations. [sent-21, score-0.092]

17 The problem of ﬁnding the M MPCs has been successfully solved within the junction tree (JT) framework. [sent-23, score-0.164]

18 However, to the best of our knowledge, there has been no equivalent solution when building a junction tree is infeasible. [sent-24, score-0.191]

19 A simple solution would be outputting the top M conﬁgurations that are generated by a Monte-Carlo simulation or by a local search algorithm from multiple initializations. [sent-25, score-0.108]

20 Alternatively, one can attempt to use more sophisticated heuristically guided search methods (such as A∗ ) or use exact MPCs algorithms on an approximated, reduced size junction tree [4, 1]. [sent-27, score-0.237]

21 We start by showing why the standard algorithm [11] for calculating the top M MPCs cannot be used in graphs with cycles. [sent-30, score-0.189]

22 We then introduce a novel algorithm called Best Max-Marginal First (BMMF) and show that when the max-marginals are exact it provably ﬁnds the M MPCs. [sent-31, score-0.099]

23 We show simulation results of BMMF in graphs where exact inference is impossible, with excellent performance on challenging graphical models with hundreds of variables. [sent-32, score-0.212]

24 Let mk = (mk (1), mk (2), · · · , mk (N )) denote the k th MPC. [sent-34, score-0.741]

25 Pearl, Dawid and others [12, 3, 11] have shown that this conﬁguration can be calculated using a quantity known as max-marginals (MMs): max marginal(i, j) = max Pr(X = x|y) x:x(i)=j (1) Max-marginal lemma: If there exists a unique MAP assignment m1 (i. [sent-36, score-0.294]

26 Pr(X = m1 |y) > Pr(X = x|y), ∀x = m1 ) then x1 deﬁned by x1 (i) = arg maxj max marginal(i, j) will recover the MAP assignment, m1 = x1 . [sent-38, score-0.105]

27 When the graph is a tree, the MMs can be calculated exactly using max-product belief propagation [16, 15, 12] using two passes: one up the tree and the other down the tree. [sent-41, score-0.333]

28 Similarly, for an arbitrary graph they can be calculated exactly using two passes of max-propagation in the junction tree [2, 11, 3]. [sent-42, score-0.305]

29 A more eﬃcient algorithm for calculating m1 requires only one pass of maxpropagation. [sent-43, score-0.121]

30 After calculating the max-marginal exactly at the root node, the MAP assignment m1 can be calculated by tracing back the pointers that were used during the max-propagation [11]. [sent-44, score-0.309]

31 Figure 1a illustrates this traceback operation in the Viterbi algorithm in HMMs [13] (the pairwise potentials favor conﬁgurations where neighboring nodes have diﬀerent values). [sent-45, score-0.231]

32 After calculating messages from left x(3) = 1 x(2) x(3) 1 )= x(2 = 0 ) x(2 x(2) = 0 x(1) x(1 ) x(1 = 0 )= 1 x(1) x(3) x(2) x(3) = 1 x(3) = 0 a b Figure 1: a. [sent-46, score-0.085]

33 The MAP conﬁguration can be calculated by a forward message passing scheme followed by a backward “traceback”. [sent-48, score-0.087]

34 The same traceback operation applied to a loopy graph may give inconsistent results. [sent-50, score-0.461]

35 These traceback operations, however, are problematic in loopy graphs. [sent-54, score-0.436]

36 After setting x1 (3) = 1 we traceback and ﬁnd x1 (2) = 0, x1 (1) = 1 and ﬁnally x1 (3) = 0, which is obviously inconsistent with our initial choice. [sent-56, score-0.22]

37 One advantage of using traceback is that it can recover m1 even if there are “ties” in the MMs, i. [sent-57, score-0.195]

38 The proof is a special case of the proof we present for claim 2 in the next section. [sent-64, score-0.116]

39 1 The Simpliﬁed Max-Flow Propagation Algorithm Nilsson’s Simpliﬁed Max-Flow Propagation (SMFP) [11] starts by calculating the MMs and using the max-marginal lemma to ﬁnd m1 . [sent-69, score-0.157]

40 Since m2 must diﬀer from m1 in at least one variable, the algorithm deﬁnes N conditioning sets, Ci (x(1) = m1 (1), x(2) = m1 (2), · · · , x(i−1) = m1 (i−1), x(i) = m1 (i)). [sent-70, score-0.086]

41 It then uses the maxmarginal lemma to ﬁnd the most probable conﬁguration given each conditioning set, xi = arg maxx Pr(X = x|y, Ci ) and ﬁnally m2 = arg maxx∈{xi } Pr(X = x|y). [sent-71, score-0.41]

42 Since the conditioning sets form a partition, it is easy to show that the algorithm ﬁnds m2 after N calculations of the MMs. [sent-72, score-0.125]

43 Similarly, to ﬁnd mk the algorithm uses the fact that mk must diﬀer from m1 , m2 , · · · , mk−1 in at least one variable and forms a new set of up to N conditioning sets. [sent-73, score-0.58]

44 Using the max-marginal lemma one can ﬁnd the MPC given each of these new conditioning sets. [sent-74, score-0.122]

45 This gives up to N new candidates, in addition to (k − 1)(N − 1) previously calculated candidates. [sent-75, score-0.087]

46 most probable candidate out of these k(N − 1) + 1 is guaranteed to be mk . [sent-77, score-0.35]

47 He suggested an algorithm that uses traceback operations to reduce the computation signiﬁcantly. [sent-79, score-0.231]

48 Since traceback operations are problematic in loopy graphs, we now present a novel algorithm that does not use traceback but may require far less calculation of the MMs compared to SMFP. [sent-80, score-0.691]

49 2 A novel algorithm: Best Max-Marginal First For simplicity of exposition, we will describe the BMMF algorithm under what we call the strict order assumption, that no two conﬁgurations have exactly the same probability. [sent-82, score-0.087]

50 In the ﬁrst iteration, t = 1, we start by calculating the MMs, and using the max-marginal lemma we ﬁnd m1 . [sent-86, score-0.157]

51 We now search the max-marginal table for the next best maxmarginal value. [sent-87, score-0.1]

52 We then add the complementary constraint x(3) = 1 to the originating constraints set and calculate the MMs. [sent-93, score-0.092]

53 We now add the constraint x(1) = 0 to the constraints set from t = 1, calculate the MMs and use the max-marginal lemma to ﬁnd x3 = 0001. [sent-96, score-0.121]

54 Finally, we add the complementary constraint x(1) = 0 to the originating constraints set and calculate the MMs. [sent-97, score-0.092]

55 Claim 2: x2 calculated by the BMMF algorithm is equal to the second MPC m2 . [sent-101, score-0.123]

56 Then, after iteration k, the collection {SAT1 , SAT2 , · · · , SATk } is a partition of the assignment space. [sent-111, score-0.193]

57 For k = 2, SAT1 = {x|x(i2 ) = j2 } and SAT2 = {x|x(i2 ) = j2 } are mutually disjoint and SAT1 ∪ SAT2 covers the assignment space, therefore {SAT1 , SAT2 } is a partition of the assignment space. [sent-114, score-0.265]

58 Assume that after iteration k − 1, {SAT1 , SAT2 , · · · , SATk−1 } is a partition of the assignment space. [sent-115, score-0.193]

59 Note that in iteration k, we add CONSTRAINTSk = CONSTRAINTSsk ∪ {(x(ik ) = jk )} and modify CONSTRAINTSsk = CONSTRAINTSsk ∪ {(x(ik ) = jk )}, while keeping all other constraints set unchanged. [sent-116, score-0.305]

60 Since after itera- tion k − 1 {SAT1 , SAT2 , · · · , SATk−1 } is a partition of the assignment space, so is {SAT1 , SAT2 , · · · , SATk }. [sent-118, score-0.155]

61 Claim 3: xk , the conﬁguration calculated by the algorithm in iteration k, is mk , the kth MPC. [sent-119, score-0.408]

62 Proof: First, note that SCOREsk (ik , jk ) ≤ SCOREsk−1 (ik−1 , jk−1 ), otherwise (ik , jk , sk ) would have been chosen in iteration k − 1. [sent-120, score-0.282]

63 Following the partition lemma, each assignment arises at most once. [sent-121, score-0.155]

64 By the strict order assumption, this means that SCOREsk (ik , jk ) < SCOREsk−1 (ik−1 , jk−1 ). [sent-122, score-0.122]

65 We know that mk diﬀers from all previous xs in at least one location. [sent-124, score-0.281]

66 In particular, mk must diﬀer from xs∗ in at least one location. [sent-125, score-0.247]

67 We want to show that SCOREs∗ (i∗ , j∗ ) = Pr(X = mk |y). [sent-127, score-0.247]

68 Now suppose there exists ml , l ≤ k − 1 such that ml ∈ SATs∗ and ml (i∗ ) = j ∗ . [sent-130, score-0.112]

69 Since (i∗ , j ∗ , s∗ ) ∈ USEDk this would mean that SCOREsk (ik , jk ) ≥ SCOREsk−1 (ik−1 , jk−1 ) / which is a contradiction. [sent-131, score-0.122]

70 Therefore mk is the most probable assignment that satisﬁes CONSTRAINTSs∗ and has the value j ∗ at location i∗ . [sent-132, score-0.46]

71 A consequence of claim 3 is that BMMF will ﬁnd the top M MPCs using 2M calculations of max marginals. [sent-134, score-0.165]

72 In real world loopy problems, especially when N M , this can lead to drastically diﬀerent run times. [sent-136, score-0.241]

73 3 Approximate MPCs algorithms using loopy BP We now compare 4 approximate MPCs algorithms: 1. [sent-141, score-0.284]

74 2 with the MMs based on the beliefs computed by loopy max-product BP or max-GBP: SCOREk (i, j) = Pr(X = xk |y) BEL(i, j|CONSTRAINTSk ) maxj BEL(i, j|CONSTRAINTSk ) (14) 2. [sent-144, score-0.268]

75 This is just Nilsson’s SMFP algorithm with the MMs calculated using loopy max-product BP. [sent-146, score-0.364]

76 We collect all conﬁgurations encountered during a greedy optimization of the posterior probability (this is just Gibbs sampling at zero temperature) and output the top M of these. [sent-152, score-0.121]

77 All four algorithms were implemented in Matlab and the number of iterations for greedy and Gibbs were chosen so that the run times would be the same as that of loopy BMMF. [sent-153, score-0.299]

78 Gibbs sampling started from m1 , the most probable assignment, and the greedy local search algorithm initialized to an assignment “similar” to m 1 (1% of the variables were chosen randomly and their values ﬂipped). [sent-154, score-0.366]

79 For the protein folding problem [17], we used a database consisting of 325 proteins, each gives rise to a graphical model with hundreds of variables and many loops. [sent-155, score-0.182]

80 5 5 10 15 20 25 30 35 Configuration Number 40 45 50 5 10 15 20 25 Configuration Number Figure 3: The conﬁgurations found by loopy-BMMF compared to those obtained using Gibbs sampling and greedy local search for a large toy-QMR model (right) and a 32 × 32 spin glass model (right). [sent-164, score-0.225]

81 compared the top 100 correct conﬁgurations obtained by the A∗ heuristic search algorithm [8] to those found by loopy BMMF algorithm, using BP. [sent-165, score-0.349]

82 In all cases where A∗ was feasible, loopy BMMF always found the correct conﬁgurations. [sent-166, score-0.241]

83 We then assessed the performance of the BMMF algorithm for a couple of relatively small problems, where exact inference was possible. [sent-170, score-0.173]

84 For both a small toy-QMR model (with 20 diseases and 50 symptoms) and a 8 × 8 spin glass model the BMMF algorithm obtained the correct MPCs. [sent-171, score-0.175]

85 Finally, we compared the performance of the algorithms for couple of hard problems — a large toy-QMR model (with 100 diseases and 200 symptoms) and 32 × 32 spin glass model with large pairwise interactions. [sent-172, score-0.168]

86 For the toy-QMR model, the MPCs calculated by the BMMF algorithm were better than those calculated by Gibbs sampling (Figure 3, left). [sent-173, score-0.235]

87 Gibbs results are worse than those of the greedy search and therefore not shown). [sent-177, score-0.092]

88 Note that ﬁnding the second MPC using the simple MFP algorithm requires a week, while the loopy BMMF calculated the 25 MPCs in few hours only. [sent-178, score-0.364]

89 However, in many real-world applications exact inference is impossible and approximate techniques are needed. [sent-180, score-0.151]

90 We have presented a new algorithm, called Best Max-Marginal First that will provably solve the problem if MMs can be calculated exactly. [sent-182, score-0.087]

91 We have shown that the algorithm continues to perform well when the MMs are approximated using max-product loopy BP or GBP. [sent-183, score-0.277]

92 The success of loopy BMMF suggests that in some cases the max product loopy BP gives a good numerical approximation to the true MMs. [sent-185, score-0.518]

93 Most existing analysis of loopy max-product [16, 15] has focused on the conﬁgurations found by the algorithm. [sent-186, score-0.241]

94 It would be interesting to extend the analysis to bound the approximate MMs which in turn would lead to a provable approximate MPCs algorithm. [sent-187, score-0.086]

95 While we have used loopy BP to approximate the MMs, any approximate inference can be used inside BMMF to derive a novel, approximate MPCs algorithm. [sent-188, score-0.439]

96 [14] can be shown to give the MAP assignment when it converges. [sent-190, score-0.11]

97 Applications of a general propagation algorithm for probabilistic expert systems. [sent-207, score-0.122]

98 Exploring the conformational space of protein side chains using dead-end elimination and the A* algorithm. [sent-241, score-0.08]

99 An eﬃcient algorithm for ﬁnding the M most probable conﬁgurations in probabilistic expert systems. [sent-255, score-0.139]

100 On the optimality of solutions of the max-product belief propagation algorithm in arbitrary graphs. [sent-287, score-0.176]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('mms', 0.41), ('bmmf', 0.391), ('mpcs', 0.332), ('mk', 0.247), ('loopy', 0.241), ('traceback', 0.195), ('bp', 0.151), ('gurations', 0.143), ('satk', 0.137), ('pr', 0.133), ('jk', 0.122), ('mpc', 0.117), ('scoresk', 0.117), ('con', 0.112), ('assignment', 0.11), ('gibbs', 0.103), ('ik', 0.103), ('probable', 0.103), ('guration', 0.099), ('calculated', 0.087), ('jt', 0.086), ('propagation', 0.086), ('calculating', 0.085), ('junction', 0.085), ('tree', 0.079), ('constraintsk', 0.078), ('lemma', 0.072), ('gbp', 0.072), ('inference', 0.069), ('maxx', 0.062), ('di', 0.061), ('constraintssk', 0.059), ('constraintsst', 0.059), ('nilsson', 0.059), ('satsk', 0.059), ('smfp', 0.059), ('usedt', 0.059), ('greedy', 0.058), ('protein', 0.057), ('glass', 0.054), ('spin', 0.054), ('belief', 0.054), ('claim', 0.052), ('folding', 0.051), ('conditioning', 0.05), ('ties', 0.047), ('partition', 0.045), ('originating', 0.043), ('approximate', 0.043), ('arg', 0.042), ('graphical', 0.039), ('calculations', 0.039), ('constraintss', 0.039), ('maxmarginal', 0.039), ('sats', 0.039), ('scoret', 0.039), ('searcht', 0.039), ('symptoms', 0.039), ('usedk', 0.039), ('yanover', 0.039), ('exact', 0.039), ('top', 0.038), ('iteration', 0.038), ('max', 0.036), ('algorithm', 0.036), ('hundreds', 0.035), ('clique', 0.034), ('xs', 0.034), ('search', 0.034), ('bel', 0.034), ('wainwright', 0.033), ('proof', 0.032), ('nding', 0.032), ('assignments', 0.031), ('diseases', 0.031), ('graphs', 0.03), ('nd', 0.029), ('couple', 0.029), ('proteins', 0.029), ('ml', 0.029), ('best', 0.027), ('speedup', 0.027), ('passes', 0.027), ('maxj', 0.027), ('jaakkola', 0.027), ('exactly', 0.027), ('map', 0.027), ('configuration', 0.026), ('calculate', 0.026), ('sampling', 0.025), ('exists', 0.025), ('inconsistent', 0.025), ('kaufmann', 0.024), ('scores', 0.024), ('uai', 0.024), ('viterbi', 0.024), ('novel', 0.024), ('add', 0.023), ('elimination', 0.023), ('node', 0.023)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000004 74 nips-2003-Finding the M Most Probable Configurations using Loopy Belief Propagation

Author: Chen Yanover, Yair Weiss

2 0.19347258 189 nips-2003-Tree-structured Approximations by Expectation Propagation

Author: Yuan Qi, Tom Minka

Abstract: Approximation structure plays an important role in inference on loopy graphs. As a tractable structure, tree approximations have been utilized in the variational method of Ghahramani & Jordan (1997) and the sequential projection method of Frey et al. (2000). However, belief propagation represents each factor of the graph with a product of single-node messages. In this paper, belief propagation is extended to represent factors with tree approximations, by way of the expectation propagation framework. That is, each factor sends a “message” to all pairs of nodes in a tree structure. The result is more accurate inferences and more frequent convergence than ordinary belief propagation, at a lower cost than variational trees or double-loop algorithms. 1

3 0.15221249 152 nips-2003-Pairwise Clustering and Graphical Models

Author: Noam Shental, Assaf Zomet, Tomer Hertz, Yair Weiss

Abstract: Signiﬁcant progress in clustering has been achieved by algorithms that are based on pairwise afﬁnities between the datapoints. In particular, spectral clustering methods have the advantage of being able to divide arbitrarily shaped clusters and are based on efﬁcient eigenvector calculations. However, spectral methods lack a straightforward probabilistic interpretation which makes it difﬁcult to automatically set parameters using training data. In this paper we use the previously proposed typical cut framework for pairwise clustering. We show an equivalence between calculating the typical cut and inference in an undirected graphical model. We show that for clustering problems with hundreds of datapoints exact inference may still be possible. For more complicated datasets, we show that loopy belief propagation (BP) and generalized belief propagation (GBP) can give excellent results on challenging clustering problems. We also use graphical models to derive a learning algorithm for afﬁnity matrices based on labeled data. 1

4 0.12277301 169 nips-2003-Sample Propagation

Author: Mark A. Paskin

Abstract: Rao–Blackwellization is an approximation technique for probabilistic inference that ﬂexibly combines exact inference with sampling. It is useful in models where conditioning on some of the variables leaves a simpler inference problem that can be solved tractably. This paper presents Sample Propagation, an efﬁcient implementation of Rao–Blackwellized approximate inference for a large class of models. Sample Propagation tightly integrates sampling with message passing in a junction tree, and is named for its simple, appealing structure: it walks the clusters of a junction tree, sampling some of the current cluster’s variables and then passing a message to one of its neighbors. We discuss the application of Sample Propagation to conditional Gaussian inference problems such as switching linear dynamical systems. 1

5 0.11055621 32 nips-2003-Approximate Expectation Maximization

Author: Tom Heskes, Onno Zoeter, Wim Wiegerinck

Abstract: We discuss the integration of the expectation-maximization (EM) algorithm for maximum likelihood learning of Bayesian networks with belief propagation algorithms for approximate inference. Specifically we propose to combine the outer-loop step of convergent belief propagation algorithms with the M-step of the EM algorithm. This then yields an approximate EM algorithm that is essentially still double loop, with the important advantage of an inner loop that is guaranteed to converge. Simulations illustrate the merits of such an approach. 1

6 0.10142174 117 nips-2003-Linear Response for Approximate Inference

7 0.096147612 80 nips-2003-Generalised Propagation for Fast Fourier Transforms with Partial or Missing Data

8 0.086741678 22 nips-2003-An Improved Scheme for Detection and Labelling in Johansson Displays

9 0.076827869 29 nips-2003-Applying Metric-Trees to Belief-Point POMDPs

10 0.071403019 58 nips-2003-Efficient Multiscale Sampling from Products of Gaussian Mixtures

11 0.068607442 162 nips-2003-Probabilistic Inference of Speech Signals from Phaseless Spectrograms

12 0.068544589 174 nips-2003-Semidefinite Relaxations for Approximate Inference on Graphs with Cycles

13 0.066257969 50 nips-2003-Denoising and Untangling Graphs Using Degree Priors

14 0.065799847 35 nips-2003-Attractive People: Assembling Loose-Limbed Models using Non-parametric Belief Propagation

15 0.061575584 142 nips-2003-On the Concentration of Expectation and Approximate Inference in Layered Networks

16 0.054938551 91 nips-2003-Inferring State Sequences for Non-linear Systems with Embedded Hidden Markov Models

17 0.05180927 84 nips-2003-How to Combine Expert (and Novice) Advice when Actions Impact the Environment?

18 0.04742486 97 nips-2003-Iterative Scaled Trust-Region Learning in Krylov Subspaces via Pearlmutter's Implicit Sparse Hessian

19 0.04448545 180 nips-2003-Sparseness of Support Vector Machines---Some Asymptotically Sharp Bounds

20 0.043284487 30 nips-2003-Approximability of Probability Distributions

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.157), (1, -0.033), (2, -0.093), (3, 0.198), (4, 0.043), (5, -0.207), (6, 0.083), (7, 0.078), (8, 0.007), (9, 0.044), (10, 0.028), (11, 0.099), (12, -0.015), (13, -0.09), (14, -0.052), (15, -0.067), (16, 0.053), (17, 0.028), (18, -0.019), (19, -0.013), (20, 0.019), (21, 0.046), (22, 0.036), (23, -0.069), (24, 0.014), (25, -0.083), (26, -0.023), (27, 0.008), (28, 0.041), (29, -0.091), (30, 0.046), (31, -0.035), (32, -0.033), (33, -0.037), (34, 0.043), (35, 0.006), (36, -0.159), (37, 0.086), (38, -0.03), (39, -0.111), (40, -0.057), (41, 0.101), (42, 0.017), (43, -0.017), (44, -0.059), (45, -0.002), (46, -0.042), (47, -0.068), (48, 0.045), (49, 0.102)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.94738257 74 nips-2003-Finding the M Most Probable Configurations using Loopy Belief Propagation

Author: Chen Yanover, Yair Weiss

2 0.82617205 189 nips-2003-Tree-structured Approximations by Expectation Propagation

Author: Yuan Qi, Tom Minka

3 0.71227938 169 nips-2003-Sample Propagation

Author: Mark A. Paskin

4 0.62223834 152 nips-2003-Pairwise Clustering and Graphical Models

Author: Noam Shental, Assaf Zomet, Tomer Hertz, Yair Weiss

5 0.52719492 32 nips-2003-Approximate Expectation Maximization

Author: Tom Heskes, Onno Zoeter, Wim Wiegerinck

6 0.44340804 30 nips-2003-Approximability of Probability Distributions

7 0.42525601 22 nips-2003-An Improved Scheme for Detection and Labelling in Johansson Displays

8 0.41780233 80 nips-2003-Generalised Propagation for Fast Fourier Transforms with Partial or Missing Data

9 0.39323029 35 nips-2003-Attractive People: Assembling Loose-Limbed Models using Non-parametric Belief Propagation

10 0.38268185 117 nips-2003-Linear Response for Approximate Inference

11 0.37744102 174 nips-2003-Semidefinite Relaxations for Approximate Inference on Graphs with Cycles

12 0.35763097 180 nips-2003-Sparseness of Support Vector Machines---Some Asymptotically Sharp Bounds

13 0.34790385 58 nips-2003-Efficient Multiscale Sampling from Products of Gaussian Mixtures

14 0.30447558 50 nips-2003-Denoising and Untangling Graphs Using Degree Priors

15 0.28670371 142 nips-2003-On the Concentration of Expectation and Approximate Inference in Layered Networks

16 0.27845755 100 nips-2003-Laplace Propagation

17 0.27548903 162 nips-2003-Probabilistic Inference of Speech Signals from Phaseless Spectrograms

18 0.26987693 44 nips-2003-Can We Learn to Beat the Best Stock

19 0.26917633 97 nips-2003-Iterative Scaled Trust-Region Learning in Krylov Subspaces via Pearlmutter's Implicit Sparse Hessian

20 0.24246815 172 nips-2003-Semi-Supervised Learning with Trees

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.052), (11, 0.014), (29, 0.012), (30, 0.014), (35, 0.06), (53, 0.051), (59, 0.014), (68, 0.017), (71, 0.043), (76, 0.497), (85, 0.053), (91, 0.074), (99, 0.021)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.94111902 155 nips-2003-Perspectives on Sparse Bayesian Learning

Author: Jason Palmer, Bhaskar D. Rao, David P. Wipf

Abstract: Recently, relevance vector machines (RVM) have been fashioned from a sparse Bayesian learning (SBL) framework to perform supervised learning using a weight prior that encourages sparsity of representation. The methodology incorporates an additional set of hyperparameters governing the prior, one for each weight, and then adopts a speciﬁc approximation to the full marginalization over all weights and hyperparameters. Despite its empirical success however, no rigorous motivation for this particular approximation is currently available. To address this issue, we demonstrate that SBL can be recast as the application of a rigorous variational approximation to the full model by expressing the prior in a dual form. This formulation obviates the necessity of assuming any hyperpriors and leads to natural, intuitive explanations of why sparsity is achieved in practice. 1

same-paper 2 0.93302006 74 nips-2003-Finding the M Most Probable Configurations using Loopy Belief Propagation

Author: Chen Yanover, Yair Weiss

3 0.89861631 160 nips-2003-Prediction on Spike Data Using Kernel Algorithms

Author: Jan Eichhorn, Andreas Tolias, Alexander Zien, Malte Kuss, Jason Weston, Nikos Logothetis, Bernhard Schölkopf, Carl E. Rasmussen

Abstract: We report and compare the performance of different learning algorithms based on data from cortical recordings. The task is to predict the orientation of visual stimuli from the activity of a population of simultaneously recorded neurons. We compare several ways of improving the coding of the input (i.e., the spike data) as well as of the output (i.e., the orientation), and report the results obtained using different kernel algorithms. 1

4 0.87602597 178 nips-2003-Sparse Greedy Minimax Probability Machine Classification

Author: Thomas R. Strohmann, Andrei Belitski, Gregory Z. Grudic, Dennis DeCoste

Abstract: The Minimax Probability Machine Classiﬁcation (MPMC) framework [Lanckriet et al., 2002] builds classiﬁers by minimizing the maximum probability of misclassiﬁcation, and gives direct estimates of the probabilistic accuracy bound Ω. The only assumptions that MPMC makes is that good estimates of means and covariance matrices of the classes exist. However, as with Support Vector Machines, MPMC is computationally expensive and requires extensive cross validation experiments to choose kernels and kernel parameters that give good performance. In this paper we address the computational cost of MPMC by proposing an algorithm that constructs nonlinear sparse MPMC (SMPMC) models by incrementally adding basis functions (i.e. kernels) one at a time – greedily selecting the next one that maximizes the accuracy bound Ω. SMPMC automatically chooses both kernel parameters and feature weights without using computationally expensive cross validation. Therefore the SMPMC algorithm simultaneously addresses the problem of kernel selection and feature selection (i.e. feature weighting), based solely on maximizing the accuracy bound Ω. Experimental results indicate that we can obtain reliable bounds Ω, as well as test set accuracies that are comparable to state of the art classiﬁcation algorithms.

5 0.57267535 112 nips-2003-Learning to Find Pre-Images

Author: Jason Weston, Bernhard Schölkopf, Gökhan H. Bakir

Abstract: We consider the problem of reconstructing patterns from a feature map. Learning algorithms using kernels to operate in a reproducing kernel Hilbert space (RKHS) express their solutions in terms of input points mapped into the RKHS. We introduce a technique based on kernel principal component analysis and regression to reconstruct corresponding patterns in the input space (aka pre-images) and review its performance in several applications requiring the construction of pre-images. The introduced technique avoids difﬁcult and/or unstable numerical optimization, is easy to implement and, unlike previous methods, permits the computation of pre-images in discrete input spaces. 1

6 0.55045325 189 nips-2003-Tree-structured Approximations by Expectation Propagation

7 0.54114902 176 nips-2003-Sequential Bayesian Kernel Regression

8 0.53462601 103 nips-2003-Learning Bounds for a Generalized Family of Bayesian Posterior Distributions

9 0.50271451 9 nips-2003-A Kullback-Leibler Divergence Based Kernel for SVM Classification in Multimedia Applications

10 0.49514669 152 nips-2003-Pairwise Clustering and Graphical Models

11 0.49017441 57 nips-2003-Dynamical Modeling with Kernels for Nonlinear Time Series Prediction

12 0.48873195 122 nips-2003-Margin Maximizing Loss Functions

13 0.48416358 49 nips-2003-Decoding V1 Neuronal Activity using Particle Filtering with Volterra Kernels

14 0.48305234 54 nips-2003-Discriminative Fields for Modeling Spatial Dependencies in Natural Images

15 0.48158312 180 nips-2003-Sparseness of Support Vector Machines---Some Asymptotically Sharp Bounds

16 0.47980639 173 nips-2003-Semi-supervised Protein Classification Using Cluster Kernels

17 0.47921124 17 nips-2003-A Sampled Texture Prior for Image Super-Resolution

18 0.4786301 43 nips-2003-Bounded Invariance and the Formation of Place Fields

19 0.47742954 94 nips-2003-Information Maximization in Noisy Channels : A Variational Approach

20 0.47725412 107 nips-2003-Learning Spectral Clustering