jmlr jmlr2013 jmlr2013-13 knowledge-graph by maker-knowledge-mining

13 jmlr-2013-Approximating the Permanent with Fractional Belief Propagation

Source: pdf

Author: Michael Chertkov, Adam B. Yedidia

Abstract: We discuss schemes for exact and approximate computations of permanents, and compare them with each other. Speciﬁcally, we analyze the belief propagation (BP) approach and its fractional belief propagation (FBP) generalization for computing the permanent of a non-negative matrix. Known bounds and Conjectures are veriﬁed in experiments, and some new theoretical relations, bounds and Conjectures are proposed. The fractional free energy (FFE) function is parameterized by a scalar parameter γ ∈ [−1; 1], where γ = −1 corresponds to the BP limit and γ = 1 corresponds to the exclusion principle (but ignoring perfect matching constraints) mean-ﬁeld (MF) limit. FFE shows monotonicity and continuity with respect to γ. For every non-negative matrix, we deﬁne its special value γ∗ ∈ [−1; 0] to be the γ for which the minimum of the γ-parameterized FFE function is equal to the permanent of the matrix, where the lower and upper bounds of the γ-interval corresponds to respective bounds for the permanent. Our experimental analysis suggests that the distribution of γ∗ varies for different ensembles but γ∗ always lies within the [−1; −1/2] interval. Moreover, for all ensembles considered, the behavior of γ∗ is highly distinctive, offering an empirical practical guidance for estimating permanents of non-negative matrices via the FFE approach. Keywords: permanent, graphical models, belief propagation, exact and approximate algorithms, learning ﬂows

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Speciﬁcally, we analyze the belief propagation (BP) approach and its fractional belief propagation (FBP) generalization for computing the permanent of a non-negative matrix. [sent-5, score-0.726]

2 The fractional free energy (FFE) function is parameterized by a scalar parameter γ ∈ [−1; 1], where γ = −1 corresponds to the BP limit and γ = 1 corresponds to the exclusion principle (but ignoring perfect matching constraints) mean-ﬁeld (MF) limit. [sent-7, score-0.26]

3 For every non-negative matrix, we deﬁne its special value γ∗ ∈ [−1; 0] to be the γ for which the minimum of the γ-parameterized FFE function is equal to the permanent of the matrix, where the lower and upper bounds of the γ-interval corresponds to respective bounds for the permanent. [sent-9, score-0.613]

4 Moreover, for all ensembles considered, the behavior of γ∗ is highly distinctive, offering an empirical practical guidance for estimating permanents of non-negative matrices via the FFE approach. [sent-11, score-0.227]

5 The task of learning consists in maximizing the permanent of an n × n matrix, with elements constructed of probabilities for a particle in the ﬁrst image to correspond to a particle in the second image, over the low-dimensional parametrization of the reconstructed ﬂow. [sent-16, score-0.597]

6 The permanents in this enabling application are nothing but a weighted number of perfect matchings relating particles in the two images. [sent-17, score-0.335]

7 In this manuscript we continue the thread of Watanabe and Chertkov (2010) and focus on computations of positive permanents of non-negative matrices constructed from probabilities. [sent-18, score-0.201]

8 C HERTKOV AND Y EDIDIA computation of the permanent is difﬁcult, that is, it is a problem of likely exponential complexity, with the fastest known general Algorithm for computing the permanent of a full n × n matrix based on the formula from Ryser (1963) requiring O (n2n ) operations. [sent-23, score-1.13]

9 In fact, the task of computing the permanent of a non-negative matrix was one of the ﬁrst problems established to be in the #-P complexity class, and the task is also complete in the class (Valiant, 1979). [sent-24, score-0.587]

10 The focus of the mathematics of permanent approach was on establishing rigorous lower and upper bounds for permanents. [sent-27, score-0.543]

11 Many signiﬁcant results in this line of research are related to the Conjecture of van der Waerden (1926) that the minimum of the permanent over doubly stochastic matrices is n! [sent-28, score-0.797]

12 A very signiﬁcant breakthrough in the Monte-Carlo sampling was achieved with the invention of the fully polynomial randomized Algorithmic schemes (fpras) for the permanent problem (Jerrum et al. [sent-33, score-0.543]

13 , 2004): the permanent is approximated in polynomial time, provably with high probability and within an arbitrarily small relative error. [sent-34, score-0.543]

14 Belief propagation (BP) heuristics applied to permanent showed surprisingly good performance (Chertkov et al. [sent-38, score-0.567]

15 To address this challenge (Watanabe and Chertkov, 2010) established a theoretical link between the exact permanent and its BP approximation. [sent-49, score-0.572]

16 The permanent of the original non-negative matrix was expressed as a product of terms, including the BP-estimate and another permanent of an auxiliary matrix, β. [sent-50, score-1.13]

17 ∗ (1 − β),1 where β is the doubly stochastic matrix of the marginal probabilities of the links between the particles in the two images (edges in the underlying GM) calculated using the BP approach. [sent-51, score-0.276]

18 (2005) in the entropy term, and then derive new exact relations between the original permanent and the results of the FFE-based approach (see Theorem 13). [sent-65, score-0.592]

19 The case of γ = −1 corresponds to BP and the case of γ = 1 corresponds to the so-called exclusion principle (Fermi), but ignoring perfect matching constraints, mean ﬁeld (MF) approximation discussed earlier by Chertkov et al. [sent-69, score-0.194]

20 2 The material in the manuscript is organized as follows: the technical introduction, stating the computation of the permanent as a GM, is explained in Section 2 and Appendix A. [sent-74, score-0.565]

21 Technical Introduction The permanent of a square matrix p, p = (pi j |i, j = 1, . [sent-80, score-0.587]

22 An example of a physics problem, where computations of permanents are important, is given by particle tracking experiments and measurements techniques, of the type discussed in Chertkov et al. [sent-95, score-0.196]

23 Computing the permanent for a given set of values of the parameters constitutes an important subtask, the one we are focusing on in this manuscript. [sent-103, score-0.543]

24 The relation between the problem of computing the permanent and the problem of ﬁnding the most probable (maximum) perfect matching is discussed in Appendix A. [sent-113, score-0.712]

25 2 Exact Methods for Computing Permanents Computing the permanent of a matrix is a #-P hard problem, that is, it is a problem which most likely requires a number of operations exponential in the size of the matrix. [sent-116, score-0.587]

26 Note that in most practical cases many entries of p are very small and they do not affect the permanent of p signiﬁcantly. [sent-126, score-0.543]

27 We also verify some of our results against randomized computations of the permanent using the FPRAS from Jerrum et al. [sent-129, score-0.543]

28 Approximate Methods and Exact Relations We perform an approximate computation of the permanent by following the general BFE approach of Yedidia et al. [sent-133, score-0.543]

29 (2005) and the associated belief propagation/Bethe-Peierls (BP) Algorithm, discussed in detail for the case of permanents of non-negative matrices in Chertkov et al. [sent-134, score-0.234]

30 (2008) with a special type of initialization corresponding to the best perfect matching of p. [sent-141, score-0.18]

31 Even though these optimization approaches and respective Algorithms can be thought of as approximating the permanent we will show that they also generate some exact relations for the permanent. [sent-144, score-0.63]

32 3 , if ∀(i, j) with pi j = In English, the interior solution means that all elements of the doubly stochastic matrices β are non-integer, under exception of the case when pi j = 0 and, respectively, βi j = 0. [sent-149, score-0.428]

33 Schematically, the logic extended to the case with variables associated with edges of the graph and leading to Equation (4) for the permanent is as follows. [sent-153, score-0.543]

34 In the following, and whenever Bethe, MF, or fractional FE are mentioned, we will drop the clarifying—for the permanent—as only permanents are discussed in this manuscript. [sent-164, score-0.236]

35 According to the Loop Calculus approach of Chertkov and Chernyak (2006a,b), extended to the case of the permanent in Chertkov et al. [sent-169, score-0.543]

36 An iterative heuristic Algorithm solving BP Equations (6) for the doubly stochastic matrix β efﬁciently is discussed in Appendix B. [sent-175, score-0.264]

37 We call such a solution of the BP Equations (6) partially resolved solutions, emphasizing that a part of the solution forms a partial perfect matching, and any other perfect matching over this subset is excluded by the solution (in view of the probabilistic interpretation of β). [sent-188, score-0.365]

38 A doubly stochastic matrix β corresponding to a full perfect matching is called a fully resolved solution of the BP Equations (6). [sent-189, score-0.458]

39 Therefore, we can exclude the possibility of achieving the minimum of the Bethe FE anywhere but at an interior solution, partially resolved solution or a fully resolved solution (corresponding to a perfect matching) of the BP equations. [sent-194, score-0.282]

40 Note, that an example where the minimum in Equation (8) is achieved at the boundary of the β − polytope (in fact, at the most probable perfect matching corner of the polytope) was discussed in Watanabe and Chertkov (2010). [sent-195, score-0.204]

41 We ﬁrst 2036 A PPROXIMATING THE P ERMANENT WITH F RACTIONAL B ELIEF P ROPAGATION observe that regardless of p for n = 2, the entropy contributions to the Bethe FE are identical to zero i, j=1,2 for any doubly stochastic (2 × 2) matrix, ∑(i, j) (βi j log βi j − (1 − βi j ) log(1 − βi j )) = 0. [sent-202, score-0.253]

42 Because of how the perfect matching problem is deﬁned, the two states of an individual variable, σi j = 0 and σi j = 1, are in the exclusion relation, and so one can also associate the special form of Equation (9) with the exclusion or Fermi- (for Fermi-statistics of physics) principle. [sent-216, score-0.23]

43 Direct examination shows that (unlike in the BP case) β with a single element equal to unity or zero (when the respective p element is nonzero) cannot be a solution of the MF Equations (11) over doubly stochastic matrix β—fully consistently with the Proposition 10 above. [sent-222, score-0.338]

44 Moreover, − log(Zo−MF (p)), deﬁned as the minimum of the MF FE (9), is simply equal to − log(ZMF (p), deﬁned as FMF (β) evaluated at the (only) doubly stochastic matrix solution of Equation (11). [sent-223, score-0.275]

45 Finally and most importantly (for the MF discussion of this manuscript), the MF approximation for the permanent, ZMF , can be related to the permanent itself as follows: Theorem 11 (Permanent and MF) Z(p) = perm(p) = ZMF (p)perm (β. [sent-231, score-0.543]

46 However, for all but degenerate p, that is, one reducible by permutations to a 2039 C HERTKOV AND Y EDIDIA diagonal matrix, the perfect matching solution is an isolated point. [sent-258, score-0.18]

47 Moreover, one also ﬁnds that a solution β is ε-close to a perfect matching only if p is ε1+γ -close to a diagonal matrix. [sent-261, score-0.18]

48 Proof Observe that for any non-negative p and doubly stochastic matrix β, ∑(i, j)∈E (1 − βi j ) log(1 − (γ1 ) βi j ) < 0, so for any γ1,2 ∈ [−1; 1] such that γ1 > γ2 , Ff deﬁnition of (γ) Fo− f (p), (γ ) Fo−1f (p) ≤ (γ ) Ff 1 (β|p) particular for β which is optimal for γ2 . [sent-266, score-0.243]

49 Then according to the (γ ) ≤ Ff 2 (β|p), for any doubly stochastic matrix β, in (γ ) (γ ) Finally, Fo−1f (p) ≤ Fo−2f (p), proving monotonicity. [sent-268, score-0.243]

50 1 the recently derived permanent inequalities related to BP and MF analysis. [sent-273, score-0.543]

51 ∗ (1 − β)) ≥ ∏ (1 − βi j ), (17) (i, j) stated for any doubly stochastic matrix β, with some (gauge) manipulations/transformations of the type discussed above in Sections 3. [sent-294, score-0.264]

52 One direct Corollary of the bound (16) discussed in Gurvits (2011), is that Corollary 19 For an arbitrary doubly stochastic matrix φ perm(φ) ≥ Zo−BP (φ) ≥ ∏ (1 − φi j )1−φi j . [sent-298, score-0.264]

53 Proposition 20 (BP lower bound #1) For any non-negative p and doubly stochastic matrix β ∈ (int) B p solving Equations (6) (if the solution exists) results in perm(p) ≥ ZBP (β|p) ∏ (1 − βi j )βi j −1 (i, j) n! [sent-300, score-0.275]

54 Proposition 23 The following is true for any doubly stochastic matrix β and any γ ∈ [−1; 1] perm β. [sent-311, score-0.459]

55 Indeed, combining Equations (16) with Propositions 17,26 one arrives at Proposition 27 (Special γ∗ ) For any non-negative p there exists a special γ∗ ∈ [−1; 0], such that (γ∗ ) perm(p) = Zo− f (p), and the minimal FFE-based solution upper (lower) bounds the permanent at 0 ≥ γ > γ∗ (−1 ≤ γ < γ∗ ). [sent-335, score-0.655]

56 Note also that due to the monotonicity stated in Proposition 17, the γ = 0 upper bound on the permanent is tighter than the MF, γ = 1, upper bound. [sent-337, score-0.543]

57 However, and as discussed in more details in the next Subsection, even the γ = 0 upper bound on the permanent is not expected to be tight. [sent-338, score-0.564]

58 The inequality in Equation (21) turns into the equality f (n) = 2 when p is doubly stochastic matrix and block diagonal, with all the elements in the 2 × 2 blocks equal to 1/2. [sent-342, score-0.243]

59 2043 C HERTKOV AND Y EDIDIA Conjecture 29 The following inequality holds for any doubly stochastic n × n matrix φ: √ n perm(φ) ≤ 2 ∏ (1 − φi j )(1−φi j ) . [sent-346, score-0.243]

60 (2000) a deterministic polynomialtime Algorithm to approximate the permanent of n × n nonnegative matrices within the relative √ n factor 2 . [sent-348, score-0.574]

61 The second one, λout , corresponds to an instance of the guessed values of the parameters in the learning problem, where computation of the permanent is an auxiliary step. [sent-364, score-0.543]

62 (Actual optimal learning consists in computing the maximum of the permanent over λout . [sent-365, score-0.543]

63 ) In our simulations we test the quality of the permanent approximations in the special case, when λin = λout , and also in other cases when the guessed values of the parameters do not coincide with the input ones, λin = λout . [sent-366, score-0.575]

64 Figure (a) shows the gap between the exact permanent and its lower bound estimate by Equation (24). [sent-385, score-0.572]

65 • [0; ρ]-shifted: We generate the block diagonal matrix with To make the task of the exact computation of the permanent of a random matrix tractable we consider sparsiﬁed versions of the ensembles deﬁned above. [sent-390, score-0.708]

66 The permanent of the matrix p with elements pi j = w1/T , i = j , 1, i= j where w > 1 and T > 0, can be evaluated through the recursion, n ∑ W (n−k)/T k=0 n k Dk , D0 = 1, D1 = 0, and ∀k ≥ 2, Dk = (k − 1)(Dk−1 + Dk−2 ). [sent-396, score-0.639]

67 To test the gap between the exact expression for the permanent and the FFE-based lower bound of Corollary 24, we ﬁx w = 2, n = 20 and vary the temperature parameter, T . [sent-401, score-0.572]

68 We search for the special γ = γ∗ , deﬁned in Proposition 27, by calculating the permanent of a full matrix, p, of size n × n, with n = 3, . [sent-410, score-0.575]

69 , 40, and then (γ) comparing it with the FFE-based value Z f (β, p),7 where the doubly stochastic matrix β solves Equations (14) for given p, for different γ. [sent-416, score-0.243]

70 We also observed, estimating or extrapolating the approximate value of the special γ∗ for a given matrix, that it might be possible to estimate the permanent of a matrix efﬁciently and very accurately for some ensembles. [sent-419, score-0.619]

71 To make a data point, 100 instances, each corresponding to a new matrix are drawn, and the log of the ratio of the bound to the actual permanent is recorded. [sent-485, score-0.641]

72 Figure 12 is related to discussions of Corollary 19 for the permanent of a doubly stochastic matrix. [sent-489, score-0.742]

73 (The hierarchical relation obviously holds as well for any individual instance of the doubly stochastic matrix β from the generated ensemble. [sent-498, score-0.243]

74 • The discovery of the exact relation between the permanent of a non-negative matrix, perm(p) (γ) and the respective FFE-based expression, Z f (p), where the latter is computationally tractable. [sent-532, score-0.61]

75 • The extension of the list of known BP-based upper and lower bounds for the permanent by their FFE-based counterparts. [sent-534, score-0.543]

76 • The experimental analysis of permanents of different ensembles of interest, including those expressing relations between consecutive images of stochastic ﬂows visualized with particles. [sent-535, score-0.265]

77 10 • Addressing other GM problems of the permanental type, for example, counting matchings (and not only perfect matchings) on arbitrary graphs, drawing inspiration from Sanghavi et al. [sent-552, score-0.208]

78 1 Most Probable Perfect Matching over Bi-Partite Graphs According to Equation (2), the permanent can be interpreted as the partition function of a GM. [sent-565, score-0.543]

79 Using “physics terminology” one says that this perfect matching representation allows to interpret the permanent as the statistical mechanics of perfect matchings (called dimers in the physics literature) over the bipartite graph. [sent-568, score-0.845]

80 Note also that Equation (24) is a linear programming (LP) equation, but one which at ﬁrst sight appears intractable, giving an optimization deﬁned over a huge polytope and spanning all the perfect matchings with nonzero probability. [sent-579, score-0.189]

81 Consider an interior minimum of the Bethe FE function (4) achieved with a strictly nonzero (for elements with positive pi j ) doubly stochastic matrix β. [sent-592, score-0.357]

82 (27) On the other hand, applying the permanent to both sides of Equation (6) one arrives at perm(p) = perm(β. [sent-595, score-0.591]

83 Note that the Algorithm presented above is certainly not the only option one can use to ﬁnd a doubly stochastic matrix solution of BP Equations (6). [sent-614, score-0.275]

84 (2005), stated for the problem of computing the permanent in Chertkov et al. [sent-616, score-0.543]

85 (31) On the other hand, applying the permanent to both sides of Equation (11) one arrives at perm(p) = perm(β. [sent-625, score-0.591]

86 (35) On the other hand, applying the permanent to both sides of Equation (14) one arrives at perm(p) = perm(β. [sent-637, score-0.591]

87 Then, Zo−BP for such a partially resolved solution is split into the product of two contributions, Zo−BP = Z pm · Zint , where Z pm corresponds to the perfect matching block, and Zint corresponds to the interior block. [sent-663, score-0.277]

88 In fact, Z pm is equal to the weighted perfect matching block of p and − log(Zint ) corresponds to the minimum of the Bethe FE computed for the interior block of p. [sent-664, score-0.21]

89 On the other hand the full partition function, Z, can be bounded from below by the product Z ≥ Z1 · Z2 , where Z1 and Z2 are permanents of the ﬁrst and second blocks of the original matrix p. [sent-665, score-0.192]

90 ) However, Z1 ≥ Z pm , as counting only one perfect matching (and ignoring others), and Z2 ≥ Zint in accordance to what was already shown above for any minimum of Bethe FE achieved in the interior of the respective domain. [sent-667, score-0.277]

91 Pruning of the Matrix Computing the permanent of sufﬁciently dense matrices exactly with the ZDD approach explained in Appendix G is infeasible for n > 30. [sent-669, score-0.574]

92 To avoid getting a zero permanent in the result, we include 2060 A PPROXIMATING THE P ERMANENT WITH F RACTIONAL B ELIEF P ROPAGATION all components of the maximum perfect matching permutation in the pruned matrix. [sent-673, score-0.72]

93 Once the basic concept of ZDDs is introduced, one can use it for solving various combinatorial problems, for example, to represent a permanent as a ZDD in order to use the method. [sent-713, score-0.543]

94 a ZDD representing a matrix is equal to the permanent of the corresponding to 0 − 1 matrix, with each 1 corresponding a nonzero entry. [sent-724, score-0.587]

95 In order to ﬁnd the permanent of matrices that are not 0-1 matrices, only a small modiﬁcation is necessary. [sent-725, score-0.574]

96 The WeightedCount of the root node of the ZDD will be equal to the permanent of the corresponding matrix. [sent-728, score-0.566]

97 Proof of the van der Waerden conjecture regarding the permanent of a doubly stochastic matrix. [sent-863, score-0.802]

98 Fast approximation of the permanent for very dense problems. [sent-913, score-0.543]

99 A polynomial-time approximation algorithm for the permanent of a matrix with nonnegative entries. [sent-921, score-0.587]

100 Belief propagation and loop calculus for the permanent of a nonnegative matrix. [sent-1051, score-0.567]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('permanent', 0.543), ('chertkov', 0.327), ('bp', 0.315), ('perm', 0.216), ('fe', 0.204), ('mf', 0.155), ('doubly', 0.15), ('permanents', 0.148), ('zdd', 0.148), ('bethe', 0.136), ('gurvits', 0.13), ('edidia', 0.117), ('hertkov', 0.117), ('elief', 0.111), ('ermanent', 0.111), ('ractional', 0.111), ('zbp', 0.105), ('watanabe', 0.098), ('pproximating', 0.095), ('zo', 0.09), ('perfect', 0.086), ('ropagation', 0.086), ('ryser', 0.08), ('vontobel', 0.074), ('matchings', 0.068), ('fractional', 0.067), ('matching', 0.062), ('interior', 0.062), ('yedidia', 0.062), ('equations', 0.06), ('bfe', 0.056), ('zmf', 0.056), ('scatter', 0.054), ('log', 0.054), ('ff', 0.053), ('lo', 0.053), ('int', 0.052), ('pi', 0.052), ('gm', 0.05), ('fbp', 0.049), ('ensemble', 0.049), ('stochastic', 0.049), ('ensembles', 0.048), ('arrives', 0.048), ('conjectures', 0.047), ('fo', 0.047), ('matrix', 0.044), ('ffe', 0.043), ('zdds', 0.043), ('proposition', 0.04), ('equation', 0.039), ('respective', 0.038), ('bayati', 0.037), ('conjecture', 0.036), ('polytope', 0.035), ('resolved', 0.035), ('knuth', 0.035), ('belief', 0.034), ('hi', 0.033), ('jebara', 0.033), ('particles', 0.033), ('special', 0.032), ('solution', 0.032), ('schrijver', 0.032), ('matrices', 0.031), ('waerden', 0.031), ('lp', 0.031), ('pruned', 0.029), ('ows', 0.029), ('exact', 0.029), ('counting', 0.029), ('branch', 0.028), ('particle', 0.027), ('alamos', 0.026), ('doi', 0.026), ('statement', 0.026), ('pruning', 0.026), ('exclusion', 0.025), ('permanental', 0.025), ('weightedcount', 0.025), ('zint', 0.025), ('zml', 0.025), ('unity', 0.025), ('der', 0.024), ('propagation', 0.024), ('degeneracy', 0.024), ('diagrams', 0.024), ('continuity', 0.023), ('node', 0.023), ('ml', 0.022), ('los', 0.022), ('stochasticity', 0.022), ('manuscript', 0.022), ('discussed', 0.021), ('accesses', 0.021), ('lanl', 0.021), ('nn', 0.021), ('sink', 0.02), ('relations', 0.02), ('energy', 0.02)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999934 13 jmlr-2013-Approximating the Permanent with Fractional Belief Propagation

Author: Michael Chertkov, Adam B. Yedidia

2 0.14479707 120 jmlr-2013-Variational Algorithms for Marginal MAP

Author: Qiang Liu, Alexander Ihler

Abstract: The marginal maximum a posteriori probability (MAP) estimation problem, which calculates the mode of the marginal posterior distribution of a subset of variables with the remaining variables marginalized, is an important inference problem in many models, such as those with hidden variables or uncertain parameters. Unfortunately, marginal MAP can be NP-hard even on trees, and has attracted less attention in the literature compared to the joint MAP (maximization) and marginalization problems. We derive a general dual representation for marginal MAP that naturally integrates the marginalization and maximization operations into a joint variational optimization problem, making it possible to easily extend most or all variational-based algorithms to marginal MAP. In particular, we derive a set of “mixed-product” message passing algorithms for marginal MAP, whose form is a hybrid of max-product, sum-product and a novel “argmax-product” message updates. We also derive a class of convergent algorithms based on proximal point methods, including one that transforms the marginal MAP problem into a sequence of standard marginalization problems. Theoretically, we provide guarantees under which our algorithms give globally or locally optimal solutions, and provide novel upper bounds on the optimal objectives. Empirically, we demonstrate that our algorithms signiﬁcantly outperform the existing approaches, including a state-of-the-art algorithm based on local search methods. Keywords: graphical models, message passing, belief propagation, variational methods, maximum a posteriori, marginal-MAP, hidden variable models

3 0.12030347 17 jmlr-2013-Belief Propagation for Continuous State Spaces: Stochastic Message-Passing with Quantitative Guarantees

Author: Nima Noorshams, Martin J. Wainwright

Abstract: The sum-product or belief propagation (BP) algorithm is a widely used message-passing technique for computing approximate marginals in graphical models. We introduce a new technique, called stochastic orthogonal series message-passing (SOSMP), for computing the BP ﬁxed point in models with continuous random variables. It is based on a deterministic approximation of the messages via orthogonal series basis expansion, and a stochastic estimation of the basis coefﬁcients via Monte Carlo techniques and damped updates. We prove that the SOSMP iterates converge to a δ-neighborhood of the unique BP ﬁxed point for any tree-structured graph, and for any graphs with cycles in which the BP updates satisfy a contractivity condition. In addition, we demonstrate how to choose the number of basis coefﬁcients as a function of the desired approximation accuracy δ and smoothness of the compatibility functions. We illustrate our theory with both simulated examples and in application to optical ﬂow estimation. Keywords: graphical models, sum-product for continuous state spaces, low-complexity belief propagation, stochastic approximation, Monte Carlo methods, orthogonal basis expansion

4 0.073711008 114 jmlr-2013-The Rate of Convergence of AdaBoost

Author: Indraneel Mukherjee, Cynthia Rudin, Robert E. Schapire

Abstract: The AdaBoost algorithm was designed to combine many “weak” hypotheses that perform slightly better than random guessing into a “strong” hypothesis that has very low error. We study the rate at which AdaBoost iteratively converges to the minimum of the “exponential loss.” Unlike previous work, our proofs do not require a weak-learning assumption, nor do they require that minimizers of the exponential loss are ﬁnite. Our ﬁrst result shows that the exponential loss of AdaBoost’s computed parameter vector will be at most ε more than that of any parameter vector of ℓ1 -norm bounded by B in a number of rounds that is at most a polynomial in B and 1/ε. We also provide lower bounds showing that a polynomial dependence is necessary. Our second result is that within C/ε iterations, AdaBoost achieves a value of the exponential loss that is at most ε more than the best possible value, where C depends on the data set. We show that this dependence of the rate on ε is optimal up to constant factors, that is, at least Ω(1/ε) rounds are necessary to achieve within ε of the optimal exponential loss. Keywords: AdaBoost, optimization, coordinate descent, convergence rate

5 0.052144684 107 jmlr-2013-Stochastic Dual Coordinate Ascent Methods for Regularized Loss Minimization

Author: Shai Shalev-Shwartz, Tong Zhang

Abstract: Stochastic Gradient Descent (SGD) has become popular for solving large scale supervised machine learning optimization problems such as SVM, due to their strong theoretical guarantees. While the closely related Dual Coordinate Ascent (DCA) method has been implemented in various software packages, it has so far lacked good convergence analysis. This paper presents a new analysis of Stochastic Dual Coordinate Ascent (SDCA) showing that this class of methods enjoy strong theoretical guarantees that are comparable or better than SGD. This analysis justiﬁes the effectiveness of SDCA for practical applications. Keywords: stochastic dual coordinate ascent, optimization, computational complexity, regularized loss minimization, support vector machines, ridge regression, logistic regression

6 0.047439359 50 jmlr-2013-Greedy Feature Selection for Subspace Clustering

7 0.041855957 9 jmlr-2013-A Widely Applicable Bayesian Information Criterion

8 0.036779575 71 jmlr-2013-Message-Passing Algorithms for Quadratic Minimization

9 0.03505395 111 jmlr-2013-Supervised Feature Selection in Graphs with Path Coding Penalties and Network Flows

10 0.03131872 49 jmlr-2013-Global Analytic Solution of Fully-observed Variational Bayesian Matrix Factorization

11 0.028818307 93 jmlr-2013-Random Walk Kernels and Learning Curves for Gaussian Process Regression on Random Graphs

12 0.02852701 92 jmlr-2013-Random Spanning Trees and the Prediction of Weighted Graphs

13 0.027551785 88 jmlr-2013-Perturbative Corrections for Approximate Inference in Gaussian Latent Variable Models

14 0.026496068 99 jmlr-2013-Semi-Supervised Learning Using Greedy Max-Cut

15 0.026149843 4 jmlr-2013-A Max-Norm Constrained Minimization Approach to 1-Bit Matrix Completion

16 0.026064394 90 jmlr-2013-Quasi-Newton Method: A New Direction

17 0.024709122 91 jmlr-2013-Query Induction with Schema-Guided Pruning Strategies

18 0.024343381 47 jmlr-2013-Gaussian Kullback-Leibler Approximate Inference

19 0.023026362 108 jmlr-2013-Stochastic Variational Inference

20 0.021515001 72 jmlr-2013-Multi-Stage Multi-Task Feature Learning

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.148), (1, -0.011), (2, 0.03), (3, 0.059), (4, 0.071), (5, 0.135), (6, 0.081), (7, 0.009), (8, -0.062), (9, 0.085), (10, -0.019), (11, 0.163), (12, 0.204), (13, 0.12), (14, 0.091), (15, -0.183), (16, -0.287), (17, 0.003), (18, -0.014), (19, 0.061), (20, 0.121), (21, 0.143), (22, 0.023), (23, -0.01), (24, -0.061), (25, -0.037), (26, 0.12), (27, -0.077), (28, -0.012), (29, -0.054), (30, -0.055), (31, 0.137), (32, -0.188), (33, -0.028), (34, 0.05), (35, 0.186), (36, -0.046), (37, 0.034), (38, -0.092), (39, 0.02), (40, -0.018), (41, 0.147), (42, -0.013), (43, -0.065), (44, 0.06), (45, -0.115), (46, -0.118), (47, -0.011), (48, 0.07), (49, -0.043)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.91842806 13 jmlr-2013-Approximating the Permanent with Fractional Belief Propagation

Author: Michael Chertkov, Adam B. Yedidia

2 0.62252772 120 jmlr-2013-Variational Algorithms for Marginal MAP

Author: Qiang Liu, Alexander Ihler

3 0.43702763 17 jmlr-2013-Belief Propagation for Continuous State Spaces: Stochastic Message-Passing with Quantitative Guarantees

Author: Nima Noorshams, Martin J. Wainwright

4 0.35748446 107 jmlr-2013-Stochastic Dual Coordinate Ascent Methods for Regularized Loss Minimization

Author: Shai Shalev-Shwartz, Tong Zhang

5 0.34125745 9 jmlr-2013-A Widely Applicable Bayesian Information Criterion

Author: Sumio Watanabe

Abstract: A statistical model or a learning machine is called regular if the map taking a parameter to a probability distribution is one-to-one and if its Fisher information matrix is always positive deﬁnite. If otherwise, it is called singular. In regular statistical models, the Bayes free energy, which is deﬁned by the minus logarithm of Bayes marginal likelihood, can be asymptotically approximated by the Schwarz Bayes information criterion (BIC), whereas in singular models such approximation does not hold. Recently, it was proved that the Bayes free energy of a singular model is asymptotically given by a generalized formula using a birational invariant, the real log canonical threshold (RLCT), instead of half the number of parameters in BIC. Theoretical values of RLCTs in several statistical models are now being discovered based on algebraic geometrical methodology. However, it has been difﬁcult to estimate the Bayes free energy using only training samples, because an RLCT depends on an unknown true distribution. In the present paper, we deﬁne a widely applicable Bayesian information criterion (WBIC) by the average log likelihood function over the posterior distribution with the inverse temperature 1/ log n, where n is the number of training samples. We mathematically prove that WBIC has the same asymptotic expansion as the Bayes free energy, even if a statistical model is singular for or unrealizable by a statistical model. Since WBIC can be numerically calculated without any information about a true distribution, it is a generalized version of BIC onto singular statistical models. Keywords: Bayes marginal likelihood, widely applicable Bayes information criterion

6 0.28535467 49 jmlr-2013-Global Analytic Solution of Fully-observed Variational Bayesian Matrix Factorization

7 0.26799515 50 jmlr-2013-Greedy Feature Selection for Subspace Clustering

8 0.23078412 114 jmlr-2013-The Rate of Convergence of AdaBoost

9 0.20218721 71 jmlr-2013-Message-Passing Algorithms for Quadratic Minimization

10 0.19524613 88 jmlr-2013-Perturbative Corrections for Approximate Inference in Gaussian Latent Variable Models

11 0.17866954 55 jmlr-2013-Joint Harmonic Functions and Their Supervised Connections

12 0.15861453 18 jmlr-2013-Beyond Fano's Inequality: Bounds on the Optimal F-Score, BER, and Cost-Sensitive Risk and Their Implications

13 0.15602544 36 jmlr-2013-Distributions of Angles in Random Packing on Spheres

14 0.15247808 115 jmlr-2013-Training Energy-Based Models for Time-Series Imputation

15 0.14630915 5 jmlr-2013-A Near-Optimal Algorithm for Differentially-Private Principal Components

16 0.14537764 68 jmlr-2013-Machine Learning with Operational Costs

17 0.14315289 113 jmlr-2013-The CAM Software for Nonnegative Blind Source Separation in R-Java

18 0.1420393 104 jmlr-2013-Sparse Single-Index Model

19 0.14152855 81 jmlr-2013-Optimal Discovery with Probabilistic Expert Advice: Finite Time Analysis and Macroscopic Optimality

20 0.13313638 89 jmlr-2013-QuantMiner for Mining Quantitative Association Rules

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.023), (5, 0.108), (6, 0.04), (9, 0.012), (10, 0.043), (14, 0.498), (23, 0.03), (68, 0.039), (70, 0.025), (75, 0.037), (85, 0.013), (87, 0.014)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.72369045 13 jmlr-2013-Approximating the Permanent with Fractional Belief Propagation

Author: Michael Chertkov, Adam B. Yedidia

2 0.58001137 119 jmlr-2013-Variable Selection in High-Dimension with Random Designs and Orthogonal Matching Pursuit

Author: Antony Joseph

Abstract: The performance of orthogonal matching pursuit (OMP) for variable selection is analyzed for random designs. When contrasted with the deterministic case, since the performance is here measured after averaging over the distribution of the design matrix, one can have far less stringent sparsity constraints on the coefﬁcient vector. We demonstrate that for exact sparse vectors, the performance of the OMP is similar to known results on the Lasso algorithm (Wainwright, 2009). Moreover, variable selection under a more relaxed sparsity assumption on the coefﬁcient vector, whereby one has only control on the ℓ1 norm of the smaller coefﬁcients, is also analyzed. As consequence of these results, we also show that the coefﬁcient estimate satisﬁes strong oracle type inequalities. Keywords: high dimensional regression, greedy algorithms, Lasso, compressed sensing

3 0.32314157 50 jmlr-2013-Greedy Feature Selection for Subspace Clustering

Author: Eva L. Dyer, Aswin C. Sankaranarayanan, Richard G. Baraniuk

Abstract: Unions of subspaces provide a powerful generalization of single subspace models for collections of high-dimensional data; however, learning multiple subspaces from data is challenging due to the fact that segmentation—the identiﬁcation of points that live in the same subspace—and subspace estimation must be performed simultaneously. Recently, sparse recovery methods were shown to provide a provable and robust strategy for exact feature selection (EFS)—recovering subsets of points from the ensemble that live in the same subspace. In parallel with recent studies of EFS with ℓ1 -minimization, in this paper, we develop sufﬁcient conditions for EFS with a greedy method for sparse signal recovery known as orthogonal matching pursuit (OMP). Following our analysis, we provide an empirical study of feature selection strategies for signals living on unions of subspaces and characterize the gap between sparse recovery methods and nearest neighbor (NN)-based approaches. In particular, we demonstrate that sparse recovery methods provide signiﬁcant advantages over NN methods and that the gap between the two approaches is particularly pronounced when the sampling of subspaces in the data set is sparse. Our results suggest that OMP may be employed to reliably recover exact feature sets in a number of regimes where NN approaches fail to reveal the subspace membership of points in the ensemble. Keywords: subspace clustering, unions of subspaces, hybrid linear models, sparse approximation, structured sparsity, nearest neighbors, low-rank approximation

4 0.28483593 53 jmlr-2013-Improving CUR Matrix Decomposition and the Nystrom Approximation via Adaptive Sampling

Author: Shusen Wang, Zhihua Zhang

Abstract: The CUR matrix decomposition and the Nystr¨ m approximation are two important low-rank matrix o approximation techniques. The Nystr¨ m method approximates a symmetric positive semideﬁnite o matrix in terms of a small number of its columns, while CUR approximates an arbitrary data matrix by a small number of its columns and rows. Thus, CUR decomposition can be regarded as an extension of the Nystr¨ m approximation. o In this paper we establish a more general error bound for the adaptive column/row sampling algorithm, based on which we propose more accurate CUR and Nystr¨ m algorithms with expected o relative-error bounds. The proposed CUR and Nystr¨ m algorithms also have low time complexity o and can avoid maintaining the whole data matrix in RAM. In addition, we give theoretical analysis for the lower error bounds of the standard Nystr¨ m method and the ensemble Nystr¨ m method. o o The main theoretical results established in this paper are novel, and our analysis makes no special assumption on the data matrices. Keywords: large-scale matrix computation, CUR matrix decomposition, the Nystr¨ m method, o randomized algorithms, adaptive sampling

5 0.28190324 64 jmlr-2013-Lovasz theta function, SVMs and Finding Dense Subgraphs

Author: Vinay Jethava, Anders Martinsson, Chiranjib Bhattacharyya, Devdatt Dubhashi

Abstract: In this paper we establish that the Lov´ sz ϑ function on a graph can be restated as a kernel learning a problem. We introduce the notion of SVM − ϑ graphs, on which Lov´ sz ϑ function can be apa proximated well by a Support vector machine (SVM). We show that Erd¨ s-R´ nyi random G(n, p) o e log4 n np graphs are SVM − ϑ graphs for n ≤ p < 1. Even if we embed a large clique of size Θ 1−p in a G(n, p) graph the resultant graph still remains a SVM − ϑ graph. This immediately suggests an SVM based algorithm for recovering a large planted clique in random graphs. Associated with the ϑ function is the notion of orthogonal labellings. We introduce common orthogonal labellings which extends the idea of orthogonal labellings to multiple graphs. This allows us to propose a Multiple Kernel learning (MKL) based solution which is capable of identifying a large common dense subgraph in multiple graphs. Both in the planted clique case and common subgraph detection problem the proposed solutions beat the state of the art by an order of magnitude. Keywords: orthogonal labellings of graphs, planted cliques, random graphs, common dense subgraph

6 0.27560264 88 jmlr-2013-Perturbative Corrections for Approximate Inference in Gaussian Latent Variable Models

7 0.27239272 114 jmlr-2013-The Rate of Convergence of AdaBoost

8 0.27013466 100 jmlr-2013-Similarity-based Clustering by Left-Stochastic Matrix Factorization

9 0.26216811 25 jmlr-2013-Communication-Efficient Algorithms for Statistical Optimization

10 0.25994468 48 jmlr-2013-Generalized Spike-and-Slab Priors for Bayesian Group Feature Selection Using Expectation Propagation

11 0.25785047 2 jmlr-2013-A Binary-Classification-Based Metric between Time-Series Distributions and Its Use in Statistical and Learning Problems

12 0.25712064 26 jmlr-2013-Conjugate Relation between Loss Functions and Uncertainty Sets in Classification Problems

13 0.25678149 17 jmlr-2013-Belief Propagation for Continuous State Spaces: Stochastic Message-Passing with Quantitative Guarantees

14 0.2566615 52 jmlr-2013-How to Solve Classification and Regression Problems on High-Dimensional Data with a Supervised Extension of Slow Feature Analysis

15 0.2556774 4 jmlr-2013-A Max-Norm Constrained Minimization Approach to 1-Bit Matrix Completion

16 0.25495005 10 jmlr-2013-Algorithms and Hardness Results for Parallel Large Margin Learning

17 0.25407684 72 jmlr-2013-Multi-Stage Multi-Task Feature Learning

18 0.2537078 28 jmlr-2013-Construction of Approximation Spaces for Reinforcement Learning

19 0.25343522 9 jmlr-2013-A Widely Applicable Bayesian Information Criterion

20 0.25198478 99 jmlr-2013-Semi-Supervised Learning Using Greedy Max-Cut