jmlr jmlr2013 jmlr2013-71 knowledge-graph by maker-knowledge-mining

71 jmlr-2013-Message-Passing Algorithms for Quadratic Minimization

Source: pdf

Author: Nicholas Ruozzi, Sekhar Tatikonda

Abstract: Gaussian belief propagation (GaBP) is an iterative algorithm for computing the mean (and variances) of a multivariate Gaussian distribution, or equivalently, the minimum of a multivariate positive deﬁnite quadratic function. Sufﬁcient conditions, such as walk-summability, that guarantee the convergence and correctness of GaBP are known, but GaBP may fail to converge to the correct solution given an arbitrary positive deﬁnite covariance matrix. As was observed by Malioutov et al. (2006), the GaBP algorithm fails to converge if the computation trees produced by the algorithm are not positive deﬁnite. In this work, we will show that the failure modes of the GaBP algorithm can be understood via graph covers, and we prove that a parameterized generalization of the min-sum algorithm can be used to ensure that the computation trees remain positive deﬁnite whenever the input matrix is positive deﬁnite. We demonstrate that the resulting algorithm is closely related to other iterative schemes for quadratic minimization such as the Gauss-Seidel and Jacobi algorithms. Finally, we observe, empirically, that there always exists a choice of parameters such that the above generalization of the GaBP algorithm converges. Keywords: belief propagation, Gaussian graphical models, graph covers

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Keywords: belief propagation, Gaussian graphical models, graph covers 1. [sent-12, score-0.417]

2 Moallemi and Van Roy (2009, 2010) showed that scaled diagonal dominance was a sufﬁcient condition for convergence and also characterized the rate of convergence via a computation tree analysis. [sent-25, score-0.482]

3 Such behavior is guaranteed if the original matrix is scaled diagonally dominant, but arbitrary positive deﬁnite matrices can produce computation trees that are not positive deﬁnite (Malioutov et al. [sent-31, score-0.565]

4 The key insight of diagonal loading is that scaled diagonal dominance can be achieved by sufﬁciently weighting the diagonal elements of the inverse covariance matrix. [sent-38, score-0.382]

5 The performance is determined, in part, by the amount of diagonal loading needed to make the matrix scaled diagonally dominant. [sent-41, score-0.366]

6 In this work, we investigate the behavior of reweighted message-passing algorithms for the quadratic minimization problem. [sent-53, score-0.404]

7 The motivation for this study comes from the observation that belief propagation style algorithms typically do not explore all nodes in the factor graph with the same frequency (Frey et al. [sent-54, score-0.379]

8 We will show that there exists a choice of parameters for the reweighted algorithms that guarantees monotone convergence of the variance estimates on all positive deﬁnite models, even those for which the GaBP algorithm fails to converge. [sent-58, score-0.365]

9 In addition, we show that our graph cover analysis extends to other iterative algorithms for the quadratic minimization problem and that similar ideas can be used to reason about the min-sum algorithm for general convex minimization. [sent-60, score-0.48]

10 In Section 2 we review the min-sum algorithm, its reweighted generalizations, and the quadratic minimization problem. [sent-62, score-0.404]

11 Figure 1: The factor graph corresponding to f (x1 , x2 , x3 ) = φ1 + φ2 + φ3 + ψ12 + ψ23 + ψ13 and the graph G. [sent-79, score-0.39]

12 , xn , a node for each of the ψi j , and for all (i, j) ∈ E, an edge joining the node corresponding to xi to the node corresponding to ψi j . [sent-85, score-0.414]

13 When the factor graph is a tree, these updates are guaranteed to converge, but understanding when these updates converge to the correct solution for an arbitrary graph is a central question underlying the study of the min-sum algorithm. [sent-92, score-0.589]

14 We will think of the messages as a vector of functions indexed by the edge over which the message is passed. [sent-97, score-0.381]

15 Any vector of real-valued messages is a valid choice for the vector of initial messages m0 , and the choice of initial messages can greatly affect the behavior of the algorithm. [sent-98, score-0.699]

16 Deﬁnition 1 A vector, τ = ({τi }, {τi j }), of beliefs is locally decodable to x∗ if τi (xi∗ ) < τi (xi ) for all i, xi = xi∗ . [sent-106, score-0.45]

17 If the algorithm converges to a vector of beliefs that are locally decodable to x∗ , then we hope that the vector x∗ is a global minimum of the objective function. [sent-108, score-0.441]

18 Intuitively, the computation tree is an unrolled version of the original graph that captures the evolution of the messages passed by the min-sum algorithm needed to compute the belief at time t at a particular node of the factor graph. [sent-114, score-0.85]

19 Computation trees describe the evolution of the beliefs over time, which, in some cases, can help us prove correctness and/or convergence of the message-passing updates. [sent-115, score-0.385]

20 The depth t computation tree rooted at node i contains all of the length t non-backtracking walks in the factor graph starting at node i. [sent-117, score-0.654]

21 For any node v in the factor graph, the computation tree at time t rooted at i, denoted by Ti (t), is deﬁned recursively as follows. [sent-119, score-0.379]

22 Each node of Ti (t) is a copy of a node in G, and the potentials on the nodes in Ti (t), which operate on a subset of the variables in Ti (t), are copies of the potentials of the corresponding nodes in G. [sent-122, score-0.45]

23 The construction of a computation tree for the graph in Figure 1 is pictured in Figure 2. [sent-123, score-0.343]

24 These round one messages depend only on the initial messages, so the tree terminates at this point. [sent-128, score-0.344]

25 Lemma 2 The belief at node i produced by the min-sum algorithm at time t corresponds to the exact min-marginal at the root of Ti (t) whose boundary messages are given by the initial messages. [sent-130, score-0.445]

26 2291 RUOZZI AND TATIKONDA x1 ψ12 ψ13 x2 x3 ψ23 ψ23 ′ x3 ′ x2 Figure 2: The computation tree at time t = 2 rooted at the variable node x1 of the graph in Figure 1. [sent-131, score-0.502]

27 The beliefs for the reweighted algorithm are deﬁned analogously to those for the standard minsum algorithm. [sent-150, score-0.439]

28 update the the messages as follows: mti→ j (x j ) :=κ + min xi ψi j (xi , x j ) + φi (xi ) + (ci j − 1)mt−1 (xi ) + ∑ cki mt−1 (xi ) . [sent-155, score-0.497]

29 j→i k→i ci j k∈∂i\ j The vector of messages at any ﬁxed point of the message update equations has two important properties. [sent-156, score-0.568]

30 First, the beliefs corresponding to these messages provide an alternative factorization of the objective function f . [sent-157, score-0.438]

31 Lemma 3 For any vector of messages mt with corresponding beliefs τt , f (x1 , . [sent-159, score-0.441]

32 Lemma 4 If τ is a set of beliefs corresponding to a ﬁxed point of the message updates in Algorithm 1, then min τi j (xi , x j ) = κ + τi (xi ) xj for all (i, j) ∈ G and all xi . [sent-163, score-0.416]

33 Again, the computation tree captures the messages that would need to be passed in order to compute τti (xi ). [sent-170, score-0.458]

34 As a result, the potential at a node u in the computation tree corresponds to some potential in the original graph multiplied by a constant that depends on all of the nodes above u in the computation tree. [sent-172, score-0.586]

35 As such, we now form the time t + 1 computation tree from the time t computation tree by taking any leaf u, which is a copy of node v in the factor graph, of the time t computation tree, creating a new node for every w ∈ ∂v, and connecting u to these new nodes. [sent-176, score-0.703]

36 As a result, the new computation tree rooted at node u of depth t contains at least all of the non-backtracking walks of length t in the factor graph starting from u and, at most, all walks of length t in the factor graph starting at u. [sent-177, score-0.769]

37 For example, suppose the computation tree was rooted at variable node i and that τi depends on the message from j to i. [sent-181, score-0.449]

38 One can check that setting ci j = 1 for all (i, j) ∈ E reduces the above computation tree to that of Figure 2. [sent-184, score-0.35]

39 every potential along this branch of the computation tree is multiplied by ci j . [sent-185, score-0.387]

40 To make this concrete, we can associate a weight to every edge of the computation tree that corresponds to the constant that multiplies the message passed across that edge. [sent-186, score-0.373]

41 The computation tree produced by Algorithm 1 at time t = 2 for the factor graph in Figure 1 is pictured in Figure 3. [sent-189, score-0.43]

42 3 Quadratic Minimization We now address the quadratic minimization problem in the context of the reweighted min-sum algorithm. [sent-194, score-0.404]

43 We note that we will abusively write min in the reweighted update equations even though the appropriate notion of minimization for the real numbers is inf. [sent-202, score-0.36]

44 We can explicitly compute the minimization required by the reweighted min-sum algorithm at each time step: the synchronous message update mti→ j (x j ) can be parameterized as a quadratic 1 function of the form 2 ati→ j x2 + bti→ j x j . [sent-203, score-0.742]

45 If we deﬁne j Ati\ j Γii + ∑ cki · at−1 k→i − at−1 j→i ∑ cki · bt−1 k→i − bt−1 , j→i k∈∂i and Bti\ j hi − k∈∂i then the updates at time t are given by Γi j 2 ci j ati→ j := , Ati\ j Γ Bti\ j ciijj t bi→ j := t . [sent-204, score-0.537]

46 i→ i→ Suppose that the beliefs generated from a ﬁxed point of Algorithm 1 are locally decodable to x∗ . [sent-208, score-0.378]

47 If τ is locally decodable to x∗ , then for each i ∈ V , τi (xi ) must be a positive deﬁnite quadratic function that is minimized at xi∗ . [sent-220, score-0.346]

48 As a consequence of Theorem 5, even if Γ not positive deﬁnite, if some ﬁxed point of the reweighted algorithm is locally decodable to a vector x∗ then, x∗ solves the system Γx = h. [sent-233, score-0.529]

49 For any matrix Γ with strictly positive diagonal, Weiss and Freeman (2001b) demonstrated that GaBP converges when Γ is diagonally dominant (scaled diagonally dominant with all scale factors equal to one), Malioutov et al. [sent-239, score-0.57]

50 We would like to understand how to choose the parameters of the reweighted algorithm in order to extend the convergence results for GaBP to all positive deﬁnite matrices. [sent-241, score-0.365]

51 Graph Covers In this section, we will explore graph covers and their relationship to iterative message-passing algorithms for the quadratic minimization problem. [sent-243, score-0.495]

52 Deﬁnition 8 A graph H covers a graph G if there exists a graph homomorphism π : H → G such that π is an isomorphism on neighborhoods (i. [sent-249, score-0.691]

53 However, when a graph cover is disconnected, all of the connected components of the cover must themselves be covers of the original graph. [sent-258, score-0.51]

54 For every base graph G, there exists a graph, possibly inﬁnite, which covers all ﬁnite, connected covers of the base graph. [sent-261, score-0.457]

55 To any ﬁnite cover, H, of a factor graph G we can associate a collection of potentials derived from the base graph; the potential at node i ∈ H is equal to the potential at node π(i) ∈ G. [sent-263, score-0.486]

56 Speciﬁcally, the ﬁxed points of the reweighted algorithm on the base graph are also ﬁxed 2298 M ESSAGE -PASSING A LGORITHMS FOR Q UADRATIC M INIMIZATION  1 . [sent-313, score-0.437]

57 points of the reweighted algorithm on any graph cover. [sent-332, score-0.437]

58 As such, the reweighted algorithm may not converge to the correct minimizing assignment when the matrix corresponding to some cover of G is not positive deﬁnite. [sent-333, score-0.603]

59 More importantly, we can use Theorem 11 to conclude that MPLP, tree-reweighted max-product, and other message-passing algorithms that guarantee the correctness of locally decodable beliefs cannot converge to the correct solution when Γ is positive deﬁnite but not walk-summable. [sent-351, score-0.585]

60 From the discussion in Section 3, every collection of locally decodable beliefs on the base graph can be lifted to locally decodable beliefs on any graph cover. [sent-352, score-1.098]

61 As an example, if ci j ≤ maxi∈V |∂i| for all (i, j) ∈ E, then one can show that the reweighted algorithm cannot converge to locally decodable beliefs unless all of the graph covers are convex. [sent-358, score-1.195]

62 Given this observation, in order to produce convergent message-passing schemes for the quadratic minimization problem, we will need to study choices of the parameters that do not guarantee correctness over all graph covers. [sent-360, score-0.409]

63 Convergence Properties of Reweighted Message-Passing Algorithms Recall that the GaBP algorithm can converge to the correct minimizer of the objective function even if the original matrix is not scaled diagonally dominant. [sent-362, score-0.398]

64 The most signiﬁcant problem when the original matrix is positive deﬁnite but not scaled diagonally dominant is that the computation trees may eventually possess negative eigenvalues due to the existence of some 2-cover with at least one non-positive eigenvalue. [sent-363, score-0.568]

65 1 Convergence of the Variances First, we will provide conditions on the choice of the parameter vector such that all of the computation trees produced by the reweighted algorithm remain positive deﬁnite throughout the course of the algorithm. [sent-368, score-0.539]

66 We will consider two different choices for the parameter vector: one in which ci j ≥ 1 for all i and j and one in which ci j < 0 for all i and j. [sent-374, score-0.356]

67 Such a choice is unusual among reweighted algorithms, but it does guarantee that the computation trees will remain positive deﬁnite throughout the algorithm. [sent-376, score-0.5]

68 Lemma 13 If ci j ≥ 1 for all i and j and all of the computation trees are positive deﬁnite, then the sequence a0 j , a1 j , . [sent-398, score-0.412]

69 Because all of the computation trees are positive deﬁnite, we must have that for each i, Γii + ∑k∈∂i\ j cki at−1 + k→i c ji at−1 > 0. [sent-404, score-0.431]

70 Our strategy will be to ensure that all of the computation trees are positive deﬁnite by leveraging the choice of parameters, ci j . [sent-414, score-0.412]

71 Speciﬁcally, we want to use these parameters to weight the diagonal elements of the computation tree much more than the off-diagonal elements in order to force the computation trees to be positive deﬁnite. [sent-415, score-0.467]

72 If we can show that there is a choice of each ci j = c ji that will cause all of the computation trees to be positive deﬁnite, then Algorithm 1 should behave almost as if the original matrix were scaled diagonally dominant. [sent-416, score-0.729]

73 Theorem 14 For any symmetric matrix Γ with strictly positive diagonal, ∃r ≥ 1 and an ε > 0 such that the eigenvalues of the computation trees are bounded from below by ε when generated by Algorithm 1 with ci j = r for all i and j. [sent-418, score-0.489]

74 The proof of this theorem exploits the Gerˇgorin disc theorem in order to show that there exists s a choice of r such that each computation tree is scaled diagonally dominant. [sent-419, score-0.404]

75 2 N EGATIVE PARAMETERS For the case in which ci j < 0 for all i and j, we also have that the computation trees are always positive deﬁnite when the initial messages are uniformly equal to zero as characterized by the following lemmas. [sent-423, score-0.645]

76 Otherwise, we have ati→ j = Γi j 2 ci j t−1 Γii + ∑k∈∂i\ j cki ak→i + (c ji − 1)at−1 j→i − ≤ 0, where the inequality follows from the induction hypothesis. [sent-428, score-0.375]

77 Lemma 16 For any symmetric matrix Γ with strictly positive diagonal, if ci j < 0 for all i and j, then all of the computation trees are positive deﬁnite. [sent-429, score-0.547]

78 Proof The computation trees are all positive deﬁnite if and only if Γii + ∑k∈∂i cki atk→i > 0 for all t. [sent-430, score-0.387]

79 As when ci j ≥ 1 for all (i, j) ∈ E, the eigenvalues on each computation tree are again bounded way from zero, but the ati→ j no longer form a monotonic decreasing sequence when ci j < 0 for all (i, j) ∈ E. [sent-432, score-0.528]

80 If all of the computation trees remain positive deﬁnite in the limit, then the beliefs will all be positive deﬁnite upon convergence. [sent-433, score-0.465]

81 If the estimates for the means converge as well, then the converged beliefs must be locally decodable to the correct minimizing assignment. [sent-434, score-0.471]

82 We can view the synchronous algorithm described in Algorithm 1 as a speciﬁc message-update schedule of messages on the Kronecker double cover where we perform the update in Algorithm 2 for every variable in the same partition on alternating iterations (see Algorithm 3). [sent-451, score-0.63]

83 ci j k∈∂i\ j end for 1 2 1 2 3 4 4 3 1 2 3 4 (a) A pairwise factor graph G (b) Bipartite 2-cover of G Figure 7: The Kronecker double cover (b) of a pairwise factor graph (a). [sent-454, score-0.748]

84 ci j k∈∂i\ j By construction, the message vector produced by Algorithm 3 is simply a concatenation of two consecutive time steps of the synchronous algorithm. [sent-458, score-0.516]

85 m2t−2 G Therefore, the messages passed by Algorithm 1 are identical to those passed by a speciﬁc ordering of the updates in Algorithm 2 on the Kronecker double cover. [sent-460, score-0.426]

86 If the Kronecker double cover is such a “bad” cover, then we might expect that the synchronous reweighted algorithm may not converge to the correct solution. [sent-462, score-0.672]

87 In the same way that Algorithm 1 is a synchronous version of Algorithm 2, the Jacobi algorithm is a synchronous version of the Gauss-Seidel algorithm. [sent-486, score-0.362]

88 = (ci j − 1) ci j Mi j,ki = cki Mi j, ji Here, A∗ is constructed from the vector of converged variances, a∗ . [sent-512, score-0.375]

89 When the matrix is not scaled diagonally dominant, the min-sum algorithm converges more slowly or does not converge at all. [sent-536, score-0.363]

90 Figure 8 illustrates the behavior of the min-sum algorithm, Algorithm 2 with ci j = 2 for all i = j, and the synchronous algorithm with ci j = 2 for all i = j for different choices of the constant p. [sent-551, score-0.537]

91 If we set ci j = 3 for all i = j, then empirically, both the synchronous algorithm and Algorithm 2 converge for all p ∈ (−. [sent-556, score-0.418]

92 Conclusions and Future Research In this work, we explored the properties of reweighted message-passing algorithms for the quadratic minimization problem. [sent-572, score-0.404]

93 To this end, we employed graph covers to prove that standard approaches to convergence and correctness that exploit duality and coordinate ascent/descent such as MPLP (Globerson and Jaakkola, 2007), tree-reweighted max-product (Wainwright et al. [sent-574, score-0.411]

94 The damped min-sum algorithm with damping factor δ = 1/2 empirically seems to converge if Γ is positive deﬁnite and all of the computation trees remain positive deﬁnite (Malioutov et al. [sent-585, score-0.449]

95 In practice, the damped synchronous algorithm with δ = 1/2 and Algorithm 2 appear to converge for all sufﬁciently large choices of the parameter vector as long as Γ is positive deﬁnite. [sent-588, score-0.348]

96 2 General Convex Minimization The reweighted algorithm can, in theory, be applied to minimize general convex functions, but in practice, computing and storing the message vector for these more general models may be inefﬁcient. [sent-595, score-0.414]

97 Recall that the existence of graph covers that are not bounded from below can be problematic for the reweighted message-passing algorithm. [sent-598, score-0.58]

98 For quadratic functions, this cannot occur if the matrix is scaled diagonally dominant or, equivalently, if the objective function corresponding to every ﬁnite graph cover is positive deﬁnite. [sent-599, score-0.776]

99 This equivalence suggests a generalization of scaled diagonal dominance for arbitrary convex functions based on the convexity of their graph covers. [sent-600, score-0.429]

100 Scaled diagonal dominance of a symmetric matrix with a positive diagonal implies that the matrix is symmetric positive deﬁnite. [sent-618, score-0.486]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('gabp', 0.353), ('reweighted', 0.266), ('messages', 0.233), ('ruozzi', 0.224), ('ati', 0.188), ('tatikonda', 0.181), ('synchronous', 0.181), ('ci', 0.178), ('beliefs', 0.173), ('graph', 0.171), ('decodable', 0.165), ('diagonally', 0.159), ('cki', 0.153), ('covers', 0.143), ('jacobi', 0.141), ('essage', 0.131), ('uadratic', 0.131), ('inimization', 0.131), ('message', 0.118), ('trees', 0.115), ('tree', 0.111), ('ti', 0.108), ('node', 0.104), ('lgorithms', 0.102), ('cover', 0.098), ('dominance', 0.094), ('malioutov', 0.091), ('quadratic', 0.083), ('mti', 0.082), ('vg', 0.082), ('scaled', 0.073), ('xi', 0.072), ('moallemi', 0.071), ('belief', 0.069), ('tv', 0.067), ('wainwright', 0.061), ('dominant', 0.061), ('computation', 0.061), ('diagonal', 0.061), ('dxi', 0.06), ('converge', 0.059), ('potentials', 0.059), ('lifth', 0.059), ('projg', 0.059), ('positive', 0.058), ('ht', 0.057), ('correctness', 0.056), ('minimization', 0.055), ('rooted', 0.055), ('passed', 0.053), ('updates', 0.053), ('jaakkola', 0.051), ('propagation', 0.05), ('damped', 0.05), ('factor', 0.048), ('assignment', 0.047), ('bti', 0.047), ('mtj', 0.047), ('trmp', 0.047), ('kronecker', 0.047), ('lift', 0.045), ('schedule', 0.045), ('ji', 0.044), ('convergent', 0.044), ('iterative', 0.043), ('copy', 0.042), ('ip', 0.041), ('convergence', 0.041), ('matrix', 0.041), ('nodes', 0.041), ('xh', 0.04), ('locally', 0.04), ('variances', 0.04), ('weiss', 0.04), ('update', 0.039), ('produced', 0.039), ('multiplied', 0.037), ('nite', 0.037), ('reweighting', 0.036), ('symmetric', 0.036), ('critical', 0.036), ('mt', 0.035), ('homomorphism', 0.035), ('mplp', 0.035), ('sekhar', 0.035), ('vh', 0.035), ('ii', 0.035), ('graphical', 0.034), ('correct', 0.034), ('double', 0.034), ('roy', 0.033), ('loading', 0.032), ('objective', 0.032), ('globerson', 0.031), ('converges', 0.031), ('edge', 0.03), ('messagepassing', 0.03), ('nicholas', 0.03), ('eigenvectors', 0.03), ('convex', 0.03)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0 71 jmlr-2013-Message-Passing Algorithms for Quadratic Minimization

Author: Nicholas Ruozzi, Sekhar Tatikonda

2 0.13566007 17 jmlr-2013-Belief Propagation for Continuous State Spaces: Stochastic Message-Passing with Quantitative Guarantees

Author: Nima Noorshams, Martin J. Wainwright

Abstract: The sum-product or belief propagation (BP) algorithm is a widely used message-passing technique for computing approximate marginals in graphical models. We introduce a new technique, called stochastic orthogonal series message-passing (SOSMP), for computing the BP ﬁxed point in models with continuous random variables. It is based on a deterministic approximation of the messages via orthogonal series basis expansion, and a stochastic estimation of the basis coefﬁcients via Monte Carlo techniques and damped updates. We prove that the SOSMP iterates converge to a δ-neighborhood of the unique BP ﬁxed point for any tree-structured graph, and for any graphs with cycles in which the BP updates satisfy a contractivity condition. In addition, we demonstrate how to choose the number of basis coefﬁcients as a function of the desired approximation accuracy δ and smoothness of the compatibility functions. We illustrate our theory with both simulated examples and in application to optical ﬂow estimation. Keywords: graphical models, sum-product for continuous state spaces, low-complexity belief propagation, stochastic approximation, Monte Carlo methods, orthogonal basis expansion

3 0.11719876 120 jmlr-2013-Variational Algorithms for Marginal MAP

Author: Qiang Liu, Alexander Ihler

Abstract: The marginal maximum a posteriori probability (MAP) estimation problem, which calculates the mode of the marginal posterior distribution of a subset of variables with the remaining variables marginalized, is an important inference problem in many models, such as those with hidden variables or uncertain parameters. Unfortunately, marginal MAP can be NP-hard even on trees, and has attracted less attention in the literature compared to the joint MAP (maximization) and marginalization problems. We derive a general dual representation for marginal MAP that naturally integrates the marginalization and maximization operations into a joint variational optimization problem, making it possible to easily extend most or all variational-based algorithms to marginal MAP. In particular, we derive a set of “mixed-product” message passing algorithms for marginal MAP, whose form is a hybrid of max-product, sum-product and a novel “argmax-product” message updates. We also derive a class of convergent algorithms based on proximal point methods, including one that transforms the marginal MAP problem into a sequence of standard marginalization problems. Theoretically, we provide guarantees under which our algorithms give globally or locally optimal solutions, and provide novel upper bounds on the optimal objectives. Empirically, we demonstrate that our algorithms signiﬁcantly outperform the existing approaches, including a state-of-the-art algorithm based on local search methods. Keywords: graphical models, message passing, belief propagation, variational methods, maximum a posteriori, marginal-MAP, hidden variable models

4 0.091043726 92 jmlr-2013-Random Spanning Trees and the Prediction of Weighted Graphs

Author: Nicolò Cesa-Bianchi, Claudio Gentile, Fabio Vitale, Giovanni Zappella

Abstract: We investigate the problem of sequentially predicting the binary labels on the nodes of an arbitrary weighted graph. We show that, under a suitable parametrization of the problem, the optimal number of prediction mistakes can be characterized (up to logarithmic factors) by the cutsize of a random spanning tree of the graph. The cutsize is induced by the unknown adversarial labeling of the graph nodes. In deriving our characterization, we obtain a simple randomized algorithm achieving in expectation the optimal mistake bound on any polynomially connected weighted graph. Our algorithm draws a random spanning tree of the original graph and then predicts the nodes of this tree in constant expected amortized time and linear space. Experiments on real-world data sets show that our method compares well to both global (Perceptron) and local (label propagation) methods, while being generally faster in practice. Keywords: online learning, learning on graphs, graph prediction, random spanning trees

5 0.078295283 99 jmlr-2013-Semi-Supervised Learning Using Greedy Max-Cut

Author: Jun Wang, Tony Jebara, Shih-Fu Chang

Abstract: Graph-based semi-supervised learning (SSL) methods play an increasingly important role in practical machine learning systems, particularly in agnostic settings when no parametric information or other prior knowledge is available about the data distribution. Given the constructed graph represented by a weight matrix, transductive inference is used to propagate known labels to predict the values of all unlabeled vertices. Designing a robust label diffusion algorithm for such graphs is a widely studied problem and various methods have recently been suggested. Many of these can be formalized as regularized function estimation through the minimization of a quadratic cost. However, most existing label diffusion methods minimize a univariate cost with the classiﬁcation function as the only variable of interest. Since the observed labels seed the diffusion process, such univariate frameworks are extremely sensitive to the initial label choice and any label noise. To alleviate the dependency on the initial observed labels, this article proposes a bivariate formulation for graph-based SSL, where both the binary label information and a continuous classiﬁcation function are arguments of the optimization. This bivariate formulation is shown to be equivalent to a linearly constrained Max-Cut problem. Finally an efﬁcient solution via greedy gradient Max-Cut (GGMC) is derived which gradually assigns unlabeled vertices to each class with minimum connectivity. Once convergence guarantees are established, this greedy Max-Cut based SSL is applied on both artiﬁcial and standard benchmark data sets where it obtains superior classiﬁcation accuracy compared to existing state-of-the-art SSL methods. Moreover, GGMC shows robustness with respect to the graph construction method and maintains high accuracy over extensive experiments with various edge linking and weighting schemes. Keywords: graph transduction, semi-supervised learning, bivariate formulation, mixed integer programming, greedy

6 0.071795836 93 jmlr-2013-Random Walk Kernels and Learning Curves for Gaussian Process Regression on Random Graphs

7 0.063811623 111 jmlr-2013-Supervised Feature Selection in Graphs with Path Coding Penalties and Network Flows

8 0.057287991 91 jmlr-2013-Query Induction with Schema-Guided Pruning Strategies

9 0.056872658 47 jmlr-2013-Gaussian Kullback-Leibler Approximate Inference

10 0.056112073 64 jmlr-2013-Lovasz theta function, SVMs and Finding Dense Subgraphs

11 0.049717121 102 jmlr-2013-Sparse Matrix Inversion with Scaled Lasso

12 0.04900923 101 jmlr-2013-Sparse Activity and Sparse Connectivity in Supervised Learning

13 0.04873706 72 jmlr-2013-Multi-Stage Multi-Task Feature Learning

14 0.045094665 90 jmlr-2013-Quasi-Newton Method: A New Direction

15 0.044874325 84 jmlr-2013-PC Algorithm for Nonparanormal Graphical Models

16 0.040908236 8 jmlr-2013-A Theory of Multiclass Boosting

17 0.039285634 44 jmlr-2013-Finding Optimal Bayesian Networks Using Precedence Constraints

18 0.038796913 103 jmlr-2013-Sparse Robust Estimation and Kalman Smoothing with Nonsmooth Log-Concave Densities: Modeling, Computation, and Theory

19 0.036779575 13 jmlr-2013-Approximating the Permanent with Fractional Belief Propagation

20 0.036463432 116 jmlr-2013-Truncated Power Method for Sparse Eigenvalue Problems

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.218), (1, 0.009), (2, 0.024), (3, 0.047), (4, 0.143), (5, 0.228), (6, 0.076), (7, 0.007), (8, -0.085), (9, 0.033), (10, -0.024), (11, 0.149), (12, 0.069), (13, 0.16), (14, 0.105), (15, 0.001), (16, -0.112), (17, -0.01), (18, 0.097), (19, 0.027), (20, 0.006), (21, 0.054), (22, 0.065), (23, -0.168), (24, 0.02), (25, -0.169), (26, -0.076), (27, 0.083), (28, -0.111), (29, -0.038), (30, 0.056), (31, -0.068), (32, 0.073), (33, -0.006), (34, 0.006), (35, -0.191), (36, -0.008), (37, 0.019), (38, 0.069), (39, -0.036), (40, -0.035), (41, -0.092), (42, 0.027), (43, 0.023), (44, -0.049), (45, 0.136), (46, 0.131), (47, 0.068), (48, -0.029), (49, 0.099)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.93830967 71 jmlr-2013-Message-Passing Algorithms for Quadratic Minimization

Author: Nicholas Ruozzi, Sekhar Tatikonda

2 0.6823265 92 jmlr-2013-Random Spanning Trees and the Prediction of Weighted Graphs

Author: Nicolò Cesa-Bianchi, Claudio Gentile, Fabio Vitale, Giovanni Zappella

3 0.59737182 17 jmlr-2013-Belief Propagation for Continuous State Spaces: Stochastic Message-Passing with Quantitative Guarantees

Author: Nima Noorshams, Martin J. Wainwright

4 0.50154305 120 jmlr-2013-Variational Algorithms for Marginal MAP

Author: Qiang Liu, Alexander Ihler

5 0.44624475 99 jmlr-2013-Semi-Supervised Learning Using Greedy Max-Cut

Author: Jun Wang, Tony Jebara, Shih-Fu Chang

6 0.41402239 91 jmlr-2013-Query Induction with Schema-Guided Pruning Strategies

7 0.41131765 93 jmlr-2013-Random Walk Kernels and Learning Curves for Gaussian Process Regression on Random Graphs

8 0.3551268 64 jmlr-2013-Lovasz theta function, SVMs and Finding Dense Subgraphs

9 0.32671446 72 jmlr-2013-Multi-Stage Multi-Task Feature Learning

10 0.32528391 25 jmlr-2013-Communication-Efficient Algorithms for Statistical Optimization

11 0.31576508 101 jmlr-2013-Sparse Activity and Sparse Connectivity in Supervised Learning

12 0.3046076 103 jmlr-2013-Sparse Robust Estimation and Kalman Smoothing with Nonsmooth Log-Concave Densities: Modeling, Computation, and Theory

13 0.29137757 90 jmlr-2013-Quasi-Newton Method: A New Direction

14 0.28180653 13 jmlr-2013-Approximating the Permanent with Fractional Belief Propagation

15 0.26337656 111 jmlr-2013-Supervised Feature Selection in Graphs with Path Coding Penalties and Network Flows

16 0.26225168 44 jmlr-2013-Finding Optimal Bayesian Networks Using Precedence Constraints

17 0.23978606 116 jmlr-2013-Truncated Power Method for Sparse Eigenvalue Problems

18 0.23888059 109 jmlr-2013-Stress Functions for Nonlinear Dimension Reduction, Proximity Analysis, and Graph Drawing

19 0.23429178 100 jmlr-2013-Similarity-based Clustering by Left-Stochastic Matrix Factorization

20 0.2319707 47 jmlr-2013-Gaussian Kullback-Leibler Approximate Inference

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.018), (5, 0.099), (6, 0.616), (10, 0.054), (14, 0.01), (23, 0.014), (68, 0.023), (70, 0.015), (75, 0.029), (85, 0.017), (87, 0.017)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.945494 71 jmlr-2013-Message-Passing Algorithms for Quadratic Minimization

Author: Nicholas Ruozzi, Sekhar Tatikonda

2 0.90273255 98 jmlr-2013-Segregating Event Streams and Noise with a Markov Renewal Process Model

Author: Dan Stowell, Mark D. Plumbley

Abstract: We describe an inference task in which a set of timestamped event observations must be clustered into an unknown number of temporal sequences with independent and varying rates of observations. Various existing approaches to multi-object tracking assume a ﬁxed number of sources and/or a ﬁxed observation rate; we develop an approach to inferring structure in timestamped data produced by a mixture of an unknown and varying number of similar Markov renewal processes, plus independent clutter noise. The inference simultaneously distinguishes signal from noise as well as clustering signal observations into separate source streams. We illustrate the technique via synthetic experiments as well as an experiment to track a mixture of singing birds. Source code is available. Keywords: multi-target tracking, clustering, point processes, ﬂow network, sound

3 0.86630154 106 jmlr-2013-Stationary-Sparse Causality Network Learning

Author: Yuejia He, Yiyuan She, Dapeng Wu

Abstract: Recently, researchers have proposed penalized maximum likelihood to identify network topology underlying a dynamical system modeled by multivariate time series. The time series of interest are assumed to be stationary, but this restriction is never taken into consideration by existing estimation methods. Moreover, practical problems of interest may have ultra-high dimensionality and obvious node collinearity. In addition, none of the available algorithms provides a probabilistic measure of the uncertainty for the obtained network topology which is informative in reliable network identiﬁcation. The main purpose of this paper is to tackle these challenging issues. We propose the S2 learning framework, which stands for stationary-sparse network learning. We propose a novel algorithm referred to as the Berhu iterative sparsity pursuit with stationarity (BISPS), where the Berhu regularization can improve the Lasso in detection and estimation. The algorithm is extremely easy to implement, efﬁcient in computation and has a theoretical guarantee to converge to a global optimum. We also incorporate a screening technique into BISPS to tackle ultra-high dimensional problems and enhance computational efﬁciency. Furthermore, a stationary bootstrap technique is applied to provide connection occurring frequency for reliable topology learning. Experiments show that our method can achieve stationary and sparse causality network learning and is scalable for high-dimensional problems. Keywords: stationarity, sparsity, Berhu, screening, bootstrap

4 0.52268767 82 jmlr-2013-Optimally Fuzzy Temporal Memory

Author: Karthik H. Shankar, Marc W. Howard

Abstract: Any learner with the ability to predict the future of a structured time-varying signal must maintain a memory of the recent past. If the signal has a characteristic timescale relevant to future prediction, the memory can be a simple shift register—a moving window extending into the past, requiring storage resources that linearly grows with the timescale to be represented. However, an independent general purpose learner cannot a priori know the characteristic prediction-relevant timescale of the signal. Moreover, many naturally occurring signals show scale-free long range correlations implying that the natural prediction-relevant timescale is essentially unbounded. Hence the learner should maintain information from the longest possible timescale allowed by resource availability. Here we construct a fuzzy memory system that optimally sacriﬁces the temporal accuracy of information in a scale-free fashion in order to represent prediction-relevant information from exponentially long timescales. Using several illustrative examples, we demonstrate the advantage of the fuzzy memory system over a shift register in time series forecasting of natural signals. When the available storage resources are limited, we suggest that a general purpose learner would be better off committing to such a fuzzy memory system. Keywords: temporal information compression, forecasting long range correlated time series

5 0.51243353 120 jmlr-2013-Variational Algorithms for Marginal MAP

Author: Qiang Liu, Alexander Ihler

6 0.47211394 92 jmlr-2013-Random Spanning Trees and the Prediction of Weighted Graphs

7 0.45108324 17 jmlr-2013-Belief Propagation for Continuous State Spaces: Stochastic Message-Passing with Quantitative Guarantees

8 0.43479732 43 jmlr-2013-Fast MCMC Sampling for Markov Jump Processes and Extensions

9 0.43289244 118 jmlr-2013-Using Symmetry and Evolutionary Search to Minimize Sorting Networks

10 0.42411843 74 jmlr-2013-Multivariate Convex Regression with Adaptive Partitioning

11 0.4122591 99 jmlr-2013-Semi-Supervised Learning Using Greedy Max-Cut

12 0.40119174 91 jmlr-2013-Query Induction with Schema-Guided Pruning Strategies

13 0.39853314 22 jmlr-2013-Classifying With Confidence From Incomplete Information

14 0.39407998 110 jmlr-2013-Sub-Local Constraint-Based Learning of Bayesian Networks Using A Joint Dependence Criterion

15 0.38317144 10 jmlr-2013-Algorithms and Hardness Results for Parallel Large Margin Learning

16 0.38187775 2 jmlr-2013-A Binary-Classification-Based Metric between Time-Series Distributions and Its Use in Statistical and Learning Problems

17 0.37167579 72 jmlr-2013-Multi-Stage Multi-Task Feature Learning

18 0.36883149 93 jmlr-2013-Random Walk Kernels and Learning Curves for Gaussian Process Regression on Random Graphs

19 0.36691701 25 jmlr-2013-Communication-Efficient Algorithms for Statistical Optimization

20 0.36334294 52 jmlr-2013-How to Solve Classification and Regression Problems on High-Dimensional Data with a Supervised Extension of Slow Feature Analysis