jmlr jmlr2011 jmlr2011-34 knowledge-graph by maker-knowledge-mining

34 jmlr-2011-Faster Algorithms for Max-Product Message-Passing

Source: pdf

Author: Julian J. McAuley, TibĂŠrio S. Caetano

Abstract: Maximum A Posteriori inference in graphical models is often solved via message-passing algorithms, such as the junction-tree algorithm or loopy belief-propagation. The exact solution to this problem is well-known to be exponential in the size of the maximal cliques of the triangulated model, while approximate inference is typically exponential in the size of the modelĂ˘€™s factors. In this paper, we take advantage of the fact that many models have maximal cliques that are larger than their constituent factors, and also of the fact that many factors consist only of latent variables (i.e., they do not depend on an observation). This is a common case in a wide variety of applications that deal with grid-, tree-, and ring-structured models. In such cases, we are able to decrease the exponent of complexity for message-passing by 0.5 for both exact and approximate inference. We demonstrate that message-passing operations in such models are equivalent to some variant of matrix multiplication in the tropical semiring, for which we offer an O(N 2.5 ) expected-case solution. Keywords: graphical models, belief-propagation, tropical matrix multiplication

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 The exact solution to this problem is well-known to be exponential in the size of the maximal cliques of the triangulated model, while approximate inference is typically exponential in the size of the modelĂ˘€™s factors. [sent-10, score-0.508]

2 In this paper, we take advantage of the fact that many models have maximal cliques that are larger than their constituent factors, and also of the fact that many factors consist only of latent variables (i. [sent-11, score-0.559]

3 Keywords: graphical models, belief-propagation, tropical matrix multiplication 1. [sent-19, score-0.22]

4 Ĺš ciency is determined by the size of the maximal cliques after triangulation, a quantity related to the tree-width of the graph. [sent-24, score-0.4]

5 Despite the fact that the new models have larger maximal cliques, the corresponding potentials are still factored over pairs of nodes only. [sent-37, score-0.356]

6 (a) (b) (c) (d) Figure 2: Some graphical models to which our results apply: factors conditioned upon observations have fewer latent variables than purely latent factors. [sent-39, score-0.318]

7 In other words, factors containing a gray node encode the data likelihood, whereas factors containing only white nodes encode priors. [sent-41, score-0.301]

8 Hence approximate solutions in the original graph (such as loopy belief-propagation, or inference in a loopy factor-graph) are often preferred over an exact solution via the junction-tree algorithm. [sent-44, score-0.289]

9 Even when the modelĂ˘€™s factors are the same size as its maximal cliques, neither exact nor approximate inference algorithms take advantage of the fact that many factors consist only of latent variables. [sent-45, score-0.351]

10 In many models, those factors that are conditioned upon the observation contain fewer latent variables than the purely latent factors. [sent-46, score-0.28]

11 In this paper, we exploit the fact that the maximal cliques (after triangulation) often have potentials that factor over subcliques, as illustrated in Figure 1. [sent-50, score-0.663]

12 We will show that whenever this is the case, the expected computational complexity of message-passing between such cliques can be improved (both the asymptotic upper-bound and the actual runtime). [sent-51, score-0.358]

13 A core operation encountered in the junction-tree algorithm is that of computing the innerproduct of two vectors va and vb . [sent-56, score-0.588]

14 In the max-product semiring (used for MAP inference), the Ă˘€˜inner-productĂ˘€™ becomes max {va [i] Ä‚— vb [i]} . [sent-57, score-0.335]

15 N} Our results stem from theĂ˘ˆš realization that while (Equation 1) appears to be a linear time operation, it can be decreased to O( N) (in the expected case) if we know the permutations that sort va and vb (i. [sent-61, score-0.688]

16 Ă˘€Ë˜ We are able to lower the asymptotic expected running time of max-product message-passing for any discrete graphical model whose cliques factorize into lower-order terms. [sent-70, score-0.551]

17 Ă˘€Ë˜ Our algorithm also applies whenever factors that are conditioned upon an observation contain fewer latent variables than those factors that are not conditioned upon an observation, as in Figure 2 (in which case certain computations can be taken ofÄ? [sent-72, score-0.325]

18 For example, in models with third-order cliques containing pairwise terms, message-passing is reduced Ă˘ˆš from ĂŽ˜(N 3 ) to O(N 2 N), as in Figure 1(d). [sent-75, score-0.421]

19 1 1 Ă˘€Ë˜ For cliques composed of K-ary factors, the expected speed-up generalizes to at least Ă˘„Ĺš( K N K ), though it is never asymptotically slower than the original solution. [sent-77, score-0.358]

20 We shall initially present our algorithm in terms of pairwise graphical models such as those shown in Figure 2. [sent-87, score-0.22]

21 Finally we shall explore other applications besides message-passing that make use of tropical matrix multiplication as a subroutine, such all-pairs shortest-path problems. [sent-90, score-0.301]

22 (2009) study the case where different cliques share the same potential function. [sent-94, score-0.358]

23 In Kumar and Torr (2006), the authors provide faster algorithms for the case in which the potentials are truncated, whereas in Petersen et al. [sent-96, score-0.308]

24 KjÄ‚Ĺšrulff (1998) also exploits factorization within cliques of junction-trees, albeit a different type of factorization than that studied here. [sent-105, score-0.358]

25 ĂŽĹšC (xC ), x CĂ˘ˆˆC where C is the set of maximal cliques in G . [sent-118, score-0.4]

26 In each case, the triangulated model has third-order cliques, but the potentials are only pairwise. [sent-124, score-0.327]

27 ĂŽĹšQ (xQ |y) FĂ˘Š†C , QĂ˘Š†C data-independent data-dependent We shall say that those factors that are not conditioned on the observation are Ă˘€˜data-independentĂ˘€™. [sent-132, score-0.225]

28 Our results shall apply to message-passing equations in those cliques C where for each dataindependent factor F we have F Ă˘Š‚ C, or for each data-dependent factor Q we have Q Ă˘Š‚ C, that is, when all F or all Q in C are proper subsets of C. [sent-133, score-0.477]

29 The message from a clique X to an intersecting clique Y (both sets of latent variables) is deÄ? [sent-137, score-0.249]

30 mZĂ˘†’X (xXĂ˘ˆĹ Z ) (4) ZĂ˘ˆˆĂŽ“(X)\Y (where ĂŽ“(X) is the set of neighbors of the clique X, that is, the set of cliques that intersect with X). [sent-139, score-0.456]

31 After all messages have been passed, the MAP-state for a set of latent variables M (assumed to be a subset of a single clique X) is computed using mM (xM ) = max ĂŽĹšX (xX ) xX\M Ă˘ˆ? [sent-145, score-0.204]

32 (5) ZĂ˘ˆˆĂŽ“(X) For cliques that are factorizable (according to our previous deÄ? [sent-147, score-0.522]

33 Although the (triangulated) models have cliques of size three, their potentials factorize into pairwise terms. [sent-152, score-0.729]

34 Ĺš rst, in Section 3, we shall consider cliques X whose messages take the form mM (xM ) = max ĂŽĹšX (xX ) xX\M Ă˘ˆ? [sent-158, score-0.53]

35 QĂ˘Š‚X We say that such cliques are conditionally factorizable (since all conditional terms factorize); examples are shown in Figure 2. [sent-160, score-0.551]

36 Next, in Section 4, we consider cliques whose messages take the form mM (xM ) = max Ă˘ˆ? [sent-161, score-0.411]

37 xX\M FĂ˘Š‚X We say that such cliques are latently factorizable (since terms containing only latent variables factorize); examples are shown in Figure 1. [sent-163, score-0.648]

38 Ĺš cient version of Algorithm 1, we begin by considering the simplest nontrivial conditionally factorizable model: a pairwise model in which each latent variable depends upon the observation, that is, Ă‹† x(y) = argmax Ă˘ˆ? [sent-166, score-0.339]

39 Ĺš nitions, we say that the node potentials are Ă˘€˜data-dependentĂ˘€™, whereas the edge potentials are Ă˘€˜data-independentĂ˘€™. [sent-171, score-0.564]

40 In many models, such as grids and rings, (Equation 7) shall be solved approximately by means of either loopy belief-propagation, or inference in a factor-graph, which consists of solving (Equation 8) according to protocols other than the optimal junction-tree protocol. [sent-184, score-0.268]

41 Ĺš ciently if we know the order statistics of va and vb , that is, if we know the permutations that sort ĂŽĹš j and every row of ĂŽĹši, j in (Equation 8). [sent-188, score-0.688]

42 N} {va [i] Ä‚— vb [i]} must have va [p] Ă˘‰Ä˝ va [q] or vb [p] Ă˘‰Ä˝ vb [q]. [sent-196, score-1.429]

43 Therefore, having computed va [q] Ä‚— vb [q], we can Ä? [sent-197, score-0.588]

44 Ĺš nd Ă˘€˜pĂ˘€™ by computing only those products va [i] Ä‚— vb [i] where either va [i] > va [q] or vb [i] > vb [q]. [sent-198, score-1.764]

45 Here we iterate through the indices starting from the largest values of va and vb , stopping once both indices are Ă˘€˜behindĂ˘€™ the maximum value found so far (which we then know is the maximum). [sent-200, score-0.588]

46 Note that Lemma 1 only depends upon the relative values of elements in va and vb , meaning that the number of computations that must be performed is purely a function of their order statistics (i. [sent-202, score-0.699]

47 , it does not depend on the actual values of va or vb ). [sent-204, score-0.588]

48 At this stage we shall state an upper-bound on the true complexity in the following theorem: Ă˘ˆš Theorem 2 The expected running time of Algorithm 2 is O( N), yielding a speed-up of at least Ă˘ˆš Ă˘„Ĺš( N) in cliques containing pairwise factors. [sent-207, score-0.65]

49 This expectation is derived under the assumption that va and vb have independent order statistics. [sent-208, score-0.588]

50 The arrows begin at pa [start] and pb [start]; the red dashed line connects enda and endb , behind which we need not search; a dashed arrow is used when a new maximum is found. [sent-290, score-0.206]

51 Note that in the event that va and vb contain repeated elements, they can be sorted arbitrarily. [sent-291, score-0.645]

52 1: compute the permutation function pa by sorting ĂŽÂ¨ j {takes ĂŽ˜(N log N)} 2: for q Ă˘ˆˆ {1 . [sent-295, score-0.254]

53 Latently Factorizable Models Just as we considered the simplest conditionally factorizable model in Section 3, we now consider the simplest nontrivial latently factorizable model: a clique of size three containing pairwise factors. [sent-304, score-0.591]

54 xk For a particular value of (xi , x j ) = (a, b), we must solve mi, j (a, b) = ĂŽĹši, j (a, b) Ä‚— max ĂŽĹši,k (a, xk ) Ä‚— ĂŽĹš j,k (b, xk ), xk va (11) vb which again is in precisely the form shown in (Equation 1). [sent-306, score-0.824]

55 Extensions So far we have only considered the case of pairwise graphical models, though as mentioned our results can in principle be applied to any conditionally or latently factorizable models, no matter the size of the factors. [sent-323, score-0.367]

56 Ĺš rst treat latently factorizable models, after which the same ideas can be applied to conditionally factorizable models. [sent-326, score-0.43]

57 be adapted to solve mi, j (xi , x j ) = max ĂŽĹši, j (xi , x j ) Ä‚— ĂŽĹši,k,m (xi , xk , xm ) Ä‚— ĂŽĹš j,k,m (x j , xk , xm ), xk ,xm (12) and similar variants containing three factors. [sent-345, score-0.281]

58 Ĺš xed value of xi , we are now sorting an array rather than a vector (Algorithm 4, lines 2 and 3); in this case, the permutation functions pa and pb in Algorithm 2 simply return pairs of indices. [sent-348, score-0.4]

59 For example, consider the clique in Figure 6(a), which we shall call G (the entire graph is a clique, but for clarity we only draw an edge when the corresponding nodes belong to a common factor). [sent-356, score-0.303]

60 Each of the factors in this graph have been labeled using either differently colored edges (for factors of size larger than two) or dotted edges (for factors of size two), and the max-marginal we wish to compute has been labeled using colored nodes. [sent-357, score-0.458]

61 In order to simplify the analysis of this algorithm, we shall express the running time in terms of the size of the largest group, S = max(|X|, |Y |, |Z|), and the largest difference, S\ = max(|Y \ X|, |Z \ X|). [sent-363, score-0.229]

62 The running times shown in Algorithm 5 are loose upper-bounds, given for the sake of expressing the running time in simple terms. [sent-365, score-0.22]

63 Again, we shall discuss the running time of this extension in Appendix A. [sent-439, score-0.229]

64 For the moment, we state the following theorem: KĂ˘ˆ’1 Theorem 3 Algorithm 6 generalizes Algorithm 2 to K lists with an expected running time of O(KN K ), 1 1 yielding a speed-up of at least Ă˘„Ĺš( K N K ) in cliques containing K-ary factors. [sent-440, score-0.551]

65 2, we can extend Algorithm 3 to factors of any size, so long as the purely latent cliques contain more latent variables than those cliques that depend upon the observation. [sent-485, score-0.996]

66 As we have mentioned, our results apply to any model whose cliques decompose into lower-order terms. [sent-497, score-0.358]

67 In each case, third-order cliques factorize Ă˘ˆš into second-order terms; hence we can apply Algorithm 4 to achieve a speed-up of Ă˘„Ĺš( N). [sent-507, score-0.403]

68 Maximal cliques grow with the 1367 M C AULEY AND C AETANO Reference McAuley et al. [sent-522, score-0.358]

69 K is the number of lists in (Equation 14); we can observe this number of lists only if we are working in cliques of size K + 1, and then only if the factors are of size K (e. [sent-556, score-0.63]

70 The performance reported is simply the number of elements read from the lists (which is at most K Ä‚— start). [sent-567, score-0.225]

71 Figure 11 shows how the order statistics of va and vb can affect the performance of our algorithm. [sent-578, score-0.588]

72 , for Algorithm 2), where each (va [i], vb [i]) is an independent sample from a 2-dimensional Gaussian with covariance matrix ĂŽĹ = 1 c c 1 1369 , M C AULEY AND C AETANO Ă˘†? [sent-583, score-0.253]

73 Each permutation matrix transforms the sorted values of one list into the sorted values of the other, that is, it transforms va as sorted by pa into vb as sorted by pb . [sent-585, score-1.083]

74 3 Message-Passing in Latently Factorizable Models In this section we present experiments in models whose cliques factorize into smaller terms, as discussed in Section 4. [sent-594, score-0.403]

75 (2008), which performs 2-dimensional graph-matching, using a loopy graph with cliques of size three, containing only second-order potentials (as described in Section 6); the ĂŽ˜(NM 3 ) performance of McAuley et al. [sent-601, score-0.761]

76 For instance 1371 M C AULEY AND C AETANO Random potentials (5 iterations) 450 naĂ‚Â¨ve method Ă„Ä… 400 naĂ‚Â¨ve method Ă„Ä… 0. [sent-643, score-0.263]

77 Ĺš ning the node potentials ĂŽĹši (xi |yi ), the edge potentials ĂŽĹši, j (xi , x j ), and the topology (N , E ) of the graph. [sent-691, score-0.564]

78 Furthermore we assume that the edge potentials are homogeneous, that is, that the potential for each edge is the same, or rather that they have the same order statistics (for example, they may differ by a multiplicative constant). [sent-692, score-0.263]

79 When subject to heterogeneous potentials we need merely sort them ofÄ? [sent-694, score-0.297]

80 1374 FASTER A LGORITHMS FOR M AX -P RODUCT M ESSAGE -PASSING Random potentials (2500 node chain) 4. [sent-749, score-0.301]

81 00 75 90 105 120 135 150 Korean 10 0 100 200 300 N (number of states) 400 500 0 0 500 1000 1500 N (alphabet size) 2000 Figure 17: Running time of inference in chain-structured models: random potentials (left), and text denoising (right). [sent-773, score-0.38]

82 Figure 18 (left) shows the performance of our method on a grid with random potentials (similar to the experiment in Section 7. [sent-778, score-0.263]

83 The pairwise potentials simply encode the Euclidean distance between two Ä? [sent-790, score-0.326]

84 Ĺš‚ow) are studied in Felzenszwalb and Huttenlocher (2006), where the highly structured nature of the potentials in question often allows for efÄ? [sent-793, score-0.263]

85 1375 M C AULEY AND C AETANO Random potentials (50 Ä‚— 50 grid, 5 iterations) 90 naĂ‚Â¨ve method Ă„Ä… 80 naĂ‚Â¨ve method Ă„Ä… 0. [sent-806, score-0.263]

86 76) 60 40 20 10 0 0 100 200 300 N (number of states) 400 500 0 0 100 200 300 N (number of states) 400 500 Figure 18: Running time of inference in grid-structured models: random potentials (left), and optical Ä? [sent-817, score-0.347]

87 This behavior is observed for certain types of concave potentials (or convex potentials in a min-sum formulation). [sent-820, score-0.526]

88 In these applications, the permutation matrices that transform the sorted values of va to the sorted values of vb are block-off-diagonal (see the sixth permutation in Figure 11). [sent-823, score-0.898]

89 We similarly perform an experiment on image denoising, where the unary potentials are again convex functions of the input (see Geman and Geman, 1984; Lan et al. [sent-828, score-0.263]

90 Instead of using a pairwise potential that merely encodes smoothness, we extract the pairwise statistics from image data (similar to our experiment on text denoising); thus the potentials are no longer concave. [sent-830, score-0.419]

91 63) our method 80 0 100 200 300 N (number of states) 400 500 0 0 100 200 300 N (number of states) 400 500 Figure 19: Two experiments whose potentials and messages have highly dependent order statistics: stereo disparity (left), and image denoising (right). [sent-844, score-0.449]

92 Namely, we terminate the algorithm if va [pa [start]] Ä‚— vb [pb [start]] < max. [sent-878, score-0.588]

93 Ĺš cant variation in the variable size (note that N only shows the average variable size), but it may also suggest that there is a complicated structure in the potentials which violates our assumption of independent order statistics. [sent-888, score-0.263]

94 Given the strong relationship between the two problems, it remains a promising open problem to assess whether the analysis from these solutions to all-pairs shortest-path can be applied to produce max-product matrix multiplication algorithms with similar asymptotic running times. [sent-943, score-0.241]

95 3 LĂ˘ˆž D ISTANCES The problem of computing an inner product in the max-sum semiring is closely related to computing the LĂ˘ˆž distance between two vectors ||va Ă˘ˆ’ vb ||Ă˘ˆž = max va [i] Ă˘ˆ’ vb [i] . [sent-955, score-0.923]

96 Ĺš rst considering the largest values of va and Ă˘ˆ’vb , before re-running the algorithm starting from the smallest values. [sent-961, score-0.335]

97 A similar trick can be applied to compute message in the max-product semiring even for potentials that contain negative values, though this may require up to four executions of Algorithm 2, so it is unlikely to be practical. [sent-969, score-0.345]

98 In factors with a high dynamic range, or when different factors have different scales, it may be possible to identify the maximum value very quickly, as we attempted to do in Section 7. [sent-1015, score-0.212]

99 Asymptotic Performance of Algorithm 2 and Extensions In this section we shall determine the expected-case running times of Algorithm 2 and Algorithm 6. [sent-1028, score-0.229]

100 Algorithm 2 traverses va and vb until it reaches the smallest value of m for which there is some j Ă˘‰Â¤ m for which m Ă˘‰Ä˝ pĂ˘ˆ’1 [pa [ j]]. [sent-1029, score-0.588]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('cliques', 0.358), ('va', 0.335), ('potentials', 0.263), ('vb', 0.253), ('aetano', 0.183), ('auley', 0.183), ('roduct', 0.183), ('essage', 0.174), ('factorizable', 0.164), ('read', 0.142), ('mcauley', 0.137), ('na', 0.134), ('multiplication', 0.131), ('shall', 0.119), ('running', 0.11), ('mn', 0.107), ('factors', 0.106), ('loopy', 0.105), ('pb', 0.102), ('permutation', 0.098), ('clique', 0.098), ('karger', 0.091), ('xx', 0.091), ('sorting', 0.089), ('ax', 0.085), ('dom', 0.084), ('lists', 0.083), ('semiring', 0.082), ('wall', 0.08), ('sontag', 0.077), ('deformable', 0.073), ('latently', 0.073), ('lgorithms', 0.073), ('pa', 0.067), ('ve', 0.066), ('permutations', 0.066), ('subcubic', 0.064), ('triangulated', 0.064), ('pairwise', 0.063), ('stereo', 0.062), ('xk', 0.059), ('sorted', 0.057), ('felzenszwalb', 0.056), ('tib', 0.055), ('julian', 0.054), ('messages', 0.053), ('latent', 0.053), ('xm', 0.052), ('ow', 0.051), ('tropical', 0.051), ('nodes', 0.051), ('xe', 0.048), ('loop', 0.047), ('aho', 0.047), ('rio', 0.047), ('coughlan', 0.046), ('sigal', 0.046), ('xy', 0.045), ('faster', 0.045), ('factorize', 0.045), ('array', 0.044), ('tted', 0.044), ('inference', 0.044), ('meaning', 0.043), ('denoising', 0.043), ('caetano', 0.042), ('maximal', 0.042), ('mm', 0.04), ('optical', 0.04), ('xz', 0.039), ('xf', 0.039), ('ine', 0.039), ('colored', 0.038), ('start', 0.038), ('purely', 0.038), ('node', 0.038), ('graphical', 0.038), ('aji', 0.037), ('endb', 0.037), ('ferreira', 0.037), ('galley', 0.037), ('equation', 0.036), ('pk', 0.035), ('vk', 0.035), ('marginalization', 0.035), ('graph', 0.035), ('seconds', 0.035), ('huttenlocher', 0.035), ('takes', 0.035), ('groups', 0.035), ('sort', 0.034), ('alon', 0.031), ('fitted', 0.031), ('nicta', 0.031), ('text', 0.03), ('upon', 0.03), ('wish', 0.029), ('conditionally', 0.029), ('xc', 0.029), ('disparity', 0.028)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999911 34 jmlr-2011-Faster Algorithms for Max-Product Message-Passing

Author: Julian J. McAuley, TibĂŠrio S. Caetano

2 0.14494702 41 jmlr-2011-Improved Moves for Truncated Convex Models

Author: M. Pawan Kumar, Olga Veksler, Philip H.S. Torr

Abstract: We consider the problem of obtaining an approximate maximum a posteriori estimate of a discrete random ﬁeld characterized by pairwise potentials that form a truncated convex model. For this problem, we propose two st-MINCUT based move making algorithms that we call Range Swap and Range Expansion. Our algorithms can be thought of as extensions of αβ-Swap and α-Expansion respectively that fully exploit the form of the pairwise potentials. Speciﬁcally, instead of dealing with one or two labels at each iteration, our methods explore a large search space by considering a range of labels (that is, an interval of consecutive labels). Furthermore, we show that Range Expansion provides the same multiplicative bounds as the standard linear programming (LP) relaxation in polynomial time. Compared to previous approaches based on the LP relaxation, for example interior-point algorithms or tree-reweighted message passing (TRW), our methods are faster as they use only the efﬁcient st-MINCUT algorithm in their design. We demonstrate the usefulness of the proposed approaches on both synthetic and standard real data problems. Keywords: truncated convex models, move making algorithms, range moves, multiplicative bounds, linear programming relaxation

3 0.072354361 7 jmlr-2011-Adaptive Exact Inference in Graphical Models

Author: Özgür Sümer, Umut A. Acar, Alexander T. Ihler, Ramgopal R. Mettu

Abstract: Many algorithms and applications involve repeatedly solving variations of the same inference problem, for example to introduce new evidence to the model or to change conditional dependencies. As the model is updated, the goal of adaptive inference is to take advantage of previously computed quantities to perform inference more rapidly than from scratch. In this paper, we present algorithms for adaptive exact inference on general graphs that can be used to efﬁciently compute marginals and update MAP conﬁgurations under arbitrary changes to the input factor graph and its associated elimination tree. After a linear time preprocessing step, our approach enables updates to the model and the computation of any marginal in time that is logarithmic in the size of the input model. Moreover, in contrast to max-product our approach can also be used to update MAP conﬁgurations in time that is roughly proportional to the number of updated entries, rather than the size of the input model. To evaluate the practical effectiveness of our algorithms, we implement and test them using synthetic data as well as for two real-world computational biology applications. Our experiments show that adaptive inference can achieve substantial speedups over performing complete inference as the model undergoes small changes over time. Keywords: exact inference, factor graphs, factor elimination, marginalization, dynamic programming, MAP computation, model updates, parallel tree contraction ¨ u u c 2011 Ozg¨ r S¨ mer, Umut A. Acar, Alexander T. Ihler and Ramgopal R. Mettu. ¨ S UMER , ACAR , I HLER AND M ETTU

4 0.069298014 61 jmlr-2011-Logistic Stick-Breaking Process

Author: Lu Ren, Lan Du, Lawrence Carin, David Dunson

Abstract: A logistic stick-breaking process (LSBP) is proposed for non-parametric clustering of general spatially- or temporally-dependent data, imposing the belief that proximate data are more likely to be clustered together. The sticks in the LSBP are realized via multiple logistic regression functions, with shrinkage priors employed to favor contiguous and spatially localized segments. The LSBP is also extended for the simultaneous processing of multiple data sets, yielding a hierarchical logistic stick-breaking process (H-LSBP). The model parameters (atoms) within the H-LSBP are shared across the multiple learning tasks. Efﬁcient variational Bayesian inference is derived, and comparisons are made to related techniques in the literature. Experimental analysis is performed for audio waveforms and images, and it is demonstrated that for segmentation applications the LSBP yields generally homogeneous segments with sharp boundaries. Keywords: Bayesian, nonparametric, dependent, hierarchical models, segmentation

5 0.055166766 103 jmlr-2011-Weisfeiler-Lehman Graph Kernels

Author: Nino Shervashidze, Pascal Schweitzer, Erik Jan van Leeuwen, Kurt Mehlhorn, Karsten M. Borgwardt

Abstract: In this article, we propose a family of efﬁcient kernels for large graphs with discrete node labels. Key to our method is a rapid feature extraction scheme based on the Weisfeiler-Lehman test of isomorphism on graphs. It maps the original graph to a sequence of graphs, whose node attributes capture topological and label information. A family of kernels can be deﬁned based on this Weisfeiler-Lehman sequence of graphs, including a highly efﬁcient kernel comparing subtree-like patterns. Its runtime scales only linearly in the number of edges of the graphs and the length of the Weisfeiler-Lehman graph sequence. In our experimental evaluation, our kernels outperform state-of-the-art graph kernels on several graph classiﬁcation benchmark data sets in terms of accuracy and runtime. Our kernels open the door to large-scale applications of graph kernels in various disciplines such as computational biology and social network analysis. Keywords: graph kernels, graph classiﬁcation, similarity measures for graphs, Weisfeiler-Lehman algorithm c 2011 Nino Shervashidze, Pascal Schweitzer, Erik Jan van Leeuwen, Kurt Mehlhorn and Karsten M. Borgwardt. S HERVASHIDZE , S CHWEITZER , VAN L EEUWEN , M EHLHORN AND B ORGWARDT

6 0.053476024 54 jmlr-2011-Learning Latent Tree Graphical Models

7 0.052506309 12 jmlr-2011-Bayesian Co-Training

8 0.051279098 21 jmlr-2011-Cumulative Distribution Networks and the Derivative-sum-product Algorithm: Models and Inference for Cumulative Distribution Functions on Graphs

9 0.051169764 94 jmlr-2011-Theoretical Analysis of Bayesian Matrix Factorization

10 0.045623854 85 jmlr-2011-Smoothness, Disagreement Coefficient, and the Label Complexity of Agnostic Active Learning

11 0.045461643 82 jmlr-2011-Robust Gaussian Process Regression with a Student-tLikelihood

12 0.044859927 86 jmlr-2011-Sparse Linear Identifiable Multivariate Modeling

13 0.041656815 20 jmlr-2011-Convex and Network Flow Optimization for Structured Sparsity

14 0.040901881 75 jmlr-2011-Parallel Algorithm for Learning Optimal Bayesian Network Structure

15 0.037747644 55 jmlr-2011-Learning Multi-modal Similarity

16 0.036855329 90 jmlr-2011-The Indian Buffet Process: An Introduction and Review

17 0.035050552 49 jmlr-2011-Kernel Regression in the Presence of Correlated Errors

18 0.034553315 79 jmlr-2011-Proximal Methods for Hierarchical Sparse Coding

19 0.03355867 35 jmlr-2011-Forest Density Estimation

20 0.032031573 11 jmlr-2011-Approximate Marginals in Latent Gaussian Models

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.184), (1, -0.076), (2, -0.037), (3, 0.014), (4, 0.014), (5, 0.015), (6, 0.003), (7, 0.056), (8, 0.097), (9, 0.069), (10, -0.177), (11, 0.15), (12, 0.268), (13, -0.079), (14, 0.245), (15, -0.142), (16, 0.062), (17, 0.113), (18, -0.226), (19, 0.063), (20, 0.068), (21, -0.091), (22, 0.027), (23, 0.132), (24, 0.088), (25, 0.044), (26, 0.152), (27, 0.05), (28, 0.02), (29, 0.054), (30, -0.093), (31, -0.001), (32, 0.111), (33, -0.016), (34, 0.064), (35, 0.069), (36, -0.006), (37, 0.068), (38, -0.044), (39, 0.011), (40, 0.003), (41, 0.03), (42, -0.084), (43, -0.038), (44, 0.085), (45, 0.004), (46, 0.043), (47, -0.046), (48, 0.076), (49, -0.05)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.93584394 34 jmlr-2011-Faster Algorithms for Max-Product Message-Passing

Author: Julian J. McAuley, TibĂŠrio S. Caetano

2 0.67801344 41 jmlr-2011-Improved Moves for Truncated Convex Models

Author: M. Pawan Kumar, Olga Veksler, Philip H.S. Torr

3 0.44732824 94 jmlr-2011-Theoretical Analysis of Bayesian Matrix Factorization

Author: Shinichi Nakajima, Masashi Sugiyama

Abstract: Recently, variational Bayesian (VB) techniques have been applied to probabilistic matrix factorization and shown to perform very well in experiments. In this paper, we theoretically elucidate properties of the VB matrix factorization (VBMF) method. Through ﬁnite-sample analysis of the VBMF estimator, we show that two types of shrinkage factors exist in the VBMF estimator: the positive-part James-Stein (PJS) shrinkage and the trace-norm shrinkage, both acting on each singular component separately for producing low-rank solutions. The trace-norm shrinkage is simply induced by non-ﬂat prior information, similarly to the maximum a posteriori (MAP) approach. Thus, no trace-norm shrinkage remains when priors are non-informative. On the other hand, we show a counter-intuitive fact that the PJS shrinkage factor is kept activated even with ﬂat priors. This is shown to be induced by the non-identiﬁability of the matrix factorization model, that is, the mapping between the target matrix and factorized matrices is not one-to-one. We call this model-induced regularization. We further extend our analysis to empirical Bayes scenarios where hyperparameters are also learned based on the VB free energy. Throughout the paper, we assume no missing entry in the observed matrix, and therefore collaborative ﬁltering is out of scope. Keywords: matrix factorization, variational Bayes, empirical Bayes, positive-part James-Stein shrinkage, non-identiﬁable model, model-induced regularization

4 0.40583947 61 jmlr-2011-Logistic Stick-Breaking Process

Author: Lu Ren, Lan Du, Lawrence Carin, David Dunson

5 0.39646801 7 jmlr-2011-Adaptive Exact Inference in Graphical Models

Author: Özgür Sümer, Umut A. Acar, Alexander T. Ihler, Ramgopal R. Mettu

6 0.29290763 103 jmlr-2011-Weisfeiler-Lehman Graph Kernels

7 0.28681135 12 jmlr-2011-Bayesian Co-Training

8 0.2637057 54 jmlr-2011-Learning Latent Tree Graphical Models

9 0.24639656 49 jmlr-2011-Kernel Regression in the Presence of Correlated Errors

10 0.24420524 86 jmlr-2011-Sparse Linear Identifiable Multivariate Modeling

11 0.24368177 75 jmlr-2011-Parallel Algorithm for Learning Optimal Bayesian Network Structure

12 0.22856048 21 jmlr-2011-Cumulative Distribution Networks and the Derivative-sum-product Algorithm: Models and Inference for Cumulative Distribution Functions on Graphs

13 0.21490106 20 jmlr-2011-Convex and Network Flow Optimization for Structured Sparsity

14 0.21126163 85 jmlr-2011-Smoothness, Disagreement Coefficient, and the Label Complexity of Agnostic Active Learning

15 0.20542464 90 jmlr-2011-The Indian Buffet Process: An Introduction and Review

16 0.20504248 71 jmlr-2011-On Equivalence Relationships Between Classification and Ranking Algorithms

17 0.20197177 104 jmlr-2011-X-Armed Bandits

18 0.20152451 76 jmlr-2011-Parameter Screening and Optimisation for ILP using Designed Experiments

19 0.18764409 84 jmlr-2011-Semi-Supervised Learning with Measure Propagation

20 0.1874246 88 jmlr-2011-Structured Variable Selection with Sparsity-Inducing Norms

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(4, 0.042), (9, 0.026), (10, 0.587), (24, 0.033), (31, 0.052), (32, 0.02), (41, 0.015), (60, 0.017), (71, 0.016), (73, 0.029), (78, 0.046), (90, 0.028)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.98038697 75 jmlr-2011-Parallel Algorithm for Learning Optimal Bayesian Network Structure

Author: Yoshinori Tamada, Seiya Imoto, Satoru Miyano

Abstract: We present a parallel algorithm for the score-based optimal structure search of Bayesian networks. This algorithm is based on a dynamic programming (DP) algorithm having O(n · 2n ) time and space complexity, which is known to be the fastest algorithm for the optimal structure search of networks with n nodes. The bottleneck of the problem is the memory requirement, and therefore, the algorithm is currently applicable for up to a few tens of nodes. While the recently proposed algorithm overcomes this limitation by a space-time trade-off, our proposed algorithm realizes direct parallelization of the original DP algorithm with O(nσ ) time and space overhead calculations, where σ > 0 controls the communication-space trade-off. The overall time and space complexity is O(nσ+1 2n ). This algorithm splits the search space so that the required communication between independent calculations is minimal. Because of this advantage, our algorithm can run on distributed memory supercomputers. Through computational experiments, we conﬁrmed that our algorithm can run in parallel using up to 256 processors with a parallelization efﬁciency of 0.74, compared to the original DP algorithm with a single processor. We also demonstrate optimal structure search for a 32-node network without any constraints, which is the largest network search presented in literature. Keywords: optimal Bayesian network structure, parallel algorithm

same-paper 2 0.94006383 34 jmlr-2011-Faster Algorithms for Max-Product Message-Passing

Author: Julian J. McAuley, TibĂŠrio S. Caetano

3 0.49811628 7 jmlr-2011-Adaptive Exact Inference in Graphical Models

Author: Özgür Sümer, Umut A. Acar, Alexander T. Ihler, Ramgopal R. Mettu

4 0.49511245 30 jmlr-2011-Efficient Structure Learning of Bayesian Networks using Constraints

Author: Cassio P. de Campos, Qiang Ji

Abstract: This paper addresses the problem of learning Bayesian network structures from data based on score functions that are decomposable. It describes properties that strongly reduce the time and memory costs of many known methods without losing global optimality guarantees. These properties are derived for different score criteria such as Minimum Description Length (or Bayesian Information Criterion), Akaike Information Criterion and Bayesian Dirichlet Criterion. Then a branch-andbound algorithm is presented that integrates structural constraints with data in a way to guarantee global optimality. As an example, structural constraints are used to map the problem of structure learning in Dynamic Bayesian networks into a corresponding augmented Bayesian network. Finally, we show empirically the beneﬁts of using the properties with state-of-the-art methods and with the new algorithm, which is able to handle larger data sets than before. Keywords: Bayesian networks, structure learning, properties of decomposable scores, structural constraints, branch-and-bound technique

5 0.40764236 16 jmlr-2011-Clustering Algorithms for Chains

Author: Antti Ukkonen

Abstract: We consider the problem of clustering a set of chains to k clusters. A chain is a totally ordered subset of a ﬁnite set of items. Chains are an intuitive way to express preferences over a set of alternatives, as well as a useful representation of ratings in situations where the item-speciﬁc scores are either difﬁcult to obtain, too noisy due to measurement error, or simply not as relevant as the order that they induce over the items. First we adapt the classical k-means for chains by proposing a suitable distance function and a centroid structure. We also present two different approaches for mapping chains to a vector space. The ﬁrst one is related to the planted partition model, while the second one has an intuitive geometrical interpretation. Finally we discuss a randomization test for assessing the signiﬁcance of a clustering. To this end we present an MCMC algorithm for sampling random sets of chains that share certain properties with the original data. The methods are studied in a series of experiments using real and artiﬁcial data. Results indicate that the methods produce interesting clusterings, and for certain types of inputs improve upon previous work on clustering algorithms for orders. Keywords: Lloyd’s algorithm, orders, preference statements, planted partition model, randomization testing

6 0.38851216 86 jmlr-2011-Sparse Linear Identifiable Multivariate Modeling

7 0.38661271 27 jmlr-2011-Domain Decomposition Approach for Fast Gaussian Process Regression of Large Spatial Data Sets

8 0.38028088 59 jmlr-2011-Learning with Structured Sparsity

9 0.37915057 61 jmlr-2011-Logistic Stick-Breaking Process

10 0.36725602 4 jmlr-2011-A Family of Simple Non-Parametric Kernel Learning Algorithms

11 0.36669222 25 jmlr-2011-Discriminative Learning of Bayesian Networks via Factorized Conditional Log-Likelihood

12 0.36266088 47 jmlr-2011-Inverse Reinforcement Learning in Partially Observable Environments

13 0.36254823 76 jmlr-2011-Parameter Screening and Optimisation for ILP using Designed Experiments

14 0.35746863 12 jmlr-2011-Bayesian Co-Training

15 0.35724047 13 jmlr-2011-Bayesian Generalized Kernel Mixed Models

16 0.35386023 48 jmlr-2011-Kernel Analysis of Deep Networks

17 0.35326281 19 jmlr-2011-Convergence of Distributed Asynchronous Learning Vector Quantization Algorithms

18 0.35031343 89 jmlr-2011-Super-Linear Convergence of Dual Augmented Lagrangian Algorithm for Sparsity Regularized Estimation

19 0.3474986 21 jmlr-2011-Cumulative Distribution Networks and the Derivative-sum-product Algorithm: Models and Inference for Cumulative Distribution Functions on Graphs

20 0.34387711 20 jmlr-2011-Convex and Network Flow Optimization for Structured Sparsity