jmlr jmlr2011 jmlr2011-34 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Julian J. McAuley, TibĂŠrio S. Caetano
Abstract: Maximum A Posteriori inference in graphical models is often solved via message-passing algorithms, such as the junction-tree algorithm or loopy belief-propagation. The exact solution to this problem is well-known to be exponential in the size of the maximal cliques of the triangulated model, while approximate inference is typically exponential in the size of the model’s factors. In this paper, we take advantage of the fact that many models have maximal cliques that are larger than their constituent factors, and also of the fact that many factors consist only of latent variables (i.e., they do not depend on an observation). This is a common case in a wide variety of applications that deal with grid-, tree-, and ring-structured models. In such cases, we are able to decrease the exponent of complexity for message-passing by 0.5 for both exact and approximate inference. We demonstrate that message-passing operations in such models are equivalent to some variant of matrix multiplication in the tropical semiring, for which we offer an O(N 2.5 ) expected-case solution. Keywords: graphical models, belief-propagation, tropical matrix multiplication
Reference: text
sentIndex sentText sentNum sentScore
1 The exact solution to this problem is well-known to be exponential in the size of the maximal cliques of the triangulated model, while approximate inference is typically exponential in the size of the model’s factors. [sent-10, score-0.508]
2 In this paper, we take advantage of the fact that many models have maximal cliques that are larger than their constituent factors, and also of the fact that many factors consist only of latent variables (i. [sent-11, score-0.559]
3 Keywords: graphical models, belief-propagation, tropical matrix multiplication 1. [sent-19, score-0.22]
4 Ĺš ciency is determined by the size of the maximal cliques after triangulation, a quantity related to the tree-width of the graph. [sent-24, score-0.4]
5 Despite the fact that the new models have larger maximal cliques, the corresponding potentials are still factored over pairs of nodes only. [sent-37, score-0.356]
6 (a) (b) (c) (d) Figure 2: Some graphical models to which our results apply: factors conditioned upon observations have fewer latent variables than purely latent factors. [sent-39, score-0.318]
7 In other words, factors containing a gray node encode the data likelihood, whereas factors containing only white nodes encode priors. [sent-41, score-0.301]
8 Hence approximate solutions in the original graph (such as loopy belief-propagation, or inference in a loopy factor-graph) are often preferred over an exact solution via the junction-tree algorithm. [sent-44, score-0.289]
9 Even when the model’s factors are the same size as its maximal cliques, neither exact nor approximate inference algorithms take advantage of the fact that many factors consist only of latent variables. [sent-45, score-0.351]
10 In many models, those factors that are conditioned upon the observation contain fewer latent variables than the purely latent factors. [sent-46, score-0.28]
11 In this paper, we exploit the fact that the maximal cliques (after triangulation) often have potentials that factor over subcliques, as illustrated in Figure 1. [sent-50, score-0.663]
12 We will show that whenever this is the case, the expected computational complexity of message-passing between such cliques can be improved (both the asymptotic upper-bound and the actual runtime). [sent-51, score-0.358]
13 A core operation encountered in the junction-tree algorithm is that of computing the innerproduct of two vectors va and vb . [sent-56, score-0.588]
14 In the max-product semiring (used for MAP inference), the ‘inner-product’ becomes max {va [i] Ä‚— vb [i]} . [sent-57, score-0.335]
15 N} Our results stem from the√ realization that while (Equation 1) appears to be a linear time operation, it can be decreased to O( N) (in the expected case) if we know the permutations that sort va and vb (i. [sent-61, score-0.688]
16 â€Ë˜ We are able to lower the asymptotic expected running time of max-product message-passing for any discrete graphical model whose cliques factorize into lower-order terms. [sent-70, score-0.551]
17 â€Ë˜ Our algorithm also applies whenever factors that are conditioned upon an observation contain fewer latent variables than those factors that are not conditioned upon an observation, as in Figure 2 (in which case certain computations can be taken ofÄ? [sent-72, score-0.325]
18 For example, in models with third-order cliques containing pairwise terms, message-passing is reduced √ from ĂŽ˜(N 3 ) to O(N 2 N), as in Figure 1(d). [sent-75, score-0.421]
19 1 1 â€Ë˜ For cliques composed of K-ary factors, the expected speed-up generalizes to at least â„Ĺš( K N K ), though it is never asymptotically slower than the original solution. [sent-77, score-0.358]
20 We shall initially present our algorithm in terms of pairwise graphical models such as those shown in Figure 2. [sent-87, score-0.22]
21 Finally we shall explore other applications besides message-passing that make use of tropical matrix multiplication as a subroutine, such all-pairs shortest-path problems. [sent-90, score-0.301]
22 (2009) study the case where different cliques share the same potential function. [sent-94, score-0.358]
23 In Kumar and Torr (2006), the authors provide faster algorithms for the case in which the potentials are truncated, whereas in Petersen et al. [sent-96, score-0.308]
24 KjĂŚrulff (1998) also exploits factorization within cliques of junction-trees, albeit a different type of factorization than that studied here. [sent-105, score-0.358]
25 ĂŽĹšC (xC ), x C∈C where C is the set of maximal cliques in G . [sent-118, score-0.4]
26 In each case, the triangulated model has third-order cliques, but the potentials are only pairwise. [sent-124, score-0.327]
27 ĂŽĹšQ (xQ |y) F⊆C , Q⊆C data-independent data-dependent We shall say that those factors that are not conditioned on the observation are ‘data-independent’. [sent-132, score-0.225]
28 Our results shall apply to message-passing equations in those cliques C where for each dataindependent factor F we have F ⊂ C, or for each data-dependent factor Q we have Q ⊂ C, that is, when all F or all Q in C are proper subsets of C. [sent-133, score-0.477]
29 The message from a clique X to an intersecting clique Y (both sets of latent variables) is deÄ? [sent-137, score-0.249]
30 mZ→X (xXâˆĹ Z ) (4) ZâˆˆĂŽ“(X)\Y (where ĂŽ“(X) is the set of neighbors of the clique X, that is, the set of cliques that intersect with X). [sent-139, score-0.456]
31 After all messages have been passed, the MAP-state for a set of latent variables M (assumed to be a subset of a single clique X) is computed using mM (xM ) = max ĂŽĹšX (xX ) xX\M âˆ? [sent-145, score-0.204]
32 (5) ZâˆˆĂŽ“(X) For cliques that are factorizable (according to our previous deÄ? [sent-147, score-0.522]
33 Although the (triangulated) models have cliques of size three, their potentials factorize into pairwise terms. [sent-152, score-0.729]
34 Ĺš rst, in Section 3, we shall consider cliques X whose messages take the form mM (xM ) = max ĂŽĹšX (xX ) xX\M âˆ? [sent-158, score-0.53]
35 Q⊂X We say that such cliques are conditionally factorizable (since all conditional terms factorize); examples are shown in Figure 2. [sent-160, score-0.551]
36 Next, in Section 4, we consider cliques whose messages take the form mM (xM ) = max � [sent-161, score-0.411]
37 xX\M F⊂X We say that such cliques are latently factorizable (since terms containing only latent variables factorize); examples are shown in Figure 1. [sent-163, score-0.648]
38 Ĺš cient version of Algorithm 1, we begin by considering the simplest nontrivial conditionally factorizable model: a pairwise model in which each latent variable depends upon the observation, that is, Ă‹† x(y) = argmax âˆ? [sent-166, score-0.339]
39 Ĺš nitions, we say that the node potentials are ‘data-dependent’, whereas the edge potentials are ‘data-independent’. [sent-171, score-0.564]
40 In many models, such as grids and rings, (Equation 7) shall be solved approximately by means of either loopy belief-propagation, or inference in a factor-graph, which consists of solving (Equation 8) according to protocols other than the optimal junction-tree protocol. [sent-184, score-0.268]
41 Ĺš ciently if we know the order statistics of va and vb , that is, if we know the permutations that sort ĂŽĹš j and every row of ĂŽĹši, j in (Equation 8). [sent-188, score-0.688]
42 N} {va [i] Ä‚— vb [i]} must have va [p] â‰Ä˝ va [q] or vb [p] â‰Ä˝ vb [q]. [sent-196, score-1.429]
43 Therefore, having computed va [q] Ä‚— vb [q], we can Ä? [sent-197, score-0.588]
44 Ĺš nd ‘p’ by computing only those products va [i] Ä‚— vb [i] where either va [i] > va [q] or vb [i] > vb [q]. [sent-198, score-1.764]
45 Here we iterate through the indices starting from the largest values of va and vb , stopping once both indices are ‘behind’ the maximum value found so far (which we then know is the maximum). [sent-200, score-0.588]
46 Note that Lemma 1 only depends upon the relative values of elements in va and vb , meaning that the number of computations that must be performed is purely a function of their order statistics (i. [sent-202, score-0.699]
47 , it does not depend on the actual values of va or vb ). [sent-204, score-0.588]
48 At this stage we shall state an upper-bound on the true complexity in the following theorem: √ Theorem 2 The expected running time of Algorithm 2 is O( N), yielding a speed-up of at least √ â„Ĺš( N) in cliques containing pairwise factors. [sent-207, score-0.65]
49 This expectation is derived under the assumption that va and vb have independent order statistics. [sent-208, score-0.588]
50 The arrows begin at pa [start] and pb [start]; the red dashed line connects enda and endb , behind which we need not search; a dashed arrow is used when a new maximum is found. [sent-290, score-0.206]
51 Note that in the event that va and vb contain repeated elements, they can be sorted arbitrarily. [sent-291, score-0.645]
52 1: compute the permutation function pa by sorting Ψ j {takes ĂŽ˜(N log N)} 2: for q ∈ {1 . [sent-295, score-0.254]
53 Latently Factorizable Models Just as we considered the simplest conditionally factorizable model in Section 3, we now consider the simplest nontrivial latently factorizable model: a clique of size three containing pairwise factors. [sent-304, score-0.591]
54 xk For a particular value of (xi , x j ) = (a, b), we must solve mi, j (a, b) = ĂŽĹši, j (a, b) Ä‚— max ĂŽĹši,k (a, xk ) Ä‚— ĂŽĹš j,k (b, xk ), xk va (11) vb which again is in precisely the form shown in (Equation 1). [sent-306, score-0.824]
55 Extensions So far we have only considered the case of pairwise graphical models, though as mentioned our results can in principle be applied to any conditionally or latently factorizable models, no matter the size of the factors. [sent-323, score-0.367]
56 Ĺš rst treat latently factorizable models, after which the same ideas can be applied to conditionally factorizable models. [sent-326, score-0.43]
57 be adapted to solve mi, j (xi , x j ) = max ĂŽĹši, j (xi , x j ) Ä‚— ĂŽĹši,k,m (xi , xk , xm ) Ä‚— ĂŽĹš j,k,m (x j , xk , xm ), xk ,xm (12) and similar variants containing three factors. [sent-345, score-0.281]
58 Ĺš xed value of xi , we are now sorting an array rather than a vector (Algorithm 4, lines 2 and 3); in this case, the permutation functions pa and pb in Algorithm 2 simply return pairs of indices. [sent-348, score-0.4]
59 For example, consider the clique in Figure 6(a), which we shall call G (the entire graph is a clique, but for clarity we only draw an edge when the corresponding nodes belong to a common factor). [sent-356, score-0.303]
60 Each of the factors in this graph have been labeled using either differently colored edges (for factors of size larger than two) or dotted edges (for factors of size two), and the max-marginal we wish to compute has been labeled using colored nodes. [sent-357, score-0.458]
61 In order to simplify the analysis of this algorithm, we shall express the running time in terms of the size of the largest group, S = max(|X|, |Y |, |Z|), and the largest difference, S\ = max(|Y \ X|, |Z \ X|). [sent-363, score-0.229]
62 The running times shown in Algorithm 5 are loose upper-bounds, given for the sake of expressing the running time in simple terms. [sent-365, score-0.22]
63 Again, we shall discuss the running time of this extension in Appendix A. [sent-439, score-0.229]
64 For the moment, we state the following theorem: K−1 Theorem 3 Algorithm 6 generalizes Algorithm 2 to K lists with an expected running time of O(KN K ), 1 1 yielding a speed-up of at least â„Ĺš( K N K ) in cliques containing K-ary factors. [sent-440, score-0.551]
65 2, we can extend Algorithm 3 to factors of any size, so long as the purely latent cliques contain more latent variables than those cliques that depend upon the observation. [sent-485, score-0.996]
66 As we have mentioned, our results apply to any model whose cliques decompose into lower-order terms. [sent-497, score-0.358]
67 In each case, third-order cliques factorize √ into second-order terms; hence we can apply Algorithm 4 to achieve a speed-up of â„Ĺš( N). [sent-507, score-0.403]
68 Maximal cliques grow with the 1367 M C AULEY AND C AETANO Reference McAuley et al. [sent-522, score-0.358]
69 K is the number of lists in (Equation 14); we can observe this number of lists only if we are working in cliques of size K + 1, and then only if the factors are of size K (e. [sent-556, score-0.63]
70 The performance reported is simply the number of elements read from the lists (which is at most K Ä‚— start). [sent-567, score-0.225]
71 Figure 11 shows how the order statistics of va and vb can affect the performance of our algorithm. [sent-578, score-0.588]
72 , for Algorithm 2), where each (va [i], vb [i]) is an independent sample from a 2-dimensional Gaussian with covariance matrix Ί= 1 c c 1 1369 , M C AULEY AND C AETANO � [sent-583, score-0.253]
73 Each permutation matrix transforms the sorted values of one list into the sorted values of the other, that is, it transforms va as sorted by pa into vb as sorted by pb . [sent-585, score-1.083]
74 3 Message-Passing in Latently Factorizable Models In this section we present experiments in models whose cliques factorize into smaller terms, as discussed in Section 4. [sent-594, score-0.403]
75 (2008), which performs 2-dimensional graph-matching, using a loopy graph with cliques of size three, containing only second-order potentials (as described in Section 6); the ĂŽ˜(NM 3 ) performance of McAuley et al. [sent-601, score-0.761]
76 For instance 1371 M C AULEY AND C AETANO Random potentials (5 iterations) 450 na¨ve method Ĺ 400 na¨ve method Ĺ 0. [sent-643, score-0.263]
77 Ĺš ning the node potentials ĂŽĹši (xi |yi ), the edge potentials ĂŽĹši, j (xi , x j ), and the topology (N , E ) of the graph. [sent-691, score-0.564]
78 Furthermore we assume that the edge potentials are homogeneous, that is, that the potential for each edge is the same, or rather that they have the same order statistics (for example, they may differ by a multiplicative constant). [sent-692, score-0.263]
79 When subject to heterogeneous potentials we need merely sort them ofÄ? [sent-694, score-0.297]
80 1374 FASTER A LGORITHMS FOR M AX -P RODUCT M ESSAGE -PASSING Random potentials (2500 node chain) 4. [sent-749, score-0.301]
81 00 75 90 105 120 135 150 Korean 10 0 100 200 300 N (number of states) 400 500 0 0 500 1000 1500 N (alphabet size) 2000 Figure 17: Running time of inference in chain-structured models: random potentials (left), and text denoising (right). [sent-773, score-0.38]
82 Figure 18 (left) shows the performance of our method on a grid with random potentials (similar to the experiment in Section 7. [sent-778, score-0.263]
83 The pairwise potentials simply encode the Euclidean distance between two Ä? [sent-790, score-0.326]
84 Ĺš‚ow) are studied in Felzenszwalb and Huttenlocher (2006), where the highly structured nature of the potentials in question often allows for efÄ? [sent-793, score-0.263]
85 1375 M C AULEY AND C AETANO Random potentials (50 Ä‚— 50 grid, 5 iterations) 90 na¨ve method Ă„Ä… 80 na¨ve method Ă„Ä… 0. [sent-806, score-0.263]
86 76) 60 40 20 10 0 0 100 200 300 N (number of states) 400 500 0 0 100 200 300 N (number of states) 400 500 Figure 18: Running time of inference in grid-structured models: random potentials (left), and optical Ä? [sent-817, score-0.347]
87 This behavior is observed for certain types of concave potentials (or convex potentials in a min-sum formulation). [sent-820, score-0.526]
88 In these applications, the permutation matrices that transform the sorted values of va to the sorted values of vb are block-off-diagonal (see the sixth permutation in Figure 11). [sent-823, score-0.898]
89 We similarly perform an experiment on image denoising, where the unary potentials are again convex functions of the input (see Geman and Geman, 1984; Lan et al. [sent-828, score-0.263]
90 Instead of using a pairwise potential that merely encodes smoothness, we extract the pairwise statistics from image data (similar to our experiment on text denoising); thus the potentials are no longer concave. [sent-830, score-0.419]
91 63) our method 80 0 100 200 300 N (number of states) 400 500 0 0 100 200 300 N (number of states) 400 500 Figure 19: Two experiments whose potentials and messages have highly dependent order statistics: stereo disparity (left), and image denoising (right). [sent-844, score-0.449]
92 Namely, we terminate the algorithm if va [pa [start]] Ä‚— vb [pb [start]] < max. [sent-878, score-0.588]
93 Ĺš cant variation in the variable size (note that N only shows the average variable size), but it may also suggest that there is a complicated structure in the potentials which violates our assumption of independent order statistics. [sent-888, score-0.263]
94 Given the strong relationship between the two problems, it remains a promising open problem to assess whether the analysis from these solutions to all-pairs shortest-path can be applied to produce max-product matrix multiplication algorithms with similar asymptotic running times. [sent-943, score-0.241]
95 3 L∞ D ISTANCES The problem of computing an inner product in the max-sum semiring is closely related to computing the L∞ distance between two vectors ||va − vb ||∞ = max va [i] − vb [i] . [sent-955, score-0.923]
96 Ĺš rst considering the largest values of va and −vb , before re-running the algorithm starting from the smallest values. [sent-961, score-0.335]
97 A similar trick can be applied to compute message in the max-product semiring even for potentials that contain negative values, though this may require up to four executions of Algorithm 2, so it is unlikely to be practical. [sent-969, score-0.345]
98 In factors with a high dynamic range, or when different factors have different scales, it may be possible to identify the maximum value very quickly, as we attempted to do in Section 7. [sent-1015, score-0.212]
99 Asymptotic Performance of Algorithm 2 and Extensions In this section we shall determine the expected-case running times of Algorithm 2 and Algorithm 6. [sent-1028, score-0.229]
100 Algorithm 2 traverses va and vb until it reaches the smallest value of m for which there is some j â‰Â¤ m for which m â‰Ä˝ p−1 [pa [ j]]. [sent-1029, score-0.588]
wordName wordTfidf (topN-words)
[('cliques', 0.358), ('va', 0.335), ('potentials', 0.263), ('vb', 0.253), ('aetano', 0.183), ('auley', 0.183), ('roduct', 0.183), ('essage', 0.174), ('factorizable', 0.164), ('read', 0.142), ('mcauley', 0.137), ('na', 0.134), ('multiplication', 0.131), ('shall', 0.119), ('running', 0.11), ('mn', 0.107), ('factors', 0.106), ('loopy', 0.105), ('pb', 0.102), ('permutation', 0.098), ('clique', 0.098), ('karger', 0.091), ('xx', 0.091), ('sorting', 0.089), ('ax', 0.085), ('dom', 0.084), ('lists', 0.083), ('semiring', 0.082), ('wall', 0.08), ('sontag', 0.077), ('deformable', 0.073), ('latently', 0.073), ('lgorithms', 0.073), ('pa', 0.067), ('ve', 0.066), ('permutations', 0.066), ('subcubic', 0.064), ('triangulated', 0.064), ('pairwise', 0.063), ('stereo', 0.062), ('xk', 0.059), ('sorted', 0.057), ('felzenszwalb', 0.056), ('tib', 0.055), ('julian', 0.054), ('messages', 0.053), ('latent', 0.053), ('xm', 0.052), ('ow', 0.051), ('tropical', 0.051), ('nodes', 0.051), ('xe', 0.048), ('loop', 0.047), ('aho', 0.047), ('rio', 0.047), ('coughlan', 0.046), ('sigal', 0.046), ('xy', 0.045), ('faster', 0.045), ('factorize', 0.045), ('array', 0.044), ('tted', 0.044), ('inference', 0.044), ('meaning', 0.043), ('denoising', 0.043), ('caetano', 0.042), ('maximal', 0.042), ('mm', 0.04), ('optical', 0.04), ('xz', 0.039), ('xf', 0.039), ('ine', 0.039), ('colored', 0.038), ('start', 0.038), ('purely', 0.038), ('node', 0.038), ('graphical', 0.038), ('aji', 0.037), ('endb', 0.037), ('ferreira', 0.037), ('galley', 0.037), ('equation', 0.036), ('pk', 0.035), ('vk', 0.035), ('marginalization', 0.035), ('graph', 0.035), ('seconds', 0.035), ('huttenlocher', 0.035), ('takes', 0.035), ('groups', 0.035), ('sort', 0.034), ('alon', 0.031), ('fitted', 0.031), ('nicta', 0.031), ('text', 0.03), ('upon', 0.03), ('wish', 0.029), ('conditionally', 0.029), ('xc', 0.029), ('disparity', 0.028)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999911 34 jmlr-2011-Faster Algorithms for Max-Product Message-Passing
Author: Julian J. McAuley, TibĂŠrio S. Caetano
Abstract: Maximum A Posteriori inference in graphical models is often solved via message-passing algorithms, such as the junction-tree algorithm or loopy belief-propagation. The exact solution to this problem is well-known to be exponential in the size of the maximal cliques of the triangulated model, while approximate inference is typically exponential in the size of the model’s factors. In this paper, we take advantage of the fact that many models have maximal cliques that are larger than their constituent factors, and also of the fact that many factors consist only of latent variables (i.e., they do not depend on an observation). This is a common case in a wide variety of applications that deal with grid-, tree-, and ring-structured models. In such cases, we are able to decrease the exponent of complexity for message-passing by 0.5 for both exact and approximate inference. We demonstrate that message-passing operations in such models are equivalent to some variant of matrix multiplication in the tropical semiring, for which we offer an O(N 2.5 ) expected-case solution. Keywords: graphical models, belief-propagation, tropical matrix multiplication
2 0.14494702 41 jmlr-2011-Improved Moves for Truncated Convex Models
Author: M. Pawan Kumar, Olga Veksler, Philip H.S. Torr
Abstract: We consider the problem of obtaining an approximate maximum a posteriori estimate of a discrete random field characterized by pairwise potentials that form a truncated convex model. For this problem, we propose two st-MINCUT based move making algorithms that we call Range Swap and Range Expansion. Our algorithms can be thought of as extensions of αβ-Swap and α-Expansion respectively that fully exploit the form of the pairwise potentials. Specifically, instead of dealing with one or two labels at each iteration, our methods explore a large search space by considering a range of labels (that is, an interval of consecutive labels). Furthermore, we show that Range Expansion provides the same multiplicative bounds as the standard linear programming (LP) relaxation in polynomial time. Compared to previous approaches based on the LP relaxation, for example interior-point algorithms or tree-reweighted message passing (TRW), our methods are faster as they use only the efficient st-MINCUT algorithm in their design. We demonstrate the usefulness of the proposed approaches on both synthetic and standard real data problems. Keywords: truncated convex models, move making algorithms, range moves, multiplicative bounds, linear programming relaxation
3 0.072354361 7 jmlr-2011-Adaptive Exact Inference in Graphical Models
Author: Özgür Sümer, Umut A. Acar, Alexander T. Ihler, Ramgopal R. Mettu
Abstract: Many algorithms and applications involve repeatedly solving variations of the same inference problem, for example to introduce new evidence to the model or to change conditional dependencies. As the model is updated, the goal of adaptive inference is to take advantage of previously computed quantities to perform inference more rapidly than from scratch. In this paper, we present algorithms for adaptive exact inference on general graphs that can be used to efficiently compute marginals and update MAP configurations under arbitrary changes to the input factor graph and its associated elimination tree. After a linear time preprocessing step, our approach enables updates to the model and the computation of any marginal in time that is logarithmic in the size of the input model. Moreover, in contrast to max-product our approach can also be used to update MAP configurations in time that is roughly proportional to the number of updated entries, rather than the size of the input model. To evaluate the practical effectiveness of our algorithms, we implement and test them using synthetic data as well as for two real-world computational biology applications. Our experiments show that adaptive inference can achieve substantial speedups over performing complete inference as the model undergoes small changes over time. Keywords: exact inference, factor graphs, factor elimination, marginalization, dynamic programming, MAP computation, model updates, parallel tree contraction ¨ u u c 2011 Ozg¨ r S¨ mer, Umut A. Acar, Alexander T. Ihler and Ramgopal R. Mettu. ¨ S UMER , ACAR , I HLER AND M ETTU
4 0.069298014 61 jmlr-2011-Logistic Stick-Breaking Process
Author: Lu Ren, Lan Du, Lawrence Carin, David Dunson
Abstract: A logistic stick-breaking process (LSBP) is proposed for non-parametric clustering of general spatially- or temporally-dependent data, imposing the belief that proximate data are more likely to be clustered together. The sticks in the LSBP are realized via multiple logistic regression functions, with shrinkage priors employed to favor contiguous and spatially localized segments. The LSBP is also extended for the simultaneous processing of multiple data sets, yielding a hierarchical logistic stick-breaking process (H-LSBP). The model parameters (atoms) within the H-LSBP are shared across the multiple learning tasks. Efficient variational Bayesian inference is derived, and comparisons are made to related techniques in the literature. Experimental analysis is performed for audio waveforms and images, and it is demonstrated that for segmentation applications the LSBP yields generally homogeneous segments with sharp boundaries. Keywords: Bayesian, nonparametric, dependent, hierarchical models, segmentation
5 0.055166766 103 jmlr-2011-Weisfeiler-Lehman Graph Kernels
Author: Nino Shervashidze, Pascal Schweitzer, Erik Jan van Leeuwen, Kurt Mehlhorn, Karsten M. Borgwardt
Abstract: In this article, we propose a family of efficient kernels for large graphs with discrete node labels. Key to our method is a rapid feature extraction scheme based on the Weisfeiler-Lehman test of isomorphism on graphs. It maps the original graph to a sequence of graphs, whose node attributes capture topological and label information. A family of kernels can be defined based on this Weisfeiler-Lehman sequence of graphs, including a highly efficient kernel comparing subtree-like patterns. Its runtime scales only linearly in the number of edges of the graphs and the length of the Weisfeiler-Lehman graph sequence. In our experimental evaluation, our kernels outperform state-of-the-art graph kernels on several graph classification benchmark data sets in terms of accuracy and runtime. Our kernels open the door to large-scale applications of graph kernels in various disciplines such as computational biology and social network analysis. Keywords: graph kernels, graph classification, similarity measures for graphs, Weisfeiler-Lehman algorithm c 2011 Nino Shervashidze, Pascal Schweitzer, Erik Jan van Leeuwen, Kurt Mehlhorn and Karsten M. Borgwardt. S HERVASHIDZE , S CHWEITZER , VAN L EEUWEN , M EHLHORN AND B ORGWARDT
6 0.053476024 54 jmlr-2011-Learning Latent Tree Graphical Models
7 0.052506309 12 jmlr-2011-Bayesian Co-Training
9 0.051169764 94 jmlr-2011-Theoretical Analysis of Bayesian Matrix Factorization
10 0.045623854 85 jmlr-2011-Smoothness, Disagreement Coefficient, and the Label Complexity of Agnostic Active Learning
11 0.045461643 82 jmlr-2011-Robust Gaussian Process Regression with a Student-tLikelihood
12 0.044859927 86 jmlr-2011-Sparse Linear Identifiable Multivariate Modeling
13 0.041656815 20 jmlr-2011-Convex and Network Flow Optimization for Structured Sparsity
14 0.040901881 75 jmlr-2011-Parallel Algorithm for Learning Optimal Bayesian Network Structure
15 0.037747644 55 jmlr-2011-Learning Multi-modal Similarity
16 0.036855329 90 jmlr-2011-The Indian Buffet Process: An Introduction and Review
17 0.035050552 49 jmlr-2011-Kernel Regression in the Presence of Correlated Errors
18 0.034553315 79 jmlr-2011-Proximal Methods for Hierarchical Sparse Coding
19 0.03355867 35 jmlr-2011-Forest Density Estimation
20 0.032031573 11 jmlr-2011-Approximate Marginals in Latent Gaussian Models
topicId topicWeight
[(0, 0.184), (1, -0.076), (2, -0.037), (3, 0.014), (4, 0.014), (5, 0.015), (6, 0.003), (7, 0.056), (8, 0.097), (9, 0.069), (10, -0.177), (11, 0.15), (12, 0.268), (13, -0.079), (14, 0.245), (15, -0.142), (16, 0.062), (17, 0.113), (18, -0.226), (19, 0.063), (20, 0.068), (21, -0.091), (22, 0.027), (23, 0.132), (24, 0.088), (25, 0.044), (26, 0.152), (27, 0.05), (28, 0.02), (29, 0.054), (30, -0.093), (31, -0.001), (32, 0.111), (33, -0.016), (34, 0.064), (35, 0.069), (36, -0.006), (37, 0.068), (38, -0.044), (39, 0.011), (40, 0.003), (41, 0.03), (42, -0.084), (43, -0.038), (44, 0.085), (45, 0.004), (46, 0.043), (47, -0.046), (48, 0.076), (49, -0.05)]
simIndex simValue paperId paperTitle
same-paper 1 0.93584394 34 jmlr-2011-Faster Algorithms for Max-Product Message-Passing
Author: Julian J. McAuley, TibĂŠrio S. Caetano
Abstract: Maximum A Posteriori inference in graphical models is often solved via message-passing algorithms, such as the junction-tree algorithm or loopy belief-propagation. The exact solution to this problem is well-known to be exponential in the size of the maximal cliques of the triangulated model, while approximate inference is typically exponential in the size of the model’s factors. In this paper, we take advantage of the fact that many models have maximal cliques that are larger than their constituent factors, and also of the fact that many factors consist only of latent variables (i.e., they do not depend on an observation). This is a common case in a wide variety of applications that deal with grid-, tree-, and ring-structured models. In such cases, we are able to decrease the exponent of complexity for message-passing by 0.5 for both exact and approximate inference. We demonstrate that message-passing operations in such models are equivalent to some variant of matrix multiplication in the tropical semiring, for which we offer an O(N 2.5 ) expected-case solution. Keywords: graphical models, belief-propagation, tropical matrix multiplication
2 0.67801344 41 jmlr-2011-Improved Moves for Truncated Convex Models
Author: M. Pawan Kumar, Olga Veksler, Philip H.S. Torr
Abstract: We consider the problem of obtaining an approximate maximum a posteriori estimate of a discrete random field characterized by pairwise potentials that form a truncated convex model. For this problem, we propose two st-MINCUT based move making algorithms that we call Range Swap and Range Expansion. Our algorithms can be thought of as extensions of αβ-Swap and α-Expansion respectively that fully exploit the form of the pairwise potentials. Specifically, instead of dealing with one or two labels at each iteration, our methods explore a large search space by considering a range of labels (that is, an interval of consecutive labels). Furthermore, we show that Range Expansion provides the same multiplicative bounds as the standard linear programming (LP) relaxation in polynomial time. Compared to previous approaches based on the LP relaxation, for example interior-point algorithms or tree-reweighted message passing (TRW), our methods are faster as they use only the efficient st-MINCUT algorithm in their design. We demonstrate the usefulness of the proposed approaches on both synthetic and standard real data problems. Keywords: truncated convex models, move making algorithms, range moves, multiplicative bounds, linear programming relaxation
3 0.44732824 94 jmlr-2011-Theoretical Analysis of Bayesian Matrix Factorization
Author: Shinichi Nakajima, Masashi Sugiyama
Abstract: Recently, variational Bayesian (VB) techniques have been applied to probabilistic matrix factorization and shown to perform very well in experiments. In this paper, we theoretically elucidate properties of the VB matrix factorization (VBMF) method. Through finite-sample analysis of the VBMF estimator, we show that two types of shrinkage factors exist in the VBMF estimator: the positive-part James-Stein (PJS) shrinkage and the trace-norm shrinkage, both acting on each singular component separately for producing low-rank solutions. The trace-norm shrinkage is simply induced by non-flat prior information, similarly to the maximum a posteriori (MAP) approach. Thus, no trace-norm shrinkage remains when priors are non-informative. On the other hand, we show a counter-intuitive fact that the PJS shrinkage factor is kept activated even with flat priors. This is shown to be induced by the non-identifiability of the matrix factorization model, that is, the mapping between the target matrix and factorized matrices is not one-to-one. We call this model-induced regularization. We further extend our analysis to empirical Bayes scenarios where hyperparameters are also learned based on the VB free energy. Throughout the paper, we assume no missing entry in the observed matrix, and therefore collaborative filtering is out of scope. Keywords: matrix factorization, variational Bayes, empirical Bayes, positive-part James-Stein shrinkage, non-identifiable model, model-induced regularization
4 0.40583947 61 jmlr-2011-Logistic Stick-Breaking Process
Author: Lu Ren, Lan Du, Lawrence Carin, David Dunson
Abstract: A logistic stick-breaking process (LSBP) is proposed for non-parametric clustering of general spatially- or temporally-dependent data, imposing the belief that proximate data are more likely to be clustered together. The sticks in the LSBP are realized via multiple logistic regression functions, with shrinkage priors employed to favor contiguous and spatially localized segments. The LSBP is also extended for the simultaneous processing of multiple data sets, yielding a hierarchical logistic stick-breaking process (H-LSBP). The model parameters (atoms) within the H-LSBP are shared across the multiple learning tasks. Efficient variational Bayesian inference is derived, and comparisons are made to related techniques in the literature. Experimental analysis is performed for audio waveforms and images, and it is demonstrated that for segmentation applications the LSBP yields generally homogeneous segments with sharp boundaries. Keywords: Bayesian, nonparametric, dependent, hierarchical models, segmentation
5 0.39646801 7 jmlr-2011-Adaptive Exact Inference in Graphical Models
Author: Özgür Sümer, Umut A. Acar, Alexander T. Ihler, Ramgopal R. Mettu
Abstract: Many algorithms and applications involve repeatedly solving variations of the same inference problem, for example to introduce new evidence to the model or to change conditional dependencies. As the model is updated, the goal of adaptive inference is to take advantage of previously computed quantities to perform inference more rapidly than from scratch. In this paper, we present algorithms for adaptive exact inference on general graphs that can be used to efficiently compute marginals and update MAP configurations under arbitrary changes to the input factor graph and its associated elimination tree. After a linear time preprocessing step, our approach enables updates to the model and the computation of any marginal in time that is logarithmic in the size of the input model. Moreover, in contrast to max-product our approach can also be used to update MAP configurations in time that is roughly proportional to the number of updated entries, rather than the size of the input model. To evaluate the practical effectiveness of our algorithms, we implement and test them using synthetic data as well as for two real-world computational biology applications. Our experiments show that adaptive inference can achieve substantial speedups over performing complete inference as the model undergoes small changes over time. Keywords: exact inference, factor graphs, factor elimination, marginalization, dynamic programming, MAP computation, model updates, parallel tree contraction ¨ u u c 2011 Ozg¨ r S¨ mer, Umut A. Acar, Alexander T. Ihler and Ramgopal R. Mettu. ¨ S UMER , ACAR , I HLER AND M ETTU
6 0.29290763 103 jmlr-2011-Weisfeiler-Lehman Graph Kernels
7 0.28681135 12 jmlr-2011-Bayesian Co-Training
8 0.2637057 54 jmlr-2011-Learning Latent Tree Graphical Models
9 0.24639656 49 jmlr-2011-Kernel Regression in the Presence of Correlated Errors
10 0.24420524 86 jmlr-2011-Sparse Linear Identifiable Multivariate Modeling
11 0.24368177 75 jmlr-2011-Parallel Algorithm for Learning Optimal Bayesian Network Structure
13 0.21490106 20 jmlr-2011-Convex and Network Flow Optimization for Structured Sparsity
14 0.21126163 85 jmlr-2011-Smoothness, Disagreement Coefficient, and the Label Complexity of Agnostic Active Learning
15 0.20542464 90 jmlr-2011-The Indian Buffet Process: An Introduction and Review
16 0.20504248 71 jmlr-2011-On Equivalence Relationships Between Classification and Ranking Algorithms
17 0.20197177 104 jmlr-2011-X-Armed Bandits
18 0.20152451 76 jmlr-2011-Parameter Screening and Optimisation for ILP using Designed Experiments
19 0.18764409 84 jmlr-2011-Semi-Supervised Learning with Measure Propagation
20 0.1874246 88 jmlr-2011-Structured Variable Selection with Sparsity-Inducing Norms
topicId topicWeight
[(4, 0.042), (9, 0.026), (10, 0.587), (24, 0.033), (31, 0.052), (32, 0.02), (41, 0.015), (60, 0.017), (71, 0.016), (73, 0.029), (78, 0.046), (90, 0.028)]
simIndex simValue paperId paperTitle
1 0.98038697 75 jmlr-2011-Parallel Algorithm for Learning Optimal Bayesian Network Structure
Author: Yoshinori Tamada, Seiya Imoto, Satoru Miyano
Abstract: We present a parallel algorithm for the score-based optimal structure search of Bayesian networks. This algorithm is based on a dynamic programming (DP) algorithm having O(n · 2n ) time and space complexity, which is known to be the fastest algorithm for the optimal structure search of networks with n nodes. The bottleneck of the problem is the memory requirement, and therefore, the algorithm is currently applicable for up to a few tens of nodes. While the recently proposed algorithm overcomes this limitation by a space-time trade-off, our proposed algorithm realizes direct parallelization of the original DP algorithm with O(nσ ) time and space overhead calculations, where σ > 0 controls the communication-space trade-off. The overall time and space complexity is O(nσ+1 2n ). This algorithm splits the search space so that the required communication between independent calculations is minimal. Because of this advantage, our algorithm can run on distributed memory supercomputers. Through computational experiments, we confirmed that our algorithm can run in parallel using up to 256 processors with a parallelization efficiency of 0.74, compared to the original DP algorithm with a single processor. We also demonstrate optimal structure search for a 32-node network without any constraints, which is the largest network search presented in literature. Keywords: optimal Bayesian network structure, parallel algorithm
same-paper 2 0.94006383 34 jmlr-2011-Faster Algorithms for Max-Product Message-Passing
Author: Julian J. McAuley, TibĂŠrio S. Caetano
Abstract: Maximum A Posteriori inference in graphical models is often solved via message-passing algorithms, such as the junction-tree algorithm or loopy belief-propagation. The exact solution to this problem is well-known to be exponential in the size of the maximal cliques of the triangulated model, while approximate inference is typically exponential in the size of the model’s factors. In this paper, we take advantage of the fact that many models have maximal cliques that are larger than their constituent factors, and also of the fact that many factors consist only of latent variables (i.e., they do not depend on an observation). This is a common case in a wide variety of applications that deal with grid-, tree-, and ring-structured models. In such cases, we are able to decrease the exponent of complexity for message-passing by 0.5 for both exact and approximate inference. We demonstrate that message-passing operations in such models are equivalent to some variant of matrix multiplication in the tropical semiring, for which we offer an O(N 2.5 ) expected-case solution. Keywords: graphical models, belief-propagation, tropical matrix multiplication
3 0.49811628 7 jmlr-2011-Adaptive Exact Inference in Graphical Models
Author: Özgür Sümer, Umut A. Acar, Alexander T. Ihler, Ramgopal R. Mettu
Abstract: Many algorithms and applications involve repeatedly solving variations of the same inference problem, for example to introduce new evidence to the model or to change conditional dependencies. As the model is updated, the goal of adaptive inference is to take advantage of previously computed quantities to perform inference more rapidly than from scratch. In this paper, we present algorithms for adaptive exact inference on general graphs that can be used to efficiently compute marginals and update MAP configurations under arbitrary changes to the input factor graph and its associated elimination tree. After a linear time preprocessing step, our approach enables updates to the model and the computation of any marginal in time that is logarithmic in the size of the input model. Moreover, in contrast to max-product our approach can also be used to update MAP configurations in time that is roughly proportional to the number of updated entries, rather than the size of the input model. To evaluate the practical effectiveness of our algorithms, we implement and test them using synthetic data as well as for two real-world computational biology applications. Our experiments show that adaptive inference can achieve substantial speedups over performing complete inference as the model undergoes small changes over time. Keywords: exact inference, factor graphs, factor elimination, marginalization, dynamic programming, MAP computation, model updates, parallel tree contraction ¨ u u c 2011 Ozg¨ r S¨ mer, Umut A. Acar, Alexander T. Ihler and Ramgopal R. Mettu. ¨ S UMER , ACAR , I HLER AND M ETTU
4 0.49511245 30 jmlr-2011-Efficient Structure Learning of Bayesian Networks using Constraints
Author: Cassio P. de Campos, Qiang Ji
Abstract: This paper addresses the problem of learning Bayesian network structures from data based on score functions that are decomposable. It describes properties that strongly reduce the time and memory costs of many known methods without losing global optimality guarantees. These properties are derived for different score criteria such as Minimum Description Length (or Bayesian Information Criterion), Akaike Information Criterion and Bayesian Dirichlet Criterion. Then a branch-andbound algorithm is presented that integrates structural constraints with data in a way to guarantee global optimality. As an example, structural constraints are used to map the problem of structure learning in Dynamic Bayesian networks into a corresponding augmented Bayesian network. Finally, we show empirically the benefits of using the properties with state-of-the-art methods and with the new algorithm, which is able to handle larger data sets than before. Keywords: Bayesian networks, structure learning, properties of decomposable scores, structural constraints, branch-and-bound technique
5 0.40764236 16 jmlr-2011-Clustering Algorithms for Chains
Author: Antti Ukkonen
Abstract: We consider the problem of clustering a set of chains to k clusters. A chain is a totally ordered subset of a finite set of items. Chains are an intuitive way to express preferences over a set of alternatives, as well as a useful representation of ratings in situations where the item-specific scores are either difficult to obtain, too noisy due to measurement error, or simply not as relevant as the order that they induce over the items. First we adapt the classical k-means for chains by proposing a suitable distance function and a centroid structure. We also present two different approaches for mapping chains to a vector space. The first one is related to the planted partition model, while the second one has an intuitive geometrical interpretation. Finally we discuss a randomization test for assessing the significance of a clustering. To this end we present an MCMC algorithm for sampling random sets of chains that share certain properties with the original data. The methods are studied in a series of experiments using real and artificial data. Results indicate that the methods produce interesting clusterings, and for certain types of inputs improve upon previous work on clustering algorithms for orders. Keywords: Lloyd’s algorithm, orders, preference statements, planted partition model, randomization testing
6 0.38851216 86 jmlr-2011-Sparse Linear Identifiable Multivariate Modeling
7 0.38661271 27 jmlr-2011-Domain Decomposition Approach for Fast Gaussian Process Regression of Large Spatial Data Sets
8 0.38028088 59 jmlr-2011-Learning with Structured Sparsity
9 0.37915057 61 jmlr-2011-Logistic Stick-Breaking Process
10 0.36725602 4 jmlr-2011-A Family of Simple Non-Parametric Kernel Learning Algorithms
11 0.36669222 25 jmlr-2011-Discriminative Learning of Bayesian Networks via Factorized Conditional Log-Likelihood
12 0.36266088 47 jmlr-2011-Inverse Reinforcement Learning in Partially Observable Environments
13 0.36254823 76 jmlr-2011-Parameter Screening and Optimisation for ILP using Designed Experiments
14 0.35746863 12 jmlr-2011-Bayesian Co-Training
15 0.35724047 13 jmlr-2011-Bayesian Generalized Kernel Mixed Models
16 0.35386023 48 jmlr-2011-Kernel Analysis of Deep Networks
17 0.35326281 19 jmlr-2011-Convergence of Distributed Asynchronous Learning Vector Quantization Algorithms
18 0.35031343 89 jmlr-2011-Super-Linear Convergence of Dual Augmented Lagrangian Algorithm for Sparsity Regularized Estimation
20 0.34387711 20 jmlr-2011-Convex and Network Flow Optimization for Structured Sparsity