jmlr jmlr2011 jmlr2011-7 knowledge-graph by maker-knowledge-mining

7 jmlr-2011-Adaptive Exact Inference in Graphical Models

Source: pdf

Author: Özgür Sümer, Umut A. Acar, Alexander T. Ihler, Ramgopal R. Mettu

Abstract: Many algorithms and applications involve repeatedly solving variations of the same inference problem, for example to introduce new evidence to the model or to change conditional dependencies. As the model is updated, the goal of adaptive inference is to take advantage of previously computed quantities to perform inference more rapidly than from scratch. In this paper, we present algorithms for adaptive exact inference on general graphs that can be used to efﬁciently compute marginals and update MAP conﬁgurations under arbitrary changes to the input factor graph and its associated elimination tree. After a linear time preprocessing step, our approach enables updates to the model and the computation of any marginal in time that is logarithmic in the size of the input model. Moreover, in contrast to max-product our approach can also be used to update MAP conﬁgurations in time that is roughly proportional to the number of updated entries, rather than the size of the input model. To evaluate the practical effectiveness of our algorithms, we implement and test them using synthetic data as well as for two real-world computational biology applications. Our experiments show that adaptive inference can achieve substantial speedups over performing complete inference as the model undergoes small changes over time. Keywords: exact inference, factor graphs, factor elimination, marginalization, dynamic programming, MAP computation, model updates, parallel tree contraction ¨ u u c 2011 Ozg¨ r S¨ mer, Umut A. Acar, Alexander T. Ihler and Ramgopal R. Mettu. ¨ S UMER , ACAR , I HLER AND M ETTU

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 In this paper, we present algorithms for adaptive exact inference on general graphs that can be used to efﬁciently compute marginals and update MAP conﬁgurations under arbitrary changes to the input factor graph and its associated elimination tree. [sent-14, score-1.191]

2 We use a process based on factor elimination (Darwiche, 2009) that we call hierarchical clustering that takes as input a graph and elimination tree (equivalent to a tree-decomposition of the graphical model), and produces an alternative, balanced elimination sequence. [sent-42, score-2.474]

3 Given a factor graph G with n nodes, and domain size d (each variable can take d different values), we require the user to specify an elimination tree T on factors. [sent-79, score-1.066]

4 Our framework for adaptive inference requires a preprocessing step in which we build a balanced representation of the input elimination tree in O(d 3w n) time where w is the width of the input elimination tree T . [sent-80, score-2.122]

5 The dependence in our case, however is stronger: if the input elimination tree has width w, our balanced representation is guaranteed to have width at most 3w. [sent-85, score-1.141]

6 In Section 2, we give the deﬁnitions and notation used throughout this paper, along with some background on the factor elimination algorithm and tree decompositions. [sent-114, score-1.015]

7 Factor elimination takes a factor graph G1 and an elimination tree T1 as input and sequentially eliminates the leaf factors in the elimination tree. [sent-146, score-2.563]

8 , 2001) for treestructured graphs, or more generally bucket elimination (Dechter, 1998), recursive conditioning (Darwiche and Hopkins, 2001), junction-trees (Lauritzen and Spiegelhalter, 1988) and factor elimination (Darwiche, 2009). [sent-150, score-1.406]

9 In contrast, factor elimination uses an elimination tree T on the factors and eliminates factors starting at leaves of T ; an example elimination tree is shown in Figure 1b. [sent-154, score-2.818]

10 At iteration t, we pick a leaf factor f j in Tt and eliminate it from the elimination tree forming Tt+1 . [sent-156, score-1.099]

11 1, we use the notation λi to represent the partially marginalized functions; for standard factor elimination these operations are typically combined into ′ a single update to fk . [sent-161, score-0.977]

12 Figure 1 gives an example where we apply factor elimination to a leaf factor f 1 X fk j k in the elimination tree. [sent-163, score-1.703]

13 ′ and update f1 ’s neighbor f2 in the elimination tree with f2 = f2 ∑V1 f1 . [sent-168, score-1.005]

14 We root the elimination tree at a factor Suppose we wish to compute a particular marginal g i f j such that xi ∼G f j , then eliminate leaves of the elimination tree one at a time, until only one factor remains. [sent-171, score-2.186]

15 Factor elimination is equivalent to bucket (or variable) elimination (Kask et al. [sent-175, score-1.292]

16 In particular, the factor elimination algorithm marginalizes out a variable xi when there is no factor left in the factor graph that is adjacent to xi . [sent-177, score-1.12]

17 With a similar argument, one can also interpret any bucket elimination procedure as a factor elimination sequence. [sent-179, score-1.406]

18 2 Viewing Elimination Trees as Tree-decompositions For tree-structured factor graphs, the typical choice for the elimination tree is based on the factor graph itself. [sent-183, score-1.18]

19 However, when the input factor graph is not tree-structured, we must choose an elimination ordering that ensures that the propagation of variables over the course of elimination is not too costly. [sent-184, score-1.493]

20 In this section, we outline how a particular elimination tree can be related to a tree decomposition on the input graph (e. [sent-185, score-1.243]

21 For a factor graph G, we can construct a tree decomposition (χ, ψ, D ) that corresponds to an elimination tree T = (F, E) on G. [sent-206, score-1.321]

22 First, we set ψi = { fi } and D = (χ, E ′ ) where (χi , χ j ) ∈ E ′ is an edge in the tree-decomposition if and only if ( fi , f j ) ∈ E is an edge in the elimination tree T . [sent-207, score-1.193]

23 The factor graph (light edges) and its elimination tree (bold edges) on the left is equivalent to the tree-decomposition on the right. [sent-214, score-1.066]

24 The partial marginalization function λi computed during the elimination of fi is identical to the message µχi →χ j where f j is the parent of fi in the elimination tree. [sent-227, score-1.602]

25 For an elimination tree T , suppose that the corresponding tree decomposition is (χ, ψ, D ). [sent-230, score-1.156]

26 Inference performed using T incurs a constant-factor overhead that is exponential in its width; for example, computing marginals using an elimination tree T of width w takes O(d w+1 · n) time and space where n is the number of variables and d is the domain size. [sent-232, score-1.029]

27 Computing Marginals with Deferred Factor Elimination When performing inference with factor elimination, one typically attempts to select an elimination tree to minimize its associated width. [sent-234, score-1.071]

28 For the chain factor graph in (a), the elimination tree in (b) has width 1 but requires O(n) steps to propagate information from leaves to the root. [sent-236, score-1.174]

29 The balanced elimination tree in (c), for the same factor graph, has width 2 but takes only O(log n) steps to propagate information from a leaf to the root, since f3 and f5 are eliminated earlier. [sent-237, score-1.295]

30 If f1 is modiﬁed, then using a balanced elimination tree, we only need to update O(log n) elimination steps, while an unbalanced tree requires potentially O(n) updates. [sent-238, score-1.678]

31 While this elimination tree is optimal for a single computation, suppose that we now modify the leaf factor f1 . [sent-242, score-1.058]

32 However, if we use the balanced elimination tree shown in Figure 3c, we can compute the marginalization for f7 in time that is logarithmic in the size of the model. [sent-244, score-1.081]

33 While the latter elimination tree increases the width by one (increasing the dependence on d), for ﬁxed d and as n grows large we can achieve a signiﬁcant speedup over the unbalanced ordering if we wish to make changes to the model. [sent-245, score-1.093]

34 Our primary technique, which we call deferred factor elimination, generalizes factor elimination so that it can be applied to non-leaf nodes in the input elimination tree. [sent-247, score-1.713]

35 We refer to the local information resulting from each deferred factor elimination as a cluster function (or, more succinctly, as a cluster), and store this information along with the balanced elimination tree. [sent-249, score-1.807]

36 For our algorithm, we assume that the user provides both an input factor graph G and an associated elimination tree T . [sent-252, score-1.102]

37 While the elimination tree is traditionally computed from an input model, in an adaptive setting it may be desirable to change the elimination tree to take advantage of changes made to the factors (see Figure 9 for an example). [sent-253, score-2.039]

38 1 Deferred Factor Elimination and Cluster Functions Consider the elimination of a degree-two factor f j , with neighbors fi and fk in the given elimination tree. [sent-261, score-1.681]

39 In this section, we show how deferred factor elimination can be performed on the elimination tree, and how the intermediate cluster information can be saved and also used to efﬁciently compute marginals. [sent-264, score-1.753]

40 In a particular round t (1 ≤ t ≤ n), we begin with a factor graph Gt and an elimination tree Tt , and after performing some set of deferred factor eliminations, we obtain a resulting factor graph Gt+1 and elimination tree Tt+1 for the next round. [sent-266, score-2.466]

41 When a degree-two factor f j is removed, we attach λ j to a newly created edge ( fi , fk ) where fi and fk are f j ’s neighbors in elimination tree T . [sent-271, score-1.576]

42 To eliminate a leaf node f1 , we sum out variables that are not attached to any other factors (shaded), resulting in the cluster function λ1 and new elimination tree T2 in (b). [sent-274, score-1.366]

43 Vj λk ∈CTt ( f j ) The cluster λ j is referred as a root cluster if degTt ( f j ) = 0, a degree-one cluster if degTt ( f j ) = 1, and a degree-two cluster if degTt ( f j ) = 2. [sent-279, score-1.038]

44 Figure 5 illustrates the creation of degree-one and degree-two clusters, and the associated changes to the elimination tree and factor graph. [sent-280, score-1.08]

45 2, we established an equivalence between clusters and messages in the tree-decomposition in the case where only leaf factors in the elimination tree are eliminated. [sent-289, score-1.159]

46 2, the equivalent tree-decomposition (χ, ψ, D ) of an elimination tree T = (F, E) consists of a tree D on hyper-nodes χ = {χ1 , . [sent-292, score-1.156]

47 Let fk be f j ’s unique neighbor in the elimination tree when it is eliminated. [sent-300, score-1.068]

48 Using the relationships established above between cluster functions and messages in a tree decomposition, we give the running time of deferred factor elimination on a given elimination tree and input factor graph. [sent-311, score-2.489]

49 Lemma 1 For an elimination tree with width w, the elimination of leaf factors takes Θ(d 2w ) time and produces a cluster of size Θ(d w ), where d is the domain size of the variables in the input factor graph. [sent-312, score-2.165]

50 This assumption is valid because for any given elimination tree, we can construct an equivalent elimination tree with degree 3 by adding dummy factors. [sent-330, score-1.547]

51 2 Constructing a Balanced Cluster Tree In this section, we show how performing deferred factor elimination in rounds can be used to create a data structure we call a cluster tree. [sent-339, score-1.135]

52 As variables and factors are eliminated through deferred factor elimination, we build the cluster tree using the dependency relationships among clusters (see Figure 6). [sent-340, score-1.018]

53 The cluster tree can then be used to compute marginals efﬁciently, and as we will see, it can also be used to efﬁciently update the original factor graph or elimination tree. [sent-341, score-1.449]

54 For a factor graph G = (X, F) and an elimination tree T , a cluster tree H = (X ∪C, E) is a rooted tree on variables and clusters X ∪C where C is the set of clusters. [sent-342, score-1.902]

55 To obtain the cluster tree in (b), eliminations are performed in the factor graph G (a) in the following order: f1 , f3 , f5 and f6 in round 1, f4 in round 2 and f2 in round 3. [sent-358, score-1.051]

56 Theorem 2 Let G = (X, F) be a factor graph with n nodes and T be an elimination tree on G with width w. [sent-366, score-1.204]

57 For our purposes it is desirable to perform deferred factor elimination so that we obtain a cluster tree with logarithmic depth. [sent-370, score-1.402]

58 We start with T1 = T and at each round i we identify a set K of degree-one or -2 factors in Ti and apply deferred factor elimination to this independent set of factors to construct Ti+1 . [sent-372, score-1.174]

59 The sequence of elimination trees created during the hierarchical clustering process will prove to be useful in Section 4, when we show how to perform structural updates to the elimination tree. [sent-376, score-1.361]

60 As an example, a factor graph G, along with its associated elimination tree T = T1 , is given in Figure 7a. [sent-377, score-1.066]

61 Lemma 3 For any factor graph G = (X, F) with n nodes and any elimination tree T , the cluster tree obtained by hierarchical clustering has depth O(log n). [sent-388, score-1.663]

62 3 Computing Marginals Once a balanced cluster tree H has been constructed from the input factor graph and elimination tree, as in standard approaches we can compute the marginal distribution of any variable xi by propagating information (i. [sent-395, score-1.465]

63 Theorem 4 Consider a factor graph G with n nodes and let T be an elimination tree with width w. [sent-415, score-1.204]

64 However, by caching the downward marginalization functions in the cluster tree, we can compute all marginals in O(d 2w · n) time, which is optimal given the elimination ordering. [sent-428, score-1.072]

65 Updates The preceding sections described the process of constructing a balanced, cluster tree elimination ordering from a given elimination tree, and how to use the resulting cluster tree to compute marginal distributions. [sent-433, score-2.339]

66 In this section, we describe how to efﬁciently update the cluster tree data structure after changes are made to the input factor graph or elimination tree. [sent-435, score-1.525]

67 We ﬁrst describe how to make changes to the factors, whether changing the parameters of the factor or its arguments (and thus the structure of the factor graph), but leaving the original elimination tree (and thus the cluster tree) ﬁxed. [sent-437, score-1.475]

68 We then describe how to make changes to the elimination tree and efﬁciently update the cluster tree. [sent-438, score-1.296]

69 In practice these two operations may be combined; for example when modifying a tree-structured graph such that it remains a tree we are likely to change the elimination tree to reﬂect the new structure. [sent-439, score-1.232]

70 If the factor graph in (a) is modiﬁed by removing the edge (y, f1 ), we can reduce the width of the elimination tree (from 3 to 2) by replacing the edge ( f1 , f2 ) by ( f1 , f5 ). [sent-441, score-1.239]

71 1 Updating Factors With a Fixed Elimination Tree For a ﬁxed elimination tree, suppose that we change the parameters of a factor f j (but not its arguments), and consider the new cluster tree created for the resulting graph. [sent-448, score-1.268]

72 To update the cluster tree as a result of this change, we must update all clusters affected by the change to f j , and we must also identify and update the clusters affected by earlier, or later, removal of xi from the factor graph. [sent-459, score-1.326]

73 Observe that the original cluster λk at which xi is eliminated is the topmost cluster in the cluster tree with the property that either fk , the associated factor, depends on xi , or λk has two children clusters that both depend on xi . [sent-462, score-1.445]

74 Figure 10 illustrates how the cluster tree is updated after deleting an edge in a factor graph keeping the elimination tree ﬁxed. [sent-474, score-1.647]

75 Suppose we modify the input factor graph by removing (y, f1 ) from the factor graph and replacing ( f1 , f2 ) by ( f1 , f5 ) in the elimination tree as shown in (a). [sent-484, score-1.267]

76 As stated earlier, any elimination point λk for xi satisﬁes the condition that it is the topmost cluster in the cluster tree with the property that either fk depends on xi , or λk has two children clusters that both depend on xi . [sent-491, score-1.73]

77 Theorem 5 Let G = (X, F) be a factor graph with n nodes and H be the cluster tree obtained using an elimination tree T with width w. [sent-500, score-1.712]

78 However, changing the input elimination order also requires modifying the cluster tree constructed from it. [sent-506, score-1.215]

79 In this section we prove that it is possible to efﬁciently reorganize the cluster tree after a change to the elimination tree. [sent-508, score-1.154]

80 , marking some nodes as affected if we need to revisit the deferred elimination decision we made in constructing the cluster tree, and leaving the rest unchanged. [sent-513, score-1.231]

81 Since the elimination tree can be arbitrarily modiﬁed by performing edge deletions and insertions successively, for ease of exposition we ﬁrst focus on how the cluster tree can be efﬁciently updated when a single edge in the elimination tree is inserted or deleted. [sent-515, score-2.408]

82 If Ci ( fk ) and Ci′ ( fk ) are the set of clusters around fk on Ti and Ti′ , respectively, then fk is affected at round i if the sets Ci ( fk ) and Ci′ ( fk ) are different. [sent-526, score-1.19]

83 For a factor f j , if σi ( f j ) = 1, then f j is either eliminated or a candidate for elimination at round i in one or both of the previous and new hierarchical clusterings. [sent-539, score-0.994]

84 We ﬁrst insert or remove the edge ( fi , f j ) in the original elimination tree and ′ ′ ′ obtain T1′ = (V1′ , E1 ) where E1 = E1 ∪ ( fi , f j ) if the edge is inserted or E1 = E1 \ ( fi , f j ) if deleted. [sent-545, score-1.29]

85 • We obtain the new elimination tree Ti′ = (Fi′ , Ei′ ) by eliminating the factors in Mi−1 from Ti−1 via deferred factor elimination subroutine. [sent-558, score-1.88]

86 Then, T ′ is a valid hierarchical clustering, that is, ′ • the set Mi = Fi′ \ Fi+1 is a maximal independent set containing vertices of degree at most two, and ′ • Ti+1 is obtained from Ti′ by applying deferred factor elimination to the factors in Mi . [sent-619, score-1.0]

87 The update algorithm applies the deferred factor elimination subroutine on the set Mi′ , so what remains to be shown is the saved values for Ri are the same as if we eliminate them explicitly. [sent-624, score-0.972]

88 By Lemma 6, the factors in Ri have the same set of clusters around them in Ti and Ti′ , which means that deferred factor elimination procedure will produce the same result in both elimination trees when unaffected factors are eliminated. [sent-625, score-1.861]

89 , l, let Ai be the set of affected nodes computed by our algorithm after inserting or deleting edge ( fi , f j ) in the elimination tree. [sent-634, score-1.03]

90 Since an unaffected node becomes affected only if it is adjacent to an affected factor, the set of affected nodes forms a connected sub-tree throughout the elimination procedure. [sent-637, score-1.318]

91 3169 ¨ S UMER , ACAR , I HLER AND M ETTU Combining the above arguments, we now conclude that a cluster tree can be efﬁciently updated if the elimination tree is modiﬁed. [sent-656, score-1.409]

92 Theorem 9 Let G = (X, F) be a factor graph with n nodes and H be the cluster tree obtained using an elimination tree T . [sent-657, score-1.637]

93 Theorem 10 Let G = (X, F) be a factor graph with n nodes and H be the cluster tree obtained using an elimination tree T . [sent-664, score-1.637]

94 Theorem 11 Let G be a factor graph with n nodes and T be an elimination tree on G with treewidth w. [sent-689, score-1.129]

95 1 In our implementation, all algorithms take the elimination tree as input; when it is not possible to compute the optimal elimination tree for a given input, we use a simple greedy method to construct it (the algorithm grows the tree incrementally while minimizing width). [sent-745, score-2.057]

96 We measure the running times for naive sum-product, building the cluster tree, computing marginal queries, updating factors, and restructuring (adding and deleting edges to the elimination tree) for tree-structured and loopy factor graphs. [sent-808, score-1.189]

97 Conclusion In this paper, we have presented an adaptive framework for performing exact inference that efﬁciently handles changes to the input factor graph and its associated elimination tree. [sent-1029, score-1.007]

98 We can then make arbitrary changes to the factor graph or elimination tree, and update the cluster tree in logarithmic time in the size of the input factor graph. [sent-1031, score-1.651]

99 While deferred factor elimination gives rise to a balanced elimination tree, it also incurs a larger constant factor dependent on the tree width. [sent-1049, score-1.923]

100 It would be interesting to incorporate additional information into the deferred elimination procedure used to build the cluster tree to reduce this constant factor. [sent-1051, score-1.272]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('elimination', 0.646), ('tree', 0.255), ('cluster', 0.253), ('acar', 0.189), ('affected', 0.151), ('fk', 0.14), ('round', 0.126), ('factor', 0.114), ('ettu', 0.111), ('hler', 0.111), ('umer', 0.111), ('eliminated', 0.108), ('nference', 0.106), ('xact', 0.106), ('protein', 0.102), ('fi', 0.097), ('factors', 0.097), ('guration', 0.097), ('deferred', 0.094), ('unaffected', 0.094), ('map', 0.091), ('raphical', 0.09), ('marginalization', 0.086), ('speedups', 0.085), ('daptive', 0.081), ('ti', 0.08), ('update', 0.077), ('secondary', 0.076), ('width', 0.075), ('clusters', 0.073), ('changes', 0.065), ('loopy', 0.064), ('nodes', 0.063), ('tt', 0.059), ('inference', 0.056), ('graphs', 0.054), ('balanced', 0.054), ('marginals', 0.053), ('gurations', 0.052), ('speedup', 0.052), ('graph', 0.051), ('mi', 0.049), ('edge', 0.049), ('changed', 0.048), ('odels', 0.047), ('con', 0.047), ('amino', 0.047), ('messages', 0.045), ('queries', 0.044), ('updates', 0.043), ('leaf', 0.043), ('eliminate', 0.041), ('logarithmic', 0.04), ('fa', 0.04), ('adaptive', 0.039), ('fb', 0.039), ('neighbors', 0.038), ('input', 0.036), ('fj', 0.036), ('ci', 0.035), ('children', 0.035), ('downward', 0.034), ('conformation', 0.033), ('degti', 0.033), ('ihler', 0.033), ('sidechain', 0.033), ('leaves', 0.033), ('gt', 0.032), ('running', 0.031), ('adjacent', 0.031), ('marginal', 0.031), ('query', 0.031), ('node', 0.031), ('message', 0.03), ('eliminates', 0.029), ('eliminating', 0.028), ('endfor', 0.028), ('mutations', 0.028), ('mettu', 0.028), ('upwards', 0.028), ('structure', 0.028), ('acid', 0.027), ('contraction', 0.027), ('neighbor', 0.027), ('root', 0.026), ('clustering', 0.026), ('ri', 0.026), ('updating', 0.026), ('ai', 0.026), ('darwiche', 0.025), ('recompute', 0.025), ('maximal', 0.025), ('modifying', 0.025), ('xi', 0.025), ('build', 0.024), ('marked', 0.024), ('vertices', 0.024), ('deleting', 0.024), ('marking', 0.024), ('ni', 0.023)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999809 7 jmlr-2011-Adaptive Exact Inference in Graphical Models

Author: Özgür Sümer, Umut A. Acar, Alexander T. Ihler, Ramgopal R. Mettu

2 0.13359809 54 jmlr-2011-Learning Latent Tree Graphical Models

Author: Myung Jin Choi, Vincent Y. F. Tan, Animashree Anandkumar, Alan S. Willsky

Abstract: We study the problem of learning a latent tree graphical model where samples are available only from a subset of variables. We propose two consistent and computationally efﬁcient algorithms for learning minimal latent trees, that is, trees without any redundant hidden nodes. Unlike many existing methods, the observed nodes (or variables) are not constrained to be leaf nodes. Our algorithms can be applied to both discrete and Gaussian random variables and our learned models are such that all the observed and latent variables have the same domain (state space). Our ﬁrst algorithm, recursive grouping, builds the latent tree recursively by identifying sibling groups using so-called information distances. One of the main contributions of this work is our second algorithm, which we refer to as CLGrouping. CLGrouping starts with a pre-processing procedure in which a tree over the observed variables is constructed. This global step groups the observed nodes that are likely to be close to each other in the true latent tree, thereby guiding subsequent recursive grouping (or equivalent procedures such as neighbor-joining) on much smaller subsets of variables. This results in more accurate and efﬁcient learning of latent trees. We also present regularized versions of our algorithms that learn latent tree approximations of arbitrary distributions. We compare the proposed algorithms to other methods by performing extensive numerical experiments on various latent tree graphical models such as hidden Markov models and star graphs. In addition, we demonstrate the applicability of our methods on real-world data sets by modeling the dependency structure of monthly stock returns in the S&P; index and of the words in the 20 newsgroups data set. Keywords: graphical models, Markov random ﬁelds, hidden variables, latent tree models, structure learning c 2011 Myung Jin Choi, Vincent Y. F. Tan, Animashree Anandkumar and Alan S. Willsky. C HOI , TAN , A NANDKUMAR AND W ILLSKY

3 0.072354361 34 jmlr-2011-Faster Algorithms for Max-Product Message-Passing

Author: Julian J. McAuley, TibĂŠrio S. Caetano

Abstract: Maximum A Posteriori inference in graphical models is often solved via message-passing algorithms, such as the junction-tree algorithm or loopy belief-propagation. The exact solution to this problem is well-known to be exponential in the size of the maximal cliques of the triangulated model, while approximate inference is typically exponential in the size of the modelĂ˘€™s factors. In this paper, we take advantage of the fact that many models have maximal cliques that are larger than their constituent factors, and also of the fact that many factors consist only of latent variables (i.e., they do not depend on an observation). This is a common case in a wide variety of applications that deal with grid-, tree-, and ring-structured models. In such cases, we are able to decrease the exponent of complexity for message-passing by 0.5 for both exact and approximate inference. We demonstrate that message-passing operations in such models are equivalent to some variant of matrix multiplication in the tropical semiring, for which we offer an O(N 2.5 ) expected-case solution. Keywords: graphical models, belief-propagation, tropical matrix multiplication

4 0.066933356 104 jmlr-2011-X-Armed Bandits

Author: Sébastien Bubeck, Rémi Munos, Gilles Stoltz, Csaba Szepesvári

Abstract: We consider a generalization of stochastic bandits where the set of arms, X , is allowed to be a generic measurable space and the mean-payoff function is “locally Lipschitz” with respect to a dissimilarity function that is known to the decision maker. Under this condition we construct an arm selection policy, called HOO (hierarchical optimistic optimization), with improved regret bounds compared to previous results for a large class of problems. In particular, our results imply that if X is the unit hypercube in a Euclidean space and the mean-payoff function has a ﬁnite number of global maxima around which the behavior of the function is locally continuous with a known √ smoothness degree, then the expected regret of HOO is bounded up to a logarithmic factor by n, that is, the rate of growth of the regret is independent of the dimension of the space. We also prove the minimax optimality of our algorithm when the dissimilarity is a metric. Our basic strategy has quadratic computational complexity as a function of the number of time steps and does not rely on the doubling trick. We also introduce a modiﬁed strategy, which relies on the doubling trick but runs in linearithmic time. Both results are improvements with respect to previous approaches. Keywords: bandits with inﬁnitely many arms, optimistic online optimization, regret bounds, minimax rates

5 0.05902731 103 jmlr-2011-Weisfeiler-Lehman Graph Kernels

Author: Nino Shervashidze, Pascal Schweitzer, Erik Jan van Leeuwen, Kurt Mehlhorn, Karsten M. Borgwardt

Abstract: In this article, we propose a family of efﬁcient kernels for large graphs with discrete node labels. Key to our method is a rapid feature extraction scheme based on the Weisfeiler-Lehman test of isomorphism on graphs. It maps the original graph to a sequence of graphs, whose node attributes capture topological and label information. A family of kernels can be deﬁned based on this Weisfeiler-Lehman sequence of graphs, including a highly efﬁcient kernel comparing subtree-like patterns. Its runtime scales only linearly in the number of edges of the graphs and the length of the Weisfeiler-Lehman graph sequence. In our experimental evaluation, our kernels outperform state-of-the-art graph kernels on several graph classiﬁcation benchmark data sets in terms of accuracy and runtime. Our kernels open the door to large-scale applications of graph kernels in various disciplines such as computational biology and social network analysis. Keywords: graph kernels, graph classiﬁcation, similarity measures for graphs, Weisfeiler-Lehman algorithm c 2011 Nino Shervashidze, Pascal Schweitzer, Erik Jan van Leeuwen, Kurt Mehlhorn and Karsten M. Borgwardt. S HERVASHIDZE , S CHWEITZER , VAN L EEUWEN , M EHLHORN AND B ORGWARDT

6 0.057155233 82 jmlr-2011-Robust Gaussian Process Regression with a Student-tLikelihood

7 0.054441825 79 jmlr-2011-Proximal Methods for Hierarchical Sparse Coding

8 0.053355969 35 jmlr-2011-Forest Density Estimation

9 0.047019362 9 jmlr-2011-An Asymptotic Behaviour of the Marginal Likelihood for General Markov Models

10 0.04594294 30 jmlr-2011-Efficient Structure Learning of Bayesian Networks using Constraints

11 0.045672823 11 jmlr-2011-Approximate Marginals in Latent Gaussian Models

12 0.042612907 8 jmlr-2011-Adaptive Subgradient Methods for Online Learning and Stochastic Optimization

13 0.037023276 90 jmlr-2011-The Indian Buffet Process: An Introduction and Review

14 0.036827926 21 jmlr-2011-Cumulative Distribution Networks and the Derivative-sum-product Algorithm: Models and Inference for Cumulative Distribution Functions on Graphs

15 0.03574441 52 jmlr-2011-Large Margin Hierarchical Classification with Mutually Exclusive Class Membership

16 0.034143046 86 jmlr-2011-Sparse Linear Identifiable Multivariate Modeling

17 0.033572745 53 jmlr-2011-Learning High-Dimensional Markov Forest Distributions: Analysis of Error Rates

18 0.032425467 16 jmlr-2011-Clustering Algorithms for Chains

19 0.032047853 12 jmlr-2011-Bayesian Co-Training

20 0.031013642 24 jmlr-2011-Dirichlet Process Mixtures of Generalized Linear Models

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.177), (1, -0.067), (2, -0.064), (3, 0.016), (4, 0.041), (5, 0.021), (6, 0.046), (7, 0.057), (8, 0.18), (9, 0.16), (10, -0.134), (11, 0.166), (12, 0.002), (13, -0.058), (14, 0.037), (15, -0.198), (16, -0.01), (17, 0.061), (18, -0.006), (19, -0.069), (20, -0.165), (21, 0.038), (22, 0.11), (23, 0.159), (24, 0.02), (25, 0.002), (26, -0.101), (27, 0.08), (28, 0.065), (29, -0.047), (30, 0.189), (31, -0.008), (32, 0.061), (33, -0.057), (34, -0.096), (35, -0.015), (36, 0.197), (37, 0.024), (38, 0.044), (39, 0.147), (40, 0.136), (41, -0.041), (42, 0.013), (43, -0.008), (44, 0.019), (45, -0.047), (46, 0.167), (47, 0.045), (48, 0.148), (49, 0.037)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.97709531 7 jmlr-2011-Adaptive Exact Inference in Graphical Models

Author: Özgür Sümer, Umut A. Acar, Alexander T. Ihler, Ramgopal R. Mettu

2 0.73733968 54 jmlr-2011-Learning Latent Tree Graphical Models

Author: Myung Jin Choi, Vincent Y. F. Tan, Animashree Anandkumar, Alan S. Willsky

3 0.43843946 104 jmlr-2011-X-Armed Bandits

Author: Sébastien Bubeck, Rémi Munos, Gilles Stoltz, Csaba Szepesvári

4 0.43710122 34 jmlr-2011-Faster Algorithms for Max-Product Message-Passing

Author: Julian J. McAuley, TibĂŠrio S. Caetano

5 0.36305323 9 jmlr-2011-An Asymptotic Behaviour of the Marginal Likelihood for General Markov Models

Author: Piotr Zwiernik

Abstract: The standard Bayesian Information Criterion (BIC) is derived under regularity conditions which are not always satisÄ?Ĺš ed in the case of graphical models with hidden variables. In this paper we derive the BIC for the binary graphical tree models where all the inner nodes of a tree represent binary hidden variables. This provides an extension of a similar formula given by Rusakov and Geiger for naive Bayes models. The main tool used in this paper is the connection between the growth behavior of marginal likelihood integrals and the real log-canonical threshold. Keywords: BIC, marginal likelihood, singular models, tree models, Bayesian networks, real logcanonical threshold

6 0.35362086 103 jmlr-2011-Weisfeiler-Lehman Graph Kernels

7 0.27353743 16 jmlr-2011-Clustering Algorithms for Chains

8 0.25040212 52 jmlr-2011-Large Margin Hierarchical Classification with Mutually Exclusive Class Membership

9 0.2480123 21 jmlr-2011-Cumulative Distribution Networks and the Derivative-sum-product Algorithm: Models and Inference for Cumulative Distribution Functions on Graphs

10 0.24196264 76 jmlr-2011-Parameter Screening and Optimisation for ILP using Designed Experiments

11 0.22886522 86 jmlr-2011-Sparse Linear Identifiable Multivariate Modeling

12 0.20531546 96 jmlr-2011-Two Distributed-State Models For Generating High-Dimensional Time Series

13 0.20306115 90 jmlr-2011-The Indian Buffet Process: An Introduction and Review

14 0.20124876 79 jmlr-2011-Proximal Methods for Hierarchical Sparse Coding

15 0.19889264 12 jmlr-2011-Bayesian Co-Training

16 0.19519836 11 jmlr-2011-Approximate Marginals in Latent Gaussian Models

17 0.19273028 82 jmlr-2011-Robust Gaussian Process Regression with a Student-tLikelihood

18 0.18873848 84 jmlr-2011-Semi-Supervised Learning with Measure Propagation

19 0.17986417 17 jmlr-2011-Computationally Efficient Convolved Multiple Output Gaussian Processes

20 0.17694445 33 jmlr-2011-Exploiting Best-Match Equations for Efficient Reinforcement Learning

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(4, 0.08), (9, 0.031), (10, 0.069), (24, 0.046), (31, 0.074), (32, 0.019), (40, 0.374), (41, 0.023), (60, 0.016), (70, 0.022), (71, 0.021), (73, 0.042), (78, 0.061), (90, 0.03)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.74811882 7 jmlr-2011-Adaptive Exact Inference in Graphical Models

Author: Özgür Sümer, Umut A. Acar, Alexander T. Ihler, Ramgopal R. Mettu

2 0.45473269 17 jmlr-2011-Computationally Efficient Convolved Multiple Output Gaussian Processes

Author: Mauricio A. Álvarez, Neil D. Lawrence

Abstract: Recently there has been an increasing interest in regression methods that deal with multiple outputs. This has been motivated partly by frameworks like multitask learning, multisensor networks or structured output data. From a Gaussian processes perspective, the problem reduces to specifying an appropriate covariance function that, whilst being positive semi-deﬁnite, captures the dependencies between all the data points and across all the outputs. One approach to account for non-trivial correlations between outputs employs convolution processes. Under a latent function interpretation of the convolution transform we establish dependencies between output variables. The main drawbacks of this approach are the associated computational and storage demands. In this paper we address these issues. We present different efﬁcient approximations for dependent output Gaussian processes constructed through the convolution formalism. We exploit the conditional independencies present naturally in the model. This leads to a form of the covariance similar in spirit to the so called PITC and FITC approximations for a single output. We show experimental results with synthetic and real data, in particular, we show results in school exams score prediction, pollution prediction and gene expression data. Keywords: Gaussian processes, convolution processes, efﬁcient approximations, multitask learning, structured outputs, multivariate processes

3 0.36327559 59 jmlr-2011-Learning with Structured Sparsity

Author: Junzhou Huang, Tong Zhang, Dimitris Metaxas

Abstract: This paper investigates a learning formulation called structured sparsity, which is a natural extension of the standard sparsity concept in statistical learning and compressive sensing. By allowing arbitrary structures on the feature set, this concept generalizes the group sparsity idea that has become popular in recent years. A general theory is developed for learning with structured sparsity, based on the notion of coding complexity associated with the structure. It is shown that if the coding complexity of the target signal is small, then one can achieve improved performance by using coding complexity regularization methods, which generalize the standard sparse regularization. Moreover, a structured greedy algorithm is proposed to efﬁciently solve the structured sparsity problem. It is shown that the greedy algorithm approximately solves the coding complexity optimization problem under appropriate conditions. Experiments are included to demonstrate the advantage of structured sparsity over standard sparsity on some real applications. Keywords: structured sparsity, standard sparsity, group sparsity, tree sparsity, graph sparsity, sparse learning, feature selection, compressive sensing

4 0.35413161 53 jmlr-2011-Learning High-Dimensional Markov Forest Distributions: Analysis of Error Rates

Author: Vincent Y.F. Tan, Animashree Anandkumar, Alan S. Willsky

Abstract: The problem of learning forest-structured discrete graphical models from i.i.d. samples is considered. An algorithm based on pruning of the Chow-Liu tree through adaptive thresholding is proposed. It is shown that this algorithm is both structurally consistent and risk consistent and the error probability of structure learning decays faster than any polynomial in the number of samples under ﬁxed model size. For the high-dimensional scenario where the size of the model d and the number of edges k scale with the number of samples n, sufﬁcient conditions on (n, d, k) are given for the algorithm to satisfy structural and risk consistencies. In addition, the extremal structures for learning are identiﬁed; we prove that the independent (resp., tree) model is the hardest (resp., easiest) to learn using the proposed algorithm in terms of error rates for structure learning. Keywords: graphical models, forest distributions, structural consistency, risk consistency, method of types

5 0.35159627 91 jmlr-2011-The Sample Complexity of Dictionary Learning

Author: Daniel Vainsencher, Shie Mannor, Alfred M. Bruckstein

Abstract: A large set of signals can sometimes be described sparsely using a dictionary, that is, every element can be represented as a linear combination of few elements from the dictionary. Algorithms for various signal processing applications, including classiﬁcation, denoising and signal separation, learn a dictionary from a given set of signals to be represented. Can we expect that the error in representing by such a dictionary a previously unseen signal from the same source will be of similar magnitude as those for the given examples? We assume signals are generated from a ﬁxed distribution, and study these questions from a statistical learning theory perspective. We develop generalization bounds on the quality of the learned dictionary for two types of constraints on the coefﬁcient selection, as measured by the expected L2 error in representation when the dictionary is used. For the case of l1 regularized coefﬁcient selection we provide a generalnp ln(mλ)/m , where n is the dimension, p is the number of ization bound of the order of O elements in the dictionary, λ is a bound on the l1 norm of the coefﬁcient vector and m is the number of samples, which complements existing results. For the case of representing a new signal as a combination of at most k dictionary elements, we provide a bound of the order O( np ln(mk)/m) under an assumption on the closeness to orthogonality of the dictionary (low Babel function). We further show that this assumption holds for most dictionaries in high dimensions in a strong probabilistic sense. Our results also include bounds that converge as 1/m, not previously known for this problem. We provide similar results in a general setting using kernels with weak smoothness requirements. Keywords: dictionary learning, generalization bound, sparse representation

6 0.35106894 86 jmlr-2011-Sparse Linear Identifiable Multivariate Modeling

7 0.34889039 4 jmlr-2011-A Family of Simple Non-Parametric Kernel Learning Algorithms

8 0.34806135 21 jmlr-2011-Cumulative Distribution Networks and the Derivative-sum-product Algorithm: Models and Inference for Cumulative Distribution Functions on Graphs

9 0.34649369 35 jmlr-2011-Forest Density Estimation

10 0.34586459 77 jmlr-2011-Posterior Sparsity in Unsupervised Dependency Parsing

11 0.34456429 16 jmlr-2011-Clustering Algorithms for Chains

12 0.34394032 67 jmlr-2011-Multitask Sparsity via Maximum Entropy Discrimination

13 0.34390756 12 jmlr-2011-Bayesian Co-Training

14 0.34190184 104 jmlr-2011-X-Armed Bandits

15 0.34147006 64 jmlr-2011-Minimum Description Length Penalization for Group and Multi-Task Sparse Learning

16 0.33877605 29 jmlr-2011-Efficient Learning with Partially Observed Attributes

17 0.33745962 74 jmlr-2011-Operator Norm Convergence of Spectral Clustering on Level Sets

18 0.33682507 30 jmlr-2011-Efficient Structure Learning of Bayesian Networks using Constraints

19 0.33673871 48 jmlr-2011-Kernel Analysis of Deep Networks

20 0.33462617 54 jmlr-2011-Learning Latent Tree Graphical Models