jmlr jmlr2010 jmlr2010-13 knowledge-graph by maker-knowledge-mining

13 jmlr-2010-Approximate Inference on Planar Graphs using Loop Calculus and Belief Propagation

Source: pdf

Author: Vicenç Gómez, Hilbert J. Kappen, Michael Chertkov

Abstract: We introduce novel results for approximate inference on planar graphical models using the loop calculus framework. The loop calculus (Chertkov and Chernyak, 2006a) allows to express the exact partition function of a graphical model as a ﬁnite sum of terms that can be evaluated once the belief propagation (BP) solution is known. In general, full summation over all correction terms is intractable. We develop an algorithm for the approach presented in Chertkov et al. (2008) which represents an efﬁcient truncation scheme on planar graphs and a new representation of the series in terms of Pfafﬁans of matrices. We analyze the performance of the algorithm for models with binary variables and pairwise interactions on grids and other planar graphs. We study in detail both the loop series and the equivalent Pfafﬁan series and show that the ﬁrst term of the Pfafﬁan series for the general, intractable planar model, can provide very accurate approximations. The algorithm outperforms previous truncation schemes of the loop series and is competitive with other state of the art methods for approximate inference. Keywords: belief propagation, loop calculus, approximate inference, partition function, planar graphs, Ising model

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 GOV Theoretical Division and Center for Nonlinear Studies Los Alamos National Laboratory Los Alamos, NM 87545 Editor: Tommi Jaakkola Abstract We introduce novel results for approximate inference on planar graphical models using the loop calculus framework. [sent-9, score-0.652]

2 The loop calculus (Chertkov and Chernyak, 2006a) allows to express the exact partition function of a graphical model as a ﬁnite sum of terms that can be evaluated once the belief propagation (BP) solution is known. [sent-10, score-0.518]

3 (2008) which represents an efﬁcient truncation scheme on planar graphs and a new representation of the series in terms of Pfafﬁans of matrices. [sent-13, score-0.433]

4 We analyze the performance of the algorithm for models with binary variables and pairwise interactions on grids and other planar graphs. [sent-14, score-0.495]

5 We study in detail both the loop series and the equivalent Pfafﬁan series and show that the ﬁrst term of the Pfafﬁan series for the general, intractable planar model, can provide very accurate approximations. [sent-15, score-0.651]

6 The algorithm outperforms previous truncation schemes of the loop series and is competitive with other state of the art methods for approximate inference. [sent-16, score-0.277]

7 Keywords: belief propagation, loop calculus, approximate inference, partition function, planar graphs, Ising model 1. [sent-17, score-0.666]

8 The exact partition function is explicitly related to the BP approximation through the loop calculus framework introduced by Chertkov and Chernyak (2006a). [sent-29, score-0.395]

9 Summation of the entire loop series is a hard combinatorial task since the number of generalized loops is typically exponential in the size of the graph (see also Watanabe and Fukumizu, 2009). [sent-32, score-0.505]

10 Nevertheless, a formal characterization of the classes of problems which are tractable via loop calculus still remains an open question. [sent-38, score-0.303]

11 A step toward this goal with a focus on the class of planar graphs has been done in Chertkov et al. [sent-39, score-0.374]

12 A graph is said to be planar if it can be embedded into a plane without crossing edges. [sent-41, score-0.382]

13 Furthermore, the full loop series can be expressed as a sum over certain Pfafﬁans (or determinants), where each Pfafﬁan may account for a large number of loops and is solvable in polynomial time as well. [sent-45, score-0.436]

14 (2008) builds upon classical results from Fisher (1961), Kasteleyn (1961) and Temperley and Fisher (1961) who addressed the question of counting the number of perfect matchings on a planar grid, also known as the dimer covering problem in statistical physics. [sent-47, score-0.499]

15 Such a model is known in statistical physics as planar Ising model without external ﬁeld and in the machine learning community as planar, binary MRF with pure interaction potentials. [sent-49, score-0.379]

16 Notice that exact inference for a general binary graphical model on a planar graph, namely Ising model with external ﬁeld, is intractable, in particular #P (Barahona, 1982). [sent-50, score-0.437]

17 A gentle overview of counting perfect matchings in planar graphs can be found in Jerrum (2003). [sent-51, score-0.519]

18 Globerson and Jaakkola (2007) obtained upper bounds on the partition function for non-planar graphs with binary variables by decomposition of the partition function into a weighted sum over partition functions of spanning tractable (zero ﬁeld) planar models. [sent-53, score-0.653]

19 The resulting problem is a convex optimization problem and, since exact inference can be done in each planar sub-model via the FTK method, the bound can be calculated in polynomial time. [sent-54, score-0.365]

20 1274 A PPROXIMATE I NFERENCE ON P LANAR G RAPHS USING L OOP C ALCULUS AND BP Another example is the work of Schraudolph and Kamenetsky (2008) which provides a framework for exact inference on a restricted class of planar graphs using directly the FTK approach. [sent-55, score-0.426]

21 Since a planar Ising model can be transformed to a model without external ﬁeld at the cost of introducing additional edges, one can allow local ﬁelds for a subset B of variables. [sent-56, score-0.39]

22 If the graphical model is B −outerplanar, which means that there exists a planar embedding in which the subset B of the nodes lie on the same face, the FTK technique can still be applied. [sent-57, score-0.39]

23 Contrary to these two approaches, which rely on exact inference on a tractable planar model, the loop calculus directly leads to a framework for approximate inference on general planar graphs. [sent-58, score-1.01]

24 Truncating the loop series according to Chertkov et al. [sent-59, score-0.258]

25 In the general planar case, however, this truncation may result into an accurate approximation that can be incrementally corrected by considering subsequent terms in the series. [sent-61, score-0.332]

26 In the next Section we review the main theoretical results of the loop calculus approach for planar graphs and introduce the proposed algorithm for approximating the partition function and single-variable marginals. [sent-62, score-0.746]

27 In Section 3 we provide experimental results for regular grids and other types of planar graphs. [sent-63, score-0.389]

28 A binary Forney graph G := (V , E ) consists of a set of nodes V where each node a ∈ V represents an interaction and each edge (a, b) ∈ E represents a binary variable ab which take values σ σab := {±1}. [sent-71, score-0.436]

29 If the graph contains loops, Z BP is just an approximation critically dependent on how strong the inﬂuence of the loops is. [sent-88, score-0.266]

30 We introduce now some convenient deﬁnitions related to the loop calculus framework. [sent-89, score-0.285]

31 Deﬁnition 1 A generalized loop in a graph G := V , E is any subgraph C := V ′ , E ′ , V ′ ⊆ V , E ′ ⊆ (V ′ ×V ′ ) ∩ E such that each node in V ′ has degree two or larger. [sent-90, score-0.375]

32 Loop calculus allows to represent Z explicitly in terms of the BP approximation via the loop series expansion: Z = Z BP · z, z= 1+ ∑ rC rC = ∏ µa;aC , ¯ , C∈C (3) a∈C where C is the set of all the loops within the graph and aC the set of neighbors of node a within the ¯ loop C. [sent-92, score-0.846]

33 Each term rC is a product of terms µa;aC associated with every node a of the loop C: ¯ µa;aC = ¯ Eτa ∏b∈aC (σab − mab ) ¯ , ∏b∈aC Varτab (σab ) ¯ mab = Eτab [σab ] , (4) where we have used Eτ [·] to denote expectation with respect to the pseudo-marginal distribution τ. [sent-93, score-0.356]

34 , 1 − m2 ab (5) In this work we consider planar graphs where nodes have degree at most three, that is |aC | ≤ 3. [sent-96, score-0.692]

35 2 we show how a graphical model stated in a bipartite factor graph representation can be converted to a Forney graph which satisﬁes this constraint at the cost of introducing auxiliary nodes. [sent-99, score-0.255]

36 Deﬁnition 2 A 2-regular loop is a loop in which all nodes have degree exactly two. [sent-100, score-0.52]

37 Deﬁnition 3 The 2-regular partition function Z0 is the truncated form of (3) which sums all 2/ regular loops only:2 Z0 = Z BP · z0 , / / z0 = 1 + / ∑ (6) rC . [sent-101, score-0.283]

38 C∈C , |aC |=2,∀a∈C ¯ As an example, Figure 1a shows a small Forney graph and Figure 1c shows seven loops found in the corresponding 2-regular partition function. [sent-102, score-0.334]

39 We use the term 2-regular partition function instead because loops with more than one connected component are also included in this part of the series. [sent-106, score-0.265]

40 (2008) it has been shown that computation of Z0 in a Forney graph G can be / mapped to the computation of the sum of all weighted perfect matchings within another extended weighted graph Gext := (VGext , EGext ). [sent-139, score-0.283]

41 A perfect matching is deﬁned as a subset of edges such that each node neighbors exactly one edge from the subset and its weight is the product of weights of edges in it. [sent-140, score-0.211]

42 The key idea of this mapping is that 2-regular loops in G are in one-to-one correspondence to perfect matchings in Gext (see Figures 1b and c for an illustration). [sent-141, score-0.323]

43 To see that each 2-regular loop in G is associated with a perfect matching in Gext consider, for instance, the vertex of degree three in the bottom of Figure 2. [sent-149, score-0.337]

44 Given a 2-regular loop C, vertex a can appear in four different conﬁgurations: either node a does not appear in C, or C contains one of the following three paths: -b-a-c-, -b-a-d- or -c-a-d-. [sent-150, score-0.297]

45 These four cases correspond to node terms in a loop with values 1, µa;{b,c} , µa;{b,d} and µa;{c,d} respectively, and coincide with 1278 A PPROXIMATE I NFERENCE ON P LANAR G RAPHS USING L OOP C ALCULUS AND BP the matchings in Gext shown within the box on the bottom-right. [sent-151, score-0.355]

46 / Kasteleyn (1963) provided a method to compute this sum in polynomial time for planar graphs. [sent-155, score-0.313]

47 For the tractable case, namely, Ising model without external ﬁeld or pairwise MRF with pure interaction potentials, the 2-regular partition function coincides with the exact partition function Z = 1279 ´ G OMEZ , K APPEN AND C HERTKOV Z0 = Z BP · z0 since other terms in the loop series vanish. [sent-175, score-0.552]

48 We / brieﬂy reproduce their results here and provide an algorithm for computing the full loop series as a Pfafﬁan series. [sent-186, score-0.258]

49 For each possible subset Ψ ∈ T , including an even number of triplets, there exists a unique correspondence between loops in G including the triplets in Ψ and perfect matchings in another extended graph Gext Ψ constructed after removal of the triplets Ψ in G . [sent-188, score-0.53]

50 The correction ˆ ˆ sign(Pfafﬁan(BΨ )) is necessary because loop terms can be negative and even evaluating Pfafﬁan(AΨ ) with the correct sign would only give the contribution up to a sign. [sent-191, score-0.298]

51 However, the key advantage of the Pfafﬁan representation is that Z0 is always the term / that accounts for the largest number of loop terms in the series. [sent-206, score-0.238]

52 The main loop of the algorithm can be interrupted at any time, thus leading to a sequence of algorithms producing corrections incrementally. [sent-211, score-0.243]

53 Generally, for Θ = 0 the planar problem is tractable (zero ﬁeld). [sent-218, score-0.331]

54 We omit the loop index in the triplet term µa;a because nodes have at most degree three and therefore the set a always ¯ ¯ coincide in all loops which contain that triplet. [sent-220, score-0.503]

55 In the next Subsection we analyze the full Pfafﬁan series using a small example and compare it with the original representation based on the loop series. [sent-230, score-0.258]

56 , 2007), which accounts for a certain numo ber of loops by performing depth-ﬁrst-search on the factor graph and then merging the found loops iteratively. [sent-232, score-0.478]

57 We adapted TSLBP as an any-time algorithm (anyTLSBP) in a way that the length of the loop is used as the only parameter instead of the two parameters S and M (see G´ mez et al. [sent-233, score-0.27]

58 We use as outer clusters all (maximal) factors together with loops of four (k=4) or six (k=6) variables in the factor graph. [sent-241, score-0.211]

59 Planar graph decomposition (PDC) (Globerson and Jaakkola, 2007) which decomposes the parameterization of a probabilistic graphical model as a mixture of tractable planar graphs (with zero local ﬁeld). [sent-247, score-0.516]

60 A complex loop is deﬁned as a loop which can not be expressed as the union of two or more circuits or simple loops. [sent-251, score-0.436]

61 (2007), errors in Z, obtained from a truncated form o of the loop series, are very similar to errors in single variable marginal probabilities, which can be obtained by conditioning over the variables under interest. [sent-254, score-0.268]

62 1 Full Pfafﬁan Series In the previous Section we have described two equivalent representations for Z in terms of the loop series and the Pfafﬁan series. [sent-263, score-0.258]

63 Figure 3: Planar bipartite factor graph used to compare the full Pfafﬁan series with the loop series. [sent-270, score-0.419]

64 5 0 10 Ψ Z error Z −5 10 0 −10 10 0 10 2 10 l (loop terms) 4 0 10 10 2 10 p (pfaffian terms) 4 10 −5 0 10 1 10 p (pfaffian terms) 2 10 Figure 4: Comparison between the full loop series and the full Pfafﬁan series. [sent-275, score-0.258]

65 Left column shows the error, considering loop terms Z T LSBP (l) in log-log scale. [sent-277, score-0.218]

66 Shaded regions include all loop terms (not necessarily 2-regular loops) required to achieve the same (or better) accuracy than the accuracy of the 2-regular partition function Z0 . [sent-278, score-0.305]

67 l Then Z T LSBP (l) accounts for the l most contributing loops and Z P f (p) accounts for the p most contributing Pfafﬁan terms. [sent-291, score-0.256]

68 Summation of less than 50 loop terms (top-left panel) is enough to obtain machine precision accuracy. [sent-296, score-0.218]

69 The number of loop corrections required to correct the BP error then increases. [sent-300, score-0.243]

70 5) the ﬁrst Pfafﬁan term z0 improves considerably, more than ﬁve orders of magnitude, on / the BP error, whereas approximately 100 loop terms are required to achieve a similar correction (gray region of middle-left panel). [sent-302, score-0.28]

71 In this scenario also a larger proportion of loop terms (bottom-left panel) is necessary to correct the BP result up to machine precision. [sent-305, score-0.218]

72 Looking at the bottom-left panel we ﬁnd that approximately 200 loop terms are required to achieve the same correction as the one obtained by Z0 , which / is quite accurate (bottom-middle panel). [sent-306, score-0.303]

73 These oscillations are also present in the loop series expansion. [sent-308, score-0.258]

74 Notice again that calculating Z0 , the most contributing term in the Pfafﬁan series, does not require explicit search / for loop nor Pfafﬁan terms. [sent-310, score-0.237]

75 (2007) showed that, under some additional condition, the BP approximation gives a lower-bound for the exact partition function and all loops (and therefore Pfafﬁan terms too) have the same sign. [sent-318, score-0.288]

76 We generate grids with positive interactions and local ﬁelds, that is |φa | ∼ N (0, β/2) and |θab | ∼ N (0, βΘ), and study performance of the algorithms for various values of β, as well as for strong Θ = 1 and weak Θ = 0. [sent-320, score-0.233]

77 In that case, Z BP and Z0 are no longer lower bounds on Z and loop terms can be positive or / negative. [sent-354, score-0.218]

78 (2007) when loop terms, instead of Pfafﬁan o terms, where considered. [sent-367, score-0.218]

79 We compare Z0 also with anyTLSBP, a variant of our previous algorithm for / truncating the loop series. [sent-377, score-0.218]

80 Interestingly, although Z0 and anyTLSBP use different ways to truncate the loop / series, both methods show similar scaling behaviour for large graphs. [sent-398, score-0.218]

81 Here, we carry over the analysis of the Z0 correction using planar graphs which / consist of concentric polygons with a variable number of sides. [sent-413, score-0.436]

82 We generate them as factor graphs with pairwise interactions which we subsequently convert to an equivalent Forney graph using the procedure described in Appendix A. [sent-415, score-0.287]

83 Discussion We have presented an approximate algorithm based on the exact loop calculus framework for inference on planar graphical models deﬁned in terms of binary variables. [sent-443, score-0.675]

84 The algorithm is illustrated on the example of ordered and disordered Ising model on a planar graph. [sent-445, score-0.313]

85 We tested our algorithm on regular grids and planar graphs with different structures. [sent-448, score-0.45]

86 We have also shown that for the case of grids with attractive symmetric interactions and positive local ﬁelds, the lower bound provided by Z0 is the most accurate. [sent-454, score-0.205]

87 Finally, we have presented a comparison of Z0 with TLSBP, which is another algorithm based / on the loop series expansion for BP that uses the loop length as truncation parameter. [sent-458, score-0.495]

88 On one hand, the calculation of Z0 involves a re-summation of many loop terms which in the case of TLSBP / are summed individually. [sent-459, score-0.218]

89 On the other hand, Z0 is / / restricted to the class of 2-regular loops whereas TLSBP also accounts for terms corresponding to more complex loop structures in which nodes can have degree larger than two. [sent-461, score-0.5]

90 Overall, for planar graphs, we have shown evidence that the Z0 approach is better than TLSBP when the size of the / graphs is not very small. [sent-462, score-0.374]

91 Although planarity is a severe restriction, we emphasize that planar graphs appear in many contexts such as computer vision and image processing, magnetic and optical recording, or network routing and logistics. [sent-464, score-0.374]

92 We have focused on inference problems deﬁned on planar graphs with symmetric pairwise interactions and, to make the problems difﬁcult, we have introduced local ﬁeld potentials. [sent-465, score-0.557]

93 Summarizing, among the compared methods and models, the introduced approach based on Z0 / represents the best estimate to the partition function as long as the given graph does not deviate too much from perfect planarity, that is, we are in the small local ﬁeld regime. [sent-469, score-0.25]

94 Bipartite Factor Graph Representation The Forney graph representation is convenient because each loop decomposes naturally in terms associated with nodes which have the same analytical form of Equation (4). [sent-488, score-0.339]

95 In this Appendix we ﬁrst show how the presented approach differs when it is directly applied to a bipartite factor graph and second, how to convert a bipartite factor graph to a Forney graph. [sent-490, score-0.322]

96 On a bipartite factor graph G f g := (V f g , E f g ), the set V f g consists of variable nodes i ∈ I and factor nodes a ∈ A , with an edge between i and a iff i ∈ a, that is i appears in the factor function ψa . [sent-492, score-0.356]

97 Clearly, the one-to-one correspondence from 2-regular loops in G to perfect matchings in Gext is independent of whether G is expressed as a bipartite factor graph or a Forney graph. [sent-502, score-0.484]

98 We show here how to convert a bipartite factor graph deﬁned in terms of binary variables to a more general Forney graph representation, for which the presented algorithm can be directly applied to. [sent-507, score-0.23]

99 Given G f g := (V f g , E f g ), a direct way to obtain an equivalent Forney graph G := (V , E ) is: ﬁrst, to create a node δi ∈ V for each variable node i ∈ V f g , and second, to associate a new binary variable δi a with values σδi a = {±1} to edges (δi , a) ∈ E . [sent-516, score-0.214]

100 Truncating the loop series expansion for belief propagao tion. [sent-604, score-0.306]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('bp', 0.456), ('pfaf', 0.423), ('planar', 0.313), ('gext', 0.285), ('ab', 0.234), ('loop', 0.218), ('loops', 0.178), ('chertkov', 0.155), ('forney', 0.146), ('appen', 0.098), ('hertkov', 0.098), ('omez', 0.098), ('interactions', 0.092), ('alculus', 0.09), ('anytlsbp', 0.09), ('lanar', 0.09), ('oop', 0.09), ('treeep', 0.09), ('partition', 0.087), ('matchings', 0.081), ('tlsbp', 0.073), ('graph', 0.069), ('raphs', 0.069), ('triplets', 0.069), ('calculus', 0.067), ('perfect', 0.064), ('pproximate', 0.064), ('correction', 0.062), ('graphs', 0.061), ('ising', 0.061), ('mi', 0.059), ('bipartite', 0.059), ('grids', 0.058), ('pdc', 0.057), ('node', 0.056), ('nference', 0.056), ('fa', 0.056), ('elds', 0.055), ('mez', 0.052), ('nodes', 0.052), ('xa', 0.051), ('cvm', 0.05), ('propagation', 0.05), ('chernyak', 0.049), ('pab', 0.049), ('belief', 0.048), ('external', 0.047), ('marginals', 0.045), ('dimer', 0.041), ('egext', 0.041), ('ftk', 0.041), ('mab', 0.041), ('series', 0.04), ('trw', 0.038), ('ac', 0.035), ('couplings', 0.035), ('mooij', 0.035), ('weak', 0.034), ('factor', 0.033), ('edges', 0.033), ('ans', 0.033), ('lsbp', 0.033), ('degree', 0.032), ('pairwise', 0.032), ('globerson', 0.03), ('local', 0.03), ('inference', 0.029), ('di', 0.028), ('eld', 0.027), ('bethe', 0.027), ('yedidia', 0.027), ('errors', 0.025), ('edge', 0.025), ('kappen', 0.025), ('corrections', 0.025), ('junction', 0.025), ('attractive', 0.025), ('graphical', 0.025), ('kasteleyn', 0.024), ('lanl', 0.024), ('sudderth', 0.024), ('vgext', 0.024), ('exact', 0.023), ('panel', 0.023), ('triplet', 0.023), ('vertex', 0.023), ('mechanics', 0.022), ('cpu', 0.021), ('accounts', 0.02), ('grid', 0.02), ('schraudolph', 0.019), ('truncation', 0.019), ('jaakkola', 0.019), ('strong', 0.019), ('physics', 0.019), ('contributing', 0.019), ('rc', 0.018), ('sign', 0.018), ('tractable', 0.018), ('regular', 0.018)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9999994 13 jmlr-2010-Approximate Inference on Planar Graphs using Loop Calculus and Belief Propagation

Author: Vicenç Gómez, Hilbert J. Kappen, Michael Chertkov

2 0.1686836 118 jmlr-2010-libDAI: A Free and Open Source C++ Library for Discrete Approximate Inference in Graphical Models

Author: Joris M. Mooij

Abstract: This paper describes the software package libDAI, a free & open source C++ library that provides implementations of various exact and approximate inference methods for graphical models with discrete-valued variables. libDAI supports directed graphical models (Bayesian networks) as well as undirected ones (Markov random ﬁelds and factor graphs). It offers various approximations of the partition sum, marginal probability distributions and maximum probability states. Parameter learning is also supported. A feature comparison with other open source software packages for approximate inference is given. libDAI is licensed under the GPL v2+ license and is available at http://www.libdai.org. Keywords: probabilistic graphical models, approximate inference, open source software, factor graphs, Markov random ﬁelds, Bayesian networks

3 0.065536067 39 jmlr-2010-FastInf: An Efficient Approximate Inference Library

Author: Ariel Jaimovich, Ofer Meshi, Ian McGraw, Gal Elidan

Abstract: The FastInf C++ library is designed to perform memory and time efﬁcient approximate inference in large-scale discrete undirected graphical models. The focus of the library is propagation based approximate inference methods, ranging from the basic loopy belief propagation algorithm to propagation based on convex free energies. Various message scheduling schemes that improve on the standard synchronous or asynchronous approaches are included. Also implemented are a clique tree based exact inference, Gibbs sampling, and the mean ﬁeld algorithm. In addition to inference, FastInf provides parameter estimation capabilities as well as representation and learning of shared parameters. It offers a rich interface that facilitates extension of the basic classes to other inference and learning methods. Keywords: graphical models, Markov random ﬁeld, loopy belief propagation, approximate inference

4 0.057908069 58 jmlr-2010-Kronecker Graphs: An Approach to Modeling Networks

Author: Jure Leskovec, Deepayan Chakrabarti, Jon Kleinberg, Christos Faloutsos, Zoubin Ghahramani

Abstract: How can we generate realistic networks? In addition, how can we do so with a mathematically tractable model that allows for rigorous analysis of network properties? Real networks exhibit a long list of surprising properties: Heavy tails for the in- and out-degree distribution, heavy tails for the eigenvalues and eigenvectors, small diameters, and densiﬁcation and shrinking diameters over time. Current network models and generators either fail to match several of the above properties, are complicated to analyze mathematically, or both. Here we propose a generative model for networks that is both mathematically tractable and can generate networks that have all the above mentioned structural properties. Our main idea here is to use a non-standard matrix operation, the Kronecker product, to generate graphs which we refer to as “Kronecker graphs”. First, we show that Kronecker graphs naturally obey common network properties. In fact, we rigorously prove that they do so. We also provide empirical evidence showing that Kronecker graphs can effectively model the structure of real networks. We then present K RON F IT, a fast and scalable algorithm for ﬁtting the Kronecker graph generation model to large real networks. A naive approach to ﬁtting would take super-exponential time. In contrast, K RON F IT takes linear time, by exploiting the structure of Kronecker matrix multiplication and by using statistical simulation techniques. Experiments on a wide range of large real and synthetic networks show that K RON F IT ﬁnds accurate parameters that very well mimic the properties of target networks. In fact, using just c 2010 Jure Leskovec, Deepayan Chakrabarti, Jon Kleinberg, Christos Faloutsos and Zoubin Ghahramani. L ESKOVEC , C HAKRABARTI , K LEINBERG , FALOUTSOS AND G HAHRAMANI four parameters we can accurately model several aspects of global network structure. Once ﬁtted, the model parameters can be used to gain insights about the network structure, and the resulting synt

5 0.050461952 44 jmlr-2010-Graph Kernels

Author: S.V.N. Vishwanathan, Nicol N. Schraudolph, Risi Kondor, Karsten M. Borgwardt

Abstract: We present a uniﬁed framework to study graph kernels, special cases of which include the random a walk (G¨ rtner et al., 2003; Borgwardt et al., 2005) and marginalized (Kashima et al., 2003, 2004; Mah´ et al., 2004) graph kernels. Through reduction to a Sylvester equation we improve the time e complexity of kernel computation between unlabeled graphs with n vertices from O(n6 ) to O(n3 ). We ﬁnd a spectral decomposition approach even more efﬁcient when computing entire kernel matrices. For labeled graphs we develop conjugate gradient and ﬁxed-point methods that take O(dn3 ) time per iteration, where d is the size of the label set. By extending the necessary linear algebra to Reproducing Kernel Hilbert Spaces (RKHS) we obtain the same result for d-dimensional edge kernels, and O(n4 ) in the inﬁnite-dimensional case; on sparse graphs these algorithms only take O(n2 ) time per iteration in all cases. Experiments on graphs from bioinformatics and other application domains show that these techniques can speed up computation of the kernel by an order of magnitude or more. We also show that certain rational kernels (Cortes et al., 2002, 2003, 2004) when specialized to graphs reduce to our random walk graph kernel. Finally, we relate our framework to R-convolution kernels (Haussler, 1999) and provide a kernel that is close to the optimal assignment o kernel of Fr¨ hlich et al. (2006) yet provably positive semi-deﬁnite. Keywords: linear algebra in RKHS, Sylvester equations, spectral decomposition, bioinformatics, rational kernels, transducers, semirings, random walks

6 0.03978223 24 jmlr-2010-Collective Inference for Extraction MRFs Coupled with Symmetric Clique Potentials

7 0.034814339 75 jmlr-2010-Mean Field Variational Approximation for Continuous-Time Bayesian Networks

8 0.031911481 32 jmlr-2010-Efficient Algorithms for Conditional Independence Inference

9 0.030609172 15 jmlr-2010-Approximate Tree Kernels

10 0.030493885 89 jmlr-2010-PAC-Bayesian Analysis of Co-clustering and Beyond

11 0.028836748 69 jmlr-2010-Lp-Nested Symmetric Distributions

12 0.028601207 26 jmlr-2010-Consensus-Based Distributed Support Vector Machines

13 0.028008088 115 jmlr-2010-Using Contextual Representations to Efficiently Learn Context-Free Languages

14 0.02698227 63 jmlr-2010-Learning Instance-Specific Predictive Models

15 0.026752187 76 jmlr-2010-Message-passing for Graph-structured Linear Programs: Proximal Methods and Rounding Schemes

16 0.025865726 52 jmlr-2010-Incremental Sigmoid Belief Networks for Grammar Learning

17 0.025645256 7 jmlr-2010-A Streaming Parallel Decision Tree Algorithm

18 0.025465565 64 jmlr-2010-Learning Non-Stationary Dynamic Bayesian Networks

19 0.024645763 109 jmlr-2010-Stochastic Composite Likelihood

20 0.024109278 51 jmlr-2010-Importance Sampling for Continuous Time Bayesian Networks

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.111), (1, 0.042), (2, -0.081), (3, -0.071), (4, -0.005), (5, 0.016), (6, -0.048), (7, -0.11), (8, -0.156), (9, 0.322), (10, -0.078), (11, -0.014), (12, 0.061), (13, 0.103), (14, 0.062), (15, 0.02), (16, -0.016), (17, -0.047), (18, -0.188), (19, 0.091), (20, 0.119), (21, -0.082), (22, -0.074), (23, 0.144), (24, 0.034), (25, -0.023), (26, 0.029), (27, -0.058), (28, -0.063), (29, 0.064), (30, -0.093), (31, -0.036), (32, 0.024), (33, 0.07), (34, 0.041), (35, 0.089), (36, 0.017), (37, -0.074), (38, 0.14), (39, -0.04), (40, 0.011), (41, 0.029), (42, -0.092), (43, 0.063), (44, 0.057), (45, -0.061), (46, 0.025), (47, -0.167), (48, 0.063), (49, 0.042)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.95220417 13 jmlr-2010-Approximate Inference on Planar Graphs using Loop Calculus and Belief Propagation

Author: Vicenç Gómez, Hilbert J. Kappen, Michael Chertkov

2 0.78555745 118 jmlr-2010-libDAI: A Free and Open Source C++ Library for Discrete Approximate Inference in Graphical Models

Author: Joris M. Mooij

3 0.60772085 39 jmlr-2010-FastInf: An Efficient Approximate Inference Library

Author: Ariel Jaimovich, Ofer Meshi, Ian McGraw, Gal Elidan

4 0.38617525 58 jmlr-2010-Kronecker Graphs: An Approach to Modeling Networks

Author: Jure Leskovec, Deepayan Chakrabarti, Jon Kleinberg, Christos Faloutsos, Zoubin Ghahramani

5 0.2778933 44 jmlr-2010-Graph Kernels

Author: S.V.N. Vishwanathan, Nicol N. Schraudolph, Risi Kondor, Karsten M. Borgwardt

6 0.20656753 26 jmlr-2010-Consensus-Based Distributed Support Vector Machines

7 0.184156 24 jmlr-2010-Collective Inference for Extraction MRFs Coupled with Symmetric Clique Potentials

8 0.17875101 109 jmlr-2010-Stochastic Composite Likelihood

9 0.17069434 69 jmlr-2010-Lp-Nested Symmetric Distributions

10 0.16388322 38 jmlr-2010-Expectation Truncation and the Benefits of Preselection In Training Generative Models

11 0.16038147 17 jmlr-2010-Bayesian Learning in Sparse Graphical Factor Models via Variational Mean-Field Annealing

12 0.15807872 52 jmlr-2010-Incremental Sigmoid Belief Networks for Grammar Learning

13 0.1566602 115 jmlr-2010-Using Contextual Representations to Efficiently Learn Context-Free Languages

14 0.14927433 77 jmlr-2010-Model-based Boosting 2.0

15 0.13558304 32 jmlr-2010-Efficient Algorithms for Conditional Independence Inference

16 0.13272589 64 jmlr-2010-Learning Non-Stationary Dynamic Bayesian Networks

17 0.13192801 89 jmlr-2010-PAC-Bayesian Analysis of Co-clustering and Beyond

18 0.12580918 63 jmlr-2010-Learning Instance-Specific Predictive Models

19 0.12388044 75 jmlr-2010-Mean Field Variational Approximation for Continuous-Time Bayesian Networks

20 0.12140659 53 jmlr-2010-Inducing Tree-Substitution Grammars

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(3, 0.013), (4, 0.027), (8, 0.013), (21, 0.018), (24, 0.014), (32, 0.055), (36, 0.049), (37, 0.034), (56, 0.018), (66, 0.441), (75, 0.114), (81, 0.014), (85, 0.051), (96, 0.045)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.65909457 13 jmlr-2010-Approximate Inference on Planar Graphs using Loop Calculus and Belief Propagation

Author: Vicenç Gómez, Hilbert J. Kappen, Michael Chertkov

2 0.46834984 103 jmlr-2010-Sparse Semi-supervised Learning Using Conjugate Functions

Author: Shiliang Sun, John Shawe-Taylor

Abstract: In this paper, we propose a general framework for sparse semi-supervised learning, which concerns using a small portion of unlabeled data and a few labeled data to represent target functions and thus has the merit of accelerating function evaluations when predicting the output of a new example. This framework makes use of Fenchel-Legendre conjugates to rewrite a convex insensitive loss involving a regularization with unlabeled data, and is applicable to a family of semi-supervised learning methods such as multi-view co-regularized least squares and single-view Laplacian support vector machines (SVMs). As an instantiation of this framework, we propose sparse multi-view SVMs which use a squared ε-insensitive loss. The resultant optimization is an inf-sup problem and the optimal solutions have arguably saddle-point properties. We present a globally optimal iterative algorithm to optimize the problem. We give the margin bound on the generalization error of the sparse multi-view SVMs, and derive the empirical Rademacher complexity for the induced function class. Experiments on artiﬁcial and real-world data show their effectiveness. We further give a sequential training approach to show their possibility and potential for uses in large-scale problems and provide encouraging experimental results indicating the efﬁcacy of the margin bound and empirical Rademacher complexity on characterizing the roles of unlabeled data for semi-supervised learning. Keywords: semi-supervised learning, Fenchel-Legendre conjugate, representer theorem, multiview regularization, support vector machine, statistical learning theory

3 0.32205322 89 jmlr-2010-PAC-Bayesian Analysis of Co-clustering and Beyond

Author: Yevgeny Seldin, Naftali Tishby

Abstract: We derive PAC-Bayesian generalization bounds for supervised and unsupervised learning models based on clustering, such as co-clustering, matrix tri-factorization, graphical models, graph clustering, and pairwise clustering.1 We begin with the analysis of co-clustering, which is a widely used approach to the analysis of data matrices. We distinguish among two tasks in matrix data analysis: discriminative prediction of the missing entries in data matrices and estimation of the joint probability distribution of row and column variables in co-occurrence matrices. We derive PAC-Bayesian generalization bounds for the expected out-of-sample performance of co-clustering-based solutions for these two tasks. The analysis yields regularization terms that were absent in the previous formulations of co-clustering. The bounds suggest that the expected performance of co-clustering is governed by a trade-off between its empirical performance and the mutual information preserved by the cluster variables on row and column IDs. We derive an iterative projection algorithm for ﬁnding a local optimum of this trade-off for discriminative prediction tasks. This algorithm achieved stateof-the-art performance in the MovieLens collaborative ﬁltering task. Our co-clustering model can also be seen as matrix tri-factorization and the results provide generalization bounds, regularization terms, and new algorithms for this form of matrix factorization. The analysis of co-clustering is extended to tree-shaped graphical models, which can be used to analyze high dimensional tensors. According to the bounds, the generalization abilities of treeshaped graphical models depend on a trade-off between their empirical data ﬁt and the mutual information that is propagated up the tree levels. We also formulate weighted graph clustering as a prediction problem: given a subset of edge weights we analyze the ability of graph clustering to predict the remaining edge weights. The analysis of co-clustering easily

4 0.31982011 91 jmlr-2010-Posterior Regularization for Structured Latent Variable Models

Author: Kuzman Ganchev, João Graça, Jennifer Gillenwater, Ben Taskar

Abstract: We present posterior regularization, a probabilistic framework for structured, weakly supervised learning. Our framework efﬁciently incorporates indirect supervision via constraints on posterior distributions of probabilistic models with latent variables. Posterior regularization separates model complexity from the complexity of structural constraints it is desired to satisfy. By directly imposing decomposable regularization on the posterior moments of latent variables during learning, we retain the computational efﬁciency of the unconstrained model while ensuring desired constraints hold in expectation. We present an efﬁcient algorithm for learning with posterior regularization and illustrate its versatility on a diverse set of structural constraints such as bijectivity, symmetry and group sparsity in several large scale experiments, including multi-view learning, cross-lingual dependency grammar induction, unsupervised part-of-speech induction, and bitext word alignment.1 Keywords: posterior regularization framework, unsupervised learning, latent variables models, prior knowledge, natural language processing

5 0.3117798 114 jmlr-2010-Unsupervised Supervised Learning I: Estimating Classification and Regression Errors without Labels

Author: Pinar Donmez, Guy Lebanon, Krishnakumar Balasubramanian

Abstract: Estimating the error rates of classiﬁers or regression models is a fundamental task in machine learning which has thus far been studied exclusively using supervised learning techniques. We propose a novel unsupervised framework for estimating these error rates using only unlabeled data and mild assumptions. We prove consistency results for the framework and demonstrate its practical applicability on both synthetic and real world data. Keywords: classiﬁcation and regression, maximum likelihood, latent variable models

6 0.31143004 74 jmlr-2010-Maximum Relative Margin and Data-Dependent Regularization

7 0.311369 46 jmlr-2010-High Dimensional Inverse Covariance Matrix Estimation via Linear Programming

8 0.30912328 63 jmlr-2010-Learning Instance-Specific Predictive Models

9 0.30894879 111 jmlr-2010-Topology Selection in Graphical Models of Autoregressive Processes

10 0.30848894 59 jmlr-2010-Large Scale Online Learning of Image Similarity Through Ranking

11 0.30805352 17 jmlr-2010-Bayesian Learning in Sparse Graphical Factor Models via Variational Mean-Field Annealing

12 0.30657673 109 jmlr-2010-Stochastic Composite Likelihood

13 0.30641234 92 jmlr-2010-Practical Approaches to Principal Component Analysis in the Presence of Missing Values

14 0.30619967 49 jmlr-2010-Hubs in Space: Popular Nearest Neighbors in High-Dimensional Data

15 0.30369565 69 jmlr-2010-Lp-Nested Symmetric Distributions

16 0.30278856 102 jmlr-2010-Semi-Supervised Novelty Detection

17 0.30272654 51 jmlr-2010-Importance Sampling for Continuous Time Bayesian Networks

18 0.30267289 107 jmlr-2010-Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion

19 0.30261132 87 jmlr-2010-Online Learning for Matrix Factorization and Sparse Coding

20 0.30113187 43 jmlr-2010-Generalized Power Method for Sparse Principal Component Analysis