jmlr jmlr2010 jmlr2010-13 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Vicenç Gómez, Hilbert J. Kappen, Michael Chertkov
Abstract: We introduce novel results for approximate inference on planar graphical models using the loop calculus framework. The loop calculus (Chertkov and Chernyak, 2006a) allows to express the exact partition function of a graphical model as a finite sum of terms that can be evaluated once the belief propagation (BP) solution is known. In general, full summation over all correction terms is intractable. We develop an algorithm for the approach presented in Chertkov et al. (2008) which represents an efficient truncation scheme on planar graphs and a new representation of the series in terms of Pfaffians of matrices. We analyze the performance of the algorithm for models with binary variables and pairwise interactions on grids and other planar graphs. We study in detail both the loop series and the equivalent Pfaffian series and show that the first term of the Pfaffian series for the general, intractable planar model, can provide very accurate approximations. The algorithm outperforms previous truncation schemes of the loop series and is competitive with other state of the art methods for approximate inference. Keywords: belief propagation, loop calculus, approximate inference, partition function, planar graphs, Ising model
Reference: text
sentIndex sentText sentNum sentScore
1 GOV Theoretical Division and Center for Nonlinear Studies Los Alamos National Laboratory Los Alamos, NM 87545 Editor: Tommi Jaakkola Abstract We introduce novel results for approximate inference on planar graphical models using the loop calculus framework. [sent-9, score-0.652]
2 The loop calculus (Chertkov and Chernyak, 2006a) allows to express the exact partition function of a graphical model as a finite sum of terms that can be evaluated once the belief propagation (BP) solution is known. [sent-10, score-0.518]
3 (2008) which represents an efficient truncation scheme on planar graphs and a new representation of the series in terms of Pfaffians of matrices. [sent-13, score-0.433]
4 We analyze the performance of the algorithm for models with binary variables and pairwise interactions on grids and other planar graphs. [sent-14, score-0.495]
5 We study in detail both the loop series and the equivalent Pfaffian series and show that the first term of the Pfaffian series for the general, intractable planar model, can provide very accurate approximations. [sent-15, score-0.651]
6 The algorithm outperforms previous truncation schemes of the loop series and is competitive with other state of the art methods for approximate inference. [sent-16, score-0.277]
7 Keywords: belief propagation, loop calculus, approximate inference, partition function, planar graphs, Ising model 1. [sent-17, score-0.666]
8 The exact partition function is explicitly related to the BP approximation through the loop calculus framework introduced by Chertkov and Chernyak (2006a). [sent-29, score-0.395]
9 Summation of the entire loop series is a hard combinatorial task since the number of generalized loops is typically exponential in the size of the graph (see also Watanabe and Fukumizu, 2009). [sent-32, score-0.505]
10 Nevertheless, a formal characterization of the classes of problems which are tractable via loop calculus still remains an open question. [sent-38, score-0.303]
11 A step toward this goal with a focus on the class of planar graphs has been done in Chertkov et al. [sent-39, score-0.374]
12 A graph is said to be planar if it can be embedded into a plane without crossing edges. [sent-41, score-0.382]
13 Furthermore, the full loop series can be expressed as a sum over certain Pfaffians (or determinants), where each Pfaffian may account for a large number of loops and is solvable in polynomial time as well. [sent-45, score-0.436]
14 (2008) builds upon classical results from Fisher (1961), Kasteleyn (1961) and Temperley and Fisher (1961) who addressed the question of counting the number of perfect matchings on a planar grid, also known as the dimer covering problem in statistical physics. [sent-47, score-0.499]
15 Such a model is known in statistical physics as planar Ising model without external field and in the machine learning community as planar, binary MRF with pure interaction potentials. [sent-49, score-0.379]
16 Notice that exact inference for a general binary graphical model on a planar graph, namely Ising model with external field, is intractable, in particular #P (Barahona, 1982). [sent-50, score-0.437]
17 A gentle overview of counting perfect matchings in planar graphs can be found in Jerrum (2003). [sent-51, score-0.519]
18 Globerson and Jaakkola (2007) obtained upper bounds on the partition function for non-planar graphs with binary variables by decomposition of the partition function into a weighted sum over partition functions of spanning tractable (zero field) planar models. [sent-53, score-0.653]
19 The resulting problem is a convex optimization problem and, since exact inference can be done in each planar sub-model via the FTK method, the bound can be calculated in polynomial time. [sent-54, score-0.365]
20 1274 A PPROXIMATE I NFERENCE ON P LANAR G RAPHS USING L OOP C ALCULUS AND BP Another example is the work of Schraudolph and Kamenetsky (2008) which provides a framework for exact inference on a restricted class of planar graphs using directly the FTK approach. [sent-55, score-0.426]
21 Since a planar Ising model can be transformed to a model without external field at the cost of introducing additional edges, one can allow local fields for a subset B of variables. [sent-56, score-0.39]
22 If the graphical model is B −outerplanar, which means that there exists a planar embedding in which the subset B of the nodes lie on the same face, the FTK technique can still be applied. [sent-57, score-0.39]
23 Contrary to these two approaches, which rely on exact inference on a tractable planar model, the loop calculus directly leads to a framework for approximate inference on general planar graphs. [sent-58, score-1.01]
24 Truncating the loop series according to Chertkov et al. [sent-59, score-0.258]
25 In the general planar case, however, this truncation may result into an accurate approximation that can be incrementally corrected by considering subsequent terms in the series. [sent-61, score-0.332]
26 In the next Section we review the main theoretical results of the loop calculus approach for planar graphs and introduce the proposed algorithm for approximating the partition function and single-variable marginals. [sent-62, score-0.746]
27 In Section 3 we provide experimental results for regular grids and other types of planar graphs. [sent-63, score-0.389]
28 A binary Forney graph G := (V , E ) consists of a set of nodes V where each node a ∈ V represents an interaction and each edge (a, b) ∈ E represents a binary variable ab which take values σ σab := {±1}. [sent-71, score-0.436]
29 If the graph contains loops, Z BP is just an approximation critically dependent on how strong the influence of the loops is. [sent-88, score-0.266]
30 We introduce now some convenient definitions related to the loop calculus framework. [sent-89, score-0.285]
31 Definition 1 A generalized loop in a graph G := V , E is any subgraph C := V ′ , E ′ , V ′ ⊆ V , E ′ ⊆ (V ′ ×V ′ ) ∩ E such that each node in V ′ has degree two or larger. [sent-90, score-0.375]
32 Loop calculus allows to represent Z explicitly in terms of the BP approximation via the loop series expansion: Z = Z BP · z, z= 1+ ∑ rC rC = ∏ µa;aC , ¯ , C∈C (3) a∈C where C is the set of all the loops within the graph and aC the set of neighbors of node a within the ¯ loop C. [sent-92, score-0.846]
33 Each term rC is a product of terms µa;aC associated with every node a of the loop C: ¯ µa;aC = ¯ Eτa ∏b∈aC (σab − mab ) ¯ , ∏b∈aC Varτab (σab ) ¯ mab = Eτab [σab ] , (4) where we have used Eτ [·] to denote expectation with respect to the pseudo-marginal distribution τ. [sent-93, score-0.356]
34 , 1 − m2 ab (5) In this work we consider planar graphs where nodes have degree at most three, that is |aC | ≤ 3. [sent-96, score-0.692]
35 2 we show how a graphical model stated in a bipartite factor graph representation can be converted to a Forney graph which satisfies this constraint at the cost of introducing auxiliary nodes. [sent-99, score-0.255]
36 Definition 2 A 2-regular loop is a loop in which all nodes have degree exactly two. [sent-100, score-0.52]
37 Definition 3 The 2-regular partition function Z0 is the truncated form of (3) which sums all 2/ regular loops only:2 Z0 = Z BP · z0 , / / z0 = 1 + / ∑ (6) rC . [sent-101, score-0.283]
38 C∈C , |aC |=2,∀a∈C ¯ As an example, Figure 1a shows a small Forney graph and Figure 1c shows seven loops found in the corresponding 2-regular partition function. [sent-102, score-0.334]
39 We use the term 2-regular partition function instead because loops with more than one connected component are also included in this part of the series. [sent-106, score-0.265]
40 (2008) it has been shown that computation of Z0 in a Forney graph G can be / mapped to the computation of the sum of all weighted perfect matchings within another extended weighted graph Gext := (VGext , EGext ). [sent-139, score-0.283]
41 A perfect matching is defined as a subset of edges such that each node neighbors exactly one edge from the subset and its weight is the product of weights of edges in it. [sent-140, score-0.211]
42 The key idea of this mapping is that 2-regular loops in G are in one-to-one correspondence to perfect matchings in Gext (see Figures 1b and c for an illustration). [sent-141, score-0.323]
43 To see that each 2-regular loop in G is associated with a perfect matching in Gext consider, for instance, the vertex of degree three in the bottom of Figure 2. [sent-149, score-0.337]
44 Given a 2-regular loop C, vertex a can appear in four different configurations: either node a does not appear in C, or C contains one of the following three paths: -b-a-c-, -b-a-d- or -c-a-d-. [sent-150, score-0.297]
45 These four cases correspond to node terms in a loop with values 1, µa;{b,c} , µa;{b,d} and µa;{c,d} respectively, and coincide with 1278 A PPROXIMATE I NFERENCE ON P LANAR G RAPHS USING L OOP C ALCULUS AND BP the matchings in Gext shown within the box on the bottom-right. [sent-151, score-0.355]
46 / Kasteleyn (1963) provided a method to compute this sum in polynomial time for planar graphs. [sent-155, score-0.313]
47 For the tractable case, namely, Ising model without external field or pairwise MRF with pure interaction potentials, the 2-regular partition function coincides with the exact partition function Z = 1279 ´ G OMEZ , K APPEN AND C HERTKOV Z0 = Z BP · z0 since other terms in the loop series vanish. [sent-175, score-0.552]
48 We / briefly reproduce their results here and provide an algorithm for computing the full loop series as a Pfaffian series. [sent-186, score-0.258]
49 For each possible subset Ψ ∈ T , including an even number of triplets, there exists a unique correspondence between loops in G including the triplets in Ψ and perfect matchings in another extended graph Gext Ψ constructed after removal of the triplets Ψ in G . [sent-188, score-0.53]
50 The correction ˆ ˆ sign(Pfaffian(BΨ )) is necessary because loop terms can be negative and even evaluating Pfaffian(AΨ ) with the correct sign would only give the contribution up to a sign. [sent-191, score-0.298]
51 However, the key advantage of the Pfaffian representation is that Z0 is always the term / that accounts for the largest number of loop terms in the series. [sent-206, score-0.238]
52 The main loop of the algorithm can be interrupted at any time, thus leading to a sequence of algorithms producing corrections incrementally. [sent-211, score-0.243]
53 Generally, for Θ = 0 the planar problem is tractable (zero field). [sent-218, score-0.331]
54 We omit the loop index in the triplet term µa;a because nodes have at most degree three and therefore the set a always ¯ ¯ coincide in all loops which contain that triplet. [sent-220, score-0.503]
55 In the next Subsection we analyze the full Pfaffian series using a small example and compare it with the original representation based on the loop series. [sent-230, score-0.258]
56 , 2007), which accounts for a certain numo ber of loops by performing depth-first-search on the factor graph and then merging the found loops iteratively. [sent-232, score-0.478]
57 We adapted TSLBP as an any-time algorithm (anyTLSBP) in a way that the length of the loop is used as the only parameter instead of the two parameters S and M (see G´ mez et al. [sent-233, score-0.27]
58 We use as outer clusters all (maximal) factors together with loops of four (k=4) or six (k=6) variables in the factor graph. [sent-241, score-0.211]
59 Planar graph decomposition (PDC) (Globerson and Jaakkola, 2007) which decomposes the parameterization of a probabilistic graphical model as a mixture of tractable planar graphs (with zero local field). [sent-247, score-0.516]
60 A complex loop is defined as a loop which can not be expressed as the union of two or more circuits or simple loops. [sent-251, score-0.436]
61 (2007), errors in Z, obtained from a truncated form o of the loop series, are very similar to errors in single variable marginal probabilities, which can be obtained by conditioning over the variables under interest. [sent-254, score-0.268]
62 1 Full Pfaffian Series In the previous Section we have described two equivalent representations for Z in terms of the loop series and the Pfaffian series. [sent-263, score-0.258]
63 Figure 3: Planar bipartite factor graph used to compare the full Pfaffian series with the loop series. [sent-270, score-0.419]
64 5 0 10 Ψ Z error Z −5 10 0 −10 10 0 10 2 10 l (loop terms) 4 0 10 10 2 10 p (pfaffian terms) 4 10 −5 0 10 1 10 p (pfaffian terms) 2 10 Figure 4: Comparison between the full loop series and the full Pfaffian series. [sent-275, score-0.258]
65 Left column shows the error, considering loop terms Z T LSBP (l) in log-log scale. [sent-277, score-0.218]
66 Shaded regions include all loop terms (not necessarily 2-regular loops) required to achieve the same (or better) accuracy than the accuracy of the 2-regular partition function Z0 . [sent-278, score-0.305]
67 l Then Z T LSBP (l) accounts for the l most contributing loops and Z P f (p) accounts for the p most contributing Pfaffian terms. [sent-291, score-0.256]
68 Summation of less than 50 loop terms (top-left panel) is enough to obtain machine precision accuracy. [sent-296, score-0.218]
69 The number of loop corrections required to correct the BP error then increases. [sent-300, score-0.243]
70 5) the first Pfaffian term z0 improves considerably, more than five orders of magnitude, on / the BP error, whereas approximately 100 loop terms are required to achieve a similar correction (gray region of middle-left panel). [sent-302, score-0.28]
71 In this scenario also a larger proportion of loop terms (bottom-left panel) is necessary to correct the BP result up to machine precision. [sent-305, score-0.218]
72 Looking at the bottom-left panel we find that approximately 200 loop terms are required to achieve the same correction as the one obtained by Z0 , which / is quite accurate (bottom-middle panel). [sent-306, score-0.303]
73 These oscillations are also present in the loop series expansion. [sent-308, score-0.258]
74 Notice again that calculating Z0 , the most contributing term in the Pfaffian series, does not require explicit search / for loop nor Pfaffian terms. [sent-310, score-0.237]
75 (2007) showed that, under some additional condition, the BP approximation gives a lower-bound for the exact partition function and all loops (and therefore Pfaffian terms too) have the same sign. [sent-318, score-0.288]
76 We generate grids with positive interactions and local fields, that is |φa | ∼ N (0, β/2) and |θab | ∼ N (0, βΘ), and study performance of the algorithms for various values of β, as well as for strong Θ = 1 and weak Θ = 0. [sent-320, score-0.233]
77 In that case, Z BP and Z0 are no longer lower bounds on Z and loop terms can be positive or / negative. [sent-354, score-0.218]
78 (2007) when loop terms, instead of Pfaffian o terms, where considered. [sent-367, score-0.218]
79 We compare Z0 also with anyTLSBP, a variant of our previous algorithm for / truncating the loop series. [sent-377, score-0.218]
80 Interestingly, although Z0 and anyTLSBP use different ways to truncate the loop / series, both methods show similar scaling behaviour for large graphs. [sent-398, score-0.218]
81 Here, we carry over the analysis of the Z0 correction using planar graphs which / consist of concentric polygons with a variable number of sides. [sent-413, score-0.436]
82 We generate them as factor graphs with pairwise interactions which we subsequently convert to an equivalent Forney graph using the procedure described in Appendix A. [sent-415, score-0.287]
83 Discussion We have presented an approximate algorithm based on the exact loop calculus framework for inference on planar graphical models defined in terms of binary variables. [sent-443, score-0.675]
84 The algorithm is illustrated on the example of ordered and disordered Ising model on a planar graph. [sent-445, score-0.313]
85 We tested our algorithm on regular grids and planar graphs with different structures. [sent-448, score-0.45]
86 We have also shown that for the case of grids with attractive symmetric interactions and positive local fields, the lower bound provided by Z0 is the most accurate. [sent-454, score-0.205]
87 Finally, we have presented a comparison of Z0 with TLSBP, which is another algorithm based / on the loop series expansion for BP that uses the loop length as truncation parameter. [sent-458, score-0.495]
88 On one hand, the calculation of Z0 involves a re-summation of many loop terms which in the case of TLSBP / are summed individually. [sent-459, score-0.218]
89 On the other hand, Z0 is / / restricted to the class of 2-regular loops whereas TLSBP also accounts for terms corresponding to more complex loop structures in which nodes can have degree larger than two. [sent-461, score-0.5]
90 Overall, for planar graphs, we have shown evidence that the Z0 approach is better than TLSBP when the size of the / graphs is not very small. [sent-462, score-0.374]
91 Although planarity is a severe restriction, we emphasize that planar graphs appear in many contexts such as computer vision and image processing, magnetic and optical recording, or network routing and logistics. [sent-464, score-0.374]
92 We have focused on inference problems defined on planar graphs with symmetric pairwise interactions and, to make the problems difficult, we have introduced local field potentials. [sent-465, score-0.557]
93 Summarizing, among the compared methods and models, the introduced approach based on Z0 / represents the best estimate to the partition function as long as the given graph does not deviate too much from perfect planarity, that is, we are in the small local field regime. [sent-469, score-0.25]
94 Bipartite Factor Graph Representation The Forney graph representation is convenient because each loop decomposes naturally in terms associated with nodes which have the same analytical form of Equation (4). [sent-488, score-0.339]
95 In this Appendix we first show how the presented approach differs when it is directly applied to a bipartite factor graph and second, how to convert a bipartite factor graph to a Forney graph. [sent-490, score-0.322]
96 On a bipartite factor graph G f g := (V f g , E f g ), the set V f g consists of variable nodes i ∈ I and factor nodes a ∈ A , with an edge between i and a iff i ∈ a, that is i appears in the factor function ψa . [sent-492, score-0.356]
97 Clearly, the one-to-one correspondence from 2-regular loops in G to perfect matchings in Gext is independent of whether G is expressed as a bipartite factor graph or a Forney graph. [sent-502, score-0.484]
98 We show here how to convert a bipartite factor graph defined in terms of binary variables to a more general Forney graph representation, for which the presented algorithm can be directly applied to. [sent-507, score-0.23]
99 Given G f g := (V f g , E f g ), a direct way to obtain an equivalent Forney graph G := (V , E ) is: first, to create a node δi ∈ V for each variable node i ∈ V f g , and second, to associate a new binary variable δi a with values σδi a = {±1} to edges (δi , a) ∈ E . [sent-516, score-0.214]
100 Truncating the loop series expansion for belief propagao tion. [sent-604, score-0.306]
wordName wordTfidf (topN-words)
[('bp', 0.456), ('pfaf', 0.423), ('planar', 0.313), ('gext', 0.285), ('ab', 0.234), ('loop', 0.218), ('loops', 0.178), ('chertkov', 0.155), ('forney', 0.146), ('appen', 0.098), ('hertkov', 0.098), ('omez', 0.098), ('interactions', 0.092), ('alculus', 0.09), ('anytlsbp', 0.09), ('lanar', 0.09), ('oop', 0.09), ('treeep', 0.09), ('partition', 0.087), ('matchings', 0.081), ('tlsbp', 0.073), ('graph', 0.069), ('raphs', 0.069), ('triplets', 0.069), ('calculus', 0.067), ('perfect', 0.064), ('pproximate', 0.064), ('correction', 0.062), ('graphs', 0.061), ('ising', 0.061), ('mi', 0.059), ('bipartite', 0.059), ('grids', 0.058), ('pdc', 0.057), ('node', 0.056), ('nference', 0.056), ('fa', 0.056), ('elds', 0.055), ('mez', 0.052), ('nodes', 0.052), ('xa', 0.051), ('cvm', 0.05), ('propagation', 0.05), ('chernyak', 0.049), ('pab', 0.049), ('belief', 0.048), ('external', 0.047), ('marginals', 0.045), ('dimer', 0.041), ('egext', 0.041), ('ftk', 0.041), ('mab', 0.041), ('series', 0.04), ('trw', 0.038), ('ac', 0.035), ('couplings', 0.035), ('mooij', 0.035), ('weak', 0.034), ('factor', 0.033), ('edges', 0.033), ('ans', 0.033), ('lsbp', 0.033), ('degree', 0.032), ('pairwise', 0.032), ('globerson', 0.03), ('local', 0.03), ('inference', 0.029), ('di', 0.028), ('eld', 0.027), ('bethe', 0.027), ('yedidia', 0.027), ('errors', 0.025), ('edge', 0.025), ('kappen', 0.025), ('corrections', 0.025), ('junction', 0.025), ('attractive', 0.025), ('graphical', 0.025), ('kasteleyn', 0.024), ('lanl', 0.024), ('sudderth', 0.024), ('vgext', 0.024), ('exact', 0.023), ('panel', 0.023), ('triplet', 0.023), ('vertex', 0.023), ('mechanics', 0.022), ('cpu', 0.021), ('accounts', 0.02), ('grid', 0.02), ('schraudolph', 0.019), ('truncation', 0.019), ('jaakkola', 0.019), ('strong', 0.019), ('physics', 0.019), ('contributing', 0.019), ('rc', 0.018), ('sign', 0.018), ('tractable', 0.018), ('regular', 0.018)]
simIndex simValue paperId paperTitle
same-paper 1 0.9999994 13 jmlr-2010-Approximate Inference on Planar Graphs using Loop Calculus and Belief Propagation
Author: Vicenç Gómez, Hilbert J. Kappen, Michael Chertkov
Abstract: We introduce novel results for approximate inference on planar graphical models using the loop calculus framework. The loop calculus (Chertkov and Chernyak, 2006a) allows to express the exact partition function of a graphical model as a finite sum of terms that can be evaluated once the belief propagation (BP) solution is known. In general, full summation over all correction terms is intractable. We develop an algorithm for the approach presented in Chertkov et al. (2008) which represents an efficient truncation scheme on planar graphs and a new representation of the series in terms of Pfaffians of matrices. We analyze the performance of the algorithm for models with binary variables and pairwise interactions on grids and other planar graphs. We study in detail both the loop series and the equivalent Pfaffian series and show that the first term of the Pfaffian series for the general, intractable planar model, can provide very accurate approximations. The algorithm outperforms previous truncation schemes of the loop series and is competitive with other state of the art methods for approximate inference. Keywords: belief propagation, loop calculus, approximate inference, partition function, planar graphs, Ising model
Author: Joris M. Mooij
Abstract: This paper describes the software package libDAI, a free & open source C++ library that provides implementations of various exact and approximate inference methods for graphical models with discrete-valued variables. libDAI supports directed graphical models (Bayesian networks) as well as undirected ones (Markov random fields and factor graphs). It offers various approximations of the partition sum, marginal probability distributions and maximum probability states. Parameter learning is also supported. A feature comparison with other open source software packages for approximate inference is given. libDAI is licensed under the GPL v2+ license and is available at http://www.libdai.org. Keywords: probabilistic graphical models, approximate inference, open source software, factor graphs, Markov random fields, Bayesian networks
3 0.065536067 39 jmlr-2010-FastInf: An Efficient Approximate Inference Library
Author: Ariel Jaimovich, Ofer Meshi, Ian McGraw, Gal Elidan
Abstract: The FastInf C++ library is designed to perform memory and time efficient approximate inference in large-scale discrete undirected graphical models. The focus of the library is propagation based approximate inference methods, ranging from the basic loopy belief propagation algorithm to propagation based on convex free energies. Various message scheduling schemes that improve on the standard synchronous or asynchronous approaches are included. Also implemented are a clique tree based exact inference, Gibbs sampling, and the mean field algorithm. In addition to inference, FastInf provides parameter estimation capabilities as well as representation and learning of shared parameters. It offers a rich interface that facilitates extension of the basic classes to other inference and learning methods. Keywords: graphical models, Markov random field, loopy belief propagation, approximate inference
4 0.057908069 58 jmlr-2010-Kronecker Graphs: An Approach to Modeling Networks
Author: Jure Leskovec, Deepayan Chakrabarti, Jon Kleinberg, Christos Faloutsos, Zoubin Ghahramani
Abstract: How can we generate realistic networks? In addition, how can we do so with a mathematically tractable model that allows for rigorous analysis of network properties? Real networks exhibit a long list of surprising properties: Heavy tails for the in- and out-degree distribution, heavy tails for the eigenvalues and eigenvectors, small diameters, and densification and shrinking diameters over time. Current network models and generators either fail to match several of the above properties, are complicated to analyze mathematically, or both. Here we propose a generative model for networks that is both mathematically tractable and can generate networks that have all the above mentioned structural properties. Our main idea here is to use a non-standard matrix operation, the Kronecker product, to generate graphs which we refer to as “Kronecker graphs”. First, we show that Kronecker graphs naturally obey common network properties. In fact, we rigorously prove that they do so. We also provide empirical evidence showing that Kronecker graphs can effectively model the structure of real networks. We then present K RON F IT, a fast and scalable algorithm for fitting the Kronecker graph generation model to large real networks. A naive approach to fitting would take super-exponential time. In contrast, K RON F IT takes linear time, by exploiting the structure of Kronecker matrix multiplication and by using statistical simulation techniques. Experiments on a wide range of large real and synthetic networks show that K RON F IT finds accurate parameters that very well mimic the properties of target networks. In fact, using just c 2010 Jure Leskovec, Deepayan Chakrabarti, Jon Kleinberg, Christos Faloutsos and Zoubin Ghahramani. L ESKOVEC , C HAKRABARTI , K LEINBERG , FALOUTSOS AND G HAHRAMANI four parameters we can accurately model several aspects of global network structure. Once fitted, the model parameters can be used to gain insights about the network structure, and the resulting synt
5 0.050461952 44 jmlr-2010-Graph Kernels
Author: S.V.N. Vishwanathan, Nicol N. Schraudolph, Risi Kondor, Karsten M. Borgwardt
Abstract: We present a unified framework to study graph kernels, special cases of which include the random a walk (G¨ rtner et al., 2003; Borgwardt et al., 2005) and marginalized (Kashima et al., 2003, 2004; Mah´ et al., 2004) graph kernels. Through reduction to a Sylvester equation we improve the time e complexity of kernel computation between unlabeled graphs with n vertices from O(n6 ) to O(n3 ). We find a spectral decomposition approach even more efficient when computing entire kernel matrices. For labeled graphs we develop conjugate gradient and fixed-point methods that take O(dn3 ) time per iteration, where d is the size of the label set. By extending the necessary linear algebra to Reproducing Kernel Hilbert Spaces (RKHS) we obtain the same result for d-dimensional edge kernels, and O(n4 ) in the infinite-dimensional case; on sparse graphs these algorithms only take O(n2 ) time per iteration in all cases. Experiments on graphs from bioinformatics and other application domains show that these techniques can speed up computation of the kernel by an order of magnitude or more. We also show that certain rational kernels (Cortes et al., 2002, 2003, 2004) when specialized to graphs reduce to our random walk graph kernel. Finally, we relate our framework to R-convolution kernels (Haussler, 1999) and provide a kernel that is close to the optimal assignment o kernel of Fr¨ hlich et al. (2006) yet provably positive semi-definite. Keywords: linear algebra in RKHS, Sylvester equations, spectral decomposition, bioinformatics, rational kernels, transducers, semirings, random walks
6 0.03978223 24 jmlr-2010-Collective Inference for Extraction MRFs Coupled with Symmetric Clique Potentials
7 0.034814339 75 jmlr-2010-Mean Field Variational Approximation for Continuous-Time Bayesian Networks
8 0.031911481 32 jmlr-2010-Efficient Algorithms for Conditional Independence Inference
9 0.030609172 15 jmlr-2010-Approximate Tree Kernels
10 0.030493885 89 jmlr-2010-PAC-Bayesian Analysis of Co-clustering and Beyond
11 0.028836748 69 jmlr-2010-Lp-Nested Symmetric Distributions
12 0.028601207 26 jmlr-2010-Consensus-Based Distributed Support Vector Machines
13 0.028008088 115 jmlr-2010-Using Contextual Representations to Efficiently Learn Context-Free Languages
14 0.02698227 63 jmlr-2010-Learning Instance-Specific Predictive Models
15 0.026752187 76 jmlr-2010-Message-passing for Graph-structured Linear Programs: Proximal Methods and Rounding Schemes
16 0.025865726 52 jmlr-2010-Incremental Sigmoid Belief Networks for Grammar Learning
17 0.025645256 7 jmlr-2010-A Streaming Parallel Decision Tree Algorithm
18 0.025465565 64 jmlr-2010-Learning Non-Stationary Dynamic Bayesian Networks
19 0.024645763 109 jmlr-2010-Stochastic Composite Likelihood
20 0.024109278 51 jmlr-2010-Importance Sampling for Continuous Time Bayesian Networks
topicId topicWeight
[(0, -0.111), (1, 0.042), (2, -0.081), (3, -0.071), (4, -0.005), (5, 0.016), (6, -0.048), (7, -0.11), (8, -0.156), (9, 0.322), (10, -0.078), (11, -0.014), (12, 0.061), (13, 0.103), (14, 0.062), (15, 0.02), (16, -0.016), (17, -0.047), (18, -0.188), (19, 0.091), (20, 0.119), (21, -0.082), (22, -0.074), (23, 0.144), (24, 0.034), (25, -0.023), (26, 0.029), (27, -0.058), (28, -0.063), (29, 0.064), (30, -0.093), (31, -0.036), (32, 0.024), (33, 0.07), (34, 0.041), (35, 0.089), (36, 0.017), (37, -0.074), (38, 0.14), (39, -0.04), (40, 0.011), (41, 0.029), (42, -0.092), (43, 0.063), (44, 0.057), (45, -0.061), (46, 0.025), (47, -0.167), (48, 0.063), (49, 0.042)]
simIndex simValue paperId paperTitle
same-paper 1 0.95220417 13 jmlr-2010-Approximate Inference on Planar Graphs using Loop Calculus and Belief Propagation
Author: Vicenç Gómez, Hilbert J. Kappen, Michael Chertkov
Abstract: We introduce novel results for approximate inference on planar graphical models using the loop calculus framework. The loop calculus (Chertkov and Chernyak, 2006a) allows to express the exact partition function of a graphical model as a finite sum of terms that can be evaluated once the belief propagation (BP) solution is known. In general, full summation over all correction terms is intractable. We develop an algorithm for the approach presented in Chertkov et al. (2008) which represents an efficient truncation scheme on planar graphs and a new representation of the series in terms of Pfaffians of matrices. We analyze the performance of the algorithm for models with binary variables and pairwise interactions on grids and other planar graphs. We study in detail both the loop series and the equivalent Pfaffian series and show that the first term of the Pfaffian series for the general, intractable planar model, can provide very accurate approximations. The algorithm outperforms previous truncation schemes of the loop series and is competitive with other state of the art methods for approximate inference. Keywords: belief propagation, loop calculus, approximate inference, partition function, planar graphs, Ising model
2 0.78555745 118 jmlr-2010-libDAI: A Free and Open Source C++ Library for Discrete Approximate Inference in Graphical Models
Author: Joris M. Mooij
Abstract: This paper describes the software package libDAI, a free & open source C++ library that provides implementations of various exact and approximate inference methods for graphical models with discrete-valued variables. libDAI supports directed graphical models (Bayesian networks) as well as undirected ones (Markov random fields and factor graphs). It offers various approximations of the partition sum, marginal probability distributions and maximum probability states. Parameter learning is also supported. A feature comparison with other open source software packages for approximate inference is given. libDAI is licensed under the GPL v2+ license and is available at http://www.libdai.org. Keywords: probabilistic graphical models, approximate inference, open source software, factor graphs, Markov random fields, Bayesian networks
3 0.60772085 39 jmlr-2010-FastInf: An Efficient Approximate Inference Library
Author: Ariel Jaimovich, Ofer Meshi, Ian McGraw, Gal Elidan
Abstract: The FastInf C++ library is designed to perform memory and time efficient approximate inference in large-scale discrete undirected graphical models. The focus of the library is propagation based approximate inference methods, ranging from the basic loopy belief propagation algorithm to propagation based on convex free energies. Various message scheduling schemes that improve on the standard synchronous or asynchronous approaches are included. Also implemented are a clique tree based exact inference, Gibbs sampling, and the mean field algorithm. In addition to inference, FastInf provides parameter estimation capabilities as well as representation and learning of shared parameters. It offers a rich interface that facilitates extension of the basic classes to other inference and learning methods. Keywords: graphical models, Markov random field, loopy belief propagation, approximate inference
4 0.38617525 58 jmlr-2010-Kronecker Graphs: An Approach to Modeling Networks
Author: Jure Leskovec, Deepayan Chakrabarti, Jon Kleinberg, Christos Faloutsos, Zoubin Ghahramani
Abstract: How can we generate realistic networks? In addition, how can we do so with a mathematically tractable model that allows for rigorous analysis of network properties? Real networks exhibit a long list of surprising properties: Heavy tails for the in- and out-degree distribution, heavy tails for the eigenvalues and eigenvectors, small diameters, and densification and shrinking diameters over time. Current network models and generators either fail to match several of the above properties, are complicated to analyze mathematically, or both. Here we propose a generative model for networks that is both mathematically tractable and can generate networks that have all the above mentioned structural properties. Our main idea here is to use a non-standard matrix operation, the Kronecker product, to generate graphs which we refer to as “Kronecker graphs”. First, we show that Kronecker graphs naturally obey common network properties. In fact, we rigorously prove that they do so. We also provide empirical evidence showing that Kronecker graphs can effectively model the structure of real networks. We then present K RON F IT, a fast and scalable algorithm for fitting the Kronecker graph generation model to large real networks. A naive approach to fitting would take super-exponential time. In contrast, K RON F IT takes linear time, by exploiting the structure of Kronecker matrix multiplication and by using statistical simulation techniques. Experiments on a wide range of large real and synthetic networks show that K RON F IT finds accurate parameters that very well mimic the properties of target networks. In fact, using just c 2010 Jure Leskovec, Deepayan Chakrabarti, Jon Kleinberg, Christos Faloutsos and Zoubin Ghahramani. L ESKOVEC , C HAKRABARTI , K LEINBERG , FALOUTSOS AND G HAHRAMANI four parameters we can accurately model several aspects of global network structure. Once fitted, the model parameters can be used to gain insights about the network structure, and the resulting synt
5 0.2778933 44 jmlr-2010-Graph Kernels
Author: S.V.N. Vishwanathan, Nicol N. Schraudolph, Risi Kondor, Karsten M. Borgwardt
Abstract: We present a unified framework to study graph kernels, special cases of which include the random a walk (G¨ rtner et al., 2003; Borgwardt et al., 2005) and marginalized (Kashima et al., 2003, 2004; Mah´ et al., 2004) graph kernels. Through reduction to a Sylvester equation we improve the time e complexity of kernel computation between unlabeled graphs with n vertices from O(n6 ) to O(n3 ). We find a spectral decomposition approach even more efficient when computing entire kernel matrices. For labeled graphs we develop conjugate gradient and fixed-point methods that take O(dn3 ) time per iteration, where d is the size of the label set. By extending the necessary linear algebra to Reproducing Kernel Hilbert Spaces (RKHS) we obtain the same result for d-dimensional edge kernels, and O(n4 ) in the infinite-dimensional case; on sparse graphs these algorithms only take O(n2 ) time per iteration in all cases. Experiments on graphs from bioinformatics and other application domains show that these techniques can speed up computation of the kernel by an order of magnitude or more. We also show that certain rational kernels (Cortes et al., 2002, 2003, 2004) when specialized to graphs reduce to our random walk graph kernel. Finally, we relate our framework to R-convolution kernels (Haussler, 1999) and provide a kernel that is close to the optimal assignment o kernel of Fr¨ hlich et al. (2006) yet provably positive semi-definite. Keywords: linear algebra in RKHS, Sylvester equations, spectral decomposition, bioinformatics, rational kernels, transducers, semirings, random walks
6 0.20656753 26 jmlr-2010-Consensus-Based Distributed Support Vector Machines
7 0.184156 24 jmlr-2010-Collective Inference for Extraction MRFs Coupled with Symmetric Clique Potentials
8 0.17875101 109 jmlr-2010-Stochastic Composite Likelihood
9 0.17069434 69 jmlr-2010-Lp-Nested Symmetric Distributions
10 0.16388322 38 jmlr-2010-Expectation Truncation and the Benefits of Preselection In Training Generative Models
11 0.16038147 17 jmlr-2010-Bayesian Learning in Sparse Graphical Factor Models via Variational Mean-Field Annealing
12 0.15807872 52 jmlr-2010-Incremental Sigmoid Belief Networks for Grammar Learning
13 0.1566602 115 jmlr-2010-Using Contextual Representations to Efficiently Learn Context-Free Languages
14 0.14927433 77 jmlr-2010-Model-based Boosting 2.0
15 0.13558304 32 jmlr-2010-Efficient Algorithms for Conditional Independence Inference
16 0.13272589 64 jmlr-2010-Learning Non-Stationary Dynamic Bayesian Networks
17 0.13192801 89 jmlr-2010-PAC-Bayesian Analysis of Co-clustering and Beyond
18 0.12580918 63 jmlr-2010-Learning Instance-Specific Predictive Models
19 0.12388044 75 jmlr-2010-Mean Field Variational Approximation for Continuous-Time Bayesian Networks
20 0.12140659 53 jmlr-2010-Inducing Tree-Substitution Grammars
topicId topicWeight
[(3, 0.013), (4, 0.027), (8, 0.013), (21, 0.018), (24, 0.014), (32, 0.055), (36, 0.049), (37, 0.034), (56, 0.018), (66, 0.441), (75, 0.114), (81, 0.014), (85, 0.051), (96, 0.045)]
simIndex simValue paperId paperTitle
same-paper 1 0.65909457 13 jmlr-2010-Approximate Inference on Planar Graphs using Loop Calculus and Belief Propagation
Author: Vicenç Gómez, Hilbert J. Kappen, Michael Chertkov
Abstract: We introduce novel results for approximate inference on planar graphical models using the loop calculus framework. The loop calculus (Chertkov and Chernyak, 2006a) allows to express the exact partition function of a graphical model as a finite sum of terms that can be evaluated once the belief propagation (BP) solution is known. In general, full summation over all correction terms is intractable. We develop an algorithm for the approach presented in Chertkov et al. (2008) which represents an efficient truncation scheme on planar graphs and a new representation of the series in terms of Pfaffians of matrices. We analyze the performance of the algorithm for models with binary variables and pairwise interactions on grids and other planar graphs. We study in detail both the loop series and the equivalent Pfaffian series and show that the first term of the Pfaffian series for the general, intractable planar model, can provide very accurate approximations. The algorithm outperforms previous truncation schemes of the loop series and is competitive with other state of the art methods for approximate inference. Keywords: belief propagation, loop calculus, approximate inference, partition function, planar graphs, Ising model
2 0.46834984 103 jmlr-2010-Sparse Semi-supervised Learning Using Conjugate Functions
Author: Shiliang Sun, John Shawe-Taylor
Abstract: In this paper, we propose a general framework for sparse semi-supervised learning, which concerns using a small portion of unlabeled data and a few labeled data to represent target functions and thus has the merit of accelerating function evaluations when predicting the output of a new example. This framework makes use of Fenchel-Legendre conjugates to rewrite a convex insensitive loss involving a regularization with unlabeled data, and is applicable to a family of semi-supervised learning methods such as multi-view co-regularized least squares and single-view Laplacian support vector machines (SVMs). As an instantiation of this framework, we propose sparse multi-view SVMs which use a squared ε-insensitive loss. The resultant optimization is an inf-sup problem and the optimal solutions have arguably saddle-point properties. We present a globally optimal iterative algorithm to optimize the problem. We give the margin bound on the generalization error of the sparse multi-view SVMs, and derive the empirical Rademacher complexity for the induced function class. Experiments on artificial and real-world data show their effectiveness. We further give a sequential training approach to show their possibility and potential for uses in large-scale problems and provide encouraging experimental results indicating the efficacy of the margin bound and empirical Rademacher complexity on characterizing the roles of unlabeled data for semi-supervised learning. Keywords: semi-supervised learning, Fenchel-Legendre conjugate, representer theorem, multiview regularization, support vector machine, statistical learning theory
3 0.32205322 89 jmlr-2010-PAC-Bayesian Analysis of Co-clustering and Beyond
Author: Yevgeny Seldin, Naftali Tishby
Abstract: We derive PAC-Bayesian generalization bounds for supervised and unsupervised learning models based on clustering, such as co-clustering, matrix tri-factorization, graphical models, graph clustering, and pairwise clustering.1 We begin with the analysis of co-clustering, which is a widely used approach to the analysis of data matrices. We distinguish among two tasks in matrix data analysis: discriminative prediction of the missing entries in data matrices and estimation of the joint probability distribution of row and column variables in co-occurrence matrices. We derive PAC-Bayesian generalization bounds for the expected out-of-sample performance of co-clustering-based solutions for these two tasks. The analysis yields regularization terms that were absent in the previous formulations of co-clustering. The bounds suggest that the expected performance of co-clustering is governed by a trade-off between its empirical performance and the mutual information preserved by the cluster variables on row and column IDs. We derive an iterative projection algorithm for finding a local optimum of this trade-off for discriminative prediction tasks. This algorithm achieved stateof-the-art performance in the MovieLens collaborative filtering task. Our co-clustering model can also be seen as matrix tri-factorization and the results provide generalization bounds, regularization terms, and new algorithms for this form of matrix factorization. The analysis of co-clustering is extended to tree-shaped graphical models, which can be used to analyze high dimensional tensors. According to the bounds, the generalization abilities of treeshaped graphical models depend on a trade-off between their empirical data fit and the mutual information that is propagated up the tree levels. We also formulate weighted graph clustering as a prediction problem: given a subset of edge weights we analyze the ability of graph clustering to predict the remaining edge weights. The analysis of co-clustering easily
4 0.31982011 91 jmlr-2010-Posterior Regularization for Structured Latent Variable Models
Author: Kuzman Ganchev, João Graça, Jennifer Gillenwater, Ben Taskar
Abstract: We present posterior regularization, a probabilistic framework for structured, weakly supervised learning. Our framework efficiently incorporates indirect supervision via constraints on posterior distributions of probabilistic models with latent variables. Posterior regularization separates model complexity from the complexity of structural constraints it is desired to satisfy. By directly imposing decomposable regularization on the posterior moments of latent variables during learning, we retain the computational efficiency of the unconstrained model while ensuring desired constraints hold in expectation. We present an efficient algorithm for learning with posterior regularization and illustrate its versatility on a diverse set of structural constraints such as bijectivity, symmetry and group sparsity in several large scale experiments, including multi-view learning, cross-lingual dependency grammar induction, unsupervised part-of-speech induction, and bitext word alignment.1 Keywords: posterior regularization framework, unsupervised learning, latent variables models, prior knowledge, natural language processing
Author: Pinar Donmez, Guy Lebanon, Krishnakumar Balasubramanian
Abstract: Estimating the error rates of classifiers or regression models is a fundamental task in machine learning which has thus far been studied exclusively using supervised learning techniques. We propose a novel unsupervised framework for estimating these error rates using only unlabeled data and mild assumptions. We prove consistency results for the framework and demonstrate its practical applicability on both synthetic and real world data. Keywords: classification and regression, maximum likelihood, latent variable models
6 0.31143004 74 jmlr-2010-Maximum Relative Margin and Data-Dependent Regularization
7 0.311369 46 jmlr-2010-High Dimensional Inverse Covariance Matrix Estimation via Linear Programming
8 0.30912328 63 jmlr-2010-Learning Instance-Specific Predictive Models
9 0.30894879 111 jmlr-2010-Topology Selection in Graphical Models of Autoregressive Processes
10 0.30848894 59 jmlr-2010-Large Scale Online Learning of Image Similarity Through Ranking
11 0.30805352 17 jmlr-2010-Bayesian Learning in Sparse Graphical Factor Models via Variational Mean-Field Annealing
12 0.30657673 109 jmlr-2010-Stochastic Composite Likelihood
13 0.30641234 92 jmlr-2010-Practical Approaches to Principal Component Analysis in the Presence of Missing Values
14 0.30619967 49 jmlr-2010-Hubs in Space: Popular Nearest Neighbors in High-Dimensional Data
15 0.30369565 69 jmlr-2010-Lp-Nested Symmetric Distributions
16 0.30278856 102 jmlr-2010-Semi-Supervised Novelty Detection
17 0.30272654 51 jmlr-2010-Importance Sampling for Continuous Time Bayesian Networks
19 0.30261132 87 jmlr-2010-Online Learning for Matrix Factorization and Sparse Coding
20 0.30113187 43 jmlr-2010-Generalized Power Method for Sparse Principal Component Analysis