jmlr jmlr2007 jmlr2007-86 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Vicenç Gómez, Joris M. Mooij, Hilbert J. Kappen
Abstract: Recently, Chertkov and Chernyak (2006b) derived an exact expression for the partition sum (normalization constant) corresponding to a graphical model, which is an expansion around the belief propagation (BP) solution. By adding correction terms to the BP free energy, one for each “generalized loop” in the factor graph, the exact partition sum is obtained. However, the usually enormous number of generalized loops generally prohibits summation over all correction terms. In this article we introduce truncated loop series BP (TLSBP), a particular way of truncating the loop series of Chertkov & Chernyak by considering generalized loops as compositions of simple loops. We analyze the performance of TLSBP in different scenarios, including the Ising model on square grids and regular random graphs, and on PROMEDAS, a large probabilistic medical diagnostic system. We show that TLSBP often improves upon the accuracy of the BP solution, at the expense of increased computation time. We also show that the performance of TLSBP strongly depends on the degree of interaction between the variables. For weak interactions, truncating the series leads to significant improvements, whereas for strong interactions it can be ineffective, even if a high number of terms is considered. Keywords: belief propagation, loop calculus, approximate inference, partition function, Ising grid, random regular graphs, medical diagnosis
Reference: text
sentIndex sentText sentNum sentScore
1 However, the usually enormous number of generalized loops generally prohibits summation over all correction terms. [sent-13, score-0.643]
2 In this article we introduce truncated loop series BP (TLSBP), a particular way of truncating the loop series of Chertkov & Chernyak by considering generalized loops as compositions of simple loops. [sent-14, score-1.179]
3 , 1999) is a popular inference method that yields exact marginal probabilities on graphs without loops and can yield surprisingly accurate results on graphs with loops. [sent-21, score-0.658]
4 The series consists of a sum over all so-called generalized loops in the graph. [sent-38, score-0.645]
5 Since the number of generalized loops in a graphical model easily exceeds the number of configurations of the model, one could argue that the method is of little practical value. [sent-40, score-0.641]
6 For instance, the junction-tree algorithm (Lauritzen and Spiegelhalter, 1988) which transforms the original graph in a region tree such that the influence of all loops in the original graph is implicitly captured, and the exact result is obtained. [sent-43, score-0.685]
7 In this work we propose TLSBP, an algorithm to compute generalized loops in a graph which are then used for the approximate computation of the partition sum and the single-node marginals. [sent-52, score-0.715]
8 For large enough values of these parameters, all generalized loops present in a graph are retrieved and the exact result is obtained. [sent-54, score-0.684]
9 For cases were exhaustive computation of all loops is not feasible, the search can be pruned, and the result is a truncated approximation of the exact solution. [sent-56, score-0.633]
10 Throughout the paper we show evidence that for those cases where BP has difficulties to converge, loop corrections are of little use, since loops of all lengths tend to have contributions of similar order of magnitude. [sent-60, score-0.888]
11 In Section 3 we provide a formal characterization of the different types of generalized loops that can be present in an arbitrary graph. [sent-63, score-0.626]
12 Note that the number of generalized loops in a finite graph is finite, and strictly speaking, the term series denotes an infinite sequence of terms. [sent-67, score-0.682]
13 1988 T RUNCATING THE L OOP S ERIES E XPANSION FOR BP Concerning grids and regular graphs, we show that the success of restricting the loop series expansion to a reduced quantity of loops depends on the type of interactions between the variables in the network. [sent-69, score-1.001]
14 For weak interactions, the largest correction terms come from the small elementary loops and therefore truncation of the series at some maximal loop length can be effective. [sent-70, score-0.89]
15 For strong interactions, loops of all lengths contribute significantly and truncation is of limited use. [sent-71, score-0.603]
16 We numerically show that when more loops are taken into account, the error of the partition sum decreases and when all loops are taken into account the method is correct up to machine precision. [sent-72, score-1.227]
17 Then ZBP is related to the exact partition sum Z as: Z = ZBP 1 + ∑ r(C) r(C) = ∏ µi (C) ∏ µα (C), , C∈C i∈C (5) α∈C where summation is over the set C of all generalized loops in the factor graph. [sent-115, score-0.72]
18 Note that, although the series is finite, the number of generalized loops in the factor graph can be enormous and easily exceed the number of configurations 2n . [sent-121, score-0.703]
19 On the other hand, it may be that restricting the sum in (5) to a subset of the total generalized loops captures the most important corrections and may yield a significant improvement in comparison to the BP estimate. [sent-123, score-0.678]
20 Loop Characterization In this section we characterize different types of generalized loops that can be present in a graph. [sent-128, score-0.626]
21 Note that the union of two simple loops is never a simple loop except for the trivial case in which both loops are equal. [sent-137, score-1.397]
22 Any generalized loop can be categorized according to these three different categories: a simple loop cannot be a disconnected loop, neither a complex loop. [sent-147, score-0.641]
23 On the other hand, since Definitions 3 and 4 are not mutually exclusive, a disconnected loop can be a complex loop and vice-versa, and also there are generalized loops which are neither disconnected nor complex, for instance the example of Figure 1b. [sent-148, score-1.309]
24 Usually, the smallest subset contains the simple loops and both disconnected loops and complex loops have nonempty intersection. [sent-153, score-1.847]
25 There is another subset of all generalized loops which are neither simple, disconnected, nor complex. [sent-154, score-0.626]
26 1992 T RUNCATING THE L OOP S ERIES E XPANSION FOR BP (a) (b) (c) (d) (e) (f) Figure 1: Examples of generalized loops in a factor graph with lattice structure. [sent-155, score-0.684]
27 1993 ´ G OMEZ , M OOIJ , AND K APPEN disconnected simple complex Figure 2: Diagrammatic representation of the different types of generalized loops present in any graph. [sent-165, score-0.751]
28 The Truncated Loop Series Algorithm In this section we describe the TLSBP algorithm to compute generalized loops in a factor graph, and use them to correct the BP solution. [sent-168, score-0.647]
29 The general idea is to search first for a subset of the simple loops and, after that, merge all of them iteratively until no new loops are produced. [sent-170, score-1.205]
30 Eventually, a high number of generalized loops which presumably will account for the major contributions in the loops series expansion will be obtained. [sent-173, score-1.268]
31 We show that the algorithm is complete, or equivalently, that all generalized loops are obtained by the proposed approach when the constraints expressed by the two arguments are relaxed. [sent-174, score-0.626]
32 On the other hand, if a nonempty graph remains after this preprocessing, it will have loops and the BP solution can be improved using the proposed approach. [sent-182, score-0.611]
33 The result of this search will be the initial set of loops for the next step and will also provide a bound b which will be used to truncate the search for new generalized loops. [sent-185, score-0.656]
34 Once S simple loops have been found in order of increasing length, the length of the largest simple loop is used as the bound b for the remaining steps. [sent-189, score-0.824]
35 The third step of the algorithm consists of obtaining all non-simple loops that the set of S simple loops can“generate”. [sent-191, score-1.148]
36 According to definition 4, complex loops can not be expressed as union of simple loops. [sent-192, score-0.622]
37 To develop a complete method, in the sense that all existing loops can be obtained, we define the operation merge loops, which extends the simple union in such a way that complex loops are retrieved as well. [sent-193, score-1.238]
38 Given two generalized loops, l1 , l2 , merge loops returns a set of generalized loops. [sent-194, score-0.72]
39 One can observe that for each disconnected loop, a set of complex loops can be generated by connecting two (or more) components of the disconnected loop. [sent-195, score-0.793]
40 In other words, complex loops can be expressed as the union of disjoint loops with a path connecting two vertices of different components. [sent-196, score-1.196]
41 Therefore the set computed by merge loops will have only one element l = {l1 ∪ l2 } if l1 ∪ l2 is not disconnected. [sent-197, score-0.616]
42 Otherwise, all the possible complex loops in which l1 ∪ l2 appears are included in the resulting set. [sent-198, score-0.605]
43 We use the following procedure to compute all complex loops associated to the disconnected loop l : we start at a vertex of a connected component of l and perform depth-first-search (DFS) until a vertex of a different component has been reached. [sent-199, score-0.95]
44 Moreover, given a disconnected loop, the number of associated complex loops can be enormous. [sent-205, score-0.699]
45 Second, the DFS search for complex loops is limited using b, so very large complex loops will not be retrieved. [sent-208, score-1.225]
46 However, restricting the DFS search for complex loops using the bound b could result in too deep searches. [sent-209, score-0.62]
47 The maximum depth of the DFS search for complex loops is d = b − 2L s . [sent-211, score-0.62]
48 Then the computational complexity of the merge loops operation depends exponentially on d. [sent-212, score-0.616]
49 To overcome this problem we define another parameter M, the maximum depth of the DFS search in the merge loops operation. [sent-214, score-0.631]
50 For small values of M, the operation merge loops will be fast but a few (if any) complex loops will be obtained. [sent-215, score-1.221]
51 Conversely, for higher values of M the operation merge loops will find more complex loops at the cost of more time. [sent-216, score-1.221]
52 Once the problem of expressing all generalized loops as compositions of simple loops has been solved using the merge loops operation, we need to define an efficient procedure to merge them. [sent-219, score-1.858]
53 Nevertheless, we can avoid redundant combinations by merging pairs of loops iteratively: in a first iteration, all pairs of simple loops are merged, which produces new generalized loops. [sent-228, score-1.216]
54 In a next iteration i, instead of performing all S mergings, only the new generalized loops i obtained in iteration i − 1 are merged with the initial set of simple loops. [sent-229, score-0.626]
55 Summarizing, the third step applies iteratively the merge loops operation until no new generalized loops are obtained. [sent-232, score-1.242]
56 To show that this process produces all the generalized loops we first assume that S is sufficiently large to account for all the simple loops in the graph, and that M is larger or equal than the number of edges of the graph. [sent-236, score-1.2]
57 Then C is produced in iteration s , during the DFS for complex loops within the merging of one of the simple loops contained in C. [sent-241, score-1.195]
58 The obtained collection of loops can be used for the approximation of the singe node marginals as well, as described in Equation (9). [sent-242, score-0.643]
59 This requires to run BP in each clamped network, and reuse the set of loops replacing with zero those terms where the clamped variable appears. [sent-244, score-0.658]
60 Note that generalized loops can be expressed as the composition of other loops in many different ways. [sent-248, score-1.2]
61 We then study the performance of the algorithm in a 10x10 grid, where complete enumeration of all generalized loops is not feasible. [sent-295, score-0.656]
62 It is the smallest size where complex loops are present. [sent-298, score-0.605]
63 At the same time, the problem is still tractable and exhaustive enumeration of all the loops can be done. [sent-299, score-0.619]
64 (right) Number of generalized loops as a function of the length using the factor graph representation. [sent-305, score-0.702]
65 Figure 3 (right) shows the histogram of all generalized loops for this small grid. [sent-307, score-0.626]
66 To analyze how the error changes as more loops are considered it is useful to sort all the terms r(C) by their absolute value in descending order such that |r(Ci )| ≥ |r(Ci+1 )|. [sent-312, score-0.601]
67 1, first row) we can see that truncating the series until a small number of loops (around 10) is enough to achieve machine precision. [sent-332, score-0.621]
68 As the right column illustrates, loop contributions decrease exponentially with the size, and loops with the same length correspond to very similar contributions. [sent-334, score-0.84]
69 Larger loops give negligible contributions and can thus be ignored by truncating the series. [sent-335, score-0.618]
70 As interactions are strengthened, however, more loops have to be considered to achieve maximum accuracy, and contributions show more variability for a given length (see middle row). [sent-336, score-0.714]
71 Eventually, for large couplings (σ ≥ 2), where BP often fails to converge, loops of all lengths give significant contributions. [sent-338, score-0.639]
72 This occurs for very strong interactions only, and when a small fraction of the total number of loops is considered. [sent-341, score-0.695]
73 After analyzing a small grid, we now address the case of the 10x10 Ising grid, where exhaustive enumeration of all the loops is not computationally feasible. [sent-354, score-0.604]
74 For S = 10 and S = 100, only simple loops were obtained whereas for S = 1000, a total of 44590 generalized loops was used to compute the truncated partition sum. [sent-378, score-1.275]
75 Although in both types of interactions the BP error (solid line with dots) is progressively reduced as more loops are considered, the picture differs significantly between the two cases. [sent-381, score-0.721]
76 Consequently, improvements in the BP result are monotonic as more loops are considered, and in this case, the TLSBP can be considered as a lower bound of the exact solution. [sent-384, score-0.61]
77 This indicates that for intermediate and strong couplings one would need more than the selected 44590 generalized loops to improve on the CVM result. [sent-390, score-0.692]
78 Since the number of generalized loops grows very fast with the size of the grid, we choose increasing values of S as well. [sent-404, score-0.626]
79 This simple linear increment in S means that as N is increased, the proportion of simple loops captured by TLSBP over the total existing number of simple loops decreases. [sent-406, score-1.148]
80 For simplicity, M is fixed to zero, so no complex loops are considered. [sent-408, score-0.605]
81 Actually, for large instances the latter choice does not modify the final set of loops, since generalized loops which can only be expressed as compositions of three or more simple loops are pruned using the bound b. [sent-410, score-1.228]
82 One has to keep in mind that the number of loops obtained using the TLSBP algorithm grows much faster, but much less than the total number of existing loops in the model. [sent-423, score-1.148]
83 For weak couplings (bottom-left) the BP error is always decreased significantly for this choice of parameters and the improvement remains almost constant as N increases, meaning that, in this case the number of loops which contribute most to the series expansion does not grow significantly with N. [sent-435, score-0.734]
84 Errors in the partition function for weak interactions (a), marginals for weak interactions (b), partition function for strong interactions (c), and marginals for strong interactions (d). [sent-482, score-0.756]
85 Consequently, the cost of the merging step grows fast, since many loops with length smaller than b are produced. [sent-500, score-0.608]
86 On the other hand, for d ≥ 7 simple loops have similar lengths and, therefore, less combinations result in additional loops with length larger than the bound b. [sent-501, score-1.18]
87 Without bounding the length of the loops in the merging step, we would expect the first scaling tendency (d < 7) also for values of d ≥ 7. [sent-502, score-0.608]
88 Note that the number of disease nodes in this graph can be large and hence loops can be present. [sent-537, score-0.652]
89 In this subsection, as well as comparing errors of single-node marginals obtained using TLSBP against CVM, we also use another loop correction approach, loop corrected belief propagation (LCBP) (Mooij and Kappen, 2007), which is based on the cavity method and also improves over BP estimates. [sent-538, score-0.675]
90 To analyze the TLSBP results in more depth we plot the ratio between the error obtained by TLSBP and the BP error versus the number of generalized loops found and the CPU time. [sent-567, score-0.68]
91 From Figure 12b we can deduce that cases where the BP error was most improved, often correspond to graphs with a small number of generalized loops found, whereas instances with highest number of loops considered have minor improvements. [sent-568, score-1.277]
92 This is explained by the fact that some instances which contained a few loops were easy to solve and thus the BP error was significantly reduced. [sent-569, score-0.629]
93 The performance of the algorithm does not depend directly on the size of the problem, but on how loopy the original graph is, although for larger instances it is more likely that more loops are present. [sent-584, score-0.662]
94 For weak couplings, errors of BP are most prominently caused by small simple loops and truncating the series is very useful, whereas for strongly coupled variables, loop terms of different sizes tend to be of the same order of magnitude, and truncating the series can be useless. [sent-586, score-0.93]
95 One approach consisted in modifying the third step in a way that, instead of applying blind mergings, generalized loops which have larger contributions (largest |r(C)|) were merged preferentially. [sent-605, score-0.642]
96 In practice, this approach tended to check all combinations of small loops which produced the same generalized loop, causing many redundant mergings. [sent-606, score-0.626]
97 Moreover, the cost of maintaining sorted the “best” generalized loops caused a significant increase in the computational complexity. [sent-607, score-0.626]
98 In this context, the partition sum or marginals can be computed incrementally as more generalized loops are being produced. [sent-617, score-0.747]
99 Another way to extend the approach is to consider the search for loops as a compilation stage of the inference task, which can be done offline. [sent-620, score-0.608]
100 During the development of this work another way of selecting generalized loops has been proposed (Chertkov and Chernyak, 2006a) in the context of Low Density Parity Check codes. [sent-622, score-0.626]
wordName wordTfidf (topN-words)
[('loops', 0.574), ('tlsbp', 0.539), ('bp', 0.376), ('loop', 0.232), ('cvm', 0.187), ('interactions', 0.106), ('disconnected', 0.094), ('ising', 0.072), ('eries', 0.07), ('xpansion', 0.07), ('marginals', 0.069), ('appen', 0.064), ('omez', 0.064), ('ooij', 0.064), ('oop', 0.06), ('runcating', 0.06), ('coupling', 0.057), ('newloops', 0.055), ('partition', 0.052), ('generalized', 0.052), ('corrections', 0.052), ('chertkov', 0.051), ('couplings', 0.051), ('mooij', 0.051), ('corrected', 0.046), ('patient', 0.043), ('chernyak', 0.043), ('kappen', 0.043), ('lcbp', 0.043), ('dfs', 0.043), ('merge', 0.042), ('clamped', 0.042), ('belief', 0.041), ('ndings', 0.04), ('propagation', 0.038), ('graph', 0.037), ('cpu', 0.035), ('promedas', 0.034), ('expansion', 0.033), ('complex', 0.031), ('bethe', 0.03), ('mergings', 0.03), ('weak', 0.03), ('enumeration', 0.03), ('instances', 0.028), ('truncating', 0.028), ('error', 0.027), ('fbp', 0.025), ('zbp', 0.025), ('nodes', 0.024), ('truncated', 0.023), ('loopy', 0.023), ('yedidia', 0.023), ('diagnoses', 0.023), ('junction', 0.023), ('graphs', 0.022), ('grids', 0.022), ('factor', 0.021), ('strength', 0.021), ('grid', 0.021), ('exact', 0.021), ('oldloops', 0.02), ('sloops', 0.02), ('ztilsbp', 0.02), ('parametrized', 0.019), ('series', 0.019), ('clamping', 0.019), ('connected', 0.019), ('inference', 0.019), ('length', 0.018), ('medical', 0.017), ('bi', 0.017), ('beliefs', 0.017), ('correction', 0.017), ('disease', 0.017), ('union', 0.017), ('diagnosis', 0.017), ('contributions', 0.016), ('merging', 0.016), ('degree', 0.016), ('tree', 0.016), ('interaction', 0.016), ('tractable', 0.015), ('converged', 0.015), ('improvements', 0.015), ('graphical', 0.015), ('ferromagnetic', 0.015), ('strong', 0.015), ('search', 0.015), ('diagrammatic', 0.015), ('hash', 0.015), ('hbp', 0.015), ('lsbp', 0.015), ('ubp', 0.015), ('regular', 0.015), ('diagnostic', 0.014), ('lengths', 0.014), ('progressively', 0.014), ('mechanics', 0.014), ('heskes', 0.014)]
simIndex simValue paperId paperTitle
same-paper 1 0.9999997 86 jmlr-2007-Truncating the Loop Series Expansion for Belief Propagation
Author: Vicenç Gómez, Joris M. Mooij, Hilbert J. Kappen
Abstract: Recently, Chertkov and Chernyak (2006b) derived an exact expression for the partition sum (normalization constant) corresponding to a graphical model, which is an expansion around the belief propagation (BP) solution. By adding correction terms to the BP free energy, one for each “generalized loop” in the factor graph, the exact partition sum is obtained. However, the usually enormous number of generalized loops generally prohibits summation over all correction terms. In this article we introduce truncated loop series BP (TLSBP), a particular way of truncating the loop series of Chertkov & Chernyak by considering generalized loops as compositions of simple loops. We analyze the performance of TLSBP in different scenarios, including the Ising model on square grids and regular random graphs, and on PROMEDAS, a large probabilistic medical diagnostic system. We show that TLSBP often improves upon the accuracy of the BP solution, at the expense of increased computation time. We also show that the performance of TLSBP strongly depends on the degree of interaction between the variables. For weak interactions, truncating the series leads to significant improvements, whereas for strong interactions it can be ineffective, even if a high number of terms is considered. Keywords: belief propagation, loop calculus, approximate inference, partition function, Ising grid, random regular graphs, medical diagnosis
2 0.42689797 51 jmlr-2007-Loop Corrections for Approximate Inference on Factor Graphs
Author: Joris M. Mooij, Hilbert J. Kappen
Abstract: We propose a method to improve approximate inference methods by correcting for the influence of loops in the graphical model. The method is a generalization and alternative implementation of a recent idea from Montanari and Rizzo (2005). It is applicable to arbitrary factor graphs, provided that the size of the Markov blankets is not too large. It consists of two steps: (i) an approximate inference method, for example, belief propagation, is used to approximate cavity distributions for each variable (i.e., probability distributions on the Markov blanket of a variable for a modified graphical model in which the factors involving that variable have been removed); (ii) all cavity distributions are improved by a message-passing algorithm that cancels out approximation errors by imposing certain consistency constraints. This loop correction (LC) method usually gives significantly better results than the original, uncorrected, approximate inference algorithm that is used to estimate the effect of loops. Indeed, we often observe that the loop-corrected error is approximately the square of the error of the uncorrected approximate inference method. In this article, we compare different variants of the loop correction method with other approximate inference methods on a variety of graphical models, including “real world” networks, and conclude that the LC method generally obtains the most accurate results. Keywords: loop corrections, approximate inference, graphical models, factor graphs, belief propagation
3 0.15433276 21 jmlr-2007-Comments on the "Core Vector Machines: Fast SVM Training on Very Large Data Sets"
Author: Gaëlle Loosli, Stéphane Canu
Abstract: In a recently published paper in JMLR, Tsang et al. (2005) present an algorithm for SVM called Core Vector Machines (CVM) and illustrate its performances through comparisons with other SVM solvers. After reading the CVM paper we were surprised by some of the reported results. In order to clarify the matter, we decided to reproduce some of the experiments. It turns out that to some extent, our results contradict those reported. Reasons of these different behaviors are given through the analysis of the stopping criterion. Keywords: SVM, CVM, large scale, KKT gap, stopping condition, stopping criteria
4 0.062821127 28 jmlr-2007-Dynamic Conditional Random Fields: Factorized Probabilistic Models for Labeling and Segmenting Sequence Data
Author: Charles Sutton, Andrew McCallum, Khashayar Rohanimanesh
Abstract: In sequence modeling, we often wish to represent complex interaction between labels, such as when performing multiple, cascaded labeling tasks on the same sequence, or when long-range dependencies exist. We present dynamic conditional random fields (DCRFs), a generalization of linear-chain conditional random fields (CRFs) in which each time slice contains a set of state variables and edges—a distributed state representation as in dynamic Bayesian networks (DBNs)—and parameters are tied across slices. Since exact inference can be intractable in such models, we perform approximate inference using several schedules for belief propagation, including tree-based reparameterization (TRP). On a natural-language chunking task, we show that a DCRF performs better than a series of linear-chain CRFs, achieving comparable performance using only half the training data. In addition to maximum conditional likelihood, we present two alternative approaches for training DCRFs: marginal likelihood training, for when we are primarily interested in predicting only a subset of the variables, and cascaded training, for when we have a distinct data set for each state variable, as in transfer learning. We evaluate marginal training and cascaded training on both synthetic data and real-world text data, finding that marginal training can improve accuracy when uncertainty exists over the latent variables, and that for transfer learning, a DCRF trained in a cascaded fashion performs better than a linear-chain CRF that predicts the final task directly. Keywords: conditional random fields, graphical models, sequence labeling
5 0.023974026 5 jmlr-2007-A Nonparametric Statistical Approach to Clustering via Mode Identification
Author: Jia Li, Surajit Ray, Bruce G. Lindsay
Abstract: A new clustering approach based on mode identification is developed by applying new optimization techniques to a nonparametric density estimator. A cluster is formed by those sample points that ascend to the same local maximum (mode) of the density function. The path from a point to its associated mode is efficiently solved by an EM-style algorithm, namely, the Modal EM (MEM). This method is then extended for hierarchical clustering by recursively locating modes of kernel density estimators with increasing bandwidths. Without model fitting, the mode-based clustering yields a density description for every cluster, a major advantage of mixture-model-based clustering. Moreover, it ensures that every cluster corresponds to a bump of the density. The issue of diagnosing clustering results is also investigated. Specifically, a pairwise separability measure for clusters is defined using the ridgeline between the density bumps of two clusters. The ridgeline is solved for by the Ridgeline EM (REM) algorithm, an extension of MEM. Based upon this new measure, a cluster merging procedure is created to enforce strong separation. Experiments on simulated and real data demonstrate that the mode-based clustering approach tends to combine the strengths of linkage and mixture-model-based clustering. In addition, the approach is robust in high dimensions and when clusters deviate substantially from Gaussian distributions. Both of these cases pose difficulty for parametric mixture modeling. A C package on the new algorithms is developed for public access at http://www.stat.psu.edu/∼jiali/hmac. Keywords: modal clustering, mode-based clustering, mixture modeling, modal EM, ridgeline EM, nonparametric density
6 0.023372835 17 jmlr-2007-Building Blocks for Variational Bayesian Learning of Latent Variable Models
7 0.021780049 19 jmlr-2007-Classification in Networked Data: A Toolkit and a Univariate Case Study
8 0.021609038 62 jmlr-2007-On the Effectiveness of Laplacian Normalization for Graph Semi-supervised Learning
9 0.019314742 11 jmlr-2007-Anytime Learning of Decision Trees
10 0.017715471 4 jmlr-2007-A New Probabilistic Approach in Rank Regression with Optimal Bayesian Partitioning (Special Topic on Model Selection)
11 0.017327266 89 jmlr-2007-VC Theory of Large Margin Multi-Category Classifiers (Special Topic on Model Selection)
12 0.016958814 52 jmlr-2007-Margin Trees for High-dimensional Classification
13 0.015875183 69 jmlr-2007-Proto-value Functions: A Laplacian Framework for Learning Representation and Control in Markov Decision Processes
14 0.014996481 1 jmlr-2007-"Ideal Parent" Structure Learning for Continuous Variable Bayesian Networks
15 0.014939454 73 jmlr-2007-Revised Loss Bounds for the Set Covering Machine and Sample-Compression Loss Bounds for Imbalanced Data
16 0.014617445 24 jmlr-2007-Consistent Feature Selection for Pattern Recognition in Polynomial Time
17 0.01450224 31 jmlr-2007-Estimating High-Dimensional Directed Acyclic Graphs with the PC-Algorithm
18 0.014389399 10 jmlr-2007-An Interior-Point Method for Large-Scalel1-Regularized Logistic Regression
19 0.013984902 72 jmlr-2007-Relational Dependency Networks
topicId topicWeight
[(0, 0.12), (1, 0.279), (2, 0.499), (3, -0.556), (4, 0.029), (5, -0.041), (6, 0.001), (7, 0.082), (8, -0.036), (9, 0.045), (10, -0.064), (11, 0.02), (12, -0.018), (13, -0.0), (14, -0.037), (15, 0.005), (16, 0.001), (17, 0.001), (18, 0.041), (19, -0.023), (20, 0.01), (21, 0.028), (22, -0.015), (23, -0.016), (24, -0.004), (25, -0.006), (26, 0.015), (27, -0.042), (28, -0.01), (29, 0.016), (30, -0.004), (31, 0.012), (32, -0.004), (33, 0.002), (34, -0.03), (35, 0.003), (36, 0.029), (37, 0.014), (38, 0.01), (39, 0.006), (40, -0.013), (41, 0.003), (42, 0.009), (43, -0.019), (44, 0.004), (45, 0.021), (46, 0.003), (47, -0.04), (48, 0.049), (49, -0.006)]
simIndex simValue paperId paperTitle
same-paper 1 0.98254085 86 jmlr-2007-Truncating the Loop Series Expansion for Belief Propagation
Author: Vicenç Gómez, Joris M. Mooij, Hilbert J. Kappen
Abstract: Recently, Chertkov and Chernyak (2006b) derived an exact expression for the partition sum (normalization constant) corresponding to a graphical model, which is an expansion around the belief propagation (BP) solution. By adding correction terms to the BP free energy, one for each “generalized loop” in the factor graph, the exact partition sum is obtained. However, the usually enormous number of generalized loops generally prohibits summation over all correction terms. In this article we introduce truncated loop series BP (TLSBP), a particular way of truncating the loop series of Chertkov & Chernyak by considering generalized loops as compositions of simple loops. We analyze the performance of TLSBP in different scenarios, including the Ising model on square grids and regular random graphs, and on PROMEDAS, a large probabilistic medical diagnostic system. We show that TLSBP often improves upon the accuracy of the BP solution, at the expense of increased computation time. We also show that the performance of TLSBP strongly depends on the degree of interaction between the variables. For weak interactions, truncating the series leads to significant improvements, whereas for strong interactions it can be ineffective, even if a high number of terms is considered. Keywords: belief propagation, loop calculus, approximate inference, partition function, Ising grid, random regular graphs, medical diagnosis
2 0.94300044 51 jmlr-2007-Loop Corrections for Approximate Inference on Factor Graphs
Author: Joris M. Mooij, Hilbert J. Kappen
Abstract: We propose a method to improve approximate inference methods by correcting for the influence of loops in the graphical model. The method is a generalization and alternative implementation of a recent idea from Montanari and Rizzo (2005). It is applicable to arbitrary factor graphs, provided that the size of the Markov blankets is not too large. It consists of two steps: (i) an approximate inference method, for example, belief propagation, is used to approximate cavity distributions for each variable (i.e., probability distributions on the Markov blanket of a variable for a modified graphical model in which the factors involving that variable have been removed); (ii) all cavity distributions are improved by a message-passing algorithm that cancels out approximation errors by imposing certain consistency constraints. This loop correction (LC) method usually gives significantly better results than the original, uncorrected, approximate inference algorithm that is used to estimate the effect of loops. Indeed, we often observe that the loop-corrected error is approximately the square of the error of the uncorrected approximate inference method. In this article, we compare different variants of the loop correction method with other approximate inference methods on a variety of graphical models, including “real world” networks, and conclude that the LC method generally obtains the most accurate results. Keywords: loop corrections, approximate inference, graphical models, factor graphs, belief propagation
3 0.45673642 21 jmlr-2007-Comments on the "Core Vector Machines: Fast SVM Training on Very Large Data Sets"
Author: Gaëlle Loosli, Stéphane Canu
Abstract: In a recently published paper in JMLR, Tsang et al. (2005) present an algorithm for SVM called Core Vector Machines (CVM) and illustrate its performances through comparisons with other SVM solvers. After reading the CVM paper we were surprised by some of the reported results. In order to clarify the matter, we decided to reproduce some of the experiments. It turns out that to some extent, our results contradict those reported. Reasons of these different behaviors are given through the analysis of the stopping criterion. Keywords: SVM, CVM, large scale, KKT gap, stopping condition, stopping criteria
Author: Charles Sutton, Andrew McCallum, Khashayar Rohanimanesh
Abstract: In sequence modeling, we often wish to represent complex interaction between labels, such as when performing multiple, cascaded labeling tasks on the same sequence, or when long-range dependencies exist. We present dynamic conditional random fields (DCRFs), a generalization of linear-chain conditional random fields (CRFs) in which each time slice contains a set of state variables and edges—a distributed state representation as in dynamic Bayesian networks (DBNs)—and parameters are tied across slices. Since exact inference can be intractable in such models, we perform approximate inference using several schedules for belief propagation, including tree-based reparameterization (TRP). On a natural-language chunking task, we show that a DCRF performs better than a series of linear-chain CRFs, achieving comparable performance using only half the training data. In addition to maximum conditional likelihood, we present two alternative approaches for training DCRFs: marginal likelihood training, for when we are primarily interested in predicting only a subset of the variables, and cascaded training, for when we have a distinct data set for each state variable, as in transfer learning. We evaluate marginal training and cascaded training on both synthetic data and real-world text data, finding that marginal training can improve accuracy when uncertainty exists over the latent variables, and that for transfer learning, a DCRF trained in a cascaded fashion performs better than a linear-chain CRF that predicts the final task directly. Keywords: conditional random fields, graphical models, sequence labeling
5 0.095199898 17 jmlr-2007-Building Blocks for Variational Bayesian Learning of Latent Variable Models
Author: Tapani Raiko, Harri Valpola, Markus Harva, Juha Karhunen
Abstract: We introduce standardised building blocks designed to be used with variational Bayesian learning. The blocks include Gaussian variables, summation, multiplication, nonlinearity, and delay. A large variety of latent variable models can be constructed from these blocks, including nonlinear and variance models, which are lacking from most existing variational systems. The introduced blocks are designed to fit together and to yield efficient update rules. Practical implementation of various models is easy thanks to an associated software package which derives the learning formulas automatically once a specific model structure has been fixed. Variational Bayesian learning provides a cost function which is used both for updating the variables of the model and for optimising the model structure. All the computations can be carried out locally, resulting in linear computational complexity. We present experimental results on several structures, including a new hierarchical nonlinear model for variances and means. The test results demonstrate the good performance and usefulness of the introduced method. Keywords: latent variable models, variational Bayesian learning, graphical models, building blocks, Bayesian modelling, local computation
6 0.08244469 89 jmlr-2007-VC Theory of Large Margin Multi-Category Classifiers (Special Topic on Model Selection)
7 0.082419328 11 jmlr-2007-Anytime Learning of Decision Trees
8 0.079815529 62 jmlr-2007-On the Effectiveness of Laplacian Normalization for Graph Semi-supervised Learning
9 0.077593833 24 jmlr-2007-Consistent Feature Selection for Pattern Recognition in Polynomial Time
10 0.077269167 4 jmlr-2007-A New Probabilistic Approach in Rank Regression with Optimal Bayesian Partitioning (Special Topic on Model Selection)
11 0.075027302 19 jmlr-2007-Classification in Networked Data: A Toolkit and a Univariate Case Study
12 0.073902667 5 jmlr-2007-A Nonparametric Statistical Approach to Clustering via Mode Identification
14 0.070973277 69 jmlr-2007-Proto-value Functions: A Laplacian Framework for Learning Representation and Control in Markov Decision Processes
15 0.069104314 79 jmlr-2007-Structure and Majority Classes in Decision Tree Learning
17 0.063798025 63 jmlr-2007-On the Representer Theorem and Equivalent Degrees of Freedom of SVR
18 0.063338824 23 jmlr-2007-Concave Learners for Rankboost
19 0.060684305 47 jmlr-2007-Learning Horn Expressions with LOGAN-H
20 0.060243502 31 jmlr-2007-Estimating High-Dimensional Directed Acyclic Graphs with the PC-Algorithm
topicId topicWeight
[(4, 0.019), (8, 0.013), (10, 0.016), (12, 0.033), (15, 0.023), (28, 0.038), (35, 0.139), (40, 0.026), (45, 0.02), (48, 0.038), (60, 0.019), (80, 0.019), (85, 0.068), (89, 0.326), (98, 0.087)]
simIndex simValue paperId paperTitle
same-paper 1 0.74571222 86 jmlr-2007-Truncating the Loop Series Expansion for Belief Propagation
Author: Vicenç Gómez, Joris M. Mooij, Hilbert J. Kappen
Abstract: Recently, Chertkov and Chernyak (2006b) derived an exact expression for the partition sum (normalization constant) corresponding to a graphical model, which is an expansion around the belief propagation (BP) solution. By adding correction terms to the BP free energy, one for each “generalized loop” in the factor graph, the exact partition sum is obtained. However, the usually enormous number of generalized loops generally prohibits summation over all correction terms. In this article we introduce truncated loop series BP (TLSBP), a particular way of truncating the loop series of Chertkov & Chernyak by considering generalized loops as compositions of simple loops. We analyze the performance of TLSBP in different scenarios, including the Ising model on square grids and regular random graphs, and on PROMEDAS, a large probabilistic medical diagnostic system. We show that TLSBP often improves upon the accuracy of the BP solution, at the expense of increased computation time. We also show that the performance of TLSBP strongly depends on the degree of interaction between the variables. For weak interactions, truncating the series leads to significant improvements, whereas for strong interactions it can be ineffective, even if a high number of terms is considered. Keywords: belief propagation, loop calculus, approximate inference, partition function, Ising grid, random regular graphs, medical diagnosis
2 0.68081069 72 jmlr-2007-Relational Dependency Networks
Author: Jennifer Neville, David Jensen
Abstract: Recent work on graphical models for relational data has demonstrated significant improvements in classification and inference when models represent the dependencies among instances. Despite its use in conventional statistical models, the assumption of instance independence is contradicted by most relational data sets. For example, in citation data there are dependencies among the topics of a paper’s references, and in genomic data there are dependencies among the functions of interacting proteins. In this paper, we present relational dependency networks (RDNs), graphical models that are capable of expressing and reasoning with such dependencies in a relational setting. We discuss RDNs in the context of relational Bayes networks and relational Markov networks and outline the relative strengths of RDNs—namely, the ability to represent cyclic dependencies, simple methods for parameter estimation, and efficient structure learning techniques. The strengths of RDNs are due to the use of pseudolikelihood learning techniques, which estimate an efficient approximation of the full joint distribution. We present learned RDNs for a number of real-world data sets and evaluate the models in a prediction context, showing that RDNs identify and exploit cyclic relational dependencies to achieve significant performance gains over conventional conditional models. In addition, we use synthetic data to explore model performance under various relational data characteristics, showing that RDN learning and inference techniques are accurate over a wide range of conditions. Keywords: relational learning, probabilistic relational models, knowledge discovery, graphical models, dependency networks, pseudolikelihood estimation
3 0.55606806 51 jmlr-2007-Loop Corrections for Approximate Inference on Factor Graphs
Author: Joris M. Mooij, Hilbert J. Kappen
Abstract: We propose a method to improve approximate inference methods by correcting for the influence of loops in the graphical model. The method is a generalization and alternative implementation of a recent idea from Montanari and Rizzo (2005). It is applicable to arbitrary factor graphs, provided that the size of the Markov blankets is not too large. It consists of two steps: (i) an approximate inference method, for example, belief propagation, is used to approximate cavity distributions for each variable (i.e., probability distributions on the Markov blanket of a variable for a modified graphical model in which the factors involving that variable have been removed); (ii) all cavity distributions are improved by a message-passing algorithm that cancels out approximation errors by imposing certain consistency constraints. This loop correction (LC) method usually gives significantly better results than the original, uncorrected, approximate inference algorithm that is used to estimate the effect of loops. Indeed, we often observe that the loop-corrected error is approximately the square of the error of the uncorrected approximate inference method. In this article, we compare different variants of the loop correction method with other approximate inference methods on a variety of graphical models, including “real world” networks, and conclude that the LC method generally obtains the most accurate results. Keywords: loop corrections, approximate inference, graphical models, factor graphs, belief propagation
4 0.3373878 81 jmlr-2007-The Locally Weighted Bag of Words Framework for Document Representation
Author: Guy Lebanon, Yi Mao, Joshua Dillon
Abstract: The popular bag of words assumption represents a document as a histogram of word occurrences. While computationally efficient, such a representation is unable to maintain any sequential information. We present an effective sequential document representation that goes beyond the bag of words representation and its n-gram extensions. This representation uses local smoothing to embed documents as smooth curves in the multinomial simplex thereby preserving valuable sequential information. In contrast to bag of words or n-grams, the new representation is able to robustly capture medium and long range sequential trends in the document. We discuss the representation and its geometric properties and demonstrate its applicability for various text processing tasks. Keywords: text processing, local smoothing
5 0.33200979 7 jmlr-2007-A Stochastic Algorithm for Feature Selection in Pattern Recognition
Author: Sébastien Gadat, Laurent Younes
Abstract: We introduce a new model addressing feature selection from a large dictionary of variables that can be computed from a signal or an image. Features are extracted according to an efficiency criterion, on the basis of specified classification or recognition tasks. This is done by estimating a probability distribution P on the complete dictionary, which distributes its mass over the more efficient, or informative, components. We implement a stochastic gradient descent algorithm, using the probability as a state variable and optimizing a multi-task goodness of fit criterion for classifiers based on variable randomly chosen according to P. We then generate classifiers from the optimal distribution of weights learned on the training set. The method is first tested on several pattern recognition problems including face detection, handwritten digit recognition, spam classification and micro-array analysis. We then compare our approach with other step-wise algorithms like random forests or recursive feature elimination. Keywords: stochastic learning algorithms, Robbins-Monro application, pattern recognition, classification algorithm, feature selection
6 0.33119947 32 jmlr-2007-Euclidean Embedding of Co-occurrence Data
7 0.32991737 62 jmlr-2007-On the Effectiveness of Laplacian Normalization for Graph Semi-supervised Learning
8 0.32546639 49 jmlr-2007-Learning to Classify Ordinal Data: The Data Replication Method
9 0.32362628 66 jmlr-2007-Penalized Model-Based Clustering with Application to Variable Selection
10 0.32339862 33 jmlr-2007-Fast Iterative Kernel Principal Component Analysis
11 0.32252252 17 jmlr-2007-Building Blocks for Variational Bayesian Learning of Latent Variable Models
12 0.32225695 19 jmlr-2007-Classification in Networked Data: A Toolkit and a Univariate Case Study
15 0.32159531 5 jmlr-2007-A Nonparametric Statistical Approach to Clustering via Mode Identification
17 0.31857726 58 jmlr-2007-Noise Tolerant Variants of the Perceptron Algorithm
18 0.31822944 76 jmlr-2007-Spherical-Homoscedastic Distributions: The Equivalency of Spherical and Normal Distributions in Classification
19 0.31774056 39 jmlr-2007-Handling Missing Values when Applying Classification Models
20 0.31756371 37 jmlr-2007-GiniSupport Vector Machine: Quadratic Entropy Based Robust Multi-Class Probability Regression