jmlr jmlr2006 jmlr2006-95 knowledge-graph by maker-knowledge-mining

95 jmlr-2006-Walk-Sums and Belief Propagation in Gaussian Graphical Models

Source: pdf

Author: Dmitry M. Malioutov, Jason K. Johnson, Alan S. Willsky

Abstract: We present a new framework based on walks in a graph for analysis and inference in Gaussian graphical models. The key idea is to decompose the correlation between each pair of variables as a sum over all walks between those variables in the graph. The weight of each walk is given by a product of edgewise partial correlation coefﬁcients. This representation holds for a large class of Gaussian graphical models which we call walk-summable. We give a precise characterization of this class of models, and relate it to other classes including diagonally dominant, attractive, nonfrustrated, and pairwise-normalizable. We provide a walk-sum interpretation of Gaussian belief propagation in trees and of the approximate method of loopy belief propagation in graphs with cycles. The walk-sum perspective leads to a better understanding of Gaussian belief propagation and to stronger results for its convergence in loopy graphs. Keywords: Gaussian graphical models, walk-sum analysis, convergence of loopy belief propagation

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Jordan Abstract We present a new framework based on walks in a graph for analysis and inference in Gaussian graphical models. [sent-8, score-0.566]

2 The key idea is to decompose the correlation between each pair of variables as a sum over all walks between those variables in the graph. [sent-9, score-0.463]

3 We provide a walk-sum interpretation of Gaussian belief propagation in trees and of the approximate method of loopy belief propagation in graphs with cycles. [sent-13, score-0.456]

4 We develop a “walk-sum” formulation for computation of means, variances and correlations as sums over certain sets of weighted walks in a graph. [sent-46, score-0.599]

5 The reason is that LBP captures all of the walks needed to compute the means but only computes a subset of the walks needed for the variances. [sent-56, score-0.926]

6 Similarly to our work, the decomposition is connected to the Neumann series expansion of the matrix inverse, but in addition to products of edge weights, their weight of a walk includes a complicated multi-dimensional integral. [sent-60, score-0.379]

7 , unlike paths, walks can cross an edge multiple times, and the weight of a path is rather hard to calculate, as opposed to our walk-weights). [sent-65, score-0.515]

8 Ultimately each node j must receive information from each of its neighbors, where the message, mi→ j (x j ), from neighbor i to j represents the result of eliminating all of the variables in the subtree rooted at node i and including all of its neighbors other than j (see Figure 1). [sent-141, score-0.369]

9 Since each of these messages is itself made up of variable elimination steps corresponding to the subtrees rooted at the other neighbors of node i, there is a set of ﬁxed-point equations that relate messages throughout the tree: mi→ j (x j ) = Z ψi j (xi , x j )ψi (xi ) ∏ mk→i (xi ) dxi . [sent-142, score-0.35]

10 The ﬁrst step corresponds to preparing the message to be sent from node i to node j by collecting information from all of the other neighbors of i: Jˆi\ j = Jii + ∑ ∆Jk→i and ˆ hi\ j = hi + ∑ ∆hk→i . [sent-156, score-0.473]

11 (7) k∈N (i)\ j k∈N (i)\ j The second step produces the information quantities to be propagated to node j: −1 ∆Ji→ j = −J ji Jˆi\ j J ji and −1 ˆ ∆hi→ j = −J ji Jˆi\ j hi\ j . [sent-157, score-0.414]

12 2036 WALK -S UMS IN G AUSSIAN G RAPHICAL M ODELS A message mi→ j passed from node i to node j ∈ N (i) captures the effect of eliminating the subtree rooted at i. [sent-167, score-0.449]

13 In (c), we show (3) how, after 3 iterations, messages link up to form the computation tree T1 of node 1 (the (3) (3) subtree T4→1 , associated with message m4→1 , is also indicated within the dotted outline). [sent-195, score-0.422]

14 , wl ) of nodes wk ∈ V such that each step of the walk (wk , wk+1 ) corresponds to an edge of the graph {wk , wk+1 } ∈ E. [sent-223, score-0.553]

15 k=1 We also allow zero-length “self” walks w = (v) at each node v for which we deﬁne φ(w) = 1. [sent-227, score-0.598]

16 To make a connection between these walks and Gaussian inference, we decompose the covariance matrix using the Neumann power series for the matrix inverse: 10 P = J −1 = (I − R)−1 = ∞ ∑ Rk , for ρ(R) < 1. [sent-228, score-0.463]

17 11 The (i, j)-th element of Rl can be expressed as a sum of weights of l walks w that go from i to j and have length l (denoted w : i → j): (Rl )i j = ∑ ri,w1 rw1 ,w2 . [sent-231, score-0.463]

18 2039 M ALIOUTOV, J OHNSON AND W ILLSKY The last equality holds because only the terms that correspond to walks in the graph have non-zero contributions: for all other terms at least one of the partial correlation coefﬁcients r wk ,wk+1 is zero. [sent-246, score-0.622]

19 The set of walks from i to j of length l is ﬁnite, and the sum of weights of these walks (the walksum) is well-deﬁned. [sent-247, score-0.926]

20 However, care must be taken, as walk-sums over countably many walks may or may not converge, and convergence may depend on the order of summation. [sent-249, score-0.49]

21 This motivates the following deﬁnition: We say that a Gaussian distribution is walk-summable (WS) if for all i, j ∈ V the unordered sum over all walks w from i to j (denoted w : i → j) ∑ φ(w) w:i→ j is well-deﬁned (i. [sent-250, score-0.463]

22 It uses absolute convergence to rearrange walks in order of ¯ increasing length, and the Perron-Frobenius theorem for part (iv). [sent-265, score-0.49]

23 The latter is sufﬁcient for the convergence of the walks ordered by increasing length, whereas walk-summability enables convergence to the same answer in arbitrary order of summation. [sent-267, score-0.517]

24 w:i→ j Also, the means are walk-sums reweighted by the value of h at the start of each walk: ∑ µi = h∗ φ(w) w:∗→i where the sum is over all walks which end at node i (with arbitrary starting node), and where ∗ denotes the starting node of the walk w. [sent-340, score-1.015]

25 r1,3 r2,3 Set of walks W (∗ → 1): {(1), (2, 1), (3, 1), (2, 3, 1), (3, 2, 1), (1, 2, 1)(1, 3, 1), . [sent-352, score-0.463]

26 In general, given a set of walks W we deﬁne the walk-sum: φ(W ) = ∑ φ(w) w∈W and the reweighted walk-sum: φh (W ) = ∑ w∈W hw0 φ(w) where w0 denotes the initial node in the walk w. [sent-361, score-0.88]

27 ) denotes the set of all walks having some property . [sent-365, score-0.463]

28 For instance, W (i → j) denotes the set of all walks from i to j and φ(i → j) is the corresponding walk-sum. [sent-375, score-0.463]

29 Also, W (∗ → i) denotes the set all walks that end at node i and φ h (∗ → i) is the corresponding reweighted walk-sum. [sent-376, score-0.598]

30 An illustration of walk-sums and their connection to inference appears in Figure 5 where we list some walks and walk-sums for a 3-cycle graph. [sent-378, score-0.491]

31 , vm ) with un = v0 (walk v begins where walk u ends) we deﬁne the product of walks uv = (u0 , . [sent-398, score-0.823]

32 Let U and V be two countable sets of walks such that every walk in U ends at a given node i and every walk in V begin at this node. [sent-405, score-1.192]

33 For individual walks it is evident that φ(uv) = φ(u)φ(v). [sent-411, score-0.463]

34 Note that W (i → i) is the set of self-return walks at node i, that is, walks which begin and end \i at node i. [sent-414, score-1.196]

35 These self-return walks include walks which return to i many times. [sent-415, score-0.926]

36 Let W (i → i) be the set of all walks with non-zero length that begin and end at i but do not visit i at any other point in between. [sent-416, score-0.463]

37 We call these the single-revisit self-return walks at node i. [sent-417, score-0.598]

38 The set of self-return walks \i that return exactly k times is generated by taking the product of k copies of W (i → i) denoted by \i W k (i → i). [sent-418, score-0.463]

39 Thus, we obtain all self-return walks as \i W (i → i) = ∪k≥0 W k (i → i) \i where W 0 (i → i) (12) {(i)}. [sent-419, score-0.463]

40 \i Similarly, recall that W (∗ → i) denotes the set of all walks which end at node i. [sent-420, score-0.598]

41 Let W (∗ → i) denote the set of walks with non-zero length which end at node i and do not visit i previously (we call them single-visit walks). [sent-421, score-0.598]

42 Thus, all walks which end at i are obtained as: \i W (∗ → i) = {(i)} ∪ W (∗ → i) W (i → i), (13) which is a valid decomposition. [sent-422, score-0.497]

43 First note that the decomposition of W k (i → i) into products of k single-revisit self\i \i return walks is a valid decomposition. [sent-430, score-0.519]

44 The basic idea here is to separate out the walks with positive and negative weights. [sent-439, score-0.463]

45 For each edge {i, j} ∈ E with ri j > 0 we ˆ and set the partial correlations on these edges to be equal deﬁne two edges {i+ , j+ }, {i− , j− } ∈ E, ˆ to ri j . [sent-442, score-0.389]

46 In essence, this is accomplished by summing walks with positive and negative weights separately and then taking the difference, which is only possible for walk-summable models. [sent-459, score-0.463]

47 Moreover, for these models we show that exact walk-sums over inﬁnite sets of walks for means and variances can be computed efﬁciently in a recursive fashion. [sent-495, score-0.601]

48 The ingredients for these results are decompositions of the variance and mean walk-sums in terms of sums over walks on subtrees, together with the decomposition in terms of single-revisit and single-visit walks provided in Proposition 9. [sent-511, score-0.979]

49 This latter walk-sum can be further decomposed into sums over disjoint subsets of walks each of which corresponds to single-revisit self-return walks that exit node j via a speciﬁc one of its neighbors, say i. [sent-514, score-1.061]

50 In particular, as illustrated in Figure 7, the single-revisit self-return walks that do this correspond to walks that live in the subtree Ti→ j . [sent-515, score-0.964]

51 Using \j the notation W ( j → j | Ti→ j ) for the set of all single-revisit walks which are restricted to stay in subtree Ti→ j we see that ∑ \j α j = φ( j → j) = ∑ \j φ( j → j | Ti→ j ) i∈N ( j) αi→ j . [sent-516, score-0.501]

52 Mean µ j is the reweighted walk-sum over walks that start anywhere and end at node j, µ j = φh (∗ → j). [sent-524, score-0.598]

53 Any walk that ends at node j can be expressed as a single-visit walk to node j followed by a multiple\j revisit self-return walk from node j: φh (∗ → j) = h j + φh (∗ → j) φ( j → j), where the term h j corresponds to the length-zero walk that starts and ends at node j. [sent-525, score-1.728]

54 2 LBP in Walk-Summable Models In this subsection we use the LBP computation tree to show that LBP includes all the walks for the means, but only a subset of the walks for the variances. [sent-531, score-1.013]

55 As we have discussed, running LBP for some number of iterations yields identical calculations at any particular node i to the exact inference calculations on the corresponding computation tree (n) rooted at node i. [sent-535, score-0.42]

56 We use the notation Ti for the nth computation tree at node i, Ti for the full (n) computation tree (as n → ∞) and we assign the label 0 to the root node. [sent-536, score-0.363]

57 Then, P0 (Ti ) denotes the variance at the root node of the nth computation tree rooted at node i in G. [sent-537, score-0.477]

58 We say that a message schedule is proper if every message is updated inﬁnitely often, that is, if for every m > 0 and every directed edge (i, j) in the graph there exists n > m such that (i, j) ∈ M (n) . [sent-541, score-0.421]

59 Lemma 18 (Walks in G and in Ti ) There is a one-to one correspondence between ﬁnite-length walks in G that end at i, and walks in Ti that end at the root node. [sent-547, score-0.953]

60 In particular, for each such walk in G (n) there is a corresponding walk in Ti for n large enough. [sent-548, score-0.564]

61 Now, recall that to compute the mean µi we need to gather walk-sums over all walks that start anywhere and end at i. [sent-549, score-0.463]

62 We have just shown that LBP gathers all of these walks as the computation tree grows to inﬁnity. [sent-550, score-0.55]

63 The true variance Pii is a walk-sum over all self-return walks that start and end at i in G. [sent-552, score-0.494]

64 However, walks in G that start and end at i (n) may map to walks that start at the root node of Ti , but end at a replica of the root node instead of the root. [sent-553, score-1.28]

65 These walks are not captured by the LBP variance estimate. [sent-554, score-0.494]

66 15 The walks for the variance (n) (n) estimate P0 (Ti ) are self-return walks W (0 → 0 | Ti ) that start and end at the root node in the 15. [sent-555, score-1.119]

67 Recall that the computation tree is a representation of the computations seen at the root node of the tree, and it is only the computation at this node—that is, at this replica of node i that corresponds to the LBP computation at node i in G. [sent-556, score-0.549]

68 The walk (1, 2, 3, 1) is a self-return walk in the original graph G but is not a self-return walk in the computation tree shown in Figure 2(d). [sent-559, score-0.982]

69 LBP variances capture only those self-return walks of the original graph G that are also self-return walks in the computation tree—for example, the walk (1, 3, 2, 3, 4, 3, 1) is a self-return walk in both Figures 2(a) and (d). [sent-560, score-1.642]

70 Hence, Lemma 19 (Self-return walks in G and in Ti ) The LBP variance estimate at each node is a sum over the backtracking self-return walks in G, a subset of all self-return walks needed to calculate the correct variance. [sent-562, score-1.555]

71 Note that back-tracking walks for the variances have positive weights, since each edge in the walk is traversed an even number of times. [sent-563, score-0.9]

72 With each LBP step the computation tree grows and new back-tracking walks are included, hence variance estimates grow monotonically. [sent-564, score-0.581]

73 16 We have shown which walks LBP gathers based on the computation tree. [sent-565, score-0.463]

74 (n) Intuitively, walks in the computation tree Ti are subsets of the walks in G, and hence they converge. [sent-568, score-1.013]

75 Proposition 21 (Convergence of LBP for walk-summable models) If a model on a graph G is walk-summable, then LBP is well-posed, the means converge to the true means and the LBP variances converge to walk-sums over the backtracking self-return walks at each node. [sent-572, score-0.707]

76 Let W (i → i) denote the back-tracking self-return walks at node i. [sent-574, score-0.598]

77 In attractive and non-frustrated models LBP variance estimates are less than or equal to the true variances (the missing non-backtracking walks all have positive weights). [sent-594, score-0.676]

78 In our framework, each such correlation is a walksum over the subset of non-backtracking self-return walks in G that, in the computation tree, begin at a particular replica of the root. [sent-600, score-0.493]

79 This implies that all the computation trees T (n) are walk-summable and that variances monotonically increase (since weights of backtracking walks are positive, see the discussion after Lemma 19). [sent-654, score-0.651]

80 The reason is that LBP only computes a subset of the self-return walks for the variances but captures all the walks for the means. [sent-672, score-1.029]

81 One involves developing improved walk-sum algorithms that gather more walks than LBP does, to yield better variance estimates. [sent-774, score-0.494]

82 However, related expansions of correlations in terms of walks have been investigated for other models. [sent-780, score-0.496]

83 (1983) use walk-sums for non-Gaussian classical and quantum spin-systems, where the weights of walks involve complicated multi-dimensional integrals. [sent-783, score-0.463]

84 First note that (Rl )i j is an absolute walk-sum over all walks of length l from i to j: ¯ ( Rl ) i j = ∑ |φ(w)| l w:i→ j (there are a ﬁnite number of these walks so the sum is well-deﬁned). [sent-788, score-0.926]

85 If we order walks by their length and then group terms for walks of equal lengths (each group has a ﬁnite number of terms) we obtain: ∑ w:i→ j |φ(w)| = ∑ ∑ l w:i→ j l ¯ |φ(w)| = ∑(Rl )i j . [sent-790, score-0.926]

86 As shown in (17), ∑l (Rl )i j corresponds to one particular ordering of the walks which converges by (ii). [sent-794, score-0.463]

87 ¯ relabeling the nodes), so ρ(R) Proof of Proposition 11 We partition walk-sums into sums over “even” and “odd” walks according to the number of negative edges crossed by the walk. [sent-839, score-0.507]

88 The graph G is deﬁned so that every walk from i+ to j+ is even and every walk from i+ to j− is odd. [sent-841, score-0.613]

89 Note also that setting h = (h+ ; h− ) has the effect that all walks with hw0 > 0 begin in V+ and all walks with hw0 < 0 begin in V− . [sent-845, score-0.926]

90 Consequently, every even walk ends in V+ and every odd walk ends in V− . [sent-846, score-0.651]

91 Thus, J is WALK -S UMS IN G AUSSIAN G RAPHICAL M ODELS Proof of Proposition 16 To calculate the walk-sum for multiple-revisit self-return walks in Ti\ j , we can use the single-revisit counterpart: γi\ j = φ(i → i | Ti\ j ) = 1 . [sent-878, score-0.463]

92 \i (18) 1 − φ i → i | Ti\ j Now, we decompose the single-revisit walks in the subtree Ti\ j in terms of the possible ﬁrst step of the walk (i, k), where k ∈ N (i)\ j. [sent-879, score-0.783]

93 Proof of Proposition 17 A multiple-revisit walk in Ti\ j can be written in terms of single-visit walks: \i φh (∗ → i | Ti\ j ) = hi + φh (∗ → i | Ti\ j ) φ(i → i | Ti\ j ). [sent-889, score-0.353]

94 2059 M ALIOUTOV, J OHNSON AND W ILLSKY (n) Proof of Lemma 18 First, we note that for every walk w which ends at the root node of Ti there is a corresponding walk in G which ends at i. [sent-896, score-0.786]

95 Then (n) for any walk in G that ends at wl and has length n there is a walk in Twl that ends at the root. [sent-904, score-0.653]

96 The intuition for other message schedules is that every step (i, j) of the walk will appear eventually in any proper message schedule M . [sent-905, score-0.602]

97 First we unwrap the walk w into a tree Tw rooted at wl in the following way: start at wl , the end of the walk, and traverse the walk in reverse. [sent-907, score-0.744]

98 Now the com- (n) (n) putation tree Twl (which is created by splicing together Ti→ j for all edges (i, j) coming into the root of Tw ) contains Tw , and hence it contains the walk w. [sent-923, score-0.44]

99 ρ(Ri ) ≤ ρ(Ri (n) Proof of Lemma 24 Let Ti (M ) denote the nth computation tree under a proper message sched(n) ule M rooted at node i. [sent-932, score-0.411]

100 We use the following simple extension of Lemma 18: Let Ti (M1 ) be the (n) nth computation tree rooted at i under message schedule M 1 . [sent-933, score-0.342]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('lbp', 0.563), ('walks', 0.463), ('walk', 0.282), ('ti', 0.234), ('je', 0.169), ('node', 0.135), ('wk', 0.11), ('ri', 0.108), ('message', 0.106), ('variances', 0.103), ('alioutov', 0.103), ('illsky', 0.103), ('ohnson', 0.103), ('tw', 0.097), ('ums', 0.097), ('ji', 0.093), ('jii', 0.091), ('tree', 0.087), ('schedule', 0.087), ('trees', 0.085), ('belief', 0.079), ('uv', 0.078), ('hi', 0.071), ('propagation', 0.064), ('raphical', 0.063), ('proposition', 0.06), ('diagonally', 0.06), ('bp', 0.059), ('odels', 0.059), ('messages', 0.056), ('aussian', 0.055), ('loopy', 0.055), ('edge', 0.052), ('ws', 0.05), ('graph', 0.049), ('walksummable', 0.048), ('edges', 0.044), ('attractive', 0.044), ('frustrated', 0.042), ('elimination', 0.042), ('gaussian', 0.041), ('gk', 0.04), ('dominant', 0.04), ('subtree', 0.038), ('rl', 0.036), ('converge', 0.036), ('moallemi', 0.036), ('rk', 0.036), ('critical', 0.036), ('models', 0.035), ('rooted', 0.035), ('valid', 0.034), ('rx', 0.034), ('limn', 0.034), ('spectral', 0.033), ('roy', 0.033), ('correlations', 0.033), ('johnson', 0.031), ('variance', 0.031), ('nodes', 0.031), ('drd', 0.03), ('replica', 0.03), ('radius', 0.03), ('ends', 0.03), ('validity', 0.03), ('graphs', 0.03), ('wl', 0.029), ('mi', 0.028), ('inference', 0.028), ('nth', 0.027), ('odd', 0.027), ('pi', 0.027), ('root', 0.027), ('convergence', 0.027), ('graphical', 0.026), ('freeman', 0.026), ('neighbors', 0.026), ('twl', 0.026), ('rusmevichientong', 0.026), ('tk', 0.025), ('deg', 0.024), ('gmrfs', 0.024), ('jpn', 0.024), ('malioutov', 0.024), ('varga', 0.024), ('lemma', 0.024), ('iv', 0.024), ('marginal', 0.024), ('connected', 0.023), ('decomposition', 0.022), ('van', 0.021), ('proper', 0.021), ('potentials', 0.021), ('chord', 0.02), ('radii', 0.02), ('cycles', 0.02), ('leaf', 0.02), ('model', 0.02), ('diagonal', 0.02), ('weiss', 0.02)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0 95 jmlr-2006-Walk-Sums and Belief Propagation in Gaussian Graphical Models

Author: Dmitry M. Malioutov, Jason K. Johnson, Alan S. Willsky

2 0.102203 5 jmlr-2006-A Robust Procedure For Gaussian Graphical Model Search From Microarray Data WithpLarger Thann

Author: Robert Castelo, Alberto Roverato

Abstract: Learning of large-scale networks of interactions from microarray data is an important and challenging problem in bioinformatics. A widely used approach is to assume that the available data constitute a random sample from a multivariate distribution belonging to a Gaussian graphical model. As a consequence, the prime objects of inference are full-order partial correlations which are partial correlations between two variables given the remaining ones. In the context of microarray data the number of variables exceed the sample size and this precludes the application of traditional structure learning procedures because a sampling version of full-order partial correlations does not exist. In this paper we consider limited-order partial correlations, these are partial correlations computed on marginal distributions of manageable size, and provide a set of rules that allow one to assess the usefulness of these quantities to derive the independence structure of the underlying Gaussian graphical model. Furthermore, we introduce a novel structure learning procedure based on a quantity, obtained from limited-order partial correlations, that we call the non-rejection rate. The applicability and usefulness of the procedure are demonstrated by both simulated and real data. Keywords: Gaussian distribution, gene network, graphical model, microarray data, non-rejection rate, partial correlation, small-sample inference

3 0.073913619 28 jmlr-2006-Estimating the "Wrong" Graphical Model: Benefits in the Computation-Limited Setting

Author: Martin J. Wainwright

Abstract: Consider the problem of joint parameter estimation and prediction in a Markov random ﬁeld: that is, the model parameters are estimated on the basis of an initial set of data, and then the ﬁtted model is used to perform prediction (e.g., smoothing, denoising, interpolation) on a new noisy observation. Working under the restriction of limited computation, we analyze a joint method in which the same convex variational relaxation is used to construct an M-estimator for ﬁtting parameters, and to perform approximate marginalization for the prediction step. The key result of this paper is that in the computation-limited setting, using an inconsistent parameter estimator (i.e., an estimator that returns the “wrong” model even in the inﬁnite data limit) is provably beneﬁcial, since the resulting errors can partially compensate for errors made by using an approximate prediction technique. En route to this result, we analyze the asymptotic properties of M-estimators based on convex variational relaxations, and establish a Lipschitz stability property that holds for a broad class of convex variational methods. This stability result provides additional incentive, apart from the obvious beneﬁt of unique global optima, for using message-passing methods based on convex variational relaxations. We show that joint estimation/prediction based on the reweighted sum-product algorithm substantially outperforms a commonly used heuristic based on ordinary sum-product. Keywords: graphical model, Markov random ﬁeld, belief propagation, variational method, parameter estimation, prediction error, algorithmic stability

4 0.066200495 34 jmlr-2006-Generalized Bradley-Terry Models and Multi-Class Probability Estimates

Author: Tzu-Kuo Huang, Ruby C. Weng, Chih-Jen Lin

Abstract: The Bradley-Terry model for obtaining individual skill from paired comparisons has been popular in many areas. In machine learning, this model is related to multi-class probability estimates by coupling all pairwise classiﬁcation results. Error correcting output codes (ECOC) are a general framework to decompose a multi-class problem to several binary problems. To obtain probability estimates under this framework, this paper introduces a generalized Bradley-Terry model in which paired individual comparisons are extended to paired team comparisons. We propose a simple algorithm with convergence proofs to solve the model and obtain individual skill. Experiments on synthetic and real data demonstrate that the algorithm is useful for obtaining multi-class probability estimates. Moreover, we discuss four extensions of the proposed model: 1) weighted individual skill, 2) home-ﬁeld advantage, 3) ties, and 4) comparisons with more than two teams. Keywords: Bradley-Terry model, probability estimates, error correcting output codes, support vector machines

5 0.054950647 25 jmlr-2006-Distance Patterns in Structural Similarity

Author: Thomas Kämpke

Abstract: Similarity of edge labeled graphs is considered in the sense of minimum squared distance between corresponding values. Vertex correspondences are established by isomorphisms if both graphs are of equal size and by subisomorphisms if one graph has fewer vertices than the other. Best ﬁt isomorphisms and subisomorphisms amount to solutions of quadratic assignment problems and are computed exactly as well as approximately by minimum cost ﬂow, linear assignment relaxations and related graph algorithms. Keywords: assignment problem, best approximation, branch and bound, inexact graph matching, model data base

6 0.052096114 64 jmlr-2006-Noisy-OR Component Analysis and its Application to Link Analysis

7 0.051777437 55 jmlr-2006-Linear Programming Relaxations and Belief Propagation -- An Empirical Study (Special Topic on Machine Learning and Optimization)

8 0.051764708 37 jmlr-2006-Incremental Algorithms for Hierarchical Classification

9 0.045556199 74 jmlr-2006-Point-Based Value Iteration for Continuous POMDPs

10 0.041418716 26 jmlr-2006-Efficient Learning of Label Ranking by Soft Projections onto Polyhedra (Special Topic on Machine Learning and Optimization)

11 0.040785354 41 jmlr-2006-Kernel-Based Learning of Hierarchical Multilabel Classification Models (Special Topic on Machine Learning and Optimization)

12 0.039123595 52 jmlr-2006-Learning Spectral Clustering, With Application To Speech Separation

13 0.038386773 63 jmlr-2006-New Algorithms for Efficient High-Dimensional Nonparametric Classification

14 0.037189227 20 jmlr-2006-Collaborative Multiagent Reinforcement Learning by Payoff Propagation

15 0.033897318 89 jmlr-2006-Structured Prediction, Dual Extragradient and Bregman Projections (Special Topic on Machine Learning and Optimization)

16 0.032353017 90 jmlr-2006-Superior Guarantees for Sequential Prediction and Lossless Compression via Alphabet Decomposition

17 0.031671505 54 jmlr-2006-Learning the Structure of Linear Latent Variable Models

18 0.030325748 6 jmlr-2006-A Scoring Function for Learning Bayesian Networks based on Mutual Information and Conditional Independence Tests

19 0.030037522 2 jmlr-2006-A Graphical Representation of Equivalence Classes of AMP Chain Graphs

20 0.029916124 19 jmlr-2006-Causal Graph Based Decomposition of Factored MDPs

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.169), (1, -0.0), (2, -0.152), (3, 0.048), (4, -0.069), (5, 0.03), (6, 0.207), (7, -0.001), (8, -0.014), (9, -0.041), (10, 0.003), (11, -0.047), (12, -0.045), (13, -0.056), (14, 0.054), (15, -0.043), (16, 0.053), (17, -0.175), (18, -0.036), (19, 0.078), (20, -0.059), (21, 0.069), (22, -0.164), (23, 0.166), (24, -0.139), (25, 0.127), (26, 0.027), (27, -0.129), (28, -0.161), (29, -0.053), (30, -0.104), (31, 0.053), (32, 0.122), (33, -0.123), (34, 0.138), (35, -0.092), (36, -0.009), (37, 0.21), (38, -0.06), (39, -0.055), (40, -0.028), (41, -0.16), (42, 0.184), (43, 0.042), (44, -0.126), (45, -0.168), (46, 0.189), (47, -0.093), (48, -0.25), (49, 0.005)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96209466 95 jmlr-2006-Walk-Sums and Belief Propagation in Gaussian Graphical Models

Author: Dmitry M. Malioutov, Jason K. Johnson, Alan S. Willsky

2 0.39605579 5 jmlr-2006-A Robust Procedure For Gaussian Graphical Model Search From Microarray Data WithpLarger Thann

Author: Robert Castelo, Alberto Roverato

3 0.30829445 55 jmlr-2006-Linear Programming Relaxations and Belief Propagation -- An Empirical Study (Special Topic on Machine Learning and Optimization)

Author: Chen Yanover, Talya Meltzer, Yair Weiss

Abstract: The problem of ﬁnding the most probable (MAP) conﬁguration in graphical models comes up in a wide range of applications. In a general graphical model this problem is NP hard, but various approximate algorithms have been developed. Linear programming (LP) relaxations are a standard method in computer science for approximating combinatorial problems and have been used for ﬁnding the most probable assignment in small graphical models. However, applying this powerful method to real-world problems is extremely challenging due to the large numbers of variables and constraints in the linear program. Tree-Reweighted Belief Propagation is a promising recent algorithm for solving LP relaxations, but little is known about its running time on large problems. In this paper we compare tree-reweighted belief propagation (TRBP) and powerful generalpurpose LP solvers (CPLEX) on relaxations of real-world graphical models from the ﬁelds of computer vision and computational biology. We ﬁnd that TRBP almost always ﬁnds the solution signiﬁcantly faster than all the solvers in CPLEX and more importantly, TRBP can be applied to large scale problems for which the solvers in CPLEX cannot be applied. Using TRBP we can ﬁnd the MAP conﬁgurations in a matter of minutes for a large range of real world problems.

4 0.26201889 28 jmlr-2006-Estimating the "Wrong" Graphical Model: Benefits in the Computation-Limited Setting

Author: Martin J. Wainwright

5 0.24181932 34 jmlr-2006-Generalized Bradley-Terry Models and Multi-Class Probability Estimates

Author: Tzu-Kuo Huang, Ruby C. Weng, Chih-Jen Lin

6 0.23929365 19 jmlr-2006-Causal Graph Based Decomposition of Factored MDPs

7 0.22433308 25 jmlr-2006-Distance Patterns in Structural Similarity

8 0.22279236 63 jmlr-2006-New Algorithms for Efficient High-Dimensional Nonparametric Classification

9 0.21535553 90 jmlr-2006-Superior Guarantees for Sequential Prediction and Lossless Compression via Alphabet Decomposition

10 0.20925343 64 jmlr-2006-Noisy-OR Component Analysis and its Application to Link Analysis

11 0.19982627 2 jmlr-2006-A Graphical Representation of Equivalence Classes of AMP Chain Graphs

12 0.19946957 37 jmlr-2006-Incremental Algorithms for Hierarchical Classification

13 0.19087026 74 jmlr-2006-Point-Based Value Iteration for Continuous POMDPs

14 0.18490475 48 jmlr-2006-Learning Minimum Volume Sets

15 0.1777472 40 jmlr-2006-Infinite-σ Limits For Tikhonov Regularization

16 0.17155221 83 jmlr-2006-Sparse Boosting

17 0.16926676 43 jmlr-2006-Large Scale Multiple Kernel Learning (Special Topic on Machine Learning and Optimization)

18 0.16810787 41 jmlr-2006-Kernel-Based Learning of Hierarchical Multilabel Classification Models (Special Topic on Machine Learning and Optimization)

19 0.16565777 26 jmlr-2006-Efficient Learning of Label Ranking by Soft Projections onto Polyhedra (Special Topic on Machine Learning and Optimization)

20 0.15663159 20 jmlr-2006-Collaborative Multiagent Reinforcement Learning by Payoff Propagation

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(8, 0.025), (36, 0.048), (45, 0.038), (50, 0.061), (61, 0.018), (63, 0.032), (70, 0.011), (71, 0.329), (76, 0.051), (78, 0.015), (79, 0.04), (81, 0.033), (84, 0.027), (89, 0.049), (90, 0.033), (91, 0.043), (96, 0.042)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.72478664 95 jmlr-2006-Walk-Sums and Belief Propagation in Gaussian Graphical Models

Author: Dmitry M. Malioutov, Jason K. Johnson, Alan S. Willsky

2 0.66320515 27 jmlr-2006-Ensemble Pruning Via Semi-definite Programming (Special Topic on Machine Learning and Optimization)

Author: Yi Zhang, Samuel Burer, W. Nick Street

Abstract: An ensemble is a group of learning models that jointly solve a problem. However, the ensembles generated by existing techniques are sometimes unnecessarily large, which can lead to extra memory usage, computational costs, and occasional decreases in eﬀectiveness. The purpose of ensemble pruning is to search for a good subset of ensemble members that performs as well as, or better than, the original ensemble. This subset selection problem is a combinatorial optimization problem and thus ﬁnding the exact optimal solution is computationally prohibitive. Various heuristic methods have been developed to obtain an approximate solution. However, most of the existing heuristics use simple greedy search as the optimization method, which lacks either theoretical or empirical quality guarantees. In this paper, the ensemble subset selection problem is formulated as a quadratic integer programming problem. By applying semi-deﬁnite programming (SDP) as a solution technique, we are able to get better approximate solutions. Computational experiments show that this SDP-based pruning algorithm outperforms other heuristics in the literature. Its application in a classiﬁer-sharing study also demonstrates the eﬀectiveness of the method. Keywords: ensemble pruning, semi-deﬁnite programming, heuristics, knowledge sharing

3 0.37728354 28 jmlr-2006-Estimating the "Wrong" Graphical Model: Benefits in the Computation-Limited Setting

Author: Martin J. Wainwright

4 0.35970041 9 jmlr-2006-Accurate Error Bounds for the Eigenvalues of the Kernel Matrix

Author: Mikio L. Braun

Abstract: The eigenvalues of the kernel matrix play an important role in a number of kernel methods, in particular, in kernel principal component analysis. It is well known that the eigenvalues of the kernel matrix converge as the number of samples tends to inﬁnity. We derive probabilistic ﬁnite sample size bounds on the approximation error of individual eigenvalues which have the important property that the bounds scale with the eigenvalue under consideration, reﬂecting the actual behavior of the approximation errors as predicted by asymptotic results and observed in numerical simulations. Such scaling bounds have so far only been known for tail sums of eigenvalues. Asymptotically, the bounds presented here have a slower than stochastic rate, but the number of sample points necessary to make this disadvantage noticeable is often unrealistically large. Therefore, under practical conditions, and for all but the largest few eigenvalues, the bounds presented here form a signiﬁcant improvement over existing non-scaling bounds. Keywords: kernel matrix, eigenvalues, relative perturbation bounds

5 0.33330125 5 jmlr-2006-A Robust Procedure For Gaussian Graphical Model Search From Microarray Data WithpLarger Thann

Author: Robert Castelo, Alberto Roverato

6 0.32611483 41 jmlr-2006-Kernel-Based Learning of Hierarchical Multilabel Classification Models (Special Topic on Machine Learning and Optimization)

7 0.32517856 48 jmlr-2006-Learning Minimum Volume Sets

8 0.32222944 29 jmlr-2006-Estimation of Gradients and Coordinate Covariation in Classification

9 0.31882504 53 jmlr-2006-Learning a Hidden Hypergraph

10 0.31796739 66 jmlr-2006-On Model Selection Consistency of Lasso

11 0.31675598 70 jmlr-2006-Online Passive-Aggressive Algorithms

12 0.31604668 25 jmlr-2006-Distance Patterns in Structural Similarity

13 0.3159124 87 jmlr-2006-Stochastic Complexities of Gaussian Mixtures in Variational Bayesian Approximation

14 0.31581149 52 jmlr-2006-Learning Spectral Clustering, With Application To Speech Separation

15 0.31483886 17 jmlr-2006-Bounds for the Loss in Probability of Correct Classification Under Model Based Approximation

16 0.31443864 46 jmlr-2006-Learning Factor Graphs in Polynomial Time and Sample Complexity

17 0.3132776 90 jmlr-2006-Superior Guarantees for Sequential Prediction and Lossless Compression via Alphabet Decomposition

18 0.3123863 73 jmlr-2006-Pattern Recognition for Conditionally Independent Data

19 0.31008413 54 jmlr-2006-Learning the Structure of Linear Latent Variable Models

20 0.30617574 56 jmlr-2006-Linear Programs for Hypotheses Selection in Probabilistic Inference Models (Special Topic on Machine Learning and Optimization)