jmlr jmlr2006 jmlr2006-95 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Dmitry M. Malioutov, Jason K. Johnson, Alan S. Willsky
Abstract: We present a new framework based on walks in a graph for analysis and inference in Gaussian graphical models. The key idea is to decompose the correlation between each pair of variables as a sum over all walks between those variables in the graph. The weight of each walk is given by a product of edgewise partial correlation coefficients. This representation holds for a large class of Gaussian graphical models which we call walk-summable. We give a precise characterization of this class of models, and relate it to other classes including diagonally dominant, attractive, nonfrustrated, and pairwise-normalizable. We provide a walk-sum interpretation of Gaussian belief propagation in trees and of the approximate method of loopy belief propagation in graphs with cycles. The walk-sum perspective leads to a better understanding of Gaussian belief propagation and to stronger results for its convergence in loopy graphs. Keywords: Gaussian graphical models, walk-sum analysis, convergence of loopy belief propagation
Reference: text
sentIndex sentText sentNum sentScore
1 Jordan Abstract We present a new framework based on walks in a graph for analysis and inference in Gaussian graphical models. [sent-8, score-0.566]
2 The key idea is to decompose the correlation between each pair of variables as a sum over all walks between those variables in the graph. [sent-9, score-0.463]
3 We provide a walk-sum interpretation of Gaussian belief propagation in trees and of the approximate method of loopy belief propagation in graphs with cycles. [sent-13, score-0.456]
4 We develop a “walk-sum” formulation for computation of means, variances and correlations as sums over certain sets of weighted walks in a graph. [sent-46, score-0.599]
5 The reason is that LBP captures all of the walks needed to compute the means but only computes a subset of the walks needed for the variances. [sent-56, score-0.926]
6 Similarly to our work, the decomposition is connected to the Neumann series expansion of the matrix inverse, but in addition to products of edge weights, their weight of a walk includes a complicated multi-dimensional integral. [sent-60, score-0.379]
7 , unlike paths, walks can cross an edge multiple times, and the weight of a path is rather hard to calculate, as opposed to our walk-weights). [sent-65, score-0.515]
8 Ultimately each node j must receive information from each of its neighbors, where the message, mi→ j (x j ), from neighbor i to j represents the result of eliminating all of the variables in the subtree rooted at node i and including all of its neighbors other than j (see Figure 1). [sent-141, score-0.369]
9 Since each of these messages is itself made up of variable elimination steps corresponding to the subtrees rooted at the other neighbors of node i, there is a set of fixed-point equations that relate messages throughout the tree: mi→ j (x j ) = Z ψi j (xi , x j )ψi (xi ) ∏ mk→i (xi ) dxi . [sent-142, score-0.35]
10 The first step corresponds to preparing the message to be sent from node i to node j by collecting information from all of the other neighbors of i: Jˆi\ j = Jii + ∑ ∆Jk→i and ˆ hi\ j = hi + ∑ ∆hk→i . [sent-156, score-0.473]
11 (7) k∈N (i)\ j k∈N (i)\ j The second step produces the information quantities to be propagated to node j: −1 ∆Ji→ j = −J ji Jˆi\ j J ji and −1 ˆ ∆hi→ j = −J ji Jˆi\ j hi\ j . [sent-157, score-0.414]
12 2036 WALK -S UMS IN G AUSSIAN G RAPHICAL M ODELS A message mi→ j passed from node i to node j ∈ N (i) captures the effect of eliminating the subtree rooted at i. [sent-167, score-0.449]
13 In (c), we show (3) how, after 3 iterations, messages link up to form the computation tree T1 of node 1 (the (3) (3) subtree T4→1 , associated with message m4→1 , is also indicated within the dotted outline). [sent-195, score-0.422]
14 , wl ) of nodes wk ∈ V such that each step of the walk (wk , wk+1 ) corresponds to an edge of the graph {wk , wk+1 } ∈ E. [sent-223, score-0.553]
15 k=1 We also allow zero-length “self” walks w = (v) at each node v for which we define φ(w) = 1. [sent-227, score-0.598]
16 To make a connection between these walks and Gaussian inference, we decompose the covariance matrix using the Neumann power series for the matrix inverse: 10 P = J −1 = (I − R)−1 = ∞ ∑ Rk , for ρ(R) < 1. [sent-228, score-0.463]
17 11 The (i, j)-th element of Rl can be expressed as a sum of weights of l walks w that go from i to j and have length l (denoted w : i → j): (Rl )i j = ∑ ri,w1 rw1 ,w2 . [sent-231, score-0.463]
18 2039 M ALIOUTOV, J OHNSON AND W ILLSKY The last equality holds because only the terms that correspond to walks in the graph have non-zero contributions: for all other terms at least one of the partial correlation coefficients r wk ,wk+1 is zero. [sent-246, score-0.622]
19 The set of walks from i to j of length l is finite, and the sum of weights of these walks (the walksum) is well-defined. [sent-247, score-0.926]
20 However, care must be taken, as walk-sums over countably many walks may or may not converge, and convergence may depend on the order of summation. [sent-249, score-0.49]
21 This motivates the following definition: We say that a Gaussian distribution is walk-summable (WS) if for all i, j ∈ V the unordered sum over all walks w from i to j (denoted w : i → j) ∑ φ(w) w:i→ j is well-defined (i. [sent-250, score-0.463]
22 It uses absolute convergence to rearrange walks in order of ¯ increasing length, and the Perron-Frobenius theorem for part (iv). [sent-265, score-0.49]
23 The latter is sufficient for the convergence of the walks ordered by increasing length, whereas walk-summability enables convergence to the same answer in arbitrary order of summation. [sent-267, score-0.517]
24 w:i→ j Also, the means are walk-sums reweighted by the value of h at the start of each walk: ∑ µi = h∗ φ(w) w:∗→i where the sum is over all walks which end at node i (with arbitrary starting node), and where ∗ denotes the starting node of the walk w. [sent-340, score-1.015]
25 r1,3 r2,3 Set of walks W (∗ → 1): {(1), (2, 1), (3, 1), (2, 3, 1), (3, 2, 1), (1, 2, 1)(1, 3, 1), . [sent-352, score-0.463]
26 In general, given a set of walks W we define the walk-sum: φ(W ) = ∑ φ(w) w∈W and the reweighted walk-sum: φh (W ) = ∑ w∈W hw0 φ(w) where w0 denotes the initial node in the walk w. [sent-361, score-0.88]
27 ) denotes the set of all walks having some property . [sent-365, score-0.463]
28 For instance, W (i → j) denotes the set of all walks from i to j and φ(i → j) is the corresponding walk-sum. [sent-375, score-0.463]
29 Also, W (∗ → i) denotes the set all walks that end at node i and φ h (∗ → i) is the corresponding reweighted walk-sum. [sent-376, score-0.598]
30 An illustration of walk-sums and their connection to inference appears in Figure 5 where we list some walks and walk-sums for a 3-cycle graph. [sent-378, score-0.491]
31 , vm ) with un = v0 (walk v begins where walk u ends) we define the product of walks uv = (u0 , . [sent-398, score-0.823]
32 Let U and V be two countable sets of walks such that every walk in U ends at a given node i and every walk in V begin at this node. [sent-405, score-1.192]
33 For individual walks it is evident that φ(uv) = φ(u)φ(v). [sent-411, score-0.463]
34 Note that W (i → i) is the set of self-return walks at node i, that is, walks which begin and end \i at node i. [sent-414, score-1.196]
35 These self-return walks include walks which return to i many times. [sent-415, score-0.926]
36 Let W (i → i) be the set of all walks with non-zero length that begin and end at i but do not visit i at any other point in between. [sent-416, score-0.463]
37 We call these the single-revisit self-return walks at node i. [sent-417, score-0.598]
38 The set of self-return walks \i that return exactly k times is generated by taking the product of k copies of W (i → i) denoted by \i W k (i → i). [sent-418, score-0.463]
39 Thus, we obtain all self-return walks as \i W (i → i) = ∪k≥0 W k (i → i) \i where W 0 (i → i) (12) {(i)}. [sent-419, score-0.463]
40 \i Similarly, recall that W (∗ → i) denotes the set of all walks which end at node i. [sent-420, score-0.598]
41 Let W (∗ → i) denote the set of walks with non-zero length which end at node i and do not visit i previously (we call them single-visit walks). [sent-421, score-0.598]
42 Thus, all walks which end at i are obtained as: \i W (∗ → i) = {(i)} ∪ W (∗ → i) W (i → i), (13) which is a valid decomposition. [sent-422, score-0.497]
43 First note that the decomposition of W k (i → i) into products of k single-revisit self\i \i return walks is a valid decomposition. [sent-430, score-0.519]
44 The basic idea here is to separate out the walks with positive and negative weights. [sent-439, score-0.463]
45 For each edge {i, j} ∈ E with ri j > 0 we ˆ and set the partial correlations on these edges to be equal define two edges {i+ , j+ }, {i− , j− } ∈ E, ˆ to ri j . [sent-442, score-0.389]
46 In essence, this is accomplished by summing walks with positive and negative weights separately and then taking the difference, which is only possible for walk-summable models. [sent-459, score-0.463]
47 Moreover, for these models we show that exact walk-sums over infinite sets of walks for means and variances can be computed efficiently in a recursive fashion. [sent-495, score-0.601]
48 The ingredients for these results are decompositions of the variance and mean walk-sums in terms of sums over walks on subtrees, together with the decomposition in terms of single-revisit and single-visit walks provided in Proposition 9. [sent-511, score-0.979]
49 This latter walk-sum can be further decomposed into sums over disjoint subsets of walks each of which corresponds to single-revisit self-return walks that exit node j via a specific one of its neighbors, say i. [sent-514, score-1.061]
50 In particular, as illustrated in Figure 7, the single-revisit self-return walks that do this correspond to walks that live in the subtree Ti→ j . [sent-515, score-0.964]
51 Using \j the notation W ( j → j | Ti→ j ) for the set of all single-revisit walks which are restricted to stay in subtree Ti→ j we see that ∑ \j α j = φ( j → j) = ∑ \j φ( j → j | Ti→ j ) i∈N ( j) αi→ j . [sent-516, score-0.501]
52 Mean µ j is the reweighted walk-sum over walks that start anywhere and end at node j, µ j = φh (∗ → j). [sent-524, score-0.598]
53 Any walk that ends at node j can be expressed as a single-visit walk to node j followed by a multiple\j revisit self-return walk from node j: φh (∗ → j) = h j + φh (∗ → j) φ( j → j), where the term h j corresponds to the length-zero walk that starts and ends at node j. [sent-525, score-1.728]
54 2 LBP in Walk-Summable Models In this subsection we use the LBP computation tree to show that LBP includes all the walks for the means, but only a subset of the walks for the variances. [sent-531, score-1.013]
55 As we have discussed, running LBP for some number of iterations yields identical calculations at any particular node i to the exact inference calculations on the corresponding computation tree (n) rooted at node i. [sent-535, score-0.42]
56 We use the notation Ti for the nth computation tree at node i, Ti for the full (n) computation tree (as n → ∞) and we assign the label 0 to the root node. [sent-536, score-0.363]
57 Then, P0 (Ti ) denotes the variance at the root node of the nth computation tree rooted at node i in G. [sent-537, score-0.477]
58 We say that a message schedule is proper if every message is updated infinitely often, that is, if for every m > 0 and every directed edge (i, j) in the graph there exists n > m such that (i, j) ∈ M (n) . [sent-541, score-0.421]
59 Lemma 18 (Walks in G and in Ti ) There is a one-to one correspondence between finite-length walks in G that end at i, and walks in Ti that end at the root node. [sent-547, score-0.953]
60 In particular, for each such walk in G (n) there is a corresponding walk in Ti for n large enough. [sent-548, score-0.564]
61 Now, recall that to compute the mean µi we need to gather walk-sums over all walks that start anywhere and end at i. [sent-549, score-0.463]
62 We have just shown that LBP gathers all of these walks as the computation tree grows to infinity. [sent-550, score-0.55]
63 The true variance Pii is a walk-sum over all self-return walks that start and end at i in G. [sent-552, score-0.494]
64 However, walks in G that start and end at i (n) may map to walks that start at the root node of Ti , but end at a replica of the root node instead of the root. [sent-553, score-1.28]
65 These walks are not captured by the LBP variance estimate. [sent-554, score-0.494]
66 15 The walks for the variance (n) (n) estimate P0 (Ti ) are self-return walks W (0 → 0 | Ti ) that start and end at the root node in the 15. [sent-555, score-1.119]
67 Recall that the computation tree is a representation of the computations seen at the root node of the tree, and it is only the computation at this node—that is, at this replica of node i that corresponds to the LBP computation at node i in G. [sent-556, score-0.549]
68 The walk (1, 2, 3, 1) is a self-return walk in the original graph G but is not a self-return walk in the computation tree shown in Figure 2(d). [sent-559, score-0.982]
69 LBP variances capture only those self-return walks of the original graph G that are also self-return walks in the computation tree—for example, the walk (1, 3, 2, 3, 4, 3, 1) is a self-return walk in both Figures 2(a) and (d). [sent-560, score-1.642]
70 Hence, Lemma 19 (Self-return walks in G and in Ti ) The LBP variance estimate at each node is a sum over the backtracking self-return walks in G, a subset of all self-return walks needed to calculate the correct variance. [sent-562, score-1.555]
71 Note that back-tracking walks for the variances have positive weights, since each edge in the walk is traversed an even number of times. [sent-563, score-0.9]
72 With each LBP step the computation tree grows and new back-tracking walks are included, hence variance estimates grow monotonically. [sent-564, score-0.581]
73 16 We have shown which walks LBP gathers based on the computation tree. [sent-565, score-0.463]
74 (n) Intuitively, walks in the computation tree Ti are subsets of the walks in G, and hence they converge. [sent-568, score-1.013]
75 Proposition 21 (Convergence of LBP for walk-summable models) If a model on a graph G is walk-summable, then LBP is well-posed, the means converge to the true means and the LBP variances converge to walk-sums over the backtracking self-return walks at each node. [sent-572, score-0.707]
76 Let W (i → i) denote the back-tracking self-return walks at node i. [sent-574, score-0.598]
77 In attractive and non-frustrated models LBP variance estimates are less than or equal to the true variances (the missing non-backtracking walks all have positive weights). [sent-594, score-0.676]
78 In our framework, each such correlation is a walksum over the subset of non-backtracking self-return walks in G that, in the computation tree, begin at a particular replica of the root. [sent-600, score-0.493]
79 This implies that all the computation trees T (n) are walk-summable and that variances monotonically increase (since weights of backtracking walks are positive, see the discussion after Lemma 19). [sent-654, score-0.651]
80 The reason is that LBP only computes a subset of the self-return walks for the variances but captures all the walks for the means. [sent-672, score-1.029]
81 One involves developing improved walk-sum algorithms that gather more walks than LBP does, to yield better variance estimates. [sent-774, score-0.494]
82 However, related expansions of correlations in terms of walks have been investigated for other models. [sent-780, score-0.496]
83 (1983) use walk-sums for non-Gaussian classical and quantum spin-systems, where the weights of walks involve complicated multi-dimensional integrals. [sent-783, score-0.463]
84 First note that (Rl )i j is an absolute walk-sum over all walks of length l from i to j: ¯ ( Rl ) i j = ∑ |φ(w)| l w:i→ j (there are a finite number of these walks so the sum is well-defined). [sent-788, score-0.926]
85 If we order walks by their length and then group terms for walks of equal lengths (each group has a finite number of terms) we obtain: ∑ w:i→ j |φ(w)| = ∑ ∑ l w:i→ j l ¯ |φ(w)| = ∑(Rl )i j . [sent-790, score-0.926]
86 As shown in (17), ∑l (Rl )i j corresponds to one particular ordering of the walks which converges by (ii). [sent-794, score-0.463]
87 ¯ relabeling the nodes), so ρ(R) Proof of Proposition 11 We partition walk-sums into sums over “even” and “odd” walks according to the number of negative edges crossed by the walk. [sent-839, score-0.507]
88 The graph G is defined so that every walk from i+ to j+ is even and every walk from i+ to j− is odd. [sent-841, score-0.613]
89 Note also that setting h = (h+ ; h− ) has the effect that all walks with hw0 > 0 begin in V+ and all walks with hw0 < 0 begin in V− . [sent-845, score-0.926]
90 Consequently, every even walk ends in V+ and every odd walk ends in V− . [sent-846, score-0.651]
91 Thus, J is WALK -S UMS IN G AUSSIAN G RAPHICAL M ODELS Proof of Proposition 16 To calculate the walk-sum for multiple-revisit self-return walks in Ti\ j , we can use the single-revisit counterpart: γi\ j = φ(i → i | Ti\ j ) = 1 . [sent-878, score-0.463]
92 \i (18) 1 − φ i → i | Ti\ j Now, we decompose the single-revisit walks in the subtree Ti\ j in terms of the possible first step of the walk (i, k), where k ∈ N (i)\ j. [sent-879, score-0.783]
93 Proof of Proposition 17 A multiple-revisit walk in Ti\ j can be written in terms of single-visit walks: \i φh (∗ → i | Ti\ j ) = hi + φh (∗ → i | Ti\ j ) φ(i → i | Ti\ j ). [sent-889, score-0.353]
94 2059 M ALIOUTOV, J OHNSON AND W ILLSKY (n) Proof of Lemma 18 First, we note that for every walk w which ends at the root node of Ti there is a corresponding walk in G which ends at i. [sent-896, score-0.786]
95 Then (n) for any walk in G that ends at wl and has length n there is a walk in Twl that ends at the root. [sent-904, score-0.653]
96 The intuition for other message schedules is that every step (i, j) of the walk will appear eventually in any proper message schedule M . [sent-905, score-0.602]
97 First we unwrap the walk w into a tree Tw rooted at wl in the following way: start at wl , the end of the walk, and traverse the walk in reverse. [sent-907, score-0.744]
98 Now the com- (n) (n) putation tree Twl (which is created by splicing together Ti→ j for all edges (i, j) coming into the root of Tw ) contains Tw , and hence it contains the walk w. [sent-923, score-0.44]
99 ρ(Ri ) ≤ ρ(Ri (n) Proof of Lemma 24 Let Ti (M ) denote the nth computation tree under a proper message sched(n) ule M rooted at node i. [sent-932, score-0.411]
100 We use the following simple extension of Lemma 18: Let Ti (M1 ) be the (n) nth computation tree rooted at i under message schedule M 1 . [sent-933, score-0.342]
wordName wordTfidf (topN-words)
[('lbp', 0.563), ('walks', 0.463), ('walk', 0.282), ('ti', 0.234), ('je', 0.169), ('node', 0.135), ('wk', 0.11), ('ri', 0.108), ('message', 0.106), ('variances', 0.103), ('alioutov', 0.103), ('illsky', 0.103), ('ohnson', 0.103), ('tw', 0.097), ('ums', 0.097), ('ji', 0.093), ('jii', 0.091), ('tree', 0.087), ('schedule', 0.087), ('trees', 0.085), ('belief', 0.079), ('uv', 0.078), ('hi', 0.071), ('propagation', 0.064), ('raphical', 0.063), ('proposition', 0.06), ('diagonally', 0.06), ('bp', 0.059), ('odels', 0.059), ('messages', 0.056), ('aussian', 0.055), ('loopy', 0.055), ('edge', 0.052), ('ws', 0.05), ('graph', 0.049), ('walksummable', 0.048), ('edges', 0.044), ('attractive', 0.044), ('frustrated', 0.042), ('elimination', 0.042), ('gaussian', 0.041), ('gk', 0.04), ('dominant', 0.04), ('subtree', 0.038), ('rl', 0.036), ('converge', 0.036), ('moallemi', 0.036), ('rk', 0.036), ('critical', 0.036), ('models', 0.035), ('rooted', 0.035), ('valid', 0.034), ('rx', 0.034), ('limn', 0.034), ('spectral', 0.033), ('roy', 0.033), ('correlations', 0.033), ('johnson', 0.031), ('variance', 0.031), ('nodes', 0.031), ('drd', 0.03), ('replica', 0.03), ('radius', 0.03), ('ends', 0.03), ('validity', 0.03), ('graphs', 0.03), ('wl', 0.029), ('mi', 0.028), ('inference', 0.028), ('nth', 0.027), ('odd', 0.027), ('pi', 0.027), ('root', 0.027), ('convergence', 0.027), ('graphical', 0.026), ('freeman', 0.026), ('neighbors', 0.026), ('twl', 0.026), ('rusmevichientong', 0.026), ('tk', 0.025), ('deg', 0.024), ('gmrfs', 0.024), ('jpn', 0.024), ('malioutov', 0.024), ('varga', 0.024), ('lemma', 0.024), ('iv', 0.024), ('marginal', 0.024), ('connected', 0.023), ('decomposition', 0.022), ('van', 0.021), ('proper', 0.021), ('potentials', 0.021), ('chord', 0.02), ('radii', 0.02), ('cycles', 0.02), ('leaf', 0.02), ('model', 0.02), ('diagonal', 0.02), ('weiss', 0.02)]
simIndex simValue paperId paperTitle
same-paper 1 1.0 95 jmlr-2006-Walk-Sums and Belief Propagation in Gaussian Graphical Models
Author: Dmitry M. Malioutov, Jason K. Johnson, Alan S. Willsky
Abstract: We present a new framework based on walks in a graph for analysis and inference in Gaussian graphical models. The key idea is to decompose the correlation between each pair of variables as a sum over all walks between those variables in the graph. The weight of each walk is given by a product of edgewise partial correlation coefficients. This representation holds for a large class of Gaussian graphical models which we call walk-summable. We give a precise characterization of this class of models, and relate it to other classes including diagonally dominant, attractive, nonfrustrated, and pairwise-normalizable. We provide a walk-sum interpretation of Gaussian belief propagation in trees and of the approximate method of loopy belief propagation in graphs with cycles. The walk-sum perspective leads to a better understanding of Gaussian belief propagation and to stronger results for its convergence in loopy graphs. Keywords: Gaussian graphical models, walk-sum analysis, convergence of loopy belief propagation
Author: Robert Castelo, Alberto Roverato
Abstract: Learning of large-scale networks of interactions from microarray data is an important and challenging problem in bioinformatics. A widely used approach is to assume that the available data constitute a random sample from a multivariate distribution belonging to a Gaussian graphical model. As a consequence, the prime objects of inference are full-order partial correlations which are partial correlations between two variables given the remaining ones. In the context of microarray data the number of variables exceed the sample size and this precludes the application of traditional structure learning procedures because a sampling version of full-order partial correlations does not exist. In this paper we consider limited-order partial correlations, these are partial correlations computed on marginal distributions of manageable size, and provide a set of rules that allow one to assess the usefulness of these quantities to derive the independence structure of the underlying Gaussian graphical model. Furthermore, we introduce a novel structure learning procedure based on a quantity, obtained from limited-order partial correlations, that we call the non-rejection rate. The applicability and usefulness of the procedure are demonstrated by both simulated and real data. Keywords: Gaussian distribution, gene network, graphical model, microarray data, non-rejection rate, partial correlation, small-sample inference
3 0.073913619 28 jmlr-2006-Estimating the "Wrong" Graphical Model: Benefits in the Computation-Limited Setting
Author: Martin J. Wainwright
Abstract: Consider the problem of joint parameter estimation and prediction in a Markov random field: that is, the model parameters are estimated on the basis of an initial set of data, and then the fitted model is used to perform prediction (e.g., smoothing, denoising, interpolation) on a new noisy observation. Working under the restriction of limited computation, we analyze a joint method in which the same convex variational relaxation is used to construct an M-estimator for fitting parameters, and to perform approximate marginalization for the prediction step. The key result of this paper is that in the computation-limited setting, using an inconsistent parameter estimator (i.e., an estimator that returns the “wrong” model even in the infinite data limit) is provably beneficial, since the resulting errors can partially compensate for errors made by using an approximate prediction technique. En route to this result, we analyze the asymptotic properties of M-estimators based on convex variational relaxations, and establish a Lipschitz stability property that holds for a broad class of convex variational methods. This stability result provides additional incentive, apart from the obvious benefit of unique global optima, for using message-passing methods based on convex variational relaxations. We show that joint estimation/prediction based on the reweighted sum-product algorithm substantially outperforms a commonly used heuristic based on ordinary sum-product. Keywords: graphical model, Markov random field, belief propagation, variational method, parameter estimation, prediction error, algorithmic stability
4 0.066200495 34 jmlr-2006-Generalized Bradley-Terry Models and Multi-Class Probability Estimates
Author: Tzu-Kuo Huang, Ruby C. Weng, Chih-Jen Lin
Abstract: The Bradley-Terry model for obtaining individual skill from paired comparisons has been popular in many areas. In machine learning, this model is related to multi-class probability estimates by coupling all pairwise classification results. Error correcting output codes (ECOC) are a general framework to decompose a multi-class problem to several binary problems. To obtain probability estimates under this framework, this paper introduces a generalized Bradley-Terry model in which paired individual comparisons are extended to paired team comparisons. We propose a simple algorithm with convergence proofs to solve the model and obtain individual skill. Experiments on synthetic and real data demonstrate that the algorithm is useful for obtaining multi-class probability estimates. Moreover, we discuss four extensions of the proposed model: 1) weighted individual skill, 2) home-field advantage, 3) ties, and 4) comparisons with more than two teams. Keywords: Bradley-Terry model, probability estimates, error correcting output codes, support vector machines
5 0.054950647 25 jmlr-2006-Distance Patterns in Structural Similarity
Author: Thomas Kämpke
Abstract: Similarity of edge labeled graphs is considered in the sense of minimum squared distance between corresponding values. Vertex correspondences are established by isomorphisms if both graphs are of equal size and by subisomorphisms if one graph has fewer vertices than the other. Best fit isomorphisms and subisomorphisms amount to solutions of quadratic assignment problems and are computed exactly as well as approximately by minimum cost flow, linear assignment relaxations and related graph algorithms. Keywords: assignment problem, best approximation, branch and bound, inexact graph matching, model data base
6 0.052096114 64 jmlr-2006-Noisy-OR Component Analysis and its Application to Link Analysis
8 0.051764708 37 jmlr-2006-Incremental Algorithms for Hierarchical Classification
9 0.045556199 74 jmlr-2006-Point-Based Value Iteration for Continuous POMDPs
12 0.039123595 52 jmlr-2006-Learning Spectral Clustering, With Application To Speech Separation
13 0.038386773 63 jmlr-2006-New Algorithms for Efficient High-Dimensional Nonparametric Classification
14 0.037189227 20 jmlr-2006-Collaborative Multiagent Reinforcement Learning by Payoff Propagation
15 0.033897318 89 jmlr-2006-Structured Prediction, Dual Extragradient and Bregman Projections (Special Topic on Machine Learning and Optimization)
16 0.032353017 90 jmlr-2006-Superior Guarantees for Sequential Prediction and Lossless Compression via Alphabet Decomposition
17 0.031671505 54 jmlr-2006-Learning the Structure of Linear Latent Variable Models
18 0.030325748 6 jmlr-2006-A Scoring Function for Learning Bayesian Networks based on Mutual Information and Conditional Independence Tests
19 0.030037522 2 jmlr-2006-A Graphical Representation of Equivalence Classes of AMP Chain Graphs
20 0.029916124 19 jmlr-2006-Causal Graph Based Decomposition of Factored MDPs
topicId topicWeight
[(0, 0.169), (1, -0.0), (2, -0.152), (3, 0.048), (4, -0.069), (5, 0.03), (6, 0.207), (7, -0.001), (8, -0.014), (9, -0.041), (10, 0.003), (11, -0.047), (12, -0.045), (13, -0.056), (14, 0.054), (15, -0.043), (16, 0.053), (17, -0.175), (18, -0.036), (19, 0.078), (20, -0.059), (21, 0.069), (22, -0.164), (23, 0.166), (24, -0.139), (25, 0.127), (26, 0.027), (27, -0.129), (28, -0.161), (29, -0.053), (30, -0.104), (31, 0.053), (32, 0.122), (33, -0.123), (34, 0.138), (35, -0.092), (36, -0.009), (37, 0.21), (38, -0.06), (39, -0.055), (40, -0.028), (41, -0.16), (42, 0.184), (43, 0.042), (44, -0.126), (45, -0.168), (46, 0.189), (47, -0.093), (48, -0.25), (49, 0.005)]
simIndex simValue paperId paperTitle
same-paper 1 0.96209466 95 jmlr-2006-Walk-Sums and Belief Propagation in Gaussian Graphical Models
Author: Dmitry M. Malioutov, Jason K. Johnson, Alan S. Willsky
Abstract: We present a new framework based on walks in a graph for analysis and inference in Gaussian graphical models. The key idea is to decompose the correlation between each pair of variables as a sum over all walks between those variables in the graph. The weight of each walk is given by a product of edgewise partial correlation coefficients. This representation holds for a large class of Gaussian graphical models which we call walk-summable. We give a precise characterization of this class of models, and relate it to other classes including diagonally dominant, attractive, nonfrustrated, and pairwise-normalizable. We provide a walk-sum interpretation of Gaussian belief propagation in trees and of the approximate method of loopy belief propagation in graphs with cycles. The walk-sum perspective leads to a better understanding of Gaussian belief propagation and to stronger results for its convergence in loopy graphs. Keywords: Gaussian graphical models, walk-sum analysis, convergence of loopy belief propagation
2 0.39605579 5 jmlr-2006-A Robust Procedure For Gaussian Graphical Model Search From Microarray Data WithpLarger Thann
Author: Robert Castelo, Alberto Roverato
Abstract: Learning of large-scale networks of interactions from microarray data is an important and challenging problem in bioinformatics. A widely used approach is to assume that the available data constitute a random sample from a multivariate distribution belonging to a Gaussian graphical model. As a consequence, the prime objects of inference are full-order partial correlations which are partial correlations between two variables given the remaining ones. In the context of microarray data the number of variables exceed the sample size and this precludes the application of traditional structure learning procedures because a sampling version of full-order partial correlations does not exist. In this paper we consider limited-order partial correlations, these are partial correlations computed on marginal distributions of manageable size, and provide a set of rules that allow one to assess the usefulness of these quantities to derive the independence structure of the underlying Gaussian graphical model. Furthermore, we introduce a novel structure learning procedure based on a quantity, obtained from limited-order partial correlations, that we call the non-rejection rate. The applicability and usefulness of the procedure are demonstrated by both simulated and real data. Keywords: Gaussian distribution, gene network, graphical model, microarray data, non-rejection rate, partial correlation, small-sample inference
Author: Chen Yanover, Talya Meltzer, Yair Weiss
Abstract: The problem of finding the most probable (MAP) configuration in graphical models comes up in a wide range of applications. In a general graphical model this problem is NP hard, but various approximate algorithms have been developed. Linear programming (LP) relaxations are a standard method in computer science for approximating combinatorial problems and have been used for finding the most probable assignment in small graphical models. However, applying this powerful method to real-world problems is extremely challenging due to the large numbers of variables and constraints in the linear program. Tree-Reweighted Belief Propagation is a promising recent algorithm for solving LP relaxations, but little is known about its running time on large problems. In this paper we compare tree-reweighted belief propagation (TRBP) and powerful generalpurpose LP solvers (CPLEX) on relaxations of real-world graphical models from the fields of computer vision and computational biology. We find that TRBP almost always finds the solution significantly faster than all the solvers in CPLEX and more importantly, TRBP can be applied to large scale problems for which the solvers in CPLEX cannot be applied. Using TRBP we can find the MAP configurations in a matter of minutes for a large range of real world problems.
4 0.26201889 28 jmlr-2006-Estimating the "Wrong" Graphical Model: Benefits in the Computation-Limited Setting
Author: Martin J. Wainwright
Abstract: Consider the problem of joint parameter estimation and prediction in a Markov random field: that is, the model parameters are estimated on the basis of an initial set of data, and then the fitted model is used to perform prediction (e.g., smoothing, denoising, interpolation) on a new noisy observation. Working under the restriction of limited computation, we analyze a joint method in which the same convex variational relaxation is used to construct an M-estimator for fitting parameters, and to perform approximate marginalization for the prediction step. The key result of this paper is that in the computation-limited setting, using an inconsistent parameter estimator (i.e., an estimator that returns the “wrong” model even in the infinite data limit) is provably beneficial, since the resulting errors can partially compensate for errors made by using an approximate prediction technique. En route to this result, we analyze the asymptotic properties of M-estimators based on convex variational relaxations, and establish a Lipschitz stability property that holds for a broad class of convex variational methods. This stability result provides additional incentive, apart from the obvious benefit of unique global optima, for using message-passing methods based on convex variational relaxations. We show that joint estimation/prediction based on the reweighted sum-product algorithm substantially outperforms a commonly used heuristic based on ordinary sum-product. Keywords: graphical model, Markov random field, belief propagation, variational method, parameter estimation, prediction error, algorithmic stability
5 0.24181932 34 jmlr-2006-Generalized Bradley-Terry Models and Multi-Class Probability Estimates
Author: Tzu-Kuo Huang, Ruby C. Weng, Chih-Jen Lin
Abstract: The Bradley-Terry model for obtaining individual skill from paired comparisons has been popular in many areas. In machine learning, this model is related to multi-class probability estimates by coupling all pairwise classification results. Error correcting output codes (ECOC) are a general framework to decompose a multi-class problem to several binary problems. To obtain probability estimates under this framework, this paper introduces a generalized Bradley-Terry model in which paired individual comparisons are extended to paired team comparisons. We propose a simple algorithm with convergence proofs to solve the model and obtain individual skill. Experiments on synthetic and real data demonstrate that the algorithm is useful for obtaining multi-class probability estimates. Moreover, we discuss four extensions of the proposed model: 1) weighted individual skill, 2) home-field advantage, 3) ties, and 4) comparisons with more than two teams. Keywords: Bradley-Terry model, probability estimates, error correcting output codes, support vector machines
6 0.23929365 19 jmlr-2006-Causal Graph Based Decomposition of Factored MDPs
7 0.22433308 25 jmlr-2006-Distance Patterns in Structural Similarity
8 0.22279236 63 jmlr-2006-New Algorithms for Efficient High-Dimensional Nonparametric Classification
9 0.21535553 90 jmlr-2006-Superior Guarantees for Sequential Prediction and Lossless Compression via Alphabet Decomposition
10 0.20925343 64 jmlr-2006-Noisy-OR Component Analysis and its Application to Link Analysis
11 0.19982627 2 jmlr-2006-A Graphical Representation of Equivalence Classes of AMP Chain Graphs
12 0.19946957 37 jmlr-2006-Incremental Algorithms for Hierarchical Classification
13 0.19087026 74 jmlr-2006-Point-Based Value Iteration for Continuous POMDPs
14 0.18490475 48 jmlr-2006-Learning Minimum Volume Sets
15 0.1777472 40 jmlr-2006-Infinite-σ Limits For Tikhonov Regularization
16 0.17155221 83 jmlr-2006-Sparse Boosting
17 0.16926676 43 jmlr-2006-Large Scale Multiple Kernel Learning (Special Topic on Machine Learning and Optimization)
20 0.15663159 20 jmlr-2006-Collaborative Multiagent Reinforcement Learning by Payoff Propagation
topicId topicWeight
[(8, 0.025), (36, 0.048), (45, 0.038), (50, 0.061), (61, 0.018), (63, 0.032), (70, 0.011), (71, 0.329), (76, 0.051), (78, 0.015), (79, 0.04), (81, 0.033), (84, 0.027), (89, 0.049), (90, 0.033), (91, 0.043), (96, 0.042)]
simIndex simValue paperId paperTitle
same-paper 1 0.72478664 95 jmlr-2006-Walk-Sums and Belief Propagation in Gaussian Graphical Models
Author: Dmitry M. Malioutov, Jason K. Johnson, Alan S. Willsky
Abstract: We present a new framework based on walks in a graph for analysis and inference in Gaussian graphical models. The key idea is to decompose the correlation between each pair of variables as a sum over all walks between those variables in the graph. The weight of each walk is given by a product of edgewise partial correlation coefficients. This representation holds for a large class of Gaussian graphical models which we call walk-summable. We give a precise characterization of this class of models, and relate it to other classes including diagonally dominant, attractive, nonfrustrated, and pairwise-normalizable. We provide a walk-sum interpretation of Gaussian belief propagation in trees and of the approximate method of loopy belief propagation in graphs with cycles. The walk-sum perspective leads to a better understanding of Gaussian belief propagation and to stronger results for its convergence in loopy graphs. Keywords: Gaussian graphical models, walk-sum analysis, convergence of loopy belief propagation
2 0.66320515 27 jmlr-2006-Ensemble Pruning Via Semi-definite Programming (Special Topic on Machine Learning and Optimization)
Author: Yi Zhang, Samuel Burer, W. Nick Street
Abstract: An ensemble is a group of learning models that jointly solve a problem. However, the ensembles generated by existing techniques are sometimes unnecessarily large, which can lead to extra memory usage, computational costs, and occasional decreases in effectiveness. The purpose of ensemble pruning is to search for a good subset of ensemble members that performs as well as, or better than, the original ensemble. This subset selection problem is a combinatorial optimization problem and thus finding the exact optimal solution is computationally prohibitive. Various heuristic methods have been developed to obtain an approximate solution. However, most of the existing heuristics use simple greedy search as the optimization method, which lacks either theoretical or empirical quality guarantees. In this paper, the ensemble subset selection problem is formulated as a quadratic integer programming problem. By applying semi-definite programming (SDP) as a solution technique, we are able to get better approximate solutions. Computational experiments show that this SDP-based pruning algorithm outperforms other heuristics in the literature. Its application in a classifier-sharing study also demonstrates the effectiveness of the method. Keywords: ensemble pruning, semi-definite programming, heuristics, knowledge sharing
3 0.37728354 28 jmlr-2006-Estimating the "Wrong" Graphical Model: Benefits in the Computation-Limited Setting
Author: Martin J. Wainwright
Abstract: Consider the problem of joint parameter estimation and prediction in a Markov random field: that is, the model parameters are estimated on the basis of an initial set of data, and then the fitted model is used to perform prediction (e.g., smoothing, denoising, interpolation) on a new noisy observation. Working under the restriction of limited computation, we analyze a joint method in which the same convex variational relaxation is used to construct an M-estimator for fitting parameters, and to perform approximate marginalization for the prediction step. The key result of this paper is that in the computation-limited setting, using an inconsistent parameter estimator (i.e., an estimator that returns the “wrong” model even in the infinite data limit) is provably beneficial, since the resulting errors can partially compensate for errors made by using an approximate prediction technique. En route to this result, we analyze the asymptotic properties of M-estimators based on convex variational relaxations, and establish a Lipschitz stability property that holds for a broad class of convex variational methods. This stability result provides additional incentive, apart from the obvious benefit of unique global optima, for using message-passing methods based on convex variational relaxations. We show that joint estimation/prediction based on the reweighted sum-product algorithm substantially outperforms a commonly used heuristic based on ordinary sum-product. Keywords: graphical model, Markov random field, belief propagation, variational method, parameter estimation, prediction error, algorithmic stability
4 0.35970041 9 jmlr-2006-Accurate Error Bounds for the Eigenvalues of the Kernel Matrix
Author: Mikio L. Braun
Abstract: The eigenvalues of the kernel matrix play an important role in a number of kernel methods, in particular, in kernel principal component analysis. It is well known that the eigenvalues of the kernel matrix converge as the number of samples tends to infinity. We derive probabilistic finite sample size bounds on the approximation error of individual eigenvalues which have the important property that the bounds scale with the eigenvalue under consideration, reflecting the actual behavior of the approximation errors as predicted by asymptotic results and observed in numerical simulations. Such scaling bounds have so far only been known for tail sums of eigenvalues. Asymptotically, the bounds presented here have a slower than stochastic rate, but the number of sample points necessary to make this disadvantage noticeable is often unrealistically large. Therefore, under practical conditions, and for all but the largest few eigenvalues, the bounds presented here form a significant improvement over existing non-scaling bounds. Keywords: kernel matrix, eigenvalues, relative perturbation bounds
5 0.33330125 5 jmlr-2006-A Robust Procedure For Gaussian Graphical Model Search From Microarray Data WithpLarger Thann
Author: Robert Castelo, Alberto Roverato
Abstract: Learning of large-scale networks of interactions from microarray data is an important and challenging problem in bioinformatics. A widely used approach is to assume that the available data constitute a random sample from a multivariate distribution belonging to a Gaussian graphical model. As a consequence, the prime objects of inference are full-order partial correlations which are partial correlations between two variables given the remaining ones. In the context of microarray data the number of variables exceed the sample size and this precludes the application of traditional structure learning procedures because a sampling version of full-order partial correlations does not exist. In this paper we consider limited-order partial correlations, these are partial correlations computed on marginal distributions of manageable size, and provide a set of rules that allow one to assess the usefulness of these quantities to derive the independence structure of the underlying Gaussian graphical model. Furthermore, we introduce a novel structure learning procedure based on a quantity, obtained from limited-order partial correlations, that we call the non-rejection rate. The applicability and usefulness of the procedure are demonstrated by both simulated and real data. Keywords: Gaussian distribution, gene network, graphical model, microarray data, non-rejection rate, partial correlation, small-sample inference
7 0.32517856 48 jmlr-2006-Learning Minimum Volume Sets
8 0.32222944 29 jmlr-2006-Estimation of Gradients and Coordinate Covariation in Classification
9 0.31882504 53 jmlr-2006-Learning a Hidden Hypergraph
10 0.31796739 66 jmlr-2006-On Model Selection Consistency of Lasso
11 0.31675598 70 jmlr-2006-Online Passive-Aggressive Algorithms
12 0.31604668 25 jmlr-2006-Distance Patterns in Structural Similarity
13 0.3159124 87 jmlr-2006-Stochastic Complexities of Gaussian Mixtures in Variational Bayesian Approximation
14 0.31581149 52 jmlr-2006-Learning Spectral Clustering, With Application To Speech Separation
15 0.31483886 17 jmlr-2006-Bounds for the Loss in Probability of Correct Classification Under Model Based Approximation
16 0.31443864 46 jmlr-2006-Learning Factor Graphs in Polynomial Time and Sample Complexity
17 0.3132776 90 jmlr-2006-Superior Guarantees for Sequential Prediction and Lossless Compression via Alphabet Decomposition
18 0.3123863 73 jmlr-2006-Pattern Recognition for Conditionally Independent Data
19 0.31008413 54 jmlr-2006-Learning the Structure of Linear Latent Variable Models