nips nips2007 nips2007-20 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Venkat Chandrasekaran, Alan S. Willsky, Jason K. Johnson
Abstract: We consider the estimation problem in Gaussian graphical models with arbitrary structure. We analyze the Embedded Trees algorithm, which solves a sequence of problems on tractable subgraphs thereby leading to the solution of the estimation problem on an intractable graph. Our analysis is based on the recently developed walk-sum interpretation of Gaussian estimation. We show that non-stationary iterations of the Embedded Trees algorithm using any sequence of subgraphs converge in walk-summable models. Based on walk-sum calculations, we develop adaptive methods that optimize the choice of subgraphs used at each iteration with a view to achieving maximum reduction in error. These adaptive procedures provide a significant speedup in convergence over stationary iterative methods, and also appear to converge in a larger class of models. 1
Reference: text
sentIndex sentText sentNum sentScore
1 We analyze the Embedded Trees algorithm, which solves a sequence of problems on tractable subgraphs thereby leading to the solution of the estimation problem on an intractable graph. [sent-7, score-0.468]
2 We show that non-stationary iterations of the Embedded Trees algorithm using any sequence of subgraphs converge in walk-summable models. [sent-9, score-0.461]
3 Based on walk-sum calculations, we develop adaptive methods that optimize the choice of subgraphs used at each iteration with a view to achieving maximum reduction in error. [sent-10, score-0.619]
4 These adaptive procedures provide a significant speedup in convergence over stationary iterative methods, and also appear to converge in a larger class of models. [sent-11, score-0.427]
5 Algorithms such as Belief Propagation and the junction-tree method are effective for computing exact estimates in graphical models that are tree-structured or have low treewidth [3], but for graphs with high treewidth the junction-tree approach is intractable. [sent-16, score-0.251]
6 Specifically, we discuss the Embedded Trees (ET) iteration [4] that solves a sequence of estimation problems on trees, or more generally tractable subgraphs, leading to the solution of the original problem on the intractable graph. [sent-18, score-0.266]
7 We show that in the broad class of so-called walk-summable models, the ET algorithm converges for any arbitrary sequence of subgraphs used. [sent-21, score-0.456]
8 Previous convergence results [6, 7] analyzed stationary or “cyclo-stationary” iterations that use the same subgraph at each iteration or cycle through a fixed sequence of subgraphs. [sent-23, score-0.642]
9 1 Given this great flexibility in choosing successive iterative steps, we develop algorithms that adaptively optimize the choice of subgraphs to achieve maximum reduction in error. [sent-25, score-0.497]
10 The first method finds the best tree at each iteration by solving an appropriately formulated maximum-weight spanning tree problem, with the weight of each edge being a function of the partial correlation coefficient of the edge and the residual errors at the nodes that compose the edge. [sent-28, score-1.017]
11 The second method, building on this first method, adds extra edges in a greedy manner to the tree resulting from the first method to form a thin hypertree. [sent-29, score-0.316]
12 Simulation results demonstrate that these non-stationary algorithms provide a significant speedup in convergence over stationary and cyclic iterative methods. [sent-30, score-0.302]
13 We also provide empirical evidence to show that our adaptive methods (with a minor modification) converge in many non-walk-summable models when stationary iterations diverge. [sent-32, score-0.367]
14 2 Background Let G = (V, E) be a graph with vertices V , and edges E ⊂ V that link pairs of vertices together. [sent-38, score-0.254]
15 A walk is a sequence of vertices {wi }i=0 such that each step {wi , wi+1 } ∈ E, 0 ≤ i ≤ − 1, with no restriction on crossing the same vertex or traversing the same edge multiple times. [sent-57, score-0.25]
16 The weight of a walk is the product of the −1 edge-wise partial correlation coefficients of the edges composing the walk: φ(w) i=0 Rwi ,wi+1 . [sent-58, score-0.258]
17 We then have that (R )s,t is the sum of the weights of all length- walks from s to t in G. [sent-59, score-0.567]
18 1 2 where φ(s → t) represents the sum of the weights of all the length- walks from s to t (the set of all such walks is finite). [sent-61, score-1.134]
19 Thus, (J −1 )s,t is the length-ordered sum over all walks in G from s to t. [sent-62, score-0.567]
20 Other algorithms may compute walks according to different orders (rather than length-based orders). [sent-64, score-0.593]
21 A model is said to be walk-summable if for each pair of vertices s, t ∈ V , the absolute sum over all walks from s to t in G converges: ¯ φ(s → t) |φ(w)| < ∞. [sent-66, score-0.666]
22 (3) w∈W(s→t) ¯ Here, W(s → t) represents the set of all walks from s to t, and φ(s → t) denotes the absolute 2 walk-sum over this set. [sent-67, score-0.643]
23 Based on the absolute convergence condition, walk-summability implies that walk-sums over a countable set of walks in G can be computed in any order. [sent-68, score-0.684]
24 Note that in (4) we relax the constraint that the sum is ordered by length, and do not explicitly specify an ordering on the walks (such as in (2)). [sent-70, score-0.567]
25 In words, (J −1 )s,t is the walk-sum over the set of all walks from s to t, and xt is the walk-sum over all walks ending at t, re-weighted by h. [sent-71, score-1.24]
26 Concatenation of walks We briefly describe the concatenation operation for walks and walk-sets, which plays a key role in walk-sum analysis. [sent-79, score-1.211]
27 Let u = u0 · · · uend and v = vstart v1 · · · v (v) be walks with uend = vstart . [sent-80, score-0.941]
28 The concatenation of these walks is defined to be u · v u0 · · · uend v1 · · · v (v) . [sent-81, score-0.748]
29 Now consider a walk-set U with all walks ending at uend and another walk-set V with all walks beginning at vstart . [sent-82, score-1.394]
30 If uend = vstart , then the concatenation of U and V is defined: U ⊗V 2. [sent-83, score-0.264]
31 Embedded Trees algorithm We describe the Embedded Trees iteration that performs a sequence of updates on trees, or more generally tractable subgraphs, leading to the solution of (1) on an intractable graph. [sent-85, score-0.234]
32 Each iteration involves an inference calculation on a subgraph of all the variables V . [sent-86, score-0.317]
33 If the subgraph used changes with each iteration, then we obtain the following non-stationary ET iteration: −1 x(n) = JSn (KSn x(n−1) + h), {Sn }∞ n=1 (6) where is any arbitrary sequence of subgraphs. [sent-94, score-0.264]
34 An important degree of freedom is the choice of the subgraph Sn at iteration n, which forms the focus of Section 4 of this paper. [sent-95, score-0.317]
35 Using this analysis, we show that the non-stationary ET iteration (6) converges in walk-summable models for an arbitrary choice of subgraphs {Sn }∞ . [sent-101, score-0.564]
36 This is easily seen because walks in the subgraph S are a subset of the walks in G, and thus if absolute walk-sums in G are well-defined, then so are absolute walk-sums in S. [sent-104, score-1.402]
37 Consider the following recursively defined set of walks for s, t ∈ V : Wn (s → t) = E\Sn (1) S S n ∪u,v∈V Wn−1 (s → u) ⊗ W(u −→ v) ⊗ W(v −→ t) E\Sn (1) S n = Wn−1 (s → ∗) ⊗ W(∗ −→ •) ⊗ W(• −→ t) n W(s −→ t) S n W(s −→ t), (7) with W0 (s → t) = ∅. [sent-106, score-0.567]
38 The set Wn−1 (s → ∗) denotes walks that start at node s computed at the previous iteration. [sent-108, score-0.633]
39 Finally, W(• −→ t) denotes walks in Sn that end at node t. [sent-110, score-0.633]
40 Thus, the first term in (7) refers to previously computed walks starting at s, which hop across an edge in E\Sn , and then finally propagate only in Sn (ending at t). [sent-111, score-0.691]
41 The second S n term W(s −→ t) denotes walks from s to t that only live within Sn . [sent-112, score-0.637]
42 The following proposition (proved in [10]) shows that the walks contained in these walk-sets are precisely those computed by the ET algorithm at iteration n. [sent-113, score-0.729]
43 We note that the classic Gauss-Jacobi algorithm [6], a stationary iteration with JS = I and KS = R, (n) can be interpreted as a walk-sum algorithm: xt in this method computes all walks up to length n ending at t. [sent-117, score-0.911]
44 Figure 1 gives an example of a walk-sum diagram, which provides a graphical representation of the walks accumulated by the walk-sets (7). [sent-118, score-0.62]
45 The diagram is the three-level graph on the right, and corresponds to an ET iteration based on the subgraphs S1 , S2 , S3 of the 3 × 3 grid G (on the left). [sent-119, score-0.611]
46 Each level n in the diagram consists of the subgraph Sn used at iteration n (solid edges), and information from the previous level (iteration) n − 1 is transmitted through the dashed edges in E\Sn . [sent-120, score-0.463]
47 The directed nature of these dashed edges is critical as they capture the one-directional flow of computations from iteration to iteration, while the undirected edges within each level capture the inference computation at each iteration. [sent-121, score-0.372]
48 Walks in the diagram that start at any node and end at v at level n, re-weighted by h, are exactly the walks (n) computed by the ET algorithm in xv . [sent-123, score-0.644]
49 Given this walk-sum interpretation of the ET algorithm, we can analyze the walk-sets (7) to prove the convergence of ET in walk-summable models by showing that the walk-sets eventually contain all the walks required for the computation of J −1 h in (5). [sent-125, score-0.783]
50 Validity: The walks in Wn are valid walks in G, i. [sent-130, score-1.16]
51 Thus, the ET algorithm converges to the correct solution of (1) in walk-summable models for any sequence of subgraphs with x(0) = 0. [sent-142, score-0.441]
52 Note that we have taken advantage of the absolute convergence property in walk-summable models (3) by not focusing on the order in which walks are computed, but only that they are eventually computed. [sent-144, score-0.729]
53 In [10], we prove that walk-summability is also a necessary condition for the complete flexibility in the choice of subgraphs — there exists at least one sequence of subgraphs that results in a divergent ET iteration in non-walk-summable models. [sent-145, score-0.802]
54 4 Adaptive algorithms Let e(n) = x−x(n) be the error at iteration n and let h(n) = Je(n) = h−J x(n) be the corresponding residual error (which is tractable to compute). [sent-146, score-0.324]
55 We begin by describing an algorithm to choose the “next-best” tree Sn in the ET iteration (6). [sent-147, score-0.294]
56 G\Sn (n) Thus, we have the walk-sum interpretation et = φ(h(n−1) ; ∗ −→ t), where G\Sn denotes walks that do not live entirely within Sn . [sent-149, score-0.783]
57 (8) Hence, minimizing the error at iteration n corresponds to finding the tree Sn that maximizes the ¯ second term φ(|h(n−1) |; Sn ). [sent-151, score-0.294]
58 This leads us to the following maximum walk-sum tree problem: ¯ φ(|h(n−1) |; Sn ) (9) arg max Sn a tree Finding the optimal such tree is combinatorially complex. [sent-152, score-0.471]
59 Specifically, consider an edge {u, v} and all the walks that live on this single edge W({u, v}) = {uv, vu, uvu, vuv, uvuv, vuvu, . [sent-154, score-0.791]
60 One can check that the contribution based on these single-edge walks can be computed as: ¯ φ(|h(n−1) |; w) = |h(n−1) | + |h(n−1) | u v σu,v = w∈W({u,v}) |Ru,v | . [sent-158, score-0.615]
61 These single-edge walks for edges in Sn are a subset of all the walks in Sn , and consequently ¯ provide a lower-bound on φ(|h(n−1) |; Sn ). [sent-160, score-1.237]
62 Therefore, the maximization arg max Sn a tree σu,v {u,v}∈Sn 5 (11) Figure 2: Grayscale images of residual errors in an 8 × 8 grid at successive iterations, and corresponding trees chosen by adaptive method. [sent-161, score-0.661]
63 Figure 3: Grayscale images of residual errors in an 8 × 8 grid at successive iterations, and corresponding hypertrees chosen by adaptive method. [sent-162, score-0.381]
64 This relaxed problem can be solved efficiently using a maximum-weight spanning tree algorithm that has complexity O(|E| log log |V |) for sparse graphs [13]. [sent-164, score-0.297]
65 Given the maximum-weight spanning tree of the graph, a natural extension is to build a thin hypertree by adding extra “strong” edges to the tree, subject to the constraint that the resulting graph has low treewidth. [sent-165, score-0.539]
66 For each edge not included in the tree, in order of decreasing edge weight, we add the edge to the graph if two conditions are met: first, we are able to easily verify that the treewidth stays less than M , and second, the length of the unique path in Sn between the endpoints is less than L. [sent-168, score-0.396]
67 In order to bound the tree width, we maintain a counter at each node of the total number of added edges that result in a path through that node. [sent-169, score-0.32]
68 Comparing to another method for constructing junction trees from spanning trees [15], one can check that the maximum node count is an upper-bound on the treewidth. [sent-170, score-0.415]
69 In Figure 2 and Figure 3 we present a simple demonstration of the tree and hypertree selection procedures respectively, and the corresponding change in error achieved. [sent-173, score-0.309]
70 Notice that the first tree in Figure 2 tries to include as many edges as possible that are incident on the nodes with high residual error. [sent-175, score-0.445]
71 Such edges are useful for capturing walks ending at the high-error nodes, which contribute to the set of walks in (5). [sent-176, score-1.31]
72 The first hypertree in Figure 3 actually includes all the edges incident on the higherror nodes. [sent-177, score-0.257]
73 The residual errors after inference on these subgraphs are shown next in Figure 2 and Figure 3. [sent-178, score-0.456]
74 As expected, the hypertree seems to achieve greater reduction in error compared to the spanning tree. [sent-179, score-0.239]
75 Again, at this iteration, the subgraphs chosen by our methods adapt based on the errors at the various nodes. [sent-180, score-0.343]
76 1 Experimental illustration Walk-summable models We test the adaptive algorithms on densely connected nearest-neighbor grid-structured models (similar to G in Figure 1). [sent-182, score-0.233]
77 We generate random grid models — the grid edge partial correlation coef4 One sets two pointers into the tree starting from any two nodes and then iteratively walks up the tree, always advancing from the point that is deeper in the tree, until the nearest ancestor of the two nodes is reached. [sent-183, score-1.214]
78 Figure 5: (Left) 16-node graphical model; (Right) two embedded spanning trees T1 , T2 . [sent-185, score-0.355]
79 The plot in Figure 4 shows the decrease in the normalized residual error as a function of the number of iterations on a randomly generated 511 × 511 grid model. [sent-191, score-0.266]
80 The stationary one-tree iteration uses a tree similar to S1 in Figure 1, and the two-tree iteration alternates between trees similar to S1 and S3 in Figure 1 [4]. [sent-194, score-0.655]
81 The adaptive hypertree method uses M = 6 and L = 8. [sent-195, score-0.242]
82 We also note that in practice the per-iteration costs of the adaptive tree and hypertree algorithms are roughly comparable. [sent-196, score-0.425]
83 These results show that our adaptive algorithms demonstrate significantly superior convergence properties compared to stationary methods, thus providing a convergent, computationally attractive method for estimation in walk-summable models. [sent-197, score-0.383]
84 2 Non-walk-summable models Next, we give empirical evidence that our adaptive methods provide convergence over a broader range of models than stationary iterations. [sent-201, score-0.436]
85 One potential complication in non-walk-summable models is that the subgraph models chosen by the stationary and adaptive algorithms may be indefinite or singular even though J is positive-definite. [sent-202, score-0.54]
86 In order to avoid this problem in the adaptive ET algorithm, the trees Sn chosen at each iteration must be valid (i. [sent-203, score-0.403]
87 A simple modification to the maximum-weight spanning tree algorithm achieves this goal — we add an extra condition to the algorithm to test for diagonal dominance of the chosen tree model (as all symmetric, diagonally-dominant models are positive definite [6]). [sent-206, score-0.483]
88 That is, at each step of the maximum-weight spanning tree algorithm, we only add an edge if it does not create a cycle and maintains a diagonally-dominant tractable subgraph model. [sent-207, score-0.596]
89 For the onetree iteration we use tree T1 , and for the two-tree iteration we alternate between trees T1 and T2 (see 7 Figure 5). [sent-215, score-0.554]
90 As the table on the right in Figure 4 demonstrates, the adaptive tree algorithm without the diagonal-dominance (DD) check provides convergence over a much broader range of models than the one-tree and two-tree iterations, but not for all valid models. [sent-216, score-0.521]
91 However, the modified adaptive tree algorithm with the DD check appears to converge almost up to the validity threshold. [sent-217, score-0.381]
92 We have also observed such behavior empirically in many other (though not all) non-walk-summable models where the adaptive ET algorithm with the DD condition converges while stationary methods diverge. [sent-218, score-0.302]
93 Thus, our adaptive methods, compared to stationary iterations, not only provide faster convergence rates in walk-summable models but also converge for a broader class of models. [sent-219, score-0.424]
94 6 Discussion We analyze non-stationary iterations of the ET algorithm that use any sequence of subgraphs for estimation in Gaussian graphical models. [sent-220, score-0.544]
95 Our analysis is based on the recently developed walk-sum interpretation of inference in Gaussian models, and we show that the ET algorithm converges for any sequence of subgraphs used in walk-summable models. [sent-221, score-0.491]
96 These convergence results motivate the development of methods to choose subgraphs adaptively at each iteration to achieve maximum reduction in error. [sent-222, score-0.583]
97 Our simulation results show that the adaptive algorithms provide a significant speedup in convergence over stationary methods. [sent-224, score-0.359]
98 Moreover, these adaptive methods also seem to provide convergence over a broader class of models than stationary algorithms. [sent-225, score-0.391]
99 An interesting question is to develop tractable methods to compute the next K best subgraphs jointly to achieve maximum reduction in error after K iterations. [sent-227, score-0.413]
100 Finally, subgraph preconditioners have been shown to improve the convergence rate of the conjugategradient method; using walk-sum analysis to select such preconditioners is of clear interest. [sent-229, score-0.337]
wordName wordTfidf (topN-words)
[('walks', 0.567), ('sn', 0.324), ('subgraphs', 0.308), ('js', 0.198), ('subgraph', 0.18), ('wn', 0.165), ('tree', 0.157), ('iteration', 0.137), ('hypertree', 0.125), ('ks', 0.123), ('trees', 0.123), ('adaptive', 0.117), ('residual', 0.113), ('uend', 0.104), ('edges', 0.103), ('stationary', 0.101), ('edge', 0.093), ('embedded', 0.092), ('spanning', 0.087), ('vstart', 0.083), ('grid', 0.082), ('et', 0.079), ('concatenation', 0.077), ('convergence', 0.073), ('ending', 0.073), ('iterations', 0.071), ('interpretation', 0.067), ('jsn', 0.062), ('correlation', 0.056), ('broader', 0.055), ('vertices', 0.055), ('graphical', 0.053), ('walk', 0.053), ('graphs', 0.053), ('treewidth', 0.05), ('sequence', 0.049), ('johnson', 0.049), ('tractable', 0.048), ('check', 0.048), ('partial', 0.046), ('models', 0.045), ('absolute', 0.044), ('dd', 0.044), ('convergent', 0.043), ('diagram', 0.043), ('nodes', 0.043), ('chandrasekaran', 0.042), ('diagrams', 0.042), ('jx', 0.042), ('preconditioners', 0.042), ('grayscale', 0.042), ('speedup', 0.042), ('graph', 0.041), ('calculations', 0.04), ('guess', 0.04), ('converges', 0.039), ('gaussian', 0.038), ('adaptively', 0.038), ('live', 0.038), ('diagonal', 0.037), ('wildcard', 0.036), ('nesting', 0.036), ('arbitrary', 0.035), ('errors', 0.035), ('successive', 0.034), ('coef', 0.034), ('attractive', 0.034), ('node', 0.034), ('iterative', 0.034), ('converge', 0.033), ('xt', 0.033), ('denotes', 0.032), ('estimation', 0.032), ('cycle', 0.031), ('looser', 0.031), ('hop', 0.031), ('inde', 0.031), ('zhou', 0.031), ('analyze', 0.031), ('develop', 0.03), ('greedy', 0.03), ('malioutov', 0.029), ('incident', 0.029), ('directed', 0.029), ('developed', 0.028), ('procedures', 0.027), ('reduction', 0.027), ('cyclic', 0.026), ('validity', 0.026), ('complication', 0.026), ('thin', 0.026), ('path', 0.026), ('algorithms', 0.026), ('valid', 0.026), ('hs', 0.025), ('broad', 0.025), ('proposition', 0.025), ('willsky', 0.025), ('nested', 0.025), ('entries', 0.025)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000004 20 nips-2007-Adaptive Embedded Subgraph Algorithms using Walk-Sum Analysis
Author: Venkat Chandrasekaran, Alan S. Willsky, Jason K. Johnson
Abstract: We consider the estimation problem in Gaussian graphical models with arbitrary structure. We analyze the Embedded Trees algorithm, which solves a sequence of problems on tractable subgraphs thereby leading to the solution of the estimation problem on an intractable graph. Our analysis is based on the recently developed walk-sum interpretation of Gaussian estimation. We show that non-stationary iterations of the Embedded Trees algorithm using any sequence of subgraphs converge in walk-summable models. Based on walk-sum calculations, we develop adaptive methods that optimize the choice of subgraphs used at each iteration with a view to achieving maximum reduction in error. These adaptive procedures provide a significant speedup in convergence over stationary iterative methods, and also appear to converge in a larger class of models. 1
2 0.13277359 142 nips-2007-Non-parametric Modeling of Partially Ranked Data
Author: Guy Lebanon, Yi Mao
Abstract: Statistical models on full and partial rankings of n items are often of limited practical use for large n due to computational consideration. We explore the use of non-parametric models for partially ranked data and derive efficient procedures for their use for large n. The derivations are largely possible through combinatorial and algebraic manipulations based on the lattice of partial rankings. In particular, we demonstrate for the first time a non-parametric coherent and consistent model capable of efficiently aggregating partially ranked data of different types. 1
3 0.12402464 75 nips-2007-Efficient Bayesian Inference for Dynamically Changing Graphs
Author: Ozgur Sumer, Umut Acar, Alexander T. Ihler, Ramgopal R. Mettu
Abstract: Motivated by stochastic systems in which observed evidence and conditional dependencies between states of the network change over time, and certain quantities of interest (marginal distributions, likelihood estimates etc.) must be updated, we study the problem of adaptive inference in tree-structured Bayesian networks. We describe an algorithm for adaptive inference that handles a broad range of changes to the network and is able to maintain marginal distributions, MAP estimates, and data likelihoods in all expected logarithmic time. We give an implementation of our algorithm and provide experiments that show that the algorithm can yield up to two orders of magnitude speedups on answering queries and responding to dynamic changes over the sum-product algorithm. 1
4 0.098741151 116 nips-2007-Learning the structure of manifolds using random projections
Author: Yoav Freund, Sanjoy Dasgupta, Mayank Kabra, Nakul Verma
Abstract: We present a simple variant of the k-d tree which automatically adapts to intrinsic low dimensional structure in data. 1
5 0.096820094 120 nips-2007-Linear programming analysis of loopy belief propagation for weighted matching
Author: Sujay Sanghavi, Dmitry Malioutov, Alan S. Willsky
Abstract: Loopy belief propagation has been employed in a wide variety of applications with great empirical success, but it comes with few theoretical guarantees. In this paper we investigate the use of the max-product form of belief propagation for weighted matching problems on general graphs. We show that max-product converges to the correct answer if the linear programming (LP) relaxation of the weighted matching problem is tight and does not converge if the LP relaxation is loose. This provides an exact characterization of max-product performance and reveals connections to the widely used optimization technique of LP relaxation. In addition, we demonstrate that max-product is effective in solving practical weighted matching problems in a distributed fashion by applying it to the problem of self-organization in sensor networks. 1
6 0.092054531 78 nips-2007-Efficient Principled Learning of Thin Junction Trees
7 0.077326007 190 nips-2007-Support Vector Machine Classification with Indefinite Kernels
8 0.074049711 204 nips-2007-Theoretical Analysis of Heuristic Search Methods for Online POMDPs
9 0.073275246 197 nips-2007-The Infinite Markov Model
10 0.072799392 15 nips-2007-A general agnostic active learning algorithm
11 0.070932455 121 nips-2007-Local Algorithms for Approximate Inference in Minor-Excluded Graphs
12 0.06757006 119 nips-2007-Learning with Tree-Averaged Densities and Distributions
13 0.05896946 123 nips-2007-Loop Series and Bethe Variational Bounds in Attractive Graphical Models
14 0.058772013 141 nips-2007-New Outer Bounds on the Marginal Polytope
15 0.058768965 77 nips-2007-Efficient Inference for Distributions on Permutations
16 0.056464747 159 nips-2007-Progressive mixture rules are deviation suboptimal
17 0.05596799 17 nips-2007-A neural network implementing optimal state estimation based on dynamic spike train decoding
18 0.05593431 48 nips-2007-Collective Inference on Markov Models for Modeling Bird Migration
19 0.05504569 94 nips-2007-Gaussian Process Models for Link Analysis and Transfer Learning
20 0.052707266 27 nips-2007-Anytime Induction of Cost-sensitive Trees
topicId topicWeight
[(0, -0.179), (1, -0.013), (2, -0.054), (3, 0.007), (4, -0.027), (5, -0.121), (6, 0.02), (7, 0.065), (8, 0.021), (9, -0.085), (10, -0.054), (11, 0.108), (12, 0.07), (13, -0.035), (14, -0.01), (15, -0.07), (16, 0.026), (17, -0.03), (18, -0.004), (19, -0.017), (20, -0.021), (21, 0.058), (22, 0.096), (23, -0.016), (24, 0.172), (25, -0.033), (26, -0.065), (27, 0.016), (28, 0.003), (29, -0.16), (30, -0.146), (31, 0.121), (32, -0.052), (33, 0.176), (34, -0.166), (35, 0.018), (36, 0.12), (37, 0.052), (38, -0.029), (39, 0.015), (40, -0.022), (41, 0.182), (42, -0.03), (43, 0.033), (44, -0.027), (45, 0.065), (46, 0.073), (47, 0.011), (48, 0.007), (49, 0.227)]
simIndex simValue paperId paperTitle
same-paper 1 0.95717752 20 nips-2007-Adaptive Embedded Subgraph Algorithms using Walk-Sum Analysis
Author: Venkat Chandrasekaran, Alan S. Willsky, Jason K. Johnson
Abstract: We consider the estimation problem in Gaussian graphical models with arbitrary structure. We analyze the Embedded Trees algorithm, which solves a sequence of problems on tractable subgraphs thereby leading to the solution of the estimation problem on an intractable graph. Our analysis is based on the recently developed walk-sum interpretation of Gaussian estimation. We show that non-stationary iterations of the Embedded Trees algorithm using any sequence of subgraphs converge in walk-summable models. Based on walk-sum calculations, we develop adaptive methods that optimize the choice of subgraphs used at each iteration with a view to achieving maximum reduction in error. These adaptive procedures provide a significant speedup in convergence over stationary iterative methods, and also appear to converge in a larger class of models. 1
2 0.63267255 142 nips-2007-Non-parametric Modeling of Partially Ranked Data
Author: Guy Lebanon, Yi Mao
Abstract: Statistical models on full and partial rankings of n items are often of limited practical use for large n due to computational consideration. We explore the use of non-parametric models for partially ranked data and derive efficient procedures for their use for large n. The derivations are largely possible through combinatorial and algebraic manipulations based on the lattice of partial rankings. In particular, we demonstrate for the first time a non-parametric coherent and consistent model capable of efficiently aggregating partially ranked data of different types. 1
3 0.62566137 78 nips-2007-Efficient Principled Learning of Thin Junction Trees
Author: Anton Chechetka, Carlos Guestrin
Abstract: We present the first truly polynomial algorithm for PAC-learning the structure of bounded-treewidth junction trees – an attractive subclass of probabilistic graphical models that permits both the compact representation of probability distributions and efficient exact inference. For a constant treewidth, our algorithm has polynomial time and sample complexity. If a junction tree with sufficiently strong intraclique dependencies exists, we provide strong theoretical guarantees in terms of KL divergence of the result from the true distribution. We also present a lazy extension of our approach that leads to very significant speed ups in practice, and demonstrate the viability of our method empirically, on several real world datasets. One of our key new theoretical insights is a method for bounding the conditional mutual information of arbitrarily large sets of variables with only polynomially many mutual information computations on fixed-size subsets of variables, if the underlying distribution can be approximated by a bounded-treewidth junction tree. 1
4 0.51055735 15 nips-2007-A general agnostic active learning algorithm
Author: Sanjoy Dasgupta, Claire Monteleoni, Daniel J. Hsu
Abstract: We present an agnostic active learning algorithm for any hypothesis class of bounded VC dimension under arbitrary data distributions. Most previous work on active learning either makes strong distributional assumptions, or else is computationally prohibitive. Our algorithm extends the simple scheme of Cohn, Atlas, and Ladner [1] to the agnostic setting, using reductions to supervised learning that harness generalization bounds in a simple but subtle manner. We provide a fall-back guarantee that bounds the algorithm’s label complexity by the agnostic PAC sample complexity. Our analysis yields asymptotic label complexity improvements for certain hypothesis classes and distributions. We also demonstrate improvements experimentally. 1
5 0.47093353 119 nips-2007-Learning with Tree-Averaged Densities and Distributions
Author: Sergey Kirshner
Abstract: We utilize the ensemble of trees framework, a tractable mixture over superexponential number of tree-structured distributions [1], to develop a new model for multivariate density estimation. The model is based on a construction of treestructured copulas – multivariate distributions with uniform on [0, 1] marginals. By averaging over all possible tree structures, the new model can approximate distributions with complex variable dependencies. We propose an EM algorithm to estimate the parameters for these tree-averaged models for both the real-valued and the categorical case. Based on the tree-averaged framework, we propose a new model for joint precipitation amounts data on networks of rain stations. 1
6 0.4626891 116 nips-2007-Learning the structure of manifolds using random projections
7 0.43240821 27 nips-2007-Anytime Induction of Cost-sensitive Trees
8 0.43187952 75 nips-2007-Efficient Bayesian Inference for Dynamically Changing Graphs
9 0.41547424 159 nips-2007-Progressive mixture rules are deviation suboptimal
10 0.40515819 197 nips-2007-The Infinite Markov Model
11 0.40311491 121 nips-2007-Local Algorithms for Approximate Inference in Minor-Excluded Graphs
12 0.3687273 141 nips-2007-New Outer Bounds on the Marginal Polytope
13 0.36246055 77 nips-2007-Efficient Inference for Distributions on Permutations
14 0.34462333 120 nips-2007-Linear programming analysis of loopy belief propagation for weighted matching
15 0.32689244 48 nips-2007-Collective Inference on Markov Models for Modeling Bird Migration
16 0.31972843 196 nips-2007-The Infinite Gamma-Poisson Feature Model
17 0.31455612 31 nips-2007-Bayesian Agglomerative Clustering with Coalescents
18 0.31301737 99 nips-2007-Hierarchical Penalization
19 0.30025887 123 nips-2007-Loop Series and Bethe Variational Bounds in Attractive Graphical Models
20 0.30008221 204 nips-2007-Theoretical Analysis of Heuristic Search Methods for Online POMDPs
topicId topicWeight
[(5, 0.031), (13, 0.022), (16, 0.015), (18, 0.01), (21, 0.038), (31, 0.013), (34, 0.018), (35, 0.027), (47, 0.059), (49, 0.011), (83, 0.598), (85, 0.022), (90, 0.05)]
simIndex simValue paperId paperTitle
1 0.99757928 159 nips-2007-Progressive mixture rules are deviation suboptimal
Author: Jean-yves Audibert
Abstract: We consider the learning task consisting in predicting as well as the best function in a finite reference set G up to the smallest possible additive term. If R(g) denotes the generalization error of a prediction function g, under reasonable assumptions on the loss function (typically satisfied by the least square loss when the output is bounded), it is known that the progressive mixture rule g satisfies ˆ log |G| (1) ER(ˆ) ≤ ming∈G R(g) + Cst g , n where n denotes the size of the training set, and E denotes the expectation w.r.t. the training set distribution.This work shows that, surprisingly, for appropriate reference sets G, the deviation convergence rate of the progressive mixture rule is √ no better than Cst / n: it fails to achieve the expected Cst /n. We also provide an algorithm which does not suffer from this drawback, and which is optimal in both deviation and expectation convergence rates. 1
2 0.99620301 110 nips-2007-Learning Bounds for Domain Adaptation
Author: John Blitzer, Koby Crammer, Alex Kulesza, Fernando Pereira, Jennifer Wortman
Abstract: Empirical risk minimization offers well-known learning guarantees when training and test data come from the same domain. In the real world, though, we often wish to adapt a classifier from a source domain with a large amount of training data to different target domain with very little training data. In this work we give uniform convergence bounds for algorithms that minimize a convex combination of source and target empirical risk. The bounds explicitly model the inherent trade-off between training on a large but inaccurate source data set and a small but accurate target training set. Our theory also gives results when we have multiple source domains, each of which may have a different number of instances, and we exhibit cases in which minimizing a non-uniform combination of source risks can achieve much lower target error than standard empirical risk minimization. 1
3 0.99613786 109 nips-2007-Kernels on Attributed Pointsets with Applications
Author: Mehul Parsana, Sourangshu Bhattacharya, Chiru Bhattacharya, K. Ramakrishnan
Abstract: This paper introduces kernels on attributed pointsets, which are sets of vectors embedded in an euclidean space. The embedding gives the notion of neighborhood, which is used to define positive semidefinite kernels on pointsets. Two novel kernels on neighborhoods are proposed, one evaluating the attribute similarity and the other evaluating shape similarity. Shape similarity function is motivated from spectral graph matching techniques. The kernels are tested on three real life applications: face recognition, photo album tagging, and shot annotation in video sequences, with encouraging results. 1
4 0.99612665 61 nips-2007-Convex Clustering with Exemplar-Based Models
Author: Danial Lashkari, Polina Golland
Abstract: Clustering is often formulated as the maximum likelihood estimation of a mixture model that explains the data. The EM algorithm widely used to solve the resulting optimization problem is inherently a gradient-descent method and is sensitive to initialization. The resulting solution is a local optimum in the neighborhood of the initial guess. This sensitivity to initialization presents a significant challenge in clustering large data sets into many clusters. In this paper, we present a different approach to approximate mixture fitting for clustering. We introduce an exemplar-based likelihood function that approximates the exact likelihood. This formulation leads to a convex minimization problem and an efficient algorithm with guaranteed convergence to the globally optimal solution. The resulting clustering can be thought of as a probabilistic mapping of the data points to the set of exemplars that minimizes the average distance and the information-theoretic cost of mapping. We present experimental results illustrating the performance of our algorithm and its comparison with the conventional approach to mixture model clustering. 1
5 0.99572563 132 nips-2007-Modeling image patches with a directed hierarchy of Markov random fields
Author: Simon Osindero, Geoffrey E. Hinton
Abstract: We describe an efficient learning procedure for multilayer generative models that combine the best aspects of Markov random fields and deep, directed belief nets. The generative models can be learned one layer at a time and when learning is complete they have a very fast inference procedure for computing a good approximation to the posterior distribution in all of the hidden layers. Each hidden layer has its own MRF whose energy function is modulated by the top-down directed connections from the layer above. To generate from the model, each layer in turn must settle to equilibrium given its top-down input. We show that this type of model is good at capturing the statistics of patches of natural images. 1
same-paper 6 0.99494463 20 nips-2007-Adaptive Embedded Subgraph Algorithms using Walk-Sum Analysis
7 0.94671363 190 nips-2007-Support Vector Machine Classification with Indefinite Kernels
8 0.93426675 38 nips-2007-Boosting Algorithms for Maximizing the Soft Margin
9 0.91888452 120 nips-2007-Linear programming analysis of loopy belief propagation for weighted matching
10 0.91694182 116 nips-2007-Learning the structure of manifolds using random projections
11 0.91685891 40 nips-2007-Bundle Methods for Machine Learning
12 0.91636539 54 nips-2007-Computational Equivalence of Fixed Points and No Regret Algorithms, and Convergence to Equilibria
13 0.91487163 21 nips-2007-Adaptive Online Gradient Descent
14 0.91084248 65 nips-2007-DIFFRAC: a discriminative and flexible framework for clustering
15 0.90404272 75 nips-2007-Efficient Bayesian Inference for Dynamically Changing Graphs
16 0.90364981 204 nips-2007-Theoretical Analysis of Heuristic Search Methods for Online POMDPs
17 0.90115404 178 nips-2007-Simulated Annealing: Rigorous finite-time guarantees for optimization on continuous domains
18 0.89808077 10 nips-2007-A Randomized Algorithm for Large Scale Support Vector Learning
19 0.89669037 46 nips-2007-Cluster Stability for Finite Samples
20 0.89636803 101 nips-2007-How SVMs can estimate quantiles and the median