nips nips2005 nips2005-125 knowledge-graph by maker-knowledge-mining

125 nips-2005-Message passing for task redistribution on sparse graphs

Source: pdf

Author: K. Wong, Zhuo Gao, David Tax

Abstract: The problem of resource allocation in sparse graphs with real variables is studied using methods of statistical physics. An efﬁcient distributed algorithm is devised on the basis of insight gained from the analysis and is examined using numerical simulations, showing excellent performance and full agreement with the theoretical results.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Message passing for task redistribution on sparse graphs K. [sent-1, score-0.228]

2 cn Abstract The problem of resource allocation in sparse graphs with real variables is studied using methods of statistical physics. [sent-13, score-0.218]

3 An efﬁcient distributed algorithm is devised on the basis of insight gained from the analysis and is examined using numerical simulations, showing excellent performance and full agreement with the theoretical results. [sent-14, score-0.139]

4 1 Introduction Optimal resource allocation is a well known problem in the area of distributed computing [1, 2] to which signiﬁcant effort has been dedicated within the computer science community. [sent-15, score-0.189]

5 The problem itself is quite general and is applicable to other areas as well where a large number of nodes are required to balance loads/resources and redistribute tasks, such as reducing internet trafﬁc congestion [3]. [sent-16, score-0.262]

6 The problem has many ﬂavors and usually refers, in the computer science literature, to ﬁnding practical heuristic solutions to the distribution of computational load between computers connected in a predetermined manner. [sent-17, score-0.164]

7 The problem we are addressing here is more generic and is represented by nodes of some computational power that should carry out tasks. [sent-18, score-0.233]

8 The nodes are located on a randomly chosen sparse graph of some given connectivity. [sent-20, score-0.336]

9 We focus here on the satisﬁable case where the total computing power is greater than the demand, and where the number of nodes involved is very large. [sent-23, score-0.233]

10 We analyze the problem using the Bethe approximation of statistical mechanics in Section 2, and alternatively a new variant of the replica method [4, 5] in Section 3. [sent-25, score-0.169]

11 We then present numerical results in Section 4, and derive a new message passing distributed algo- rithm on the basis of the analysis (in Section 5). [sent-26, score-0.402]

12 2 The statistical physics framework: Bethe approximation We consider a typical resource allocation task on a sparse graph of Æ nodes, labelled ½ Æ . [sent-28, score-0.331]

13 Each node is randomly connected to other nodes1, and has a capacity £ randomly drawn from a distribution ´£ µ. [sent-29, score-0.733]

14 The objective is to migrate tasks between nodes such that each node will be capable of carrying out its tasks. [sent-30, score-0.559]

15 The current Ý Ý drawn from node to is aimed at satisfying the constraint ·£ Ý ¼ (1) representing the ’revised’ assignment for node , where ½ ¼ for connected/unconnected node pairs and , respectively. [sent-31, score-0.839]

16 To illustrate the statistical mechanics approach to resource allocation, we consider the load balancing task of minimizing the enÈ ergy function (cost) ´Ý µ, where the summation ´ µ runs over all pairs ´ µ of nodes, subject to the constraints (1); ´Ý µ is a general function of the current Ý . [sent-32, score-0.425]

17 For load balancing tasks, ´Ý µ is typically a convex function, which will be assumed in our study. [sent-33, score-0.22]

18 The analysis of the graph is done by introducing the free energy Ì ÐÒ Ý for a temperature Ì ¬ ½ , where Ý is the partition function Ý ´ µ Ý ¼ ¢ ½ Ý ·£ ¾ ¿ ÜÔ ¬ ´ µ ´Ý µ (2) The ¢ function returns 1 for a non-negative argument and 0 otherwise. [sent-34, score-0.448]

19 When the connectivity is low, the probability of ﬁnding a loop of ﬁnite length on the graph is low, and the Bethe approximation well describes the local environment of a node. [sent-35, score-0.141]

20 In the approximation, a node is connected to branches in a tree structure, and the correlations among the branches of the tree are neglected. [sent-36, score-0.506]

21 A node is connected to an ancestor node of the previous generation, and another ½ descendent nodes of the next generation. [sent-38, score-0.888]

22 Consider a vertex Î ´Ìµ of capacity £Î ´Ìµ , and a current Ý is drawn from the vertex. [sent-39, score-0.727]

23 The free energy can be considered as the sum of two parts, ´Ý Ìµ ÆÌ Ú · Î ´Ý Ìµ, where ÆÌ is the number of nodes in the tree Ì, Ú is the average free energy per node, and Î ´Ý Ìµ is referred to as the vertex free energy2 . [sent-41, score-1.551]

24 Note that when a vertex is added to a tree, there is a 1 Although we focus here on graphs of ﬁxed connectivity, one can easily accommodate any connectivity proﬁle within the same framework; the algorithms presented later are completely general. [sent-42, score-0.417]

25 2 This term is marginalized over all inputs to the current vertex, leaving the difference in chemical potential Ý as its sole argument, hence the terminology used. [sent-43, score-0.338]

26 change in the free energy due to the added vertex. [sent-44, score-0.347]

27 Since the number of nodes increases by 1, the vertex free energy is obtained by subtracting the free energy change by the average free energy. [sent-45, score-1.512]

28 Optimizing Ä with respect to Ý , one ob´ µ tains Ý , where is referred to as the chemical potential of node , and the current is driven by the potential difference. [sent-54, score-0.624]

29 Although the analysis has also been carried out in the space of currents, we focus here on the optimization problem in the space of the chemical potentials. [sent-55, score-0.267]

30 Since the energy function is invariant under the addition of an arbitrary global constant to the chemical potentials of È ¾ all nodes, we introduce an extra regularization term ¯ ¾ to break the translational symmetry, where ¯ ¼. [sent-56, score-0.497]

31 The replicated partition function [5] is averaged over all network conﬁgurations with connectivity and capacity distributions ´£ µ. [sent-58, score-0.482]

32 This is a result of the speciﬁc interaction considered which entangles nodes of different indices. [sent-62, score-0.233]

33 Assuming replica symmetry, the saddle point equations yield a recursion relation for a two-component function Ê, which is related to the order parameters via the generating function È× ´Þµ Ö ÉÖ × ´Þ« µÖ« « ¶ Ö« « Ê´Þ« Ìµ ¬ ¾ ¾ ×« · (9) £ In Eq. [sent-69, score-0.499]

34 (9), Ì represents the tree terminated at the vertex node with chemical potential , providing input to the ancestor node with chemical potential Þ , and £ represents the average over the distribution ´£µ. [sent-70, score-1.624]

35 The resultant recursion relation for Ê´Þ Ìµ is independent of the replica indices, and is given by Ê´Þ Ìµ ½ ½ ½ ¢ ÜÔ ¬ ¾ Ê´ ½ ½ ½ Ìµ ¢ ´ µ¾ · ¯ ½ ¾ · Þ ·£Î ´Ìµ (10) where the vertex node has a capacity £Î ´Ìµ ; is a constant. [sent-71, score-1.337]

36 This algebraic structure is typical of the Bethe lattice tree-like representation of networks of connectivity , where a node obtains input from its ½ descendent nodes of the next generation, and Ì represents the tree terminated at the Ø descendent. [sent-73, score-0.792]

37 Except for the regularization factor ÜÔ´ ¬¯ ¾ ¾µ, Ê turns out to be a function of Ý Þ , which is interpreted as the current drawn from a node with chemical potential by its ancestor with chemical potential Þ . [sent-74, score-0.987]

38 One can then express the function Ê as the product of a vertex partition function Î and a normalization factor Ï , that is, Ê´Þ Ìµ Ï ´ µ Î ´Ý Ìµ. [sent-75, score-0.322]

39 In the limit Ò ¼, the dependence on and Ý are separable, providing a recursion relation for Î ´Ý Ìµ. [sent-76, score-0.354]

40 This gives rise to the vertex free energy Î ´Ý Ìµ Ì ÐÒ Î ´Ý Ìµ when a current Ý is drawn from the vertex of a tree Ì. [sent-77, score-1.126]

41 The recursive equation and the average free energy expression agrees with the results in the Bethe approximation. [sent-78, score-0.453]

42 These iterative equations can be directly linked to those obtained from a principled Bayesian approximation, where the logarithms of the messages passed between nodes are proportional to the vertex free energies. [sent-79, score-0.78]

43 Since the vertex free energy of a node depends on its own capacity and the disordered conﬁguration of its descendants, we generate 1000 nodes at each iteration of Eq. [sent-82, score-1.47]

44 (6), with capacities randomly drawn from the distribution ´£µ, each being fed by ½ nodes randomly drawn from the previous iteration. [sent-83, score-0.668]

45 We have discretized the vertex free energies Î ´Ý Ìµ function into a vector, whose Ø component is the value of the function corresponding to the current Ý . [sent-84, score-0.561]

46 To speed up the optimization search at each node, we ﬁrst ﬁnd the vertex saturation current drawn from a node such that: (a) the capacity of the node is just used up; (b) the current drawn by each of its descendant nodes is just enough to saturate its own capacity constraint. [sent-85, score-2.003]

47 When these conditions are satisﬁed, we can separately optimize the current drawn by each descendant node, and the vertex saturation current is equal to the node capacity subtracted by the current drawn by its descendants. [sent-86, score-1.241]

48 To compute the average energy, we randomly draw 2 nodes, compute the optimal current ﬂowing between them, and repeat the sampling to obtain the average. [sent-89, score-0.203]

49 Figure 1(a) shows the results as a function of iteration step Ø, for a Gaussian capacity distribution ´£µ with variance 1 and average £ . [sent-90, score-0.474]

50 Each iteration corresponds to adding one extra generation to the tree structure, such that the iterative process corresponds to approximating the network by an increasingly extensive tree. [sent-91, score-0.284]

51 We observe that after an initial rise with iteration steps, the average energies converges to steady-state values, at a rate which increases with the average capacity. [sent-92, score-0.373]

52 To study the convergence rate of the iterations, we ﬁt the average energy at iteration step ´Øµ ´½µ ÜÔ´ Øµ in the asymptotic regime. [sent-93, score-0.394]

53 1(a), the relaxation rate increases with the average capacity. [sent-95, score-0.175]

54 It is interesting to note that a cusp exists at the average capacity of about 0. [sent-96, score-0.473]

55 Below that value, convergence of the iteration is slow, since the average energy curve starts to develop a plateau before the ﬁnal convergence. [sent-98, score-0.447]

56 Figure 1(b) illustrates the current distribution for various average capacities. [sent-101, score-0.165]

57 The distribution È ´Ý µ consists of a delta function component at Ý ¼ and a continuous component whose breadth decreases with average capacity. [sent-102, score-0.139]

58 The fraction of links with zero currents increases with the average capacity. [sent-103, score-0.389]

59 Hence at a low average capacity, links with nonzero currents form a percolating cluster, whereas at a high average capacity, it breaks into isolated clusters. [sent-104, score-0.488]

60 Ø using 5 Distributed algorithms The local nature of the recursion relation Eq. [sent-105, score-0.328]

61 (6) points to the possibility that the network optimization can be solved by message passing approaches, which have been successful in problems such as error-correcting codes [8] and probabilistic inference [9]. [sent-106, score-0.438]

62 The major advantage of message passing is its potential to solve a global optimization problem via local updates, thereby reducing the computational complexity. [sent-107, score-0.461]

63 For example, the computational complexity of quadratic programming for the load balancing task typically scales as Æ ¿ , whereas capitalizing on the network topology underlying the connectivity of the variables, message passing scales as Æ . [sent-108, score-0.857]

64 (6) to steady states for the same parameters and average capacities as in (a), from right to left. [sent-149, score-0.313]

65 (14) to steady states for the same parameters and average capacities as in (b), from left to right. [sent-161, score-0.313]

66 practical implementation, is its distributive nature; it does not require a global optimizer, and is particularly suitable for distributive control in evolving networks. [sent-164, score-0.16]

67 However, in contrast to other message passing algorithms which pass conditional probability estimates of discrete variables to neighboring nodes, the messages in the present context are more complex, since they are functions Î ´Ý Ìµ of the current Ý . [sent-165, score-0.499]

68 We simplify the message to 2 parameters, namely, the ﬁrst and second derivatives of the vertex free energies. [sent-166, score-0.666]

69 For the quadratic load balancing task, it can be shown that a self-consistent solution of the recursion relation, Eq. [sent-167, score-0.515]

70 (6), consists of vertex free energies which are piecewise quadratic with continuous slopes. [sent-168, score-0.555]

71 This makes the 2-parameter message a very precise approximation. [sent-169, score-0.229]

72 Let ´ µ ´ Î ´Ý Ì µ Ý ¾ Î ´Ý Ìµ Ý ¾ µ be the message passed from node to ; using Eq. [sent-170, score-0.511]

73 (6), the recursion relation of the messages become ¾ ÑÒ È ¢´ µ Ý ¼ ´ È · ´ µ´ ´ ¼¼ ¼ · ¼¼ ¼¼ · · µ ½ · µ ½ ¿ ½ µ ½ ℄ · £ Ý where (11) ¼ (12) with ¼ and ¼¼ representing the ﬁrst and second derivatives of ´Ý µ at Ý Ý respectively. [sent-171, score-0.393]

74 The forward passing of the message from node to is then followed by a backward message from node to for updating the currents Ý according to Ý Ý ¼¼ · · (13) We simulate networks with ¿, ´Ý µ Ý ¾ ¾ and compute their average energies. [sent-172, score-1.312]

75 1(c), the simulation results of the message passing algorithm have an excellent agreement with those obtained by the recursion relation Eq. [sent-176, score-0.815]

76 For the quadratic load balancing task considered here, an independent exact optimization is available for comparison. [sent-178, score-0.336]

77 (11) yield excellent agreement with the iteration of chemical potentials Eq. [sent-183, score-0.468]

78 (11) and (14) allow us to study the distribution È ´ µ of the chemical potentials . [sent-186, score-0.302]

79 Nodes with zero chemical potentials correspond to those with unsaturated capacity constraints. [sent-189, score-0.75]

80 The fraction of unsaturated nodes increases with the average capacity, as shown in the inset of Fig. [sent-190, score-0.698]

81 Hence at a low average capacity, saturated nodes form a percolating cluster, whereas at a high average capacity, it breaks into isolated clusters. [sent-192, score-0.533]

82 It is interesting to note that at the average capacity of 0. [sent-193, score-0.42]

83 45, below which a plateau starts to develop in the relaxation rate of the recursion relation Eq. [sent-194, score-0.408]

84 (6), the fraction of unsaturated nodes is about 0. [sent-195, score-0.42]

85 1(c) also shows the simulation results of the average energy , using both Eqs. [sent-199, score-0.301]

86 We see that the average energy decreases for when the connectivity increases. [sent-201, score-0.406]

87 This is because the increase in links connecting a node provides more freedom to allocate resources. [sent-202, score-0.297]

88 Remarkably, multiplying by a factor of ´ ¾µ, we ﬁnd that the 3 curves collapse in this regime of average capacity, showing that the average energy scales as ´ ¾µ ½ in this regime, as shown in the inset of Fig. [sent-207, score-0.59]

89 1, the energy rises above the exponential ﬁt applicable to the average capacity above 0. [sent-212, score-0.644]

90 (b) The fraction of links with zero currents increases with the average capacity, and is rather insensitive to the connectivity. [sent-214, score-0.417]

91 (c) The fraction of unsaturated nodes increases with the average capacity, and is rather insensitive to Ê connectivity. [sent-217, score-0.596]

92 In the limit of large the ½ average capacities, it approaches the upper bound of ¼ £ ´£µ, which is the probability that the capacity of a node is non-negative. [sent-218, score-0.683]

93 of the changes in the chemical potentials to fall below a threshold. [sent-223, score-0.302]

94 of the sums of the currents in both message directions of a link to fall below a threshold. [sent-228, score-0.39]

95 When the average capacity decreases further, the convergence time deviates above the power laws. [sent-234, score-0.459]

96 6 Summary We have studied a prototype problem of resource allocation on sparsely connected networks using the replica method, resulting in recursion relations interpretable using the Bethe approximation. [sent-235, score-0.579]

97 The resultant recursion relation leads to a message passing algorithm for optimizing the average energy, which signiﬁcantly reduces the computational complexity of the global optimization task and is suitable for online distributive control. [sent-236, score-0.987]

98 The suggested 2-parameter approximation produces results with excellent agreement with the original recursion relation. [sent-237, score-0.354]

99 For the simple but illustrative example in this letter, we have considered a quadratic cost function, resulting in an exact algorithm based on local iterations of chemical potentials, and the message passing algorithm shows remarkable agreement with the exact result. [sent-238, score-0.739]

100 The suggested simple message passing algorithm can be generalized to more realistic cases of nonlinear cost functions and additional constraints on the capacities of nodes and links. [sent-239, score-0.804]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('capacity', 0.314), ('vertex', 0.285), ('recursion', 0.242), ('node', 0.237), ('nodes', 0.233), ('chemical', 0.23), ('message', 0.229), ('energy', 0.195), ('capacities', 0.17), ('bethe', 0.158), ('free', 0.152), ('passing', 0.146), ('replica', 0.138), ('kong', 0.134), ('unsaturated', 0.134), ('inset', 0.13), ('currents', 0.128), ('load', 0.127), ('hong', 0.106), ('average', 0.106), ('connectivity', 0.105), ('balancing', 0.093), ('resource', 0.089), ('relation', 0.086), ('tree', 0.081), ('descendent', 0.08), ('distributive', 0.08), ('allocation', 0.073), ('potentials', 0.072), ('drawn', 0.069), ('energies', 0.065), ('messages', 0.065), ('ancestor', 0.064), ('links', 0.06), ('current', 0.059), ('excellent', 0.057), ('iterating', 0.056), ('terminated', 0.056), ('increasingly', 0.055), ('agreement', 0.055), ('iteration', 0.054), ('bay', 0.053), ('cusp', 0.053), ('descendant', 0.053), ('descendants', 0.053), ('migrate', 0.053), ('percolating', 0.053), ('sherrington', 0.053), ('quadratic', 0.053), ('fraction', 0.053), ('china', 0.053), ('plateau', 0.053), ('fed', 0.051), ('physics', 0.05), ('potential', 0.049), ('beijing', 0.046), ('passed', 0.045), ('symbols', 0.044), ('saad', 0.042), ('increases', 0.042), ('wong', 0.04), ('convergence', 0.039), ('randomly', 0.038), ('extensive', 0.037), ('optimization', 0.037), ('saturation', 0.037), ('steady', 0.037), ('connected', 0.037), ('partition', 0.037), ('tasks', 0.036), ('graph', 0.036), ('branches', 0.035), ('breaks', 0.035), ('spin', 0.035), ('resultant', 0.035), ('link', 0.033), ('delta', 0.033), ('saddle', 0.033), ('uk', 0.032), ('eu', 0.031), ('mechanics', 0.031), ('generation', 0.031), ('remarkably', 0.03), ('branch', 0.03), ('symmetry', 0.029), ('sparse', 0.029), ('applicable', 0.029), ('insensitive', 0.028), ('water', 0.028), ('labelled', 0.028), ('temperature', 0.028), ('distributed', 0.027), ('relaxation', 0.027), ('regime', 0.027), ('graphs', 0.027), ('task', 0.026), ('scales', 0.026), ('network', 0.026), ('cost', 0.026), ('limit', 0.026)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999964 125 nips-2005-Message passing for task redistribution on sparse graphs

Author: K. Wong, Zhuo Gao, David Tax

2 0.15754187 70 nips-2005-Fast Information Value for Graphical Models

Author: Brigham S. Anderson, Andrew W. Moore

Abstract: Calculations that quantify the dependencies between variables are vital to many operations with graphical models, e.g., active learning and sensitivity analysis. Previously, pairwise information gain calculation has involved a cost quadratic in network size. In this work, we show how to perform a similar computation with cost linear in network size. The loss function that allows this is of a form amenable to computation by dynamic programming. The message-passing algorithm that results is described and empirical results demonstrate large speedups without decrease in accuracy. In the cost-sensitive domains examined, superior accuracy is achieved.

3 0.14544286 200 nips-2005-Variable KD-Tree Algorithms for Spatial Pattern Search and Discovery

Author: Jeremy Kubica, Joseph Masiero, Robert Jedicke, Andrew Connolly, Andrew W. Moore

Abstract: In this paper we consider the problem of ﬁnding sets of points that conform to a given underlying model from within a dense, noisy set of observations. This problem is motivated by the task of efﬁciently linking faint asteroid detections, but is applicable to a range of spatial queries. We survey current tree-based approaches, showing a trade-off exists between single tree and multiple tree algorithms. To this end, we present a new type of multiple tree algorithm that uses a variable number of trees to exploit the advantages of both approaches. We empirically show that this algorithm performs well using both simulated and astronomical data.

4 0.12835592 96 nips-2005-Inference with Minimal Communication: a Decision-Theoretic Variational Approach

Author: O. P. Kreidl, Alan S. Willsky

Abstract: Given a directed graphical model with binary-valued hidden nodes and real-valued noisy observations, consider deciding upon the maximum a-posteriori (MAP) or the maximum posterior-marginal (MPM) assignment under the restriction that each node broadcasts only to its children exactly one single-bit message. We present a variational formulation, viewing the processing rules local to all nodes as degrees-of-freedom, that minimizes the loss in expected (MAP or MPM) performance subject to such online communication constraints. The approach leads to a novel message-passing algorithm to be executed ofﬂine, or before observations are realized, which mitigates the performance loss by iteratively coupling all rules in a manner implicitly driven by global statistics. We also provide (i) illustrative examples, (ii) assumptions that guarantee convergence and efﬁciency and (iii) connections to active research areas. 1

5 0.12482742 46 nips-2005-Consensus Propagation

Author: Benjamin V. Roy, Ciamac C. Moallemi

Abstract: We propose consensus propagation, an asynchronous distributed protocol for averaging numbers across a network. We establish convergence, characterize the convergence rate for regular graphs, and demonstrate that the protocol exhibits better scaling properties than pairwise averaging, an alternative that has received much recent attention. Consensus propagation can be viewed as a special case of belief propagation, and our results contribute to the belief propagation literature. In particular, beyond singly-connected graphs, there are very few classes of relevant problems for which belief propagation is known to converge. 1

6 0.097987168 204 nips-2005-Walk-Sum Interpretation and Analysis of Gaussian Belief Propagation

7 0.096585982 121 nips-2005-Location-based activity recognition

8 0.095706485 15 nips-2005-A Theoretical Analysis of Robust Coding over Noisy Overcomplete Channels

9 0.089878067 107 nips-2005-Large scale networks fingerprinting and visualization using the k-core decomposition

10 0.087086745 139 nips-2005-Non-iterative Estimation with Perturbed Gaussian Markov Processes

11 0.085179664 122 nips-2005-Logic and MRF Circuitry for Labeling Occluding and Thinline Visual Contours

12 0.075465038 153 nips-2005-Policy-Gradient Methods for Planning

13 0.075323716 187 nips-2005-Temporal Abstraction in Temporal-difference Networks

14 0.072395474 69 nips-2005-Fast Gaussian Process Regression using KD-Trees

15 0.069375128 154 nips-2005-Preconditioner Approximations for Probabilistic Graphical Models

16 0.065657198 184 nips-2005-Structured Prediction via the Extragradient Method

17 0.063620657 75 nips-2005-Fixing two weaknesses of the Spectral Method

18 0.063206621 24 nips-2005-An Approximate Inference Approach for the PCA Reconstruction Error

19 0.059577655 90 nips-2005-Hot Coupling: A Particle Approach to Inference and Normalization on Pairwise Undirected Graphs

20 0.052134369 178 nips-2005-Soft Clustering on Graphs

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.177), (1, 0.019), (2, 0.039), (3, -0.035), (4, -0.082), (5, -0.057), (6, -0.254), (7, 0.179), (8, 0.081), (9, 0.21), (10, 0.126), (11, -0.107), (12, -0.112), (13, -0.007), (14, 0.058), (15, 0.013), (16, -0.05), (17, -0.005), (18, -0.0), (19, 0.023), (20, -0.01), (21, -0.016), (22, -0.033), (23, -0.018), (24, -0.009), (25, 0.073), (26, -0.029), (27, 0.018), (28, 0.113), (29, -0.007), (30, -0.068), (31, 0.046), (32, -0.015), (33, -0.078), (34, -0.058), (35, -0.023), (36, -0.011), (37, 0.04), (38, 0.101), (39, -0.012), (40, -0.009), (41, 0.017), (42, 0.045), (43, 0.006), (44, 0.008), (45, 0.011), (46, -0.04), (47, -0.043), (48, 0.013), (49, -0.041)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.97665769 125 nips-2005-Message passing for task redistribution on sparse graphs

Author: K. Wong, Zhuo Gao, David Tax

2 0.80560374 204 nips-2005-Walk-Sum Interpretation and Analysis of Gaussian Belief Propagation

Author: Dmitry Malioutov, Alan S. Willsky, Jason K. Johnson

Abstract: This paper presents a new framework based on walks in a graph for analysis and inference in Gaussian graphical models. The key idea is to decompose correlations between variables as a sum over all walks between those variables in the graph. The weight of each walk is given by a product of edgewise partial correlations. We provide a walk-sum interpretation of Gaussian belief propagation in trees and of the approximate method of loopy belief propagation in graphs with cycles. This perspective leads to a better understanding of Gaussian belief propagation and of its convergence in loopy graphs. 1

3 0.72191793 70 nips-2005-Fast Information Value for Graphical Models

Author: Brigham S. Anderson, Andrew W. Moore

4 0.68336684 96 nips-2005-Inference with Minimal Communication: a Decision-Theoretic Variational Approach

Author: O. P. Kreidl, Alan S. Willsky

5 0.6820581 46 nips-2005-Consensus Propagation

Author: Benjamin V. Roy, Ciamac C. Moallemi

6 0.60899043 200 nips-2005-Variable KD-Tree Algorithms for Spatial Pattern Search and Discovery

7 0.54865652 121 nips-2005-Location-based activity recognition

8 0.48014545 107 nips-2005-Large scale networks fingerprinting and visualization using the k-core decomposition

9 0.47537598 139 nips-2005-Non-iterative Estimation with Perturbed Gaussian Markov Processes

10 0.42704213 122 nips-2005-Logic and MRF Circuitry for Labeling Occluding and Thinline Visual Contours

11 0.39118397 75 nips-2005-Fixing two weaknesses of the Spectral Method

12 0.37991446 90 nips-2005-Hot Coupling: A Particle Approach to Inference and Normalization on Pairwise Undirected Graphs

13 0.37872162 69 nips-2005-Fast Gaussian Process Regression using KD-Trees

14 0.37355599 154 nips-2005-Preconditioner Approximations for Probabilistic Graphical Models

15 0.37046641 11 nips-2005-A Hierarchical Compositional System for Rapid Object Detection

16 0.35740092 187 nips-2005-Temporal Abstraction in Temporal-difference Networks

17 0.31432217 153 nips-2005-Policy-Gradient Methods for Planning

18 0.31314954 175 nips-2005-Sequence and Tree Kernels with Statistical Feature Mining

19 0.30738682 15 nips-2005-A Theoretical Analysis of Robust Coding over Noisy Overcomplete Channels

20 0.30261281 184 nips-2005-Structured Prediction via the Extragradient Method

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(3, 0.036), (10, 0.054), (18, 0.39), (27, 0.036), (31, 0.062), (34, 0.076), (39, 0.013), (41, 0.02), (55, 0.077), (69, 0.05), (73, 0.019), (88, 0.052), (91, 0.03)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.87234885 125 nips-2005-Message passing for task redistribution on sparse graphs

Author: K. Wong, Zhuo Gao, David Tax

2 0.73699403 147 nips-2005-On the Convergence of Eigenspaces in Kernel Principal Component Analysis

Author: Laurent Zwald, Gilles Blanchard

Abstract: This paper presents a non-asymptotic statistical analysis of Kernel-PCA with a focus different from the one proposed in previous work on this topic. Here instead of considering the reconstruction error of KPCA we are interested in approximation error bounds for the eigenspaces themselves. We prove an upper bound depending on the spacing between eigenvalues but not on the dimensionality of the eigenspace. As a consequence this allows to infer stability results for these estimated spaces. 1 Introduction. Principal Component Analysis (PCA for short in the sequel) is a widely used tool for data dimensionality reduction. It consists in ﬁnding the most relevant lower-dimension projection of some data in the sense that the projection should keep as much of the variance of the original data as possible. If the target dimensionality of the projected data is ﬁxed in advance, say D – an assumption that we will make throughout the present paper – the solution of this problem is obtained by considering the projection on the span SD of the ﬁrst D eigenvectors of the covariance matrix. Here by ’ﬁrst D eigenvectors’ we mean eigenvectors associated to the D largest eigenvalues counted with multiplicity; hereafter with some abuse the span of the ﬁrst D eigenvectors will be called “D-eigenspace” for short when there is no risk of confusion. The introduction of the ’Kernel trick’ has allowed to extend this methodology to data mapped in a kernel feature space, then called KPCA [8]. The interest of this extension is that, while still linear in feature space, it gives rise to nonlinear interpretation in original space – vectors in the kernel feature space can be interpreted as nonlinear functions on the original space. For PCA as well as KPCA, the true covariance matrix (resp. covariance operator) is not known and has to be estimated from the available data, an procedure which in the case of ¨ Kernel spaces is linked to the so-called Nystrom approximation [13]. The subspace given as an output is then obtained as D-eigenspace SD of the empirical covariance matrix or operator. An interesting question from a statistical or learning theoretical point of view is then, how reliable this estimate is. This question has already been studied [10, 2] from the point of view of the reconstruction error of the estimated subspace. What this means is that (assuming the data is centered in Kernel space for simplicity) the average reconstruction error (square norm of the distance to the projection) of SD converges to the (optimal) reconstruction error of SD and that bounds are known about the rate of convergence. However, this does not tell us much about the convergence of SD to SD – since two very different subspaces can have a very similar reconstruction error, in particular when some eigenvalues are very close to each other (the gap between the eigenvalues will actually appear as a central point of the analysis to come). In the present work, we set to study the behavior of these D-eigenspaces themselves: we provide ﬁnite sample bounds describing the closeness of the D-eigenspaces of the empirical covariance operator to the true one. There are several broad motivations for this analysis. First, the reconstruction error alone is a valid criterion only if one really plans to perform dimensionality reduction of the data and stop there. However, PCA is often used merely as a preprocessing step and the projected data is then submitted to further processing (which could be classiﬁcation, regression or something else). In particular for KPCA, the projection subspace in the kernel space can be interpreted as a subspace of functions on the original space; one then expects these functions to be relevant for the data at hand and for some further task (see e.g. [3]). In these cases, if we want to analyze the full procedure (from a learning theoretical sense), it is desirable to have a more precise information on the selected subspace than just its reconstruction error. In particular, from a learning complexity point of view, it is important to ensure that functions used for learning stay in a set of limited complexity, which is ensured if the selected subspace is stable (which is a consequence of its convergence). The approach we use here is based on perturbation bounds and we essentially walk in the steps pioneered by Kolchinskii and Gin´ [7] (see also [4]) using tools of operator perturbae tion theory [5]. Similar methods have been used to prove consistency of spectral clustering [12, 11]. An important difference here is that we want to study directly the convergence of the whole subspace spanned by the ﬁrst D eigenvectors instead of the separate convergence of the individual eigenvectors; in particular we are interested in how D acts as a complexity parameter. The important point in our main result is that it does not: only the gap between the D-th and the (D + 1)-th eigenvalue comes into account. This means that there in no increase in complexity (as far as this bound is concerned: of course we cannot exclude that better bounds can be obtained in the future) between estimating the D-th eigenvector alone or the span of the ﬁrst D eigenvectors. Our contribution in the present work is thus • to adapt the operator perturbation result of [7] to D-eigenspaces. • to get non-asymptotic bounds on the approximation error of Kernel-PCA eigenspaces thanks to the previous tool. In section 2 we introduce shortly the notation, explain the main ingredients used and obtain a ﬁrst bound based on controlling separately the ﬁrst D eigenvectors, and depending on the dimension D. In section 3 we explain why the ﬁrst bound is actually suboptimal and derive an improved bound as a consequence of an operator perturbation result that is more adapted to our needs and deals directly with the D-eigenspace as a whole. Section 4 concludes and discusses the obtained results. Mathematical proofs are found in the appendix. 2 First result. Notation. The interest variable X takes its values in some measurable space X , following the distribution P . We consider KPCA and are therefore primarily interested in the mapping of X into a reproducing kernel Hilbert space H with kernel function k through the feature mapping ϕ(x) = k(x, ·). The objective of the kernel PCA procedure is to recover a D-dimensional subspace SD of H such that the projection of ϕ(X) on SD has maximum averaged squared norm. All operators considered in what follows are Hilbert-Schmidt and the norm considered for these operators will be the Hilbert-Schmidt norm unless precised otherwise. Furthermore we only consider symmetric nonnegative operators, so that they can be diagonalized and have a discrete spectrum. Let C denote the covariance operator of variable ϕ(X). To simplify notation we assume that nonzero eigenvalues λ1 > λ2 > . . . of C are all simple (This is for convenience only. In the conclusion we discuss what changes have to be made if this is not the case). Let φ1 , φ2 , . . . be the associated eigenvectors. It is well-known that the optimal D-dimensional reconstruction space is SD = span{φ1 , . . . , φD }. The KPCA procedure approximates this objective by considering the empirical covariance operator, denoted Cn , and the subspace SD spanned by its ﬁrst D eigenvectors. We denote PSD , PSD the orthogonal projectors on b these spaces. A ﬁrst bound. Broadly speaking, the main steps required to obtain the type of result we are interested in are 1. A non-asympotic bound on the (Hilbert-Schmidt) norm of the difference between the empirical and the true covariance operators; 2. An operator perturbation result bounding the difference between spectral projectors of two operators by the norm of their difference. The combination of these two steps leads to our goal. The ﬁrst step consists in the following Lemma coming from [9]: Lemma 1 (Corollary 5 of [9]) Supposing that supx∈X k(x, x) ≤ M , with probability greater than 1 − e−ξ , 2M ξ Cn − C ≤ √ 1+ . 2 n As for the second step, [7] provides the following perturbation bound (see also e.g. [12]): Theorem 2 (Simpliﬁed Version of [7], Theorem 5.2 ) Let A be a symmetric positive Hilbert-Schmidt operator of the Hilbert space H with simple positive eigenvalues λ 1 > 1 λ2 > . . . For an integer r such that λr > 0, let δr = δr ∧ δr−1 where δr = 2 (λr − λr+1 ). Let B ∈ HS(H) be another symmetric operator such that B < δr /2 and (A + B) is still a positive operator with simple nonzero eigenvalues. Let Pr (A) (resp. Pr (A + B)) denote the orthogonal projector onto the subspace spanned by the r-th eigenvector of A (resp. (A + B)). Then, these projectors satisfy: Pr (A) − Pr (A + B) ≤ 2 B δr . Remark about the Approximation Error of the Eigenvectors: let us recall that a control over the Hilbert-Schmidt norm of the projections onto eigenspaces imply a control on the approximation errors of the eigenvectors themselves. Indeed, let φr , ψr denote the (normalized) r-th eigenvectors of the operators above with signs chosen so that φ r , ψr > 0. Then P φ r − P ψr 2 2 = 2(1 − φr , ψr ) ≥ 2(1 − φr , ψr ) = φr − ψr 2 . Now, the orthogonal projector on the direct sum of the ﬁrst D eigenspaces is the sum D r=1 Pr . Using the triangle inequality, and combining Lemma 1 and Theorem 2, we conclude that with probability at least 1 − e−ξ the following holds: D P SD − P SD ≤ b −1 δr r=1 4M √ n 1+ ξ 2 , 2 provided that n ≥ 16M 2 1 + ξ 2 −2 (sup1≤r≤D δr ) . The disadvantage of this bound is that we are penalized on the one hand by the (inverse) gaps between the eigenvalues, and on the other by the dimension D (because we have to sum the inverse gaps from 1 to D). In the next section we improve the operator perturbation bound to get an improved result where only the gap δD enters into account. 3 Improved Result. We ﬁrst prove the following variant on the operator perturbation property which better corresponds to our needs by taking directly into account the projection on the ﬁrst D eigenvectors at once. The proof uses the same kind of techniques as in [7]. Theorem 3 Let A be a symmetric positive Hilbert-Schmidt operator of the Hilbert space H with simple nonzero eigenvalues λ1 > λ2 > . . . Let D > 0 be an integer such that λD > 0, δD = 1 (λD − λD+1 ). Let B ∈ HS(H) be another symmetric operator such that 2 B < δD /2 and (A + B) is still a positive operator. Let P D (A) (resp. P D (A + B)) denote the orthogonal projector onto the subspace spanned by the ﬁrst D eigenvectors A (resp. (A + B)). Then these satisfy: P D (A) − P D (A + B) ≤ B . δD (1) This then gives rise to our main result on KPCA: Theorem 4 Assume that supx∈X k(x, x) ≤ M . Let SD , SD be the subspaces spanned by the ﬁrst D eigenvectors of C, resp. Cn deﬁned earlier. Denoting λ1 > λ2 > . . . the 1 eigenvalues of C, if D > 0 is such that λD > 0, put δD = 2 (λD − λD+1 ) and BD = 2M δD 1+ ξ 2 . 2 Then provided that n ≥ BD , the following bound holds with probability at least 1 − e−ξ : BD P SD − P SD ≤ √ . b n (2) This entails in particular ⊥ SD ⊂ g + h, g ∈ SD , h ∈ SD , h 1 Hk ≤ 2BD n− 2 g Hk . (3) The important point here is that the approximation error now only depends on D through the (inverse) gap between the D-th and (D + 1)-th eigenvalues. Note that using the results of section 2, we would have obtained exactly the same bound for estimating the D-th eigenvector only – or even a worse bound since δD = δD ∧ δD−1 appears in this case. Thus, at least from the point of view of this technique (which could still yield suboptimal bounds), there is no increase of complexity between estimating the D-th eigenvector alone and estimating the span of the ﬁrst D eigenvectors. Note that the inclusion (3) can be interpreted geometrically by saying that for any vector in SD , the √ tangent of the angle between this vector and its projection on SD is upper bounded by BD / n, which we can interpret as a stability property. Comment about the Centered Case. In the actual (K)PCA procedure, the data is actually ﬁrst empirically recentered, so that one has to consider the centered covariance operator C and its empirical counterpart C n . A result similar to Theorem 4 also holds in this case (up to some additional constant factors). Indeed, a result similar to Lemma 1 holds for the recentered operators [2]. Combined again with Theorem 3, this allows to come to similar conclusions for the “true” centered KPCA. 4 Conclusion and Discussion In this paper, ﬁnite sample size conﬁdence bounds of the eigenspaces of Kernel-PCA (the D-eigenspaces of the empirical covariance operator) are provided using tools of operator perturbation theory. This provides a ﬁrst step towards an in-depth complexity analysis of algorithms using KPCA as pre-processing, and towards taking into account the randomness of the obtained models (e.g. [3]). We proved a bound in which the complexity factor for estimating the eigenspace SD by its empirical counterpart depends only on the inverse gap between the D-th and (D + 1)-th eigenvalues. In addition to the previously cited works, we take into account the centering of the data and obtain comparable rates. In this work we assumed for simplicity of notation the eigenvalues to be simple. In the case the covariance operator C has nonzero eigenvalues with multiplicities m1 , m2 , . . . possibly larger than one, the analysis remains the same except for one point: we have to assume that the dimension D of the subspaces considered is of the form m1 + · · · + mr for a certain r. This could seem restrictive in comparison with the results obtained for estimating the sum of the ﬁrst D eigenvalues themselves [2] (which is linked to the reconstruction error in KPCA) where no such restriction appears. However, it should be clear that we need this restriction when considering D−eigenspaces themselves since the target space has to be unequivocally deﬁned, otherwise convergence cannot occur. Thus, it can happen in this special case that the reconstruction error converges while the projection space itself does not. Finally, a common point of the two analyses (over the spectrum and over the eigenspaces) lies in the fact that the bounds involve an inverse gap in the eigenvalues of the true covariance operator. Finally, how tight are these bounds and do they at least carry some correct qualitative information about the behavior of the eigenspaces? Asymptotic results (central limit Theorems) in [6, 4] always provide the correct goal to shoot for since they actually give the limit distributions of these quantities. They imply that there is still important ground to cover before bridging the gap between asymptotic and non-asymptotic. This of course opens directions for future work. Acknowledgements: This work was supported in part by the PASCAL Network of Excellence (EU # 506778). A Appendix: proofs. Proof of Lemma 1. This lemma is proved in [9]. We give a short proof for the sake of n 1 completness. Cn − C = n i=1 CXi − E [CX ] with CX = ϕ(X) ⊗ ϕ(X)∗ = k(X, X) ≤ M . We can apply the bounded difference inequality to the variable Cn − C , so that with probability greater than 1 − e−ξ , Cn − C ≤ E [ Cn − C ] + 2M Moreover, by Jensen’s inequality E [ Cn − C ] ≤ E n 1 simple calculations leads to E n i=1 CXi − E [CX ] 2 4M n . This concludes the proof of lemma 1. 1 n 2 n i=1 CXi 1 = nE 2 ξ 2n . 1 2 − E [CX ] , and CX − E [CX ] 2 ≤ Proof of Theorem 3. The variation of this proof with respect to Theorem 5.2 in [7] is (a) to work directly in a (inﬁnite-dimensional) Hilbert space, requiring extra caution for some details and (b) obtaining an improved bound by considering D-eigenspaces at once. The key property of Hilbert-Schmidt operators allowing to work directly in a inﬁnite dimensional setting is that HS(H) is a both right and left ideal of Lc (H, H), the Banach space of all continuous linear operators of H endowed with the operator norm . op . Indeed, ∀ T ∈ HS(H), ∀S ∈ Lc (H, H), T S and ST belong to HS(H) with TS ≤ T S ST ≤ T and op S op . (4) The spectrum of an Hilbert-Schmidt operator T is denoted Λ(T ) and the sequence of eigenvalues in non-increasing order is denoted λ(T ) = (λ1 (T ) ≥ λ2 (T ) ≥ . . .) . In the following, P D (T ) denotes the orthogonal projector onto the D-eigenspace of T . The Hoffmann-Wielandt inequality in inﬁnite dimensional setting[1] yields that: λ(A) − λ(A + B) 2 ≤ B ≤ δD . 2 (5) implying in particular that ∀i > 0, |λi (A) − λi (A + B)| ≤ δD . 2 (6) Results found in [5] p.39 yield the formula P D (A) − P D (A + B) = − 1 2iπ γ (RA (z) − RA+B (z))dz ∈ Lc (H, H) . (7) where RA (z) = (A − z Id)−1 is the resolvent of A, provided that γ is a simple closed curve in C enclosing exactly the ﬁrst D eigenvalues of A and (A + B). Moreover, the same reference (p.60) states that for ξ in the complementary of Λ(A), RA (ξ) op = dist(ξ, Λ(A)) −1 . (8) The proof of the theorem now relies on the simple choice for the closed curve γ in (7), drawn in the picture below and consisting of three straight lines and a semi-circle of radius D L. For all L > δ2 , γ intersect neither the eigenspectrum of A (by equation (6)) nor the eigenspectrum of A + B. Moreover, the eigenvalues of A (resp. A + B) enclosed by γ are exactly λ1 (A), . . . , λD (A) (resp. λ1 (A + B), . . . , λD (A + B)). Moreover, for z ∈ γ, T (z) = RA (z) − RA+B (z) = −RA+B (z)BRA (z) belongs to HS(H) and depends continuously on z by (4). Consequently, P D (A) − P D (A + B) ≤ 1 2π b a (RA − RA+B )(γ(t)) |γ (t)|dt . N Let SN = n=0 (−1)n (RA (z)B)n RA (z). RA+B (z) = (Id + RA (z)B)−1 RA (z) and, for z ∈ γ and L > δD , RA (z)B op ≤ RA (z) op B ≤ δD 1 ≤ , 2 dist(z, Λ(A)) 2 γ L L δD λ 0 D+1 δD λ2 λD λ1 δD δD δD 2 2 2 L . op imply that SN −→ RA+B (z) (uniformly for z ∈ γ). Using property (4), since B ∈ . HS(H), SN BRA (z) −→ RA+B (z)BRA (z) = RA+B (z) − RA (z) . Finally, RA (z) − RA+B (z) = (−1)n (RA (z)B)n RA (z) n≥1 where the series converges in HS(H), uniformly in z ∈ γ. Using again property (4) and (8) implies B n (RA − RA+B )(γ(t)) ≤ RA (γ(t)) n+1 B n ≤ op distn+1 (γ(t), Λ(A)) n≥1 Finally, since for L > δD , B ≤ δD 2 n≥1 ≤ dist(γ(t),Λ(A)) , 2 b B 1 |γ (t)|dt . 2 (γ(t), Λ(A)) π a dist Splitting the last integral into four parts according to the deﬁnition of the contour γ, we obtain P D (A) − P D (A + B) ≤ 2arctan( δL ) 1 µ1 (A) − (µD (A) − δD ) π D |γ (t)|dt ≤ + +2 , dist2 (γ(t), Λ(A)) δD L L2 a and letting L goes to inﬁnity leads to the result. b Proof of Theorem 4. Lemma 1 and Theorem 3 yield inequality (2). Together with as1 2 sumption n ≥ BD it implies PSD − PSD ≤ 2 . Let f ∈ SD : f = PSD (f ) + PSD (f ) . ⊥ b Lemma 5 below with F = SD and G = SD , and the fact that the operator norm is bounded by the Hilbert-Schmidt norm imply that 4 PSD (f ) 2 k ≤ PSD − PSD 2 PSD (f ) 2 k . ⊥ b H H 3 Gathering the different inequalities, Theorem 4 is proved. Lemma 5 Let F and G be two vector subspaces of H such that PF − PG the following bound holds: 4 PF − PG 2 PF (f ) 2 . ∀ f ∈ G , PF ⊥ (f ) 2 ≤ H op H 3 op 1 ≤ 2 . Then Proof of Lemma 5. = f − PF (f ) 2 = (PG − PF )(f ) 2 = PF − PF ⊥ (f ) 2 For f ∈ G, we have PG (f ) = f , hence PF (f ) 2 op PG 2 op ≤ P F − PG gathering the terms containing PF ⊥ (f ) 1/4 leads to the conclusion. 2 f 2 2 + PF ⊥ (f ) 2 on the left-hand side and using PF −PG 2 op ≤ References [1] R. Bhatia and L. Elsner. The Hoffman-Wielandt inequality in inﬁnite dimensions. Proc.Indian Acad.Sci(Math. Sci.) 104 (3), p. 483-494, 1994. [2] G. Blanchard, O. Bousquet, and L. Zwald. Statistical Properties of Kernel Principal Component Analysis. Proceedings of the 17th. Conference on Learning Theory (COLT 2004), p. 594–608. Springer, 2004. [3] G. Blanchard, P. Massart, R. Vert, and L. Zwald. Kernel projection machine: a new tool for pattern recognition. Proceedings of the 18th. Neural Information Processing System (NIPS 2004), p. 1649–1656. MIT Press, 2004. [4] J. Dauxois, A. Pousse, and Y. Romain. Asymptotic theory for the Principal Component Analysis of a vector random function: some applications to statistical inference. Journal of multivariate analysis 12, 136-154, 1982. [5] T. Kato. Perturbation Theory for Linear Operators. New-York: Springer-Verlag, 1966. [6] V. Koltchinskii. Asymptotics of spectral projections of some random matrices approximating integral operators. Progress in Probability, 43:191–227, 1998. [7] V. Koltchinskii and E. Gin´ . Random matrix approximation of spectra of integral e operators. Bernoulli, 6(1):113–167, 2000. [8] B. Sch¨ lkopf, A. J. Smola, and K.-R. M¨ ller. Nonlinear component analysis as a o u kernel eigenvalue problem. Neural Computation, 10:1299–1319, 1998. [9] J. Shawe-Taylor and N. Cristianini. Estimating the moments of a random vector with applications. Proceedings of the GRETSI 2003 Conference, p. 47-52, 2003. [10] J. Shawe-Taylor, C. Williams, N. Cristianini, and J. Kandola. On the eigenspectrum of the Gram matrix and the generalisation error of Kernel PCA. IEEE Transactions on Information Theory 51 (7), p. 2510-2522, 2005. [11] U. von Luxburg, M. Belkin, and O. Bousquet. Consistency of spectral clustering. Technical Report 134, Max Planck Institute for Biological Cybernetics, 2004. [12] U. von Luxburg, O. Bousquet, and M. Belkin. On the convergence of spectral clustering on random samples: the normalized case. Proceedings of the 17th Annual Conference on Learning Theory (COLT 2004), p. 457–471. Springer, 2004. [13] C. K. I. Williams and M. Seeger. The effect of the input density distribution on kernel-based classiﬁers. Proceedings of the 17th International Conference on Machine Learning (ICML), p. 1159–1166. Morgan Kaufmann, 2000.

3 0.52301675 179 nips-2005-Sparse Gaussian Processes using Pseudo-inputs

Author: Edward Snelson, Zoubin Ghahramani

Abstract: We present a new Gaussian process (GP) regression model whose covariance is parameterized by the the locations of M pseudo-input points, which we learn by a gradient based optimization. We take M N, where N is the number of real data points, and hence obtain a sparse regression method which has O(M 2 N ) training cost and O(M 2 ) prediction cost per test case. We also ﬁnd hyperparameters of the covariance function in the same joint optimization. The method can be viewed as a Bayesian regression model with particular input dependent noise. The method turns out to be closely related to several other sparse GP approaches, and we discuss the relation in detail. We ﬁnally demonstrate its performance on some large data sets, and make a direct comparison to other sparse GP methods. We show that our method can match full GP performance with small M , i.e. very sparse solutions, and it signiﬁcantly outperforms other approaches in this regime. 1

4 0.3846505 24 nips-2005-An Approximate Inference Approach for the PCA Reconstruction Error

Author: Manfred Opper

Abstract: The problem of computing a resample estimate for the reconstruction error in PCA is reformulated as an inference problem with the help of the replica method. Using the expectation consistent (EC) approximation, the intractable inference problem can be solved efﬁciently using only two variational parameters. A perturbative correction to the result is computed and an alternative simpliﬁed derivation is also presented. 1

5 0.36022228 139 nips-2005-Non-iterative Estimation with Perturbed Gaussian Markov Processes

Author: Yunsong Huang, B. Keith Jenkins

Abstract: We develop an approach for estimation with Gaussian Markov processes that imposes a smoothness prior while allowing for discontinuities. Instead of propagating information laterally between neighboring nodes in a graph, we study the posterior distribution of the hidden nodes as a whole—how it is perturbed by invoking discontinuities, or weakening the edges, in the graph. We show that the resulting computation amounts to feed-forward fan-in operations reminiscent of V1 neurons. Moreover, using suitable matrix preconditioners, the incurred matrix inverse and determinant can be approximated, without iteration, in the same computational style. Simulation results illustrate the merits of this approach.

6 0.35971352 16 nips-2005-A matching pursuit approach to sparse Gaussian process regression

7 0.35953081 96 nips-2005-Inference with Minimal Communication: a Decision-Theoretic Variational Approach

8 0.35807332 182 nips-2005-Statistical Convergence of Kernel CCA

9 0.35694677 72 nips-2005-Fast Online Policy Gradient Learning with SMD Gain Vector Adaptation

10 0.35582483 62 nips-2005-Efficient Estimation of OOMs

11 0.35147199 46 nips-2005-Consensus Propagation

12 0.34734532 56 nips-2005-Diffusion Maps, Spectral Clustering and Eigenfunctions of Fokker-Planck Operators

13 0.34618372 154 nips-2005-Preconditioner Approximations for Probabilistic Graphical Models

14 0.3451581 78 nips-2005-From Weighted Classification to Policy Search

15 0.34175199 153 nips-2005-Policy-Gradient Methods for Planning

16 0.34100166 144 nips-2005-Off-policy Learning with Options and Recognizers

17 0.34088197 192 nips-2005-The Information-Form Data Association Filter

18 0.33855486 43 nips-2005-Comparing the Effects of Different Weight Distributions on Finding Sparse Representations

19 0.33803204 142 nips-2005-Oblivious Equilibrium: A Mean Field Approximation for Large-Scale Dynamic Games

20 0.33777213 200 nips-2005-Variable KD-Tree Algorithms for Spatial Pattern Search and Discovery