nips nips2010 nips2010-105 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Ulrike V. Luxburg, Agnes Radl, Matthias Hein
Abstract: The commute distance between two vertices in a graph is the expected time it takes a random walk to travel from the first to the second vertex and back. We study the behavior of the commute distance as the size of the underlying graph increases. We prove that the commute distance converges to an expression that does not take into account the structure of the graph at all and that is completely meaningless as a distance function on the graph. Consequently, the use of the raw commute distance for machine learning purposes is strongly discouraged for large graphs and in high dimensions. As an alternative we introduce the amplified commute distance that corrects for the undesired large sample effects. 1
Reference: text
sentIndex sentText sentNum sentScore
1 Getting lost in space: Large sample analysis of the commute distance Ulrike von Luxburg Agnes Radl Max Planck Institute for Biological Cybernetics, T¨ bingen, Germany u {ulrike. [sent-1, score-0.96]
2 de Abstract The commute distance between two vertices in a graph is the expected time it takes a random walk to travel from the first to the second vertex and back. [sent-7, score-1.379]
3 We study the behavior of the commute distance as the size of the underlying graph increases. [sent-8, score-1.104]
4 We prove that the commute distance converges to an expression that does not take into account the structure of the graph at all and that is completely meaningless as a distance function on the graph. [sent-9, score-1.336]
5 Consequently, the use of the raw commute distance for machine learning purposes is strongly discouraged for large graphs and in high dimensions. [sent-10, score-1.095]
6 As an alternative we introduce the amplified commute distance that corrects for the undesired large sample effects. [sent-11, score-0.924]
7 1 Introduction Given an undirected, weighted graph, the commute distance between two vertices u and v is defined as the expected time it takes a random walk starting in vertex u to travel to vertex v and back to u. [sent-12, score-1.302]
8 As a rule of thumb, the more paths connect u with v, the smaller the commute distance becomes. [sent-14, score-0.974]
9 As a consequence, it supposedly satisfies the following, highly desirable property: Property ( ): Vertices in the same cluster of the graph have a small commute distance, whereas two vertices in different clusters of the graph have a “large” commute distance. [sent-15, score-1.83]
10 It is because of this property that the commute distance has become a popular choice and is widely used, for example in clustering (Yen et al. [sent-16, score-0.949]
11 In this paper we study how the commute distance (up to a constant factor equivalent to the resistance distance, see below for exact definitions) behaves when the size of the graph increases. [sent-24, score-1.344]
12 We focus on the case of random geometric graphs as this is most relevant to machine learning, but similar results hold for very general classes of graphs under mild assumptions. [sent-25, score-0.238]
13 Cij ≈ vol(G) di dj and The intuitive reason for this behavior is that if the graph is large, the random walk “gets lost” in the sheer size of the graph. [sent-27, score-0.374]
14 It takes so long to travel through a substantial part of the graph that by the time the random walk comes close to its goal it has already “forgotten” where it started from. [sent-28, score-0.307]
15 It only depends on the inverse degree of the target vertex vj , which intuitively represents the likelihood that the random walk exactly hits vj once it is in its neighborhood. [sent-30, score-0.334]
16 Our findings have very strong implications: The raw commute distance is not a useful distance function on large graphs. [sent-32, score-1.192]
17 On the negative side, our approximation result shows that contrary to popular belief, the commute distance does not take into account any global properties of the data, at least if the graph is “large enough”. [sent-33, score-1.08]
18 The resulting large sample commute distance dist(vi , vj ) = 1/di + 1/dj is completely meaningless as a distance on a graph. [sent-35, score-1.226]
19 For example, all data points have the same nearest neighbor (namely, the vertex with the largest degree), the same second-nearest neighbor (the vertex with the second-largest degree), and so on. [sent-36, score-0.248]
20 In particular, the main motivation to use the commute distance, Property ( ), no longer holds when the graph becomes “large enough”. [sent-37, score-0.866]
21 Often, n in the order of 1000 is already enough to make the commute distance very close to its approximation expression (see Section 5 for details). [sent-39, score-0.924]
22 Consequently, even on moderate-sized graphs, the use of the raw commute distance as a basis for machine learning algorithms should be discouraged. [sent-41, score-0.978]
23 It has been reported in the literature that hitting times and commute times can be observed to be quite small if the vertices under consideration have a high degree, and that the spread of the commute distance values can be quite large (Liben-Nowell and Kleinberg, 2003, Brand, 2005, Yen et al. [sent-43, score-1.852]
24 In the light of our theoretical results we can see immediately why the undesired behavior of the commute distance occurs. [sent-46, score-0.948]
25 Based on our theory we suggest a new correction, the amplified commute distance. [sent-48, score-0.71]
26 This is a new distance function that is derived from the commute distance, but avoids its artifacts. [sent-49, score-0.924]
27 This distance function is Euclidean, making it well-suited for machine learning purposes and kernel methods. [sent-50, score-0.307]
28 In some applications the commute distance is not used as a distance function, but for other reasons, for example in graph sparsification (Spielman and Srivastava, 2008) or when computing bounds on mixing or cover times (Aleliunas et al. [sent-52, score-1.319]
29 To obtain the commute distance between all points in a graph one has to compute the pseudo-inverse of the graph Laplacian matrix, an operation of time complexity O(n3 ). [sent-56, score-1.264]
30 To circumvent the matrix inversion, several approximations of the commute distance have been suggested in the literature (Spielman and Srivastava, 2008, Sarkar and Moore, 2007, Brand, 2005). [sent-58, score-0.924]
31 Our results lead to a much simpler and well-justified way of approximating the commute distance on large random geometric graphs. [sent-59, score-0.978]
32 By di := j=1 wij we denote the degree of vertex vi and vol(G) := j=1 dj is the volume of the graph. [sent-66, score-0.248]
33 The edges in the graph are defined such that “neighboring points” are connected: In the ε-graph we connect two points whenever their Euclidean distance is less than or equal to ε. [sent-83, score-0.462]
34 In the undirected, symmetric k-nearest neighbor graph we connect vi to vj if Xi is among the k nearest neighbors of Xj or vice versa. [sent-84, score-0.299]
35 In the mutual k-nearest neighbor graph we connect vi to vj if Xi is among the k nearest neighbors of Xj and vice versa. [sent-85, score-0.299]
36 The hitting time Hij is defined as the expected time it takes a random walk starting in vertex vi to travel to vertex vj (with Hii := 0 by definition). [sent-89, score-0.485]
37 The commute distance (also called commute time) between vi and vj is defined as Cij := Hij + Hji . [sent-90, score-1.72]
38 Some readers might also know the commute distance under the name resistance distance. [sent-91, score-1.188]
39 The resistance distance Rij between i and j is defined as the effective resistance between the vertices i and j in the network. [sent-94, score-0.811]
40 It is well known that the resistance distance coincides with the commute distance up to a constant: Cij = vol(G)·Rij . [sent-95, score-1.402]
41 We want to study the behavior of the commute distance between two fixed points s and t. [sent-99, score-0.976]
42 For notational convenience, we will formulate all the following results in terms of the resistance distance. [sent-118, score-0.264]
43 To obtain the results for the commute distance one just has to multiply by factor vol(G). [sent-119, score-0.924]
44 3 Convergence of the resistance distance on random geometric graphs In this section we present our theoretical main results for random geometric graphs. [sent-120, score-0.678]
45 We show that on this type of graph, the resistance distance Rij converges to the trivial limit 1/di + 1/dj . [sent-121, score-0.507]
46 Theorem 2 (Resistance distance on kNN-graphs) Fix two points Xi and Xj . [sent-124, score-0.242]
47 Assume that Xi and Xj have distance at least h from the boundary of X and that (k/n)1/d /2pmax ≤ h. [sent-126, score-0.214]
48 , c5 > 0 such that with probability at least 1 − c1 n exp(−c2 k) the resistance distance on both the symmetric and the mutual kNN-graph satisfies kRij − k k + di dj ≤ 1 c4 k log(n/k) + (k/n)1/3 + 1 1 c5 k if d = 3 if d > 3 The probability converges to 1 if n → ∞ and k/ log(n) → ∞. [sent-130, score-0.57]
49 Theorem 3 (Resistance distance on ε-graphs) Fix two points Xi and Xj . [sent-133, score-0.242]
50 , c6 > 0 such that with probability at least 1 − c1 n exp(−c2 nεd ) − c3 exp(−c4 nεd )/εd the resistance distance on the ε-graph satisfies nεd Rij − nεd nεd + di dj ≤ c5 log(1/ε)+ε+1 nε3 1 c6 nεd if d = 3 if d > 3 The probability converges to 1 if n → ∞ and nεd / log(n) → ∞. [sent-139, score-0.57]
51 Note that to achieve the convergence of the resistance distance we have to rescale it appropriately (for example, in the ε-graph we scale by a factor of nεd ). [sent-144, score-0.478]
52 More generally, results about the convergence of the commute distance to 1/di + 1/dj can also be proved for other kinds of graphs such as graphs with given expected degrees and even for power law graphs, under the assumption that the minimal degree in the graph slowly increases with n. [sent-159, score-1.301]
53 Consider two fixed vertices s and t in a connected graph and consider the graph as an electrical network where each edge has resistance 1. [sent-162, score-0.675]
54 By the electrical laws, resistances in series add up, that is for two resistances R1 and R2 in series we get the overall resistance R = R1 + R2 . [sent-163, score-0.374]
55 The resistance “spanned” by these ds parallel edges satisfies ds 1/R = i=1 1, that is R = 1/ds . [sent-167, score-0.362]
56 It turns out that the contribution of these paths to the resistance is negligible (essentially, we have so many wires between the two neighborhoods that electricity can flow nearly freely). [sent-170, score-0.288]
57 So the overall effective resistance between s and t is dominated by the edges adjacent to i and j with contributions 1/ds + 1/dt . [sent-171, score-0.302]
58 2 of Bollob´ s (1998) that states that the resistance distance beween two a 4 ds many outgoing edges, each with s resistance 1 d t many outgoing t edges, each with resistance 1 very many paths Figure 1: Intuition for the proof of Theorems 2 and 3. [sent-174, score-1.06]
59 4 Correcting the resistance distance Obviously, the large sample resistance distance Rij ≈ 1/di + 1/dj is completely meaningless as a distance on a graph. [sent-180, score-1.212]
60 The question we want to discuss in this section is whether there is a way to correct the commute distance such that this unpleasant large sample effect does not occur. [sent-181, score-0.948]
61 It has been observed in several empirical studies that the commute distances are quite small if the vertices under consideration have a high degree, and that the spread of the commute distance values can be quite large. [sent-183, score-1.774]
62 For the commute distance, this leads to the suggested correction of CLN K (i, j) := dj Hij + di Hji . [sent-188, score-0.878]
63 Even though we did not prove it explicitly in our paper, the convergence results for the commute time also hold for the individual hitting times. [sent-189, score-0.8]
64 (2009) exploit the well-known fact that the commute distance is Euclidean and its kernel matrix coincides with the Moore-Penrose inverse L+ of the graph Laplacian matrix. [sent-196, score-1.148]
65 The idea is that the sigmoid transformation reduces the spread of the distance (or similarity) values. [sent-198, score-0.248]
66 (2009) he considers the kernel matrix that corresponds to the commute distance. [sent-202, score-0.778]
67 One the first glance it is surprising that using the centered and normalized kernel instead of the commute distance should make any difference. [sent-205, score-0.992]
68 However, whenever one takes a Euclidean distance function of the form dist(i, j) = sij + ui + uj − 2δij ui and computes the corresponding centered kernel matrix, one obtains s Kij = Kij + 2δij ui − 2 2 (ui + uj ) + 2 n n 5 n ur , r=1 (1) where K s is the kernel matrix induced by s. [sent-206, score-0.515]
69 The proof of our main theorems shows that the edges adjacent to i and j completely dominate the behavior of the resistance distance: they are the “bottleneck” of the flow, and their contribution 1/di + 1/dj dominates all the other terms. [sent-210, score-0.326]
70 We believe that the key to obtaining a good distance function is to remove the influence of the 1/di terms and “amplify” the influence of the general graph term Sij . [sent-212, score-0.37]
71 This can be achieved by either using the off-diagonal terms of the pseudo-inverse graph Laplacian L† while ignoring its diagonal, or by building a distance function based on the remainder terms Sij directly. [sent-213, score-0.37]
72 We define the amplified commute distance as Camp (i, j) = Sij + uij with Sij = Rij − 1/di − 1/dj and uij = 2wij /di dj − wii /d2 − wjj /d2 . [sent-215, score-1.055]
73 i j Proposition 4 (Amplified commute distance is Euclidean) The matrix D with entries dij = Camp (i, j)1/2 is a Euclidean distance matrix. [sent-217, score-1.138]
74 Additionally to being a Euclidean distance, the amplified commute distance has a nice limit behavior. [sent-222, score-0.953]
75 For all practical purposes, one should use the kernel induced by the amplified commute distance and center and normalize it. [sent-224, score-0.992]
76 In formulas, the amplified commute kernel is 1 1 ¯ ¯ ¯ ¯ Kamp (i, j) := Kij / Kii Kjj with K = (I − 11 )Camp (I − 11 ) (2) n n (where I is the identity matrix, 1 the vector of all ones, and Camp the amplified commute distance matrix). [sent-225, score-1.702]
77 Note that the correction by Brand and our amplified commute kernel are very similar, but not identical with each other. [sent-227, score-0.854]
78 5 Experiments Our first set of experiments considers the question how fast the convergence of the commute distance takes place in practice. [sent-231, score-0.924]
79 This means that the problems of the raw commute distance already occur for small sample size. [sent-233, score-0.978]
80 We show the results for ε-graphs, unweighted kNN graphs and Gaussian similarity graphs (fully connected weighted graphs with edge weights exp( xi − xj 2 /σ 2 )). [sent-236, score-0.336]
81 Given some value k for the kNN-graph we thus set the values of ε for the ε-graph and σ for the Gaussian graph to be equal to the maximal k-nearest neighbor distance in the data set. [sent-238, score-0.43]
82 In a second set of experiments we compare the different corrections of the raw commute distance. [sent-268, score-0.902]
83 To this end, we built a kNN graph of the whole USPS data set (all 9298 points, k = 10), computed the commute distance matrix and the various corrections. [sent-269, score-1.08]
84 We can see that as predicted by theory, the raw commute distance does not identify the cluster structure. [sent-272, score-1.007]
85 However, the cluster structure is still visible in the kernel corresponding to the commute distance, the pseudoinverse graph Laplacian L† . [sent-273, score-0.963]
86 In our heat plots, all four corrections of the graph Laplacian show the cluster structure to a certain extent (the correction by LNK to a small extent, the corrections by Brand, Yen and us to a bigger extent). [sent-279, score-0.567]
87 As predicted by theory, we can see that the raw commute distance performs extremely poor. [sent-290, score-0.978]
88 The Euclidean distance behaves reasonably, but is outperformed by all corrections of the commute distance. [sent-291, score-1.062]
89 We believe that the correction by LNK is “a bit too naive”, whereas the corrections by Brand, Yen and us “tend to work” in a ranking based setting. [sent-302, score-0.214]
90 6 Discussion In this paper we have proved that the commute distance on random geometric graphs can be approximated by a very simple limit expression. [sent-308, score-1.099]
91 This shows that the use of the raw commute distance for machine learning purposes can be problematic. [sent-311, score-1.003]
92 However, the structure of the graph can be recovered by certain corrections of the commute distance. [sent-312, score-1.004]
93 We suggest to use either the correction by Brand (2005) or our own amplified commute kernel from Section 4. [sent-313, score-0.854]
94 The intuitive explanation for our result is that as the sample size increases, the random walk on the sample graph “gets lost” in the sheer size of the graph. [sent-315, score-0.258]
95 It takes so long to travel through a substantial part of the graph that by the time the random walk comes close to its goal it has already “forgotten” where it started from. [sent-316, score-0.307]
96 Stated differently: the random walk on the graph has mixed before it hits the desired target vertex. [sent-317, score-0.282]
97 On a higher level, we expect that the problem of “getting lost” not only affects the commute distance, but many other methods where random walks are used in a naive way to explore global properties of a graph. [sent-318, score-0.744]
98 In general, we believe that one has to be particularly careful when using random walk based methods for extracting global properties of graphs in order to avoid getting lost and converging to meaningless results. [sent-321, score-0.272]
99 The electrical resistance of a graph captures its commute and cover times. [sent-357, score-1.13]
100 Hitting times, commute distances and the spectral gap in large random geometric graphs. [sent-462, score-0.801]
wordName wordTfidf (topN-words)
[('commute', 0.71), ('resistance', 0.264), ('distance', 0.214), ('brand', 0.189), ('yen', 0.169), ('graph', 0.156), ('corrections', 0.138), ('ampli', 0.128), ('vol', 0.114), ('walk', 0.102), ('rij', 0.099), ('graphs', 0.092), ('hitting', 0.09), ('sij', 0.081), ('vertex', 0.079), ('hij', 0.078), ('correction', 0.076), ('knn', 0.072), ('vertices', 0.069), ('camp', 0.069), ('lnk', 0.069), ('kernel', 0.068), ('dist', 0.062), ('fouss', 0.06), ('resistances', 0.055), ('geometric', 0.054), ('raw', 0.054), ('dj', 0.053), ('euclidean', 0.052), ('travel', 0.049), ('qiu', 0.048), ('epsilon', 0.048), ('pmin', 0.048), ('rel', 0.048), ('sarkar', 0.048), ('vj', 0.046), ('meaningless', 0.042), ('bollob', 0.041), ('kamp', 0.041), ('pmax', 0.041), ('usps', 0.041), ('vi', 0.04), ('di', 0.039), ('uij', 0.039), ('edges', 0.038), ('degree', 0.037), ('distances', 0.037), ('laplacian', 0.037), ('herbster', 0.036), ('cln', 0.036), ('spielman', 0.036), ('kij', 0.036), ('lost', 0.036), ('bottleneck', 0.035), ('cij', 0.035), ('spread', 0.034), ('walks', 0.034), ('neighbor', 0.031), ('deviation', 0.03), ('ds', 0.03), ('xj', 0.03), ('connected', 0.03), ('kleinberg', 0.03), ('heat', 0.03), ('limit', 0.029), ('maximal', 0.029), ('diagonal', 0.029), ('cluster', 0.029), ('ui', 0.028), ('points', 0.028), ('lines', 0.028), ('aleliunas', 0.028), ('avin', 0.028), ('ham', 0.028), ('hancock', 0.028), ('hji', 0.028), ('krij', 0.028), ('kyen', 0.028), ('srivastava', 0.028), ('wittmann', 0.028), ('ow', 0.027), ('valid', 0.027), ('dim', 0.027), ('density', 0.027), ('luxburg', 0.026), ('connect', 0.026), ('constants', 0.026), ('purposes', 0.025), ('et', 0.025), ('paths', 0.024), ('radl', 0.024), ('chandra', 0.024), ('doyle', 0.024), ('hits', 0.024), ('kii', 0.024), ('kjj', 0.024), ('lyons', 0.024), ('saerens', 0.024), ('unpleasant', 0.024), ('behavior', 0.024)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000006 105 nips-2010-Getting lost in space: Large sample analysis of the resistance distance
Author: Ulrike V. Luxburg, Agnes Radl, Matthias Hein
Abstract: The commute distance between two vertices in a graph is the expected time it takes a random walk to travel from the first to the second vertex and back. We study the behavior of the commute distance as the size of the underlying graph increases. We prove that the commute distance converges to an expression that does not take into account the structure of the graph at all and that is completely meaningless as a distance function on the graph. Consequently, the use of the raw commute distance for machine learning purposes is strongly discouraged for large graphs and in high dimensions. As an alternative we introduce the amplified commute distance that corrects for the undesired large sample effects. 1
2 0.088569358 162 nips-2010-Link Discovery using Graph Feature Tracking
Author: Emile Richard, Nicolas Baskiotis, Theodoros Evgeniou, Nicolas Vayatis
Abstract: We consider the problem of discovering links of an evolving undirected graph given a series of past snapshots of that graph. The graph is observed through the time sequence of its adjacency matrix and only the presence of edges is observed. The absence of an edge on a certain snapshot cannot be distinguished from a missing entry in the adjacency matrix. Additional information can be provided by examining the dynamics of the graph through a set of topological features, such as the degrees of the vertices. We develop a novel methodology by building on both static matrix completion methods and the estimation of the future state of relevant graph features. Our procedure relies on the formulation of an optimization problem which can be approximately solved by a fast alternating linearized algorithm whose properties are examined. We show experiments with both simulated and real data which reveal the interest of our methodology. 1
3 0.08714363 63 nips-2010-Distributed Dual Averaging In Networks
Author: Alekh Agarwal, Martin J. Wainwright, John C. Duchi
Abstract: The goal of decentralized optimization over a network is to optimize a global objective formed by a sum of local (possibly nonsmooth) convex functions using only local computation and communication. We develop and analyze distributed algorithms based on dual averaging of subgradients, and provide sharp bounds on their convergence rates as a function of the network size and topology. Our analysis clearly separates the convergence of the optimization algorithm itself from the effects of communication constraints arising from the network structure. We show that the number of iterations required by our algorithm scales inversely in the spectral gap of the network. The sharpness of this prediction is confirmed both by theoretical lower bounds and simulations for various networks. 1
4 0.076129943 85 nips-2010-Exact learning curves for Gaussian process regression on large random graphs
Author: Matthew Urry, Peter Sollich
Abstract: We study learning curves for Gaussian process regression which characterise performance in terms of the Bayes error averaged over datasets of a given size. Whilst learning curves are in general very difficult to calculate we show that for discrete input domains, where similarity between input points is characterised in terms of a graph, accurate predictions can be obtained. These should in fact become exact for large graphs drawn from a broad range of random graph ensembles with arbitrary degree distributions where each input (node) is connected only to a finite number of others. Our approach is based on translating the appropriate belief propagation equations to the graph ensemble. We demonstrate the accuracy of the predictions for Poisson (Erdos-Renyi) and regular random graphs, and discuss when and why previous approximations of the learning curve fail. 1
5 0.07494764 254 nips-2010-Stability Approach to Regularization Selection (StARS) for High Dimensional Graphical Models
Author: Han Liu, Kathryn Roeder, Larry Wasserman
Abstract: A challenging problem in estimating high-dimensional graphical models is to choose the regularization parameter in a data-dependent way. The standard techniques include K-fold cross-validation (K-CV), Akaike information criterion (AIC), and Bayesian information criterion (BIC). Though these methods work well for low-dimensional problems, they are not suitable in high dimensional settings. In this paper, we present StARS: a new stability-based method for choosing the regularization parameter in high dimensional inference for undirected graphs. The method has a clear interpretation: we use the least amount of regularization that simultaneously makes a graph sparse and replicable under random sampling. This interpretation requires essentially no conditions. Under mild conditions, we show that StARS is partially sparsistent in terms of graph estimation: i.e. with high probability, all the true edges will be included in the selected model even when the graph size diverges with the sample size. Empirically, the performance of StARS is compared with the state-of-the-art model selection procedures, including K-CV, AIC, and BIC, on both synthetic data and a real microarray dataset. StARS outperforms all these competing procedures.
6 0.074634679 117 nips-2010-Identifying graph-structured activation patterns in networks
7 0.070308901 204 nips-2010-Penalized Principal Component Regression on Graphs for Analysis of Subnetworks
8 0.069943443 259 nips-2010-Subgraph Detection Using Eigenvector L1 Norms
9 0.062891722 222 nips-2010-Random Walk Approach to Regret Minimization
10 0.061761152 230 nips-2010-Robust Clustering as Ensembles of Affinity Relations
11 0.061108772 223 nips-2010-Rates of convergence for the cluster tree
12 0.060007017 104 nips-2010-Generative Local Metric Learning for Nearest Neighbor Classification
13 0.057608273 249 nips-2010-Spatial and anatomical regularization of SVM for brain image analysis
14 0.057280362 138 nips-2010-Large Margin Multi-Task Metric Learning
15 0.056727387 150 nips-2010-Learning concept graphs from text with stick-breaking priors
16 0.05672225 181 nips-2010-Network Flow Algorithms for Structured Sparsity
17 0.054755442 148 nips-2010-Learning Networks of Stochastic Differential Equations
18 0.054436449 80 nips-2010-Estimation of Renyi Entropy and Mutual Information Based on Generalized Nearest-Neighbor Graphs
19 0.052864846 108 nips-2010-Graph-Valued Regression
20 0.052421071 124 nips-2010-Inductive Regularized Learning of Kernel Functions
topicId topicWeight
[(0, 0.137), (1, 0.046), (2, 0.046), (3, 0.048), (4, 0.013), (5, -0.057), (6, 0.066), (7, -0.071), (8, 0.065), (9, 0.038), (10, -0.042), (11, -0.039), (12, -0.007), (13, -0.014), (14, 0.069), (15, -0.06), (16, 0.033), (17, -0.107), (18, -0.039), (19, 0.032), (20, 0.032), (21, -0.008), (22, -0.129), (23, 0.071), (24, 0.035), (25, -0.04), (26, 0.076), (27, 0.026), (28, -0.009), (29, 0.015), (30, 0.024), (31, 0.045), (32, 0.006), (33, -0.047), (34, -0.034), (35, -0.015), (36, -0.04), (37, -0.012), (38, 0.086), (39, 0.041), (40, -0.048), (41, -0.004), (42, 0.036), (43, -0.05), (44, 0.079), (45, 0.098), (46, 0.016), (47, 0.061), (48, 0.001), (49, -0.019)]
simIndex simValue paperId paperTitle
same-paper 1 0.94859678 105 nips-2010-Getting lost in space: Large sample analysis of the resistance distance
Author: Ulrike V. Luxburg, Agnes Radl, Matthias Hein
Abstract: The commute distance between two vertices in a graph is the expected time it takes a random walk to travel from the first to the second vertex and back. We study the behavior of the commute distance as the size of the underlying graph increases. We prove that the commute distance converges to an expression that does not take into account the structure of the graph at all and that is completely meaningless as a distance function on the graph. Consequently, the use of the raw commute distance for machine learning purposes is strongly discouraged for large graphs and in high dimensions. As an alternative we introduce the amplified commute distance that corrects for the undesired large sample effects. 1
2 0.76818484 259 nips-2010-Subgraph Detection Using Eigenvector L1 Norms
Author: Benjamin Miller, Nadya Bliss, Patrick J. Wolfe
Abstract: When working with network datasets, the theoretical framework of detection theory for Euclidean vector spaces no longer applies. Nevertheless, it is desirable to determine the detectability of small, anomalous graphs embedded into background networks with known statistical properties. Casting the problem of subgraph detection in a signal processing context, this article provides a framework and empirical results that elucidate a “detection theory” for graph-valued data. Its focus is the detection of anomalies in unweighted, undirected graphs through L1 properties of the eigenvectors of the graph’s so-called modularity matrix. This metric is observed to have relatively low variance for certain categories of randomly-generated graphs, and to reveal the presence of an anomalous subgraph with reasonable reliability when the anomaly is not well-correlated with stronger portions of the background graph. An analysis of subgraphs in real network datasets confirms the efficacy of this approach. 1
3 0.68263072 204 nips-2010-Penalized Principal Component Regression on Graphs for Analysis of Subnetworks
Author: Ali Shojaie, George Michailidis
Abstract: Network models are widely used to capture interactions among component of complex systems, such as social and biological. To understand their behavior, it is often necessary to analyze functionally related components of the system, corresponding to subsystems. Therefore, the analysis of subnetworks may provide additional insight into the behavior of the system, not evident from individual components. We propose a novel approach for incorporating available network information into the analysis of arbitrary subnetworks. The proposed method offers an efficient dimension reduction strategy using Laplacian eigenmaps with Neumann boundary conditions, and provides a flexible inference framework for analysis of subnetworks, based on a group-penalized principal component regression model on graphs. Asymptotic properties of the proposed inference method, as well as the choice of the tuning parameter for control of the false positive rate are discussed in high dimensional settings. The performance of the proposed methodology is illustrated using simulated and real data examples from biology. 1
4 0.67466956 117 nips-2010-Identifying graph-structured activation patterns in networks
Author: James Sharpnack, Aarti Singh
Abstract: We consider the problem of identifying an activation pattern in a complex, largescale network that is embedded in very noisy measurements. This problem is relevant to several applications, such as identifying traces of a biochemical spread by a sensor network, expression levels of genes, and anomalous activity or congestion in the Internet. Extracting such patterns is a challenging task specially if the network is large (pattern is very high-dimensional) and the noise is so excessive that it masks the activity at any single node. However, typically there are statistical dependencies in the network activation process that can be leveraged to fuse the measurements of multiple nodes and enable reliable extraction of highdimensional noisy patterns. In this paper, we analyze an estimator based on the graph Laplacian eigenbasis, and establish the limits of mean square error recovery of noisy patterns arising from a probabilistic (Gaussian or Ising) model based on an arbitrary graph structure. We consider both deterministic and probabilistic network evolution models, and our results indicate that by leveraging the network interaction structure, it is possible to consistently recover high-dimensional patterns even when the noise variance increases with network size. 1
5 0.60764962 85 nips-2010-Exact learning curves for Gaussian process regression on large random graphs
Author: Matthew Urry, Peter Sollich
Abstract: We study learning curves for Gaussian process regression which characterise performance in terms of the Bayes error averaged over datasets of a given size. Whilst learning curves are in general very difficult to calculate we show that for discrete input domains, where similarity between input points is characterised in terms of a graph, accurate predictions can be obtained. These should in fact become exact for large graphs drawn from a broad range of random graph ensembles with arbitrary degree distributions where each input (node) is connected only to a finite number of others. Our approach is based on translating the appropriate belief propagation equations to the graph ensemble. We demonstrate the accuracy of the predictions for Poisson (Erdos-Renyi) and regular random graphs, and discuss when and why previous approximations of the learning curve fail. 1
7 0.58621055 254 nips-2010-Stability Approach to Regularization Selection (StARS) for High Dimensional Graphical Models
8 0.51645815 162 nips-2010-Link Discovery using Graph Feature Tracking
9 0.50466931 63 nips-2010-Distributed Dual Averaging In Networks
10 0.50103843 234 nips-2010-Segmentation as Maximum-Weight Independent Set
11 0.4695785 289 nips-2010-b-Bit Minwise Hashing for Estimating Three-Way Similarities
12 0.45741355 190 nips-2010-On the Convexity of Latent Social Network Inference
13 0.45310298 80 nips-2010-Estimation of Renyi Entropy and Mutual Information Based on Generalized Nearest-Neighbor Graphs
14 0.45023739 280 nips-2010-Unsupervised Kernel Dimension Reduction
15 0.42942885 150 nips-2010-Learning concept graphs from text with stick-breaking priors
16 0.41791049 250 nips-2010-Spectral Regularization for Support Estimation
17 0.41464475 230 nips-2010-Robust Clustering as Ensembles of Affinity Relations
18 0.41030973 108 nips-2010-Graph-Valued Regression
19 0.4045378 210 nips-2010-Practical Large-Scale Optimization for Max-norm Regularization
20 0.39983389 104 nips-2010-Generative Local Metric Learning for Nearest Neighbor Classification
topicId topicWeight
[(13, 0.045), (17, 0.016), (27, 0.046), (30, 0.038), (35, 0.023), (39, 0.266), (45, 0.19), (50, 0.029), (52, 0.037), (59, 0.014), (60, 0.053), (77, 0.064), (78, 0.029), (79, 0.015), (90, 0.026)]
simIndex simValue paperId paperTitle
1 0.7816202 130 nips-2010-Interval Estimation for Reinforcement-Learning Algorithms in Continuous-State Domains
Author: Martha White, Adam White
Abstract: The reinforcement learning community has explored many approaches to obtaining value estimates and models to guide decision making; these approaches, however, do not usually provide a measure of confidence in the estimate. Accurate estimates of an agent’s confidence are useful for many applications, such as biasing exploration and automatically adjusting parameters to reduce dependence on parameter-tuning. Computing confidence intervals on reinforcement learning value estimates, however, is challenging because data generated by the agentenvironment interaction rarely satisfies traditional assumptions. Samples of valueestimates are dependent, likely non-normally distributed and often limited, particularly in early learning when confidence estimates are pivotal. In this work, we investigate how to compute robust confidences for value estimates in continuous Markov decision processes. We illustrate how to use bootstrapping to compute confidence intervals online under a changing policy (previously not possible) and prove validity under a few reasonable assumptions. We demonstrate the applicability of our confidence estimation algorithms with experiments on exploration, parameter estimation and tracking. 1
same-paper 2 0.77833217 105 nips-2010-Getting lost in space: Large sample analysis of the resistance distance
Author: Ulrike V. Luxburg, Agnes Radl, Matthias Hein
Abstract: The commute distance between two vertices in a graph is the expected time it takes a random walk to travel from the first to the second vertex and back. We study the behavior of the commute distance as the size of the underlying graph increases. We prove that the commute distance converges to an expression that does not take into account the structure of the graph at all and that is completely meaningless as a distance function on the graph. Consequently, the use of the raw commute distance for machine learning purposes is strongly discouraged for large graphs and in high dimensions. As an alternative we introduce the amplified commute distance that corrects for the undesired large sample effects. 1
3 0.76854587 182 nips-2010-New Adaptive Algorithms for Online Classification
Author: Francesco Orabona, Koby Crammer
Abstract: We propose a general framework to online learning for classification problems with time-varying potential functions in the adversarial setting. This framework allows to design and prove relative mistake bounds for any generic loss function. The mistake bounds can be specialized for the hinge loss, allowing to recover and improve the bounds of known online classification algorithms. By optimizing the general bound we derive a new online classification algorithm, called NAROW, that hybridly uses adaptive- and fixed- second order information. We analyze the properties of the algorithm and illustrate its performance using synthetic dataset. 1
4 0.76250201 172 nips-2010-Multi-Stage Dantzig Selector
Author: Ji Liu, Peter Wonka, Jieping Ye
Abstract: We consider the following sparse signal recovery (or feature selection) problem: given a design matrix X ∈ Rn×m (m n) and a noisy observation vector y ∈ Rn satisfying y = Xβ ∗ + where is the noise vector following a Gaussian distribution N (0, σ 2 I), how to recover the signal (or parameter vector) β ∗ when the signal is sparse? The Dantzig selector has been proposed for sparse signal recovery with strong theoretical guarantees. In this paper, we propose a multi-stage Dantzig selector method, which iteratively refines the target signal β ∗ . We show that if X obeys a certain condition, then with a large probability the difference between the solution ˆ β estimated by the proposed method and the true solution β ∗ measured in terms of the lp norm (p ≥ 1) is bounded as ˆ β − β∗ p ≤ C(s − N )1/p log m + ∆ σ, where C is a constant, s is the number of nonzero entries in β ∗ , ∆ is independent of m and is much smaller than the first term, and N is the number of entries of √ β ∗ larger than a certain value in the order of O(σ log m). The proposed method improves the estimation bound of the standard Dantzig selector approximately √ √ from Cs1/p log mσ to C(s − N )1/p log mσ where the value N depends on the number of large entries in β ∗ . When N = s, the proposed algorithm achieves the oracle solution with a high probability. In addition, with a large probability, the proposed method can select the same number of correct features under a milder condition than the Dantzig selector. 1
5 0.74058181 193 nips-2010-Online Learning: Random Averages, Combinatorial Parameters, and Learnability
Author: Alexander Rakhlin, Karthik Sridharan, Ambuj Tewari
Abstract: We develop a theory of online learning by defining several complexity measures. Among them are analogues of Rademacher complexity, covering numbers and fatshattering dimension from statistical learning theory. Relationship among these complexity measures, their connection to online learning, and tools for bounding them are provided. We apply these results to various learning problems. We provide a complete characterization of online learnability in the supervised setting. 1
6 0.63596773 63 nips-2010-Distributed Dual Averaging In Networks
7 0.63301432 92 nips-2010-Fast global convergence rates of gradient methods for high-dimensional statistical recovery
8 0.63167942 31 nips-2010-An analysis on negative curvature induced by singularity in multi-layer neural-network learning
9 0.63107246 179 nips-2010-Natural Policy Gradient Methods with Parameter-based Exploration for Control Tasks
10 0.6309889 7 nips-2010-A Family of Penalty Functions for Structured Sparsity
11 0.63090378 265 nips-2010-The LASSO risk: asymptotic results and real world examples
12 0.63078171 10 nips-2010-A Novel Kernel for Learning a Neuron Model from Spike Train Data
13 0.6285277 70 nips-2010-Efficient Optimization for Discriminative Latent Class Models
14 0.62793183 238 nips-2010-Short-term memory in neuronal networks through dynamical compressed sensing
15 0.62714058 282 nips-2010-Variable margin losses for classifier design
16 0.6271078 196 nips-2010-Online Markov Decision Processes under Bandit Feedback
17 0.62702054 12 nips-2010-A Primal-Dual Algorithm for Group Sparse Regularization with Overlapping Groups
18 0.62680423 87 nips-2010-Extended Bayesian Information Criteria for Gaussian Graphical Models
19 0.62647301 66 nips-2010-Double Q-learning
20 0.62623763 117 nips-2010-Identifying graph-structured activation patterns in networks