nips nips2013 nips2013-289 knowledge-graph by maker-knowledge-mining

289 nips-2013-Scalable kernels for graphs with continuous attributes


Source: pdf

Author: Aasa Feragen, Niklas Kasenburg, Jens Petersen, Marleen de Bruijne, Karsten Borgwardt

Abstract: While graphs with continuous node attributes arise in many applications, stateof-the-art graph kernels for comparing continuous-attributed graphs suffer from a high runtime complexity. For instance, the popular shortest path kernel scales as O(n4 ), where n is the number of nodes. In this paper, we present a class of graph kernels with computational complexity O(n2 (m + log n + δ 2 + d)), where δ is the graph diameter, m is the number of edges, and d is the dimension of the node attributes. Due to the sparsity and small diameter of real-world graphs, these kernels typically scale comfortably to large graphs. In our experiments, the presented kernels outperform state-of-the-art kernels in terms of speed and accuracy on classification benchmark datasets. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Scalable kernels for graphs with continuous attributes Aasa Feragen, Niklas Kasenburg Machine Learning and Computational Biology Group Max Planck Institutes T¨ bingen and DIKU, University of Copenhagen u {aasa,niklas. [sent-1, score-0.608]

2 de Abstract While graphs with continuous node attributes arise in many applications, stateof-the-art graph kernels for comparing continuous-attributed graphs suffer from a high runtime complexity. [sent-7, score-1.238]

3 For instance, the popular shortest path kernel scales as O(n4 ), where n is the number of nodes. [sent-8, score-0.688]

4 In this paper, we present a class of graph kernels with computational complexity O(n2 (m + log n + δ 2 + d)), where δ is the graph diameter, m is the number of edges, and d is the dimension of the node attributes. [sent-9, score-0.786]

5 Due to the sparsity and small diameter of real-world graphs, these kernels typically scale comfortably to large graphs. [sent-10, score-0.392]

6 In our experiments, the presented kernels outperform state-of-the-art kernels in terms of speed and accuracy on classification benchmark datasets. [sent-11, score-0.604]

7 Comparing graphs to each other is a fundamental problem in learning on graphs, and graph kernels have become an efficient and widely-used method for measuring similarity between graphs. [sent-13, score-0.561]

8 Highly scalable graph kernels have been proposed for graphs with thousands and millions of nodes, both for graphs without node labels [1] and for graphs with discrete node labels [2]. [sent-14, score-1.457]

9 An open challenge, which is receiving increased attention, is to develop a scalable kernel on graphs with continuous-valued node attributes. [sent-17, score-0.619]

10 We present the GraphHopper kernel between graphs with real-valued edge lengths and any type of node attribute, including vectors. [sent-18, score-0.709]

11 This kernel is a convolution kernel counting sub-path similarities. [sent-19, score-0.46]

12 The computational complexity of this kernel is O(n2 (m + log n + δ 2 + d)), where n and m are the number of nodes and edges, respectively; δ is the graph diameter; and d is the dimension of the node attributes. [sent-20, score-0.694]

13 1 that our GraphHopper kernel tends to scale quadratically with the number of nodes on real data. [sent-23, score-0.328]

14 1 Related work Many popular kernels for structured data are sums of substructure kernels: k(G, G ) = ksub (s, s ). [sent-25, score-0.364]

15 A large variety of kernels exist for structures such as strings [4, 5], finite state transducers [6] and trees [5, 7]. [sent-28, score-0.358]

16 For graphs in general, kernels can be sorted into categories based on the types of attributes they can handle. [sent-29, score-0.578]

17 The graphlet kernel [1] compares unlabeled graphs, whereas several kernels allow node labels from a finite alphabet [2, 8]. [sent-30, score-0.868]

18 Unfortunately, this does not generalize to graphs with vector-valued node attributes, which are typically all distinct samples from an infinite alphabet. [sent-32, score-0.389]

19 The first kernel to take advantage of non-discrete node labels was the random walk kernel [9–11]. [sent-33, score-0.744]

20 It incorporates edge probabilities and geometric node attributes [12], but suffers from tottering [13] and is empirically slow. [sent-34, score-0.426]

21 Other kernels handling non-discrete attributes use edit-distance and subtree enumeration [15]. [sent-38, score-0.437]

22 While none of these kernels scale well to large graphs, the propagation kernel [16] is fast asymptotically and empirically. [sent-39, score-0.532]

23 It translates the problem of continuous-valued attributes to a problem of discrete-valued labels by hashing node attributes. [sent-40, score-0.48]

24 Nevertheless, its performance depends strongly on the hashing function and in our experiments it is outperformed in classification accuracy by kernels which do not discretize the attributes. [sent-41, score-0.386]

25 In problems where continuous-valued node attributes and inter-node distance dG (v, w) along the graph G are important features, the shortest path kernel [17], defined as kn (v, v ) · kl (dG (v, w), dG (v , w )) · kn (w, w ), kSP (G, G ) = v,w∈V v ,w ∈V performs well in classification. [sent-42, score-1.361]

26 In particular, kSP allows the user to choose any kernels kn and kl on nodes and shortest path length. [sent-43, score-0.944]

27 2 Our contribution In this paper we present a kernel which also compares shortest paths between node pairs from the two graphs, but with a different path kernel. [sent-46, score-1.02]

28 Instead of comparing paths via products of kernels on their lengths and endpoints, we compare paths through kernels on the nodes encountered while ”hopping” along shortest paths. [sent-47, score-1.241]

29 This particular path kernel allows us to decompose the graph kernel as a weighted sum of node kernels, initially suggesting a potential runtime as low as O(n2 d). [sent-48, score-1.138]

30 The graph structure is encoded in the node kernel weights, and the main algorithmic challenge becomes to efficiently compute these weights. [sent-49, score-0.596]

31 Note, moreover, that the GraphHopper kernel is parameter-free except for the choice of node kernels. [sent-51, score-0.478]

32 Section 3 presents experimental classification results on different datasets in comparison to state-of-the-art kernels as well as empirical runtime studies, before we conclude with a discussion of our findings in Section 4. [sent-54, score-0.478]

33 2 Graphs, paths and GraphHoppers We shall compare undirected graphs G = (V, E) with edge lengths l : E → R+ and node attributes A : V → X from a set X, which can be any set with a kernel kn ; in our data X = Rd . [sent-55, score-1.014]

34 Such subtrees inherit node attributes and edge lengths from G by restricting the attribute and length maps A and l to the new node and edge sets, respectively. [sent-58, score-0.866]

35 For a tree T = (V, E, r) with a root node r, let p(v) and c(v) denote the parent and the children of any v ∈ V . [sent-59, score-0.329]

36 Given nodes va , vb ∈ V , a path π from va to vb in G is defined as a sequence of nodes π = [v1 , v2 , v3 , . [sent-60, score-0.515]

37 , vn ] , where v1 = va , vn = vb and [vi , vi+1 ] ∈ E for all i = 1, . [sent-63, score-0.458]

38 Let π(i) = vi denote the ith node encountered when ”hopping” along the path. [sent-67, score-0.345]

39 Denote by l(π) the weighted length of π, given by the sum of lengths l(vi , vi+1 ) of edges traversed along the path, and denote by |π| the discrete length of π, defined as the number of nodes in π. [sent-69, score-0.359]

40 The shortest path πab from va to vb is defined in terms of weighted length; if no edge length function is given, set l(e) = 1 for all e ∈ E as default. [sent-70, score-0.629]

41 The diameter δ(G) of G is the maximal number of nodes in a shortest path in G, with respect to weighted path length. [sent-71, score-0.805]

42 In the next few lemmas we shall prove that for a fixed a source node v ∈ V , the directed edges along shortest paths from v to other nodes of G form a well-defined directed acyclic graph (DAG), that is, a directed graph with no cycles. [sent-72, score-1.199]

43 First of all, subpaths of shortest paths πvw with source node v are shortest paths as well: Lemma 1. [sent-73, score-1.014]

44 , vn ] is a shortest path from v1 = v to vn , then the path π1n (1 : i) consisting of the first i nodes of π1n is a shortest path from v1 = v to vi . [sent-78, score-1.595]

45 Given a source node v ∈ G, construct the directed graph Gv = (Vv , Ev ) consisting of all nodes Vv from the connected component of v in G and the set Ev of all directed edges found in any shortest path from v to any given node w in Gv . [sent-79, score-1.35]

46 Any directed walk from v in Gv is a shortest path in G: Lemma 2 If π1n is a shortest path from v1 = v to vn and (vn , vn+1 ) ∈ Ev , then [π1n , [vn , vn+1 ]] is a shortest path from v1 = v to vn+1 . [sent-80, score-1.617]

47 Since (vn , vn+1 ) ∈ Ev , there is a shortest path π1(n+1) = [v1 , . [sent-82, score-0.458]

48 If this path is shorter than [π1n , [vn , vn+1 ]], then π1(n+1) (1 : n) is a shortest path from v1 = v to vn by Lemma 1, and it must be shorter than π1n . [sent-86, score-0.806]

49 Proposition 3 The shortest path graph Gv is a DAG. [sent-88, score-0.576]

50 Using Lemma 2 repeatedly, we see that the path [πv1 , c] is a shortest path from v to vn = v1 , which is impossible since the new path must be longer than the shortest path πv1 . [sent-98, score-1.423]

51 (4) It is clear from the definition that k(G, G ) decomposes as a sum of node kernels: k(G, G ) = w(v, v )kn (v, v ), (5) v∈V v ∈V where w(v, v ) counts the number of times v and v appear at the same hop, or coordinate, i of shortest paths π, π of equal discrete length |π| = |π |. [sent-101, score-0.764]

52 Bottom middle and right: Recursive computation of the dv in a r v ˜ rooted tree as in Algorithm 2, and of the dv on a DAG Gv as in Algorithm 3. [sent-104, score-0.628]

53 ˜ v ˜ where M (v) is a δ × δ matrix whose entry [M (v)]ij counts how many times v appears at the ith coordinate of a shortest path in G of discrete length j, and δ = max{δ(G), δ(G )}. [sent-105, score-0.619]

54 More precisely, = number of times v appears as the ith node on a shortest path of discrete length j = v∈V number of times v appears as ith node on a shortest path from v ˜ ˜ of discrete length j = v∈V Dv (v, j − i + 1)Ov (v, i). [sent-106, score-1.656]

55 ˜ ˜ ˜ (6) Here Dv is a n × δ matrix whose (v, i)-coordinate counts the number of directed walks with i nodes ˜ starting at v in the shortest path DAG Gv . [sent-107, score-0.649]

56 Here, Vvj consists of the nodes v ∈ V for which the shortest ˜ paths πvv of highest discrete length have j nodes. [sent-111, score-0.575]

57 To compute the v th row of Dv , denoted dv , we draw inspiration from [19] where the vectors dv are ˜ v ˜ v ˜ computed easily for trees using a message-passing algorithm as follows. [sent-113, score-0.583]

58 Let T = (V, E, r) be a tree with a designated root node r. [sent-114, score-0.329]

59 The ith coefficient of dv counts the number of paths from v in T of r discrete length i, directed from the root. [sent-115, score-0.574]

60 Using ⊕, the dv can be expressed recursively: r dv = [1] r [0, dw ]. [sent-120, score-0.55]

61 r p(w)=v Algorithm 1 Message-passing algorithm for computing ov for all v, on Gv ˜ v ˜ ˜ 1: Initialize: ov = [1]; ov = [0] ∀ v ∈ V \ {˜}. [sent-121, score-0.528]

62 δ do 3: for v ∈ Vvj do ˜ 4: for (v, w) ∈ Ev do ˜ 5: ow = ow ⊕ [0, ov ] v ˜ v ˜ v ˜ 6: end for 7: end for 8: end for 4 (7) Algorithm 2 Recursive computation of dv for all v on T = (V, E, r). [sent-125, score-0.451]

63 r 2: for e = (v, c(v)) ∈ E do c(v) 3: dv = dv ⊕ [0, dr ] r r 4: end for Algorithm 3 Recursive computation of dv for all v on Gv ˜ v ˜ v 1: Initialize: dv = [1] ∀ v ∈ V . [sent-127, score-1.1]

64 ˜ 2: for e = (v, c(v)) ∈ EG do c(v) 3: dv = dv ⊕ [0, dv ] v ˜ v ˜ ˜ 4: end for The dv for all v ∈ V are computed recursively, sending counters along the edges from the leaf nodes r towards the root, recording the number of descendants of any node at any level, see Algorithm 2 and Figure 1. [sent-128, score-1.542]

65 The dv for all v ∈ V are computed in O(nh) time, where h is tree height, since each edge r passes exactly one message of size ≤ h. [sent-129, score-0.367]

66 Note that the DAG Gv generated by all shortest ˜ v ˜ paths from v ∈ V can be expanded into a rooted tree Sv by duplicating any node with several ˜ ˜ incoming edges, see Figure 1. [sent-131, score-0.709]

67 The tree Sv contains, as a path from the root v to one of the nodes ˜ ˜ labeled v in Sv , any shortest path from v to v in G. [sent-132, score-0.796]

68 However, the number of nodes in Sv could, in ˜ ˜ ˜ theory, be exponential in n, making computation of dv by message-passing on Sv intractable. [sent-133, score-0.373]

69 As on trees, the dv in Sv are given by ˜ ˜ ˜ v ˜ v ˜ dv = [1] ⊕ (w,v)∈Ev [0, dw ], where ⊕ is defined in (7). [sent-135, score-0.55]

70 2 Computational complexity analysis Given the w(v, v ) and the kn (v, v ) for all v ∈ V and v ∈ V , the kernel can be computed in O(n2 ) time. [sent-140, score-0.316]

71 If we assume that each node kernel kn (v, v ) can be computed in O(d) time (as is the case with many standard kernels including Gaussian and linear kernels), then all kn (v, v ) can be precomputed in O(n2 d) time. [sent-141, score-0.952]

72 When computing the kernel matrix Kij = k(Gi , Gj ) for a set {Gi }N of graphs with N > m + i=1 n + δ 2 , note that Algorithm 4 only needs to be run once for every graph Gi . [sent-147, score-0.489]

73 N2 3 Experiments Classification experiments were made with the proposed GraphHopper kernel and several alternatives: The propagation kernel PROP [16], the connected subgraph matching kernel CSM [14] and the shortest path kernel SP [17] all use continuous-valued attributes. [sent-149, score-1.403]

74 In addition, we benchmark against the Weisfeiler-Lehman kernel WL [2], which only uses discrete node attributes. [sent-150, score-0.524]

75 For the GraphHopper and SP kernels, shortest paths were computed using the BGL package [21] implemented in C++. [sent-154, score-0.383]

76 For PROP-diff, labels were propagated with the diffusion scheme, whereas in PROP-WL labels were first discretised via hashing and then the WL kernel [2] update was used. [sent-156, score-0.363]

77 ENZYMES comes with the task of classifying the enzymes to one out of 6 EC top-level classes, whereas PROTEINS comes with the task of classifying into enzymes and non-enzymes. [sent-169, score-0.412]

78 Each node represents an airway branch, attributed with its length. [sent-171, score-0.363]

79 SYNTHETIC is a set of synthetic graphs based on a random graph G with 100 nodes and 196 edges, whose nodes are endowed with normally distributed scalar attributes sampled from N (0, 1). [sent-174, score-0.626]

80 Two classes A and B each with 150 attributed graphs were generated from G by randomly rewiring edges and permuting node attributes. [sent-175, score-0.572]

81 Each graph in A was generated by rewiring 5 edges and permuting 10 node attributes, and each graph in B was generated by rewiring 10 edges and permuting 5 node attributes, after which noise from N (0, 0. [sent-176, score-1.034]

82 Both GraphHopper, SP and CSM depend on freely selected node kernels for continuous attributes, giving modeling flexibility. [sent-179, score-0.55]

83 For the ENZYMES, AIRWAYS and SYNTHETIC datasets, a Gaussian 2 node kernel kn (v, v ) = e−λ A(v)−A(v ) was used on the continuous-valued attribute, with λ = 1/d. [sent-180, score-0.564]

84 For the PROTEINS dataset, the node kernel was a product of a Gaussian kernel with λ = 1/d and a Dirac kernel on the continuous- and discrete-valued node attributes, respectively. [sent-181, score-1.186]

85 For the WL kernel, discrete node labels were used when available (in ENZYMES and PROTEINS); otherwise node degree was used as node label. [sent-182, score-0.826]

86 1 1 1966 980/986 SYNTHETIC 100 196 7 1 300 150/150 Table 1: Data statistics: Average node and edge counts and graph diameter, dataset and class sizes. [sent-191, score-0.448]

87 For each kernel and dataset, runtime is given in parentheses in Table 2. [sent-244, score-0.383]

88 In the top left panel, average kernel evaluation runtime was measured on datasets of 10 random m graphs with 10, 20, 30, . [sent-248, score-0.547]

89 Development of both average kernel evaluation runtime and graph diameter is shown. [sent-260, score-0.591]

90 In the bottom panels, the relationship between runtime and graph diameter is shown on subsets of 100 and 200 of the real AIRWAYS and PROTEINS datasets, respectively, for each diameter. [sent-261, score-0.361]

91 The CSM kernel [14] has asymptotic runtime O(knk+1 ), where k is a parameter bounding the size of subgraphs considered by the kernel, and thus in order to study subgraphs of relevant size, its runtime will be at least as high as the shortest path kernel. [sent-266, score-1.058]

92 Moreover, the CSM kernel requires the computation of a product graph which, for graphs with hundreds of nodes, can cause memory problems, which we also find in our experiments. [sent-267, score-0.516]

93 The PROP kernel is fast; however, the reason for the computational efficiency of PROP is that it is not really a kernel for continuous valued features – it is a kernel for discrete features combined with a hashing scheme to discretize continuous-valued features. [sent-268, score-0.82]

94 In our experiments, these hashing schemes do not prove powerful enough to compete in classification accuracy with the kernels that really do use the continuous-valued features. [sent-269, score-0.363]

95 continuous and discrete node features gives equal classification performance as the more efficient WL kernel using only discrete attributes. [sent-295, score-0.57]

96 1 that the GraphHopper kernel has asymptotic runtime O(n2 (d+m+log n+ δ 2 )), and that the average runtime for one kernel evaluation in a Gram matrix is O(n2 d) when the number of graphs exceeds m + n + δ 2 . [sent-297, score-0.907]

97 4 Conclusion We have defined the GraphHopper kernel for graphs with any type of node attributes, presented an efficient algorithm for computing it, and demonstrated that it outperforms state-of-the-art graph kernels on real and synthetic data in terms of classification accuracy and/or speed. [sent-306, score-1.075]

98 The kernels are able to take advantage of any kind of node attributes, as they can integrate any user-defined node kernel. [sent-307, score-0.798]

99 Moreover, the kernel is parameter-free except for the node kernels. [sent-308, score-0.478]

100 This kernel opens the door to new application domains such as computer vision or medical imaging, in which kernels that work solely on graphs with discrete attributes were too restrictive so far. [sent-309, score-0.854]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('kernels', 0.302), ('shortest', 0.299), ('dv', 0.275), ('gv', 0.25), ('node', 0.248), ('kernel', 0.23), ('airways', 0.216), ('vn', 0.189), ('enzymes', 0.183), ('graphhopper', 0.183), ('ov', 0.176), ('csm', 0.166), ('path', 0.159), ('runtime', 0.153), ('proteins', 0.149), ('graphs', 0.141), ('attributes', 0.135), ('graph', 0.118), ('ev', 0.1), ('nodes', 0.098), ('wl', 0.097), ('sv', 0.093), ('diameter', 0.09), ('kn', 0.086), ('paths', 0.084), ('airway', 0.083), ('prop', 0.083), ('dag', 0.083), ('edges', 0.072), ('dirksen', 0.067), ('hashing', 0.061), ('attribute', 0.054), ('vishwanathan', 0.054), ('directed', 0.054), ('borgwardt', 0.052), ('kriege', 0.05), ('ksp', 0.05), ('pedersen', 0.05), ('shervashidze', 0.05), ('vv', 0.05), ('tree', 0.049), ('length', 0.048), ('sp', 0.047), ('lengths', 0.047), ('discrete', 0.046), ('vi', 0.044), ('rewiring', 0.044), ('hopping', 0.044), ('danish', 0.044), ('va', 0.043), ('edge', 0.043), ('mv', 0.042), ('kp', 0.041), ('counts', 0.039), ('vb', 0.037), ('classi', 0.037), ('labels', 0.036), ('synthetic', 0.036), ('permuting', 0.035), ('dg', 0.033), ('brenda', 0.033), ('chemoinformatics', 0.033), ('copd', 0.033), ('diku', 0.033), ('dobson', 0.033), ('feragen', 0.033), ('ipmi', 0.033), ('ksub', 0.033), ('mah', 0.033), ('mehlhorn', 0.033), ('vvj', 0.033), ('trees', 0.033), ('root', 0.032), ('subgraphs', 0.032), ('attributed', 0.032), ('bingen', 0.03), ('copenhagen', 0.029), ('institutes', 0.029), ('substructure', 0.029), ('graphlet', 0.029), ('rooted', 0.029), ('ith', 0.028), ('enzyme', 0.027), ('alfried', 0.027), ('krupp', 0.027), ('memory', 0.027), ('petersen', 0.025), ('karsten', 0.025), ('lncs', 0.025), ('subgraph', 0.025), ('rarely', 0.025), ('encountered', 0.025), ('recursive', 0.025), ('descendants', 0.024), ('datasets', 0.023), ('classifying', 0.023), ('discretize', 0.023), ('strings', 0.023), ('alphabet', 0.023), ('accuracies', 0.023)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999923 289 nips-2013-Scalable kernels for graphs with continuous attributes

Author: Aasa Feragen, Niklas Kasenburg, Jens Petersen, Marleen de Bruijne, Karsten Borgwardt

Abstract: While graphs with continuous node attributes arise in many applications, stateof-the-art graph kernels for comparing continuous-attributed graphs suffer from a high runtime complexity. For instance, the popular shortest path kernel scales as O(n4 ), where n is the number of nodes. In this paper, we present a class of graph kernels with computational complexity O(n2 (m + log n + δ 2 + d)), where δ is the graph diameter, m is the number of edges, and d is the dimension of the node attributes. Due to the sparsity and small diameter of real-world graphs, these kernels typically scale comfortably to large graphs. In our experiments, the presented kernels outperform state-of-the-art kernels in terms of speed and accuracy on classification benchmark datasets. 1

2 0.18106702 227 nips-2013-Online Learning in Markov Decision Processes with Adversarially Chosen Transition Probability Distributions

Author: Yasin Abbasi, Peter Bartlett, Varun Kanade, Yevgeny Seldin, Csaba Szepesvari

Abstract: We study the problem of online learning Markov Decision Processes (MDPs) when both the transition distributions and loss functions are chosen by an adversary. We present an algorithm that, under a mixing assumption, achieves p O( T log |⇧| + log |⇧|) regret with respect to a comparison set of policies ⇧. The regret is independent of the size of the state and action spaces. When expectations over sample paths can be computed efficiently and the comparison set ⇧ has polynomial size, this algorithm is efficient. We also consider the episodic adversarial online shortest path problem. Here, in each episode an adversary may choose a weighted directed acyclic graph with an identified start and finish node. The goal of the learning algorithm is to choose a path that minimizes the loss while traversing from the start to finish node. At the end of each episode the loss function (given by weights on the edges) is revealed to the learning algorithm. The goal is to minimize regret with respect to a fixed policy for selecting paths. This problem is a special case of the online MDP problem. It was shown that for randomly chosen graphs and adversarial losses, the problem can be efficiently solved. We show that it also can be efficiently solved for adversarial graphs and randomly chosen losses. When both graphs and losses are adversarially chosen, we show that designing efficient algorithms for the adversarial online shortest path problem (and hence for the adversarial MDP problem) is as hard as learning parity with noise, a notoriously difficult problem that has been used to design efficient cryptographic schemes. Finally, we present an efficient algorithm whose regret scales linearly with the number of distinct graphs. 1

3 0.15817037 156 nips-2013-Learning Kernels Using Local Rademacher Complexity

Author: Corinna Cortes, Marius Kloft, Mehryar Mohri

Abstract: We use the notion of local Rademacher complexity to design new algorithms for learning kernels. Our algorithms thereby benefit from the sharper learning bounds based on that notion which, under certain general conditions, guarantee a faster convergence rate. We devise two new learning kernel algorithms: one based on a convex optimization problem for which we give an efficient solution using existing learning kernel techniques, and another one that can be formulated as a DC-programming problem for which we describe a solution in detail. We also report the results of experiments with both algorithms in both binary and multi-class classification tasks. 1

4 0.1568778 87 nips-2013-Density estimation from unweighted k-nearest neighbor graphs: a roadmap

Author: Ulrike Von Luxburg, Morteza Alamgir

Abstract: Consider an unweighted k-nearest neighbor graph on n points that have been sampled i.i.d. from some unknown density p on Rd . We prove how one can estimate the density p just from the unweighted adjacency matrix of the graph, without knowing the points themselves or any distance or similarity scores. The key insights are that local differences in link numbers can be used to estimate a local function of the gradient of p, and that integrating this function along shortest paths leads to an estimate of the underlying density. 1

5 0.1161232 288 nips-2013-Scalable Influence Estimation in Continuous-Time Diffusion Networks

Author: Nan Du, Le Song, Manuel Gomez-Rodriguez, Hongyuan Zha

Abstract: If a piece of information is released from a media site, can we predict whether it may spread to one million web pages, in a month ? This influence estimation problem is very challenging since both the time-sensitive nature of the task and the requirement of scalability need to be addressed simultaneously. In this paper, we propose a randomized algorithm for influence estimation in continuous-time diffusion networks. Our algorithm can estimate the influence of every node in a network with |V| nodes and |E| edges to an accuracy of using n = O(1/ 2 ) randomizations and up to logarithmic factors O(n|E|+n|V|) computations. When used as a subroutine in a greedy influence maximization approach, our proposed algorithm is guaranteed to find a set of C nodes with the influence of at least (1 − 1/e) OPT −2C , where OPT is the optimal value. Experiments on both synthetic and real-world data show that the proposed algorithm can easily scale up to networks of millions of nodes while significantly improves over previous state-of-the-arts in terms of the accuracy of the estimated influence and the quality of the selected nodes in maximizing the influence. 1

6 0.10953279 324 nips-2013-The Fast Convergence of Incremental PCA

7 0.10703218 281 nips-2013-Robust Low Rank Kernel Embeddings of Multivariate Distributions

8 0.10403206 66 nips-2013-Computing the Stationary Distribution Locally

9 0.10147551 282 nips-2013-Robust Multimodal Graph Matching: Sparse Coding Meets Graph Matching

10 0.097395048 151 nips-2013-Learning Chordal Markov Networks by Constraint Satisfaction

11 0.091452919 144 nips-2013-Inverse Density as an Inverse Problem: the Fredholm Equation Approach

12 0.091305099 7 nips-2013-A Gang of Bandits

13 0.090892494 44 nips-2013-B-test: A Non-parametric, Low Variance Kernel Two-sample Test

14 0.089334808 242 nips-2013-PAC-Bayes-Empirical-Bernstein Inequality

15 0.08803051 82 nips-2013-Decision Jungles: Compact and Rich Models for Classification

16 0.085707821 216 nips-2013-On Flat versus Hierarchical Classification in Large-Scale Taxonomies

17 0.085097976 272 nips-2013-Regularized Spectral Clustering under the Degree-Corrected Stochastic Blockmodel

18 0.084740035 207 nips-2013-Near-optimal Anomaly Detection in Graphs using Lovasz Extended Scan Statistic

19 0.083498813 196 nips-2013-Modeling Overlapping Communities with Node Popularities

20 0.081785195 40 nips-2013-Approximate Inference in Continuous Determinantal Processes


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.165), (1, 0.04), (2, -0.005), (3, -0.01), (4, 0.09), (5, 0.072), (6, 0.067), (7, -0.08), (8, -0.061), (9, 0.043), (10, -0.017), (11, -0.169), (12, 0.173), (13, -0.126), (14, 0.225), (15, 0.033), (16, 0.025), (17, 0.089), (18, -0.089), (19, -0.106), (20, 0.136), (21, -0.071), (22, 0.05), (23, -0.012), (24, -0.045), (25, 0.017), (26, 0.064), (27, 0.107), (28, 0.03), (29, -0.026), (30, -0.02), (31, 0.043), (32, -0.085), (33, 0.042), (34, -0.016), (35, -0.004), (36, -0.004), (37, -0.02), (38, -0.115), (39, -0.141), (40, 0.046), (41, 0.032), (42, 0.165), (43, 0.009), (44, 0.024), (45, -0.003), (46, 0.038), (47, 0.05), (48, 0.074), (49, -0.046)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.97777706 289 nips-2013-Scalable kernels for graphs with continuous attributes

Author: Aasa Feragen, Niklas Kasenburg, Jens Petersen, Marleen de Bruijne, Karsten Borgwardt

Abstract: While graphs with continuous node attributes arise in many applications, stateof-the-art graph kernels for comparing continuous-attributed graphs suffer from a high runtime complexity. For instance, the popular shortest path kernel scales as O(n4 ), where n is the number of nodes. In this paper, we present a class of graph kernels with computational complexity O(n2 (m + log n + δ 2 + d)), where δ is the graph diameter, m is the number of edges, and d is the dimension of the node attributes. Due to the sparsity and small diameter of real-world graphs, these kernels typically scale comfortably to large graphs. In our experiments, the presented kernels outperform state-of-the-art kernels in terms of speed and accuracy on classification benchmark datasets. 1

2 0.58458817 182 nips-2013-Manifold-based Similarity Adaptation for Label Propagation

Author: Masayuki Karasuyama, Hiroshi Mamitsuka

Abstract: Label propagation is one of the state-of-the-art methods for semi-supervised learning, which estimates labels by propagating label information through a graph. Label propagation assumes that data points (nodes) connected in a graph should have similar labels. Consequently, the label estimation heavily depends on edge weights in a graph which represent similarity of each node pair. We propose a method for a graph to capture the manifold structure of input features using edge weights parameterized by a similarity function. In this approach, edge weights represent both similarity and local reconstruction weight simultaneously, both being reasonable for label propagation. For further justification, we provide analytical considerations including an interpretation as a cross-validation of a propagation model in the feature space, and an error analysis based on a low dimensional manifold model. Experimental results demonstrated the effectiveness of our approach both in synthetic and real datasets. 1

3 0.58243585 288 nips-2013-Scalable Influence Estimation in Continuous-Time Diffusion Networks

Author: Nan Du, Le Song, Manuel Gomez-Rodriguez, Hongyuan Zha

Abstract: If a piece of information is released from a media site, can we predict whether it may spread to one million web pages, in a month ? This influence estimation problem is very challenging since both the time-sensitive nature of the task and the requirement of scalability need to be addressed simultaneously. In this paper, we propose a randomized algorithm for influence estimation in continuous-time diffusion networks. Our algorithm can estimate the influence of every node in a network with |V| nodes and |E| edges to an accuracy of using n = O(1/ 2 ) randomizations and up to logarithmic factors O(n|E|+n|V|) computations. When used as a subroutine in a greedy influence maximization approach, our proposed algorithm is guaranteed to find a set of C nodes with the influence of at least (1 − 1/e) OPT −2C , where OPT is the optimal value. Experiments on both synthetic and real-world data show that the proposed algorithm can easily scale up to networks of millions of nodes while significantly improves over previous state-of-the-arts in terms of the accuracy of the estimated influence and the quality of the selected nodes in maximizing the influence. 1

4 0.58010292 156 nips-2013-Learning Kernels Using Local Rademacher Complexity

Author: Corinna Cortes, Marius Kloft, Mehryar Mohri

Abstract: We use the notion of local Rademacher complexity to design new algorithms for learning kernels. Our algorithms thereby benefit from the sharper learning bounds based on that notion which, under certain general conditions, guarantee a faster convergence rate. We devise two new learning kernel algorithms: one based on a convex optimization problem for which we give an efficient solution using existing learning kernel techniques, and another one that can be formulated as a DC-programming problem for which we describe a solution in detail. We also report the results of experiments with both algorithms in both binary and multi-class classification tasks. 1

5 0.57043016 87 nips-2013-Density estimation from unweighted k-nearest neighbor graphs: a roadmap

Author: Ulrike Von Luxburg, Morteza Alamgir

Abstract: Consider an unweighted k-nearest neighbor graph on n points that have been sampled i.i.d. from some unknown density p on Rd . We prove how one can estimate the density p just from the unweighted adjacency matrix of the graph, without knowing the points themselves or any distance or similarity scores. The key insights are that local differences in link numbers can be used to estimate a local function of the gradient of p, and that integrating this function along shortest paths leads to an estimate of the underlying density. 1

6 0.52626944 151 nips-2013-Learning Chordal Markov Networks by Constraint Satisfaction

7 0.51873797 7 nips-2013-A Gang of Bandits

8 0.50240844 310 nips-2013-Statistical analysis of coupled time series with Kernel Cross-Spectral Density operators.

9 0.49735782 272 nips-2013-Regularized Spectral Clustering under the Degree-Corrected Stochastic Blockmodel

10 0.47409233 44 nips-2013-B-test: A Non-parametric, Low Variance Kernel Two-sample Test

11 0.46014896 207 nips-2013-Near-optimal Anomaly Detection in Graphs using Lovasz Extended Scan Statistic

12 0.44463459 291 nips-2013-Sensor Selection in High-Dimensional Gaussian Trees with Nuisances

13 0.44235051 82 nips-2013-Decision Jungles: Compact and Rich Models for Classification

14 0.44123685 66 nips-2013-Computing the Stationary Distribution Locally

15 0.43605658 359 nips-2013-Σ-Optimality for Active Learning on Gaussian Random Fields

16 0.42084709 144 nips-2013-Inverse Density as an Inverse Problem: the Fredholm Equation Approach

17 0.42053655 281 nips-2013-Robust Low Rank Kernel Embeddings of Multivariate Distributions

18 0.41798508 216 nips-2013-On Flat versus Hierarchical Classification in Large-Scale Taxonomies

19 0.41719493 196 nips-2013-Modeling Overlapping Communities with Node Popularities

20 0.41498151 9 nips-2013-A Kernel Test for Three-Variable Interactions


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(2, 0.013), (16, 0.069), (33, 0.075), (34, 0.087), (41, 0.029), (49, 0.029), (56, 0.104), (60, 0.334), (70, 0.019), (85, 0.081), (89, 0.033), (93, 0.026), (99, 0.01)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.76370293 289 nips-2013-Scalable kernels for graphs with continuous attributes

Author: Aasa Feragen, Niklas Kasenburg, Jens Petersen, Marleen de Bruijne, Karsten Borgwardt

Abstract: While graphs with continuous node attributes arise in many applications, stateof-the-art graph kernels for comparing continuous-attributed graphs suffer from a high runtime complexity. For instance, the popular shortest path kernel scales as O(n4 ), where n is the number of nodes. In this paper, we present a class of graph kernels with computational complexity O(n2 (m + log n + δ 2 + d)), where δ is the graph diameter, m is the number of edges, and d is the dimension of the node attributes. Due to the sparsity and small diameter of real-world graphs, these kernels typically scale comfortably to large graphs. In our experiments, the presented kernels outperform state-of-the-art kernels in terms of speed and accuracy on classification benchmark datasets. 1

2 0.58238554 287 nips-2013-Scalable Inference for Logistic-Normal Topic Models

Author: Jianfei Chen, June Zhu, Zi Wang, Xun Zheng, Bo Zhang

Abstract: Logistic-normal topic models can effectively discover correlation structures among latent topics. However, their inference remains a challenge because of the non-conjugacy between the logistic-normal prior and multinomial topic mixing proportions. Existing algorithms either make restricting mean-field assumptions or are not scalable to large-scale applications. This paper presents a partially collapsed Gibbs sampling algorithm that approaches the provably correct distribution by exploring the ideas of data augmentation. To improve time efficiency, we further present a parallel implementation that can deal with large-scale applications and learn the correlation structures of thousands of topics from millions of documents. Extensive empirical results demonstrate the promise. 1

3 0.57863039 228 nips-2013-Online Learning of Dynamic Parameters in Social Networks

Author: Shahin Shahrampour, Sasha Rakhlin, Ali Jadbabaie

Abstract: This paper addresses the problem of online learning in a dynamic setting. We consider a social network in which each individual observes a private signal about the underlying state of the world and communicates with her neighbors at each time period. Unlike many existing approaches, the underlying state is dynamic, and evolves according to a geometric random walk. We view the scenario as an optimization problem where agents aim to learn the true state while suffering the smallest possible loss. Based on the decomposition of the global loss function, we introduce two update mechanisms, each of which generates an estimate of the true state. We establish a tight bound on the rate of change of the underlying state, under which individuals can track the parameter with a bounded variance. Then, we characterize explicit expressions for the steady state mean-square deviation(MSD) of the estimates from the truth, per individual. We observe that only one of the estimators recovers the optimal MSD, which underscores the impact of the objective function decomposition on the learning quality. Finally, we provide an upper bound on the regret of the proposed methods, measured as an average of errors in estimating the parameter in a finite time. 1

4 0.50852931 173 nips-2013-Least Informative Dimensions

Author: Fabian Sinz, Anna Stockl, January Grewe, January Benda

Abstract: We present a novel non-parametric method for finding a subspace of stimulus features that contains all information about the response of a system. Our method generalizes similar approaches to this problem such as spike triggered average, spike triggered covariance, or maximally informative dimensions. Instead of maximizing the mutual information between features and responses directly, we use integral probability metrics in kernel Hilbert spaces to minimize the information between uninformative features and the combination of informative features and responses. Since estimators of these metrics access the data via kernels, are easy to compute, and exhibit good theoretical convergence properties, our method can easily be generalized to populations of neurons or spike patterns. By using a particular expansion of the mutual information, we can show that the informative features must contain all information if we can make the uninformative features independent of the rest. 1

5 0.47038364 104 nips-2013-Efficient Online Inference for Bayesian Nonparametric Relational Models

Author: Dae Il Kim, Prem Gopalan, David Blei, Erik Sudderth

Abstract: Stochastic block models characterize observed network relationships via latent community memberships. In large social networks, we expect entities to participate in multiple communities, and the number of communities to grow with the network size. We introduce a new model for these phenomena, the hierarchical Dirichlet process relational model, which allows nodes to have mixed membership in an unbounded set of communities. To allow scalable learning, we derive an online stochastic variational inference algorithm. Focusing on assortative models of undirected networks, we also propose an efficient structured mean field variational bound, and online methods for automatically pruning unused communities. Compared to state-of-the-art online learning methods for parametric relational models, we show significantly improved perplexity and link prediction accuracy for sparse networks with tens of thousands of nodes. We also showcase an analysis of LittleSis, a large network of who-knows-who at the heights of business and government. 1

6 0.46972057 332 nips-2013-Tracking Time-varying Graphical Structure

7 0.46817601 232 nips-2013-Online PCA for Contaminated Data

8 0.46788839 79 nips-2013-DESPOT: Online POMDP Planning with Regularization

9 0.4606334 140 nips-2013-Improved and Generalized Upper Bounds on the Complexity of Policy Iteration

10 0.46004474 238 nips-2013-Optimistic Concurrency Control for Distributed Unsupervised Learning

11 0.45978931 7 nips-2013-A Gang of Bandits

12 0.45908058 357 nips-2013-k-Prototype Learning for 3D Rigid Structures

13 0.45770296 125 nips-2013-From Bandits to Experts: A Tale of Domination and Independence

14 0.45754945 196 nips-2013-Modeling Overlapping Communities with Node Popularities

15 0.45692092 252 nips-2013-Predictive PAC Learning and Process Decompositions

16 0.4564881 355 nips-2013-Which Space Partitioning Tree to Use for Search?

17 0.45596868 282 nips-2013-Robust Multimodal Graph Matching: Sparse Coding Meets Graph Matching

18 0.45589364 116 nips-2013-Fantope Projection and Selection: A near-optimal convex relaxation of sparse PCA

19 0.45482841 204 nips-2013-Multiscale Dictionary Learning for Estimating Conditional Distributions

20 0.454108 278 nips-2013-Reward Mapping for Transfer in Long-Lived Agents