nips nips2011 nips2011-242 knowledge-graph by maker-knowledge-mining

242 nips-2011-See the Tree Through the Lines: The Shazoo Algorithm


Source: pdf

Author: Fabio Vitale, Nicolò Cesa-bianchi, Claudio Gentile, Giovanni Zappella

Abstract: Predicting the nodes of a given graph is a fascinating theoretical problem with applications in several domains. Since graph sparsification via spanning trees retains enough information while making the task much easier, trees are an important special case of this problem. Although it is known how to predict the nodes of an unweighted tree in a nearly optimal way, in the weighted case a fully satisfactory algorithm is not available yet. We fill this hole and introduce an efficient node predictor, S HAZOO, which is nearly optimal on any weighted tree. Moreover, we show that S HAZOO can be viewed as a common nontrivial generalization of both previous approaches for unweighted trees and weighted lines. Experiments on real-world datasets confirm that S HAZOO performs well in that it fully exploits the structure of the input tree, and gets very close to (and sometimes better than) less scalable energy minimization methods. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Since graph sparsification via spanning trees retains enough information while making the task much easier, trees are an important special case of this problem. [sent-12, score-0.432]

2 Although it is known how to predict the nodes of an unweighted tree in a nearly optimal way, in the weighted case a fully satisfactory algorithm is not available yet. [sent-13, score-0.533]

3 We fill this hole and introduce an efficient node predictor, S HAZOO, which is nearly optimal on any weighted tree. [sent-14, score-0.311]

4 Moreover, we show that S HAZOO can be viewed as a common nontrivial generalization of both previous approaches for unweighted trees and weighted lines. [sent-15, score-0.36]

5 In this work we view networked data as weighted graphs, and focus on the task of node classification in the transductive setting, i. [sent-18, score-0.356]

6 Inspired by [5, 6], this paper reduces the problem of node classification from graphs to trees by extracting suitable spanning trees of the graph, which can be done quickly in many cases. [sent-26, score-0.584]

7 The advantage of performing this reduction is that node prediction is much easier on trees than on graphs. [sent-27, score-0.318]

8 Yet, the current results in node classification on trees are not satisfactory. [sent-29, score-0.28]

9 In fact, WTA can still be applied to weighted trees by exploiting an idea contained in [9]. [sent-33, score-0.25]

10 This theoretical drawback is also confirmed by empirical performance: throwing away the tree structure negatively affects the practical behavior of the algorithm on real-world weighted graphs. [sent-39, score-0.254]

11 The importance of weighted graphs, as opposed to unweighted ones, is suggested by many practical scenarios where the nodes carry more information than just labels, e. [sent-40, score-0.379]

12 In this work, we bridge the gap between the weighted and unweighted cases by proposing a new prediction strategy, called S HAZOO, achieving a mistake bound that depends on the detailed structure of the weighted tree. [sent-44, score-0.54]

13 More precisely, we measure the regularity of the unknown node labeling via the weighted cutsize induced by the labeling on the tree (see Section 3 for a precise definition). [sent-46, score-0.633]

14 This replaces the unweighted cutsize that was used in the analysis of WTA. [sent-47, score-0.258]

15 When the weighted cutsize is used, a cut edge violates this inductive bias in proportion to its weight. [sent-48, score-0.363]

16 This modified bias does not prevent a fair comparison between the old algorithms and the new one: S HAZOO specializes to T REE O PT in the unweighted case, and to WTA when the input tree is a weighted line. [sent-49, score-0.4]

17 By specializing S HAZOO’s analysis to the unweighted case we recover T REE O PT’s optimal mistake bound. [sent-50, score-0.251]

18 When the input tree is a weighted line, we recover WTA’s mistake bound expressed through the weighted cutsize instead of the unweighted one. [sent-51, score-0.765]

19 Recall that label propagation has a running time per prediction which is proportional to |E|, where E is the graph edge set. [sent-64, score-0.229]

20 On the contrary, S HAZOO can typically be run in constant amortized time per prediction by using Wilson’s algorithm for sampling random spanning trees [17]. [sent-65, score-0.329]

21 By disregarding edge weights in the initial sampling phase, this algorithm is able to draw a random (unweighted) spanning tree in time proportional to |V | on most graphs. [sent-66, score-0.36]

22 2 Preliminaries and basic notation Let T = (V, E, W ) be an undirected and weighted tree with |V | = n nodes, positive edge weights Wi,j > 0 for (i, j) ∈ E, and Wi,j = 0 for (i, j) ∈ E. [sent-68, score-0.316]

23 , n node it is presented and the learner must issue a prediction yit ∈ {−1, +1} for the label yit . [sent-83, score-0.48]

24 Then yit is revealed and the learner knows whether a mistake occurred. [sent-84, score-0.267]

25 The overall amount of irregularity in a labeled tree (T, y) is the weighted cutsize ΦW = (i,j)∈E φ Wi,j , where E φ ⊆ E is the subset of φ-edges in the tree. [sent-87, score-0.435]

26 We use the weighted cutsize as our learning bias, that is, we want to design algorithms whose predictive performance scales with ΦW . [sent-88, score-0.251]

27 Unlike the φ-edge count Φ = |E φ |, which is a good measure of regularity for unweighted graphs, the weighted cutsize takes 2 the edge weight Wi,j into account when measuring the irregularity of a φ-edge (i, j). [sent-89, score-0.498]

28 In the sequel, when we measure the distance between any pair of nodes i and j on the input tree T we always use the resistance distance metric d, that is, d(i, j) = (r,s)∈π(i,j) W1 , where π(i, j) is the unique r,s path connecting i to j. [sent-90, score-0.312]

29 3 A lower bound for weighted trees In this section we show that the weighted cutsize can be used as a lower bound on the number of online mistakes made by any algorithm on any tree. [sent-91, score-0.612]

30 Theorem 1 For any weighted tree T = (V, E, W ) there exists a randomized label assignment to V such that any algorithm can be forced to make at least ξ(M )/2 online mistakes in expectation, while ΦW ≤ M . [sent-97, score-0.38]

31 In the next section we describe an algorithm whose mistake bound nearly matches the above lower bound on any weighted tree when using ξ(ΦW ) as the measure of label regularity. [sent-104, score-0.494]

32 4 The Shazoo algorithm In this section we introduce the S HAZOO algorithm, and relate it to previously proposed methods for online prediction on unweighted trees (T REE O PT from [5]) and weighted line graphs (WTA from [6]). [sent-105, score-0.513]

33 In fact, S HAZOO is optimal on any weighted tree, and reduces to T REE O PT on unweighted trees and to WTA on weighted line graphs. [sent-106, score-0.512]

34 Since T REE O PT and WTA are optimal on any unweighted tree and any weighted line graph, respectively, S HAZOO necessarily contains elements of both of these algorithms. [sent-107, score-0.413]

35 First, we call revealed a node whose label has already been observed by the online learner; otherwise, a node is unrevealed. [sent-111, score-0.473]

36 A fork is any unrevealed node connected to at least three different revealed nodes by edge-disjoint paths. [sent-112, score-0.508]

37 A hinge node is either a revealed node or a fork. [sent-113, score-0.505]

38 A hinge tree is any component of the forest obtained by removing from T all edges incident to hinge nodes; hence any fork or labeled node forms a 1-node hinge tree. [sent-114, score-0.9]

39 When a hinge tree H contains only one hinge node, a connection node for H is the node contained in H. [sent-115, score-0.792]

40 In all other cases, we call a connection node for H any node outside H which is adjacent to a node in H. [sent-116, score-0.562]

41 A connection fork is a connection node which is also a fork. [sent-117, score-0.409]

42 Finally, a hinge line is any path connecting two hinge nodes such that no internal node is a hinge node. [sent-118, score-0.702]

43 Given an unrevealed node i and a label value y ∈ {−1, +1}, the cut function cut(i, y) is the value of the minimum weighted cutsize of T over all labelings y ∈ {−1, +1}n consistent with the labels seen so far and such that yi = y. [sent-119, score-0.641]

44 At time t, in order to predict the label yit of node it , S HAZOO calculates ∆(i) for all connection nodes i of H(it ), where H(it ) is the hinge tree containing it . [sent-122, score-0.762]

45 Then the algorithm predicts yit using the label of the connection node i of H(it ) which is closest to it and such that ∆(i) = 0 (recall from Section 2 that all distances/lengths are measured using the resistance metric). [sent-123, score-0.467]

46 Revealed nodes are dark grey, forks are doubly circled, and hinge lines have thick black edges. [sent-126, score-0.337]

47 Middle: The predictions of S HAZOO on the nodes of a hinge tree. [sent-131, score-0.254]

48 At a given time t, S HAZOO uses the value of ∆ on the two hinge nodes (the doubly circled ones, which are also forks in this case), and is required to issue a prediction on node it (the black node in this figure). [sent-133, score-0.732]

49 Since it is between a positive ∆ hinge node and a negative ∆ hinge node, S HAZOO goes with the one which is closer in resistance distance, hence predicting yit = −1. [sent-134, score-0.598]

50 If it is a fork (which is also a hinge node), then H(it ) = {it }. [sent-140, score-0.254]

51 In this case, it is a connection node of H(it ), and obviously the one closest to itself. [sent-141, score-0.245]

52 On the other hand, predicting with the label of the connection node closest to it in resistance distance is reminiscent of the nearest-neighbor prediction of WTA on weighted line graphs [6]. [sent-148, score-0.628]

53 The T REE O PT algorithm, instead, performs better when the unweighted input tree is very different from a line graph (more precisely, when the input tree cannot be decomposed into long edge-disjoint paths, e. [sent-152, score-0.491]

54 Similar to T REE O PT, S HAZOO does not linearize the input tree and extends to the weighted case T REE O PT’s superior performance, also confirmed by the experimental comparison reported in Section 6. [sent-156, score-0.254]

55 Since ∆ predicts a fork it with the label that minimizes the weighted cutsize of T consistent with the revealed labels, one may wonder whether computing ∆ through mincut based on the number of φ-edges (rather than their weighted sum) could be an effective prediction strategy. [sent-158, score-0.722]

56 Remark 1 We would like to stress that S HAZOO can also be used to predict the nodes of an arbitrary graph by first drawing a random spanning tree T of the graph, and then predicting optimally on T —see, e. [sent-160, score-0.511]

57 The resulting mistake bound is simply the expected value of S HAZOO’s mistake bound over the random draw of T . [sent-163, score-0.264]

58 By using a fast spanning tree sampler [17], the involved computational overhead amounts to constant amortized time per node prediction on “most” graphs. [sent-164, score-0.52]

59 4 Remark 2 In certain real-world input graphs, the presence of an edge linking two nodes may also carry information about the extent to which the two nodes are dissimilar, rather than similar. [sent-165, score-0.303]

60 The regularity measure is naturally extended to signed graphs by counting the weight of frustrated edges (e. [sent-167, score-0.214]

61 The prediction rule has to be re-defined as well: We count the parity of the number z of negative-weighted edges along the path connecting it to the closest node j ∈ C H(it ) , i. [sent-179, score-0.273]

62 This generalized Halving gives a weight to each labeling consistent with the labels seen so far and with the sign of ∆(f ) for each fork f . [sent-184, score-0.228]

63 5 Mistake bound analysis and implementation We now show that S HAZOO is nearly optimal on every weighted tree T . [sent-187, score-0.325]

64 Given a labeled tree (T, y), a cluster is any maximal subtree whose nodes have the same label. [sent-190, score-0.314]

65 i,j Theorem 2 For any labeled and weighted tree (T, y), there exists a set LT of O ξ(ΦW ) edgedisjoint in-cluster line graphs such that the number of mistakes made by S HAZOO is at most of the order of W min |L|, 1 + log 1 + ΦW RL . [sent-195, score-0.395]

66 L∈LT The above mistake bound depends on the tree structure through LT . [sent-196, score-0.265]

67 For example, if the input tree T is a weighted line graph, then it is likely to contain long in-cluster lines. [sent-200, score-0.285]

68 1 As for the implementation, we start by describing a method for calculating cut(v, y) for any unlabeled node v and label value y. [sent-206, score-0.26]

69 Let Φv (y) be the i minimum weighted cutsize of Tiv consistent with the revealed nodes and such that yi = y. [sent-209, score-0.424]

70 In all backtracking steps of v this visit the algorithm uses (1) to compute Φv (y) for each node i, the values Φv (y) for all children i j j of i being calculated during the previous backtracking steps. [sent-217, score-0.291]

71 Observe that a fork is incident to at least three nodes lying on different hinge lines. [sent-225, score-0.408]

72 Hence, in this step we perform a depth-first visit of T it , marking each node lying on a hinge line. [sent-226, score-0.362]

73 In order to accomplish this task, it suffices to single out all forks marking each labeled node and, recursively, each parent of a marked node of T it . [sent-227, score-0.469]

74 At the end of this process we are able to single out the forks by counting the number of edges (i, j) of each marked node i such that j has been marked, too. [sent-228, score-0.297]

75 The remaining hinge nodes are the leaves of T it whose labels have currently been revealed. [sent-229, score-0.308]

76 We perform a visit of H(it ) starting from every node r ∈ C(H(it )). [sent-236, score-0.217]

77 During these visits, we mark each node j of H(it ) with the label of r computed in the previous step, together with the length of π(r, j), which is what we need for predicting any label of H(it ) at the current time step. [sent-237, score-0.321]

78 In many real-world scenarios, one is interested in the more standard problem of predicting the labels of a given subset of test nodes based on the available labels of another subset of training nodes. [sent-243, score-0.276]

79 Given any unlabeled node i, we perform a visit of T i starting at i. [sent-249, score-0.251]

80 During the backtracking steps of this visit we use (1) to calculate Φi (y) for each node j in T i j and label y ∈ {−1, +1}. [sent-250, score-0.301]

81 Observe now that for any pair i, j of adjacent unlabeled nodes and any label y ∈ {−1, +1}, once we have obtained Φi (y), Φi (+1) and Φi (−1), we can compute Φj (y) in i j j i constant time, as Φj (y) = Φi (y) − miny ∈{−1,+1} Φi (y ) + I {y = y} wi,j . [sent-251, score-0.221]

82 6 The time for computing Φs (y) for all nodes s of T i and any label y is therefore linear in the time s of performing a breadth-first (or depth-first) visit of T i , i. [sent-256, score-0.235]

83 Since each labeled node with degree d is part of at most d trees T i for some i, we have that the total number of nodes of all distinct (edge-disjoint) trees T i across i ∈ V is linear in |V |. [sent-259, score-0.548]

84 Finally, we need to propagate the connection node labels of each hinge tree as in the third step of the online implementation. [sent-260, score-0.57]

85 This is an intuitive and fast algorithm for sequentially predicting the node labels is via a weighted majority vote over the labels of the adjacent nodes seen so far. [sent-263, score-0.566]

86 When the input graph is not a line, WTA turns it into a line by first extracting a spanning tree of the graph, and then linearizing it. [sent-272, score-0.393]

87 The implementation described in [6] runs in constant amortized time per prediction whenever the spanning tree sampler runs in time Θ(|V |). [sent-273, score-0.371]

88 In our experiments, we combined S HAZOO and WTA with spanning trees generated in different ways (note that OMV and L AB P ROP do not need to extract spanning trees from the input graph). [sent-278, score-0.51]

89 4 of [12], we draw a weighted spanning tree with probability proportional to the product of its edge weights. [sent-281, score-0.441]

90 We also tested our algorithms combined with random spanning trees generated uniformly at random ignoring the edge weights (i. [sent-282, score-0.317]

91 , the weights were only used to compute predictions on the randomly generated tree) —we call these spanning trees NWRST (no-weight RST). [sent-284, score-0.274]

92 This is just the minimal weight spanning tree, where the weight of a spanning tree is the sum of its edge weights. [sent-288, score-0.504]

93 The resulting algorithms are denoted by k*S HAZOO and k*WTA, where k is the number of spanning trees in the aggregation. [sent-294, score-0.255]

94 3 KROGAN (2,169 nodes and 6,102 edges) and COMBINED (2,871 nodes and 6,407 edges) are high-throughput protein-protein interaction networks of budding yeast taken from [14] —see [6] for a more complete description. [sent-298, score-0.26]

95 WEBSPAM is natively a binary classification problem, and we used the same train/test split provided with the dataset: 3,897 training nodes and 1,993 test nodes (the remaining nodes being unlabeled). [sent-306, score-0.39]

96 S HAZOO and WTA use a single minimum spanning tree (the best performing tree type for both algorithms). [sent-505, score-0.41]

97 S HAZOO and WTA use only non-weighted random spanning trees (NWRST) to optimize scalability. [sent-508, score-0.255]

98 Without using committees, S HAZOO outperforms WTA on all datasets, irrespective to the type of spanning tree being used. [sent-520, score-0.277]

99 Random spanning trees and the prediction of weighted graphs. [sent-577, score-0.414]

100 Generating random spanning trees more quickly than the cover time. [sent-656, score-0.255]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('hazoo', 0.663), ('wta', 0.426), ('node', 0.169), ('spanning', 0.144), ('nwrst', 0.142), ('tree', 0.133), ('cutsize', 0.13), ('fork', 0.13), ('nodes', 0.13), ('unweighted', 0.128), ('ree', 0.125), ('hinge', 0.124), ('weighted', 0.121), ('trees', 0.111), ('rop', 0.107), ('mistake', 0.102), ('yit', 0.094), ('forks', 0.083), ('omv', 0.083), ('ab', 0.067), ('graph', 0.066), ('sgn', 0.06), ('label', 0.057), ('pt', 0.056), ('connection', 0.055), ('labels', 0.054), ('cut', 0.051), ('resistance', 0.049), ('graphs', 0.049), ('visit', 0.048), ('milan', 0.047), ('edges', 0.045), ('revealed', 0.043), ('edge', 0.043), ('transductive', 0.042), ('mincut', 0.042), ('mst', 0.042), ('signed', 0.039), ('predicting', 0.038), ('prediction', 0.038), ('halving', 0.036), ('amortized', 0.036), ('committees', 0.036), ('fcut', 0.036), ('krogan', 0.036), ('unrevealed', 0.036), ('online', 0.035), ('usps', 0.034), ('mistakes', 0.034), ('unlabeled', 0.034), ('yj', 0.033), ('rl', 0.033), ('regularity', 0.032), ('italy', 0.031), ('spam', 0.031), ('webspam', 0.031), ('herbster', 0.031), ('line', 0.031), ('bound', 0.03), ('frustrated', 0.029), ('learner', 0.028), ('labeled', 0.027), ('backtracking', 0.027), ('propagation', 0.025), ('lt', 0.024), ('subtree', 0.024), ('gentile', 0.024), ('labeling', 0.024), ('batch', 0.024), ('incident', 0.024), ('irregularity', 0.024), ('logaritmic', 0.024), ('mispredicts', 0.024), ('networked', 0.024), ('shazoo', 0.024), ('slacks', 0.024), ('tiv', 0.024), ('vitale', 0.024), ('labelings', 0.023), ('predicts', 0.022), ('nearly', 0.021), ('star', 0.021), ('dsi', 0.021), ('disregarding', 0.021), ('marking', 0.021), ('specializing', 0.021), ('closest', 0.021), ('sparsi', 0.02), ('children', 0.02), ('implementation', 0.02), ('weight', 0.02), ('weights', 0.019), ('circled', 0.019), ('linearizing', 0.019), ('ten', 0.018), ('contained', 0.018), ('bias', 0.018), ('wonder', 0.018), ('web', 0.017), ('rooted', 0.017)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999994 242 nips-2011-See the Tree Through the Lines: The Shazoo Algorithm

Author: Fabio Vitale, Nicolò Cesa-bianchi, Claudio Gentile, Giovanni Zappella

Abstract: Predicting the nodes of a given graph is a fascinating theoretical problem with applications in several domains. Since graph sparsification via spanning trees retains enough information while making the task much easier, trees are an important special case of this problem. Although it is known how to predict the nodes of an unweighted tree in a nearly optimal way, in the weighted case a fully satisfactory algorithm is not available yet. We fill this hole and introduce an efficient node predictor, S HAZOO, which is nearly optimal on any weighted tree. Moreover, we show that S HAZOO can be viewed as a common nontrivial generalization of both previous approaches for unweighted trees and weighted lines. Experiments on real-world datasets confirm that S HAZOO performs well in that it fully exploits the structure of the input tree, and gets very close to (and sometimes better than) less scalable energy minimization methods. 1

2 0.12833381 96 nips-2011-Fast and Balanced: Efficient Label Tree Learning for Large Scale Object Recognition

Author: Jia Deng, Sanjeev Satheesh, Alexander C. Berg, Fei Li

Abstract: We present a novel approach to efficiently learn a label tree for large scale classification with many classes. The key contribution of the approach is a technique to simultaneously determine the structure of the tree and learn the classifiers for each node in the tree. This approach also allows fine grained control over the efficiency vs accuracy trade-off in designing a label tree, leading to more balanced trees. Experiments are performed on large scale image classification with 10184 classes and 9 million images. We demonstrate significant improvements in test accuracy and efficiency with less training time and more balanced trees compared to the previous state of the art by Bengio et al. 1

3 0.12658225 234 nips-2011-Reconstructing Patterns of Information Diffusion from Incomplete Observations

Author: Flavio Chierichetti, David Liben-nowell, Jon M. Kleinberg

Abstract: Motivated by the spread of on-line information in general and on-line petitions in particular, recent research has raised the following combinatorial estimation problem. There is a tree T that we cannot observe directly (representing the structure along which the information has spread), and certain nodes randomly decide to make their copy of the information public. In the case of a petition, the list of names on each public copy of the petition also reveals a path leading back to the root of the tree. What can we conclude about the properties of the tree we observe from these revealed paths, and can we use the structure of the observed tree to estimate the size of the full unobserved tree T ? Here we provide the first algorithm for this size estimation task, together with provable guarantees on its performance. We also establish structural properties of the observed tree, providing the first rigorous explanation for some of the unusual structural phenomena present in the spread of real chain-letter petitions on the Internet. 1

4 0.087771095 213 nips-2011-Phase transition in the family of p-resistances

Author: Morteza Alamgir, Ulrike V. Luxburg

Abstract: We study the family of p-resistances on graphs for p 1. This family generalizes the standard resistance distance. We prove that for any fixed graph, for p = 1 the p-resistance coincides with the shortest path distance, for p = 2 it coincides with the standard resistance distance, and for p ! 1 it converges to the inverse of the minimal s-t-cut in the graph. Secondly, we consider the special case of random geometric graphs (such as k-nearest neighbor graphs) when the number n of vertices in the graph tends to infinity. We prove that an interesting phase transition takes place. There exist two critical thresholds p⇤ and p⇤⇤ such that if p < p⇤ , then the p-resistance depends on meaningful global properties of the graph, whereas if p > p⇤⇤ , it only depends on trivial local quantities and does not convey any useful information. We can explicitly compute the critical values: p⇤ = 1 + 1/(d 1) and p⇤⇤ = 1 + 1/(d 2) where d is the dimension of the underlying space (we believe that the fact that there is a small gap between p⇤ and p⇤⇤ is an artifact of our proofs). We also relate our findings to Laplacian regularization and suggest to use q-Laplacians as regularizers, where q satisfies 1/p⇤ + 1/q = 1. 1

5 0.08412724 226 nips-2011-Projection onto A Nonnegative Max-Heap

Author: Jun Liu, Liang Sun, Jieping Ye

Abstract: We consider the problem of computing the Euclidean projection of a vector of length p onto a non-negative max-heap—an ordered tree where the values of the nodes are all nonnegative and the value of any parent node is no less than the value(s) of its child node(s). This Euclidean projection plays a building block role in the optimization problem with a non-negative maxheap constraint. Such a constraint is desirable when the features follow an ordered tree structure, that is, a given feature is selected for the given regression/classification task only if its parent node is selected. In this paper, we show that such Euclidean projection problem admits an analytical solution and we develop a top-down algorithm where the key operation is to find the so-called maximal root-tree of the subtree rooted at each node. A naive approach for finding the maximal root-tree is to enumerate all the possible root-trees, which, however, does not scale well. We reveal several important properties of the maximal root-tree, based on which we design a bottom-up algorithm with merge for efficiently finding the maximal roottree. The proposed algorithm has a (worst-case) linear time complexity for a sequential list, and O(p2 ) for a general tree. We report simulation results showing the effectiveness of the max-heap for regression with an ordered tree structure. Empirical results show that the proposed algorithm has an expected linear time complexity for many special cases including a sequential list, a full binary tree, and a tree with depth 1. 1

6 0.080623344 170 nips-2011-Message-Passing for Approximate MAP Inference with Latent Variables

7 0.07204891 140 nips-2011-Kernel Embeddings of Latent Tree Graphical Models

8 0.07038115 177 nips-2011-Multi-armed bandits on implicit metric spaces

9 0.060189545 115 nips-2011-Hierarchical Topic Modeling for Analysis of Time-Evolving Personal Choices

10 0.058968231 117 nips-2011-High-Dimensional Graphical Model Selection: Tractable Graph Families and Necessary Conditions

11 0.057584453 157 nips-2011-Learning to Search Efficiently in High Dimensions

12 0.055019788 296 nips-2011-Uniqueness of Belief Propagation on Signed Graphs

13 0.054919157 199 nips-2011-On fast approximate submodular minimization

14 0.051524803 80 nips-2011-Efficient Online Learning via Randomized Rounding

15 0.051149376 267 nips-2011-Spectral Methods for Learning Multivariate Latent Tree Structure

16 0.051062629 233 nips-2011-Rapid Deformable Object Detection using Dual-Tree Branch-and-Bound

17 0.049955402 146 nips-2011-Learning Higher-Order Graph Structure with Features by Structure Penalty

18 0.049591631 114 nips-2011-Hierarchical Multitask Structured Output Learning for Large-scale Sequence Segmentation

19 0.048852503 150 nips-2011-Learning a Distance Metric from a Network

20 0.04880359 274 nips-2011-Structure Learning for Optimization


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.131), (1, -0.012), (2, -0.076), (3, 0.001), (4, 0.021), (5, -0.078), (6, -0.098), (7, -0.075), (8, -0.006), (9, -0.201), (10, -0.008), (11, 0.022), (12, -0.124), (13, 0.084), (14, -0.042), (15, 0.062), (16, 0.024), (17, -0.062), (18, 0.05), (19, -0.088), (20, -0.018), (21, -0.063), (22, -0.03), (23, -0.005), (24, -0.011), (25, 0.01), (26, -0.035), (27, 0.026), (28, -0.054), (29, -0.022), (30, -0.022), (31, 0.029), (32, 0.018), (33, 0.06), (34, 0.053), (35, -0.078), (36, -0.004), (37, 0.046), (38, 0.04), (39, 0.016), (40, 0.052), (41, -0.053), (42, 0.054), (43, -0.062), (44, 0.003), (45, -0.079), (46, -0.009), (47, -0.058), (48, -0.056), (49, -0.034)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.94836318 242 nips-2011-See the Tree Through the Lines: The Shazoo Algorithm

Author: Fabio Vitale, Nicolò Cesa-bianchi, Claudio Gentile, Giovanni Zappella

Abstract: Predicting the nodes of a given graph is a fascinating theoretical problem with applications in several domains. Since graph sparsification via spanning trees retains enough information while making the task much easier, trees are an important special case of this problem. Although it is known how to predict the nodes of an unweighted tree in a nearly optimal way, in the weighted case a fully satisfactory algorithm is not available yet. We fill this hole and introduce an efficient node predictor, S HAZOO, which is nearly optimal on any weighted tree. Moreover, we show that S HAZOO can be viewed as a common nontrivial generalization of both previous approaches for unweighted trees and weighted lines. Experiments on real-world datasets confirm that S HAZOO performs well in that it fully exploits the structure of the input tree, and gets very close to (and sometimes better than) less scalable energy minimization methods. 1

2 0.82549614 234 nips-2011-Reconstructing Patterns of Information Diffusion from Incomplete Observations

Author: Flavio Chierichetti, David Liben-nowell, Jon M. Kleinberg

Abstract: Motivated by the spread of on-line information in general and on-line petitions in particular, recent research has raised the following combinatorial estimation problem. There is a tree T that we cannot observe directly (representing the structure along which the information has spread), and certain nodes randomly decide to make their copy of the information public. In the case of a petition, the list of names on each public copy of the petition also reveals a path leading back to the root of the tree. What can we conclude about the properties of the tree we observe from these revealed paths, and can we use the structure of the observed tree to estimate the size of the full unobserved tree T ? Here we provide the first algorithm for this size estimation task, together with provable guarantees on its performance. We also establish structural properties of the observed tree, providing the first rigorous explanation for some of the unusual structural phenomena present in the spread of real chain-letter petitions on the Internet. 1

3 0.68559957 226 nips-2011-Projection onto A Nonnegative Max-Heap

Author: Jun Liu, Liang Sun, Jieping Ye

Abstract: We consider the problem of computing the Euclidean projection of a vector of length p onto a non-negative max-heap—an ordered tree where the values of the nodes are all nonnegative and the value of any parent node is no less than the value(s) of its child node(s). This Euclidean projection plays a building block role in the optimization problem with a non-negative maxheap constraint. Such a constraint is desirable when the features follow an ordered tree structure, that is, a given feature is selected for the given regression/classification task only if its parent node is selected. In this paper, we show that such Euclidean projection problem admits an analytical solution and we develop a top-down algorithm where the key operation is to find the so-called maximal root-tree of the subtree rooted at each node. A naive approach for finding the maximal root-tree is to enumerate all the possible root-trees, which, however, does not scale well. We reveal several important properties of the maximal root-tree, based on which we design a bottom-up algorithm with merge for efficiently finding the maximal roottree. The proposed algorithm has a (worst-case) linear time complexity for a sequential list, and O(p2 ) for a general tree. We report simulation results showing the effectiveness of the max-heap for regression with an ordered tree structure. Empirical results show that the proposed algorithm has an expected linear time complexity for many special cases including a sequential list, a full binary tree, and a tree with depth 1. 1

4 0.68206328 177 nips-2011-Multi-armed bandits on implicit metric spaces

Author: Aleksandrs Slivkins

Abstract: The multi-armed bandit (MAB) setting is a useful abstraction of many online learning tasks which focuses on the trade-off between exploration and exploitation. In this setting, an online algorithm has a fixed set of alternatives (“arms”), and in each round it selects one arm and then observes the corresponding reward. While the case of small number of arms is by now well-understood, a lot of recent work has focused on multi-armed bandits with (infinitely) many arms, where one needs to assume extra structure in order to make the problem tractable. In particular, in the Lipschitz MAB problem there is an underlying similarity metric space, known to the algorithm, such that any two arms that are close in this metric space have similar payoffs. In this paper we consider the more realistic scenario in which the metric space is implicit – it is defined by the available structure but not revealed to the algorithm directly. Specifically, we assume that an algorithm is given a tree-based classification of arms. For any given problem instance such a classification implicitly defines a similarity metric space, but the numerical similarity information is not available to the algorithm. We provide an algorithm for this setting, whose performance guarantees (almost) match the best known guarantees for the corresponding instance of the Lipschitz MAB problem. 1

5 0.65811354 274 nips-2011-Structure Learning for Optimization

Author: Shulin Yang, Ali Rahimi

Abstract: We describe a family of global optimization procedures that automatically decompose optimization problems into smaller loosely coupled problems. The solutions of these are subsequently combined with message passing algorithms. We show empirically that these methods produce better solutions with fewer function evaluations than existing global optimization methods. To develop these methods, we introduce a notion of coupling between variables of optimization. This notion of coupling generalizes the notion of independence between random variables in statistics, sparseness of the Hessian in nonlinear optimization, and the generalized distributive law. Despite its generality, this notion of coupling is easier to verify empirically, making structure estimation easy, while allowing us to migrate well-established inference methods on graphical models to the setting of global optimization. 1

6 0.6561116 208 nips-2011-Optimistic Optimization of a Deterministic Function without the Knowledge of its Smoothness

7 0.60995603 96 nips-2011-Fast and Balanced: Efficient Label Tree Learning for Large Scale Object Recognition

8 0.60230881 267 nips-2011-Spectral Methods for Learning Multivariate Latent Tree Structure

9 0.59485471 31 nips-2011-An Application of Tree-Structured Expectation Propagation for Channel Decoding

10 0.54833895 170 nips-2011-Message-Passing for Approximate MAP Inference with Latent Variables

11 0.50760108 117 nips-2011-High-Dimensional Graphical Model Selection: Tractable Graph Families and Necessary Conditions

12 0.49951848 115 nips-2011-Hierarchical Topic Modeling for Analysis of Time-Evolving Personal Choices

13 0.48034155 140 nips-2011-Kernel Embeddings of Latent Tree Graphical Models

14 0.46910825 150 nips-2011-Learning a Distance Metric from a Network

15 0.44617119 6 nips-2011-A Global Structural EM Algorithm for a Model of Cancer Progression

16 0.44605774 146 nips-2011-Learning Higher-Order Graph Structure with Features by Structure Penalty

17 0.43600512 233 nips-2011-Rapid Deformable Object Detection using Dual-Tree Branch-and-Bound

18 0.42032641 213 nips-2011-Phase transition in the family of p-resistances

19 0.4122079 67 nips-2011-Data Skeletonization via Reeb Graphs

20 0.40744454 296 nips-2011-Uniqueness of Belief Propagation on Signed Graphs


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.025), (4, 0.103), (20, 0.028), (26, 0.033), (31, 0.057), (33, 0.032), (42, 0.277), (43, 0.054), (45, 0.077), (57, 0.106), (74, 0.034), (83, 0.014), (99, 0.035)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.75734061 242 nips-2011-See the Tree Through the Lines: The Shazoo Algorithm

Author: Fabio Vitale, Nicolò Cesa-bianchi, Claudio Gentile, Giovanni Zappella

Abstract: Predicting the nodes of a given graph is a fascinating theoretical problem with applications in several domains. Since graph sparsification via spanning trees retains enough information while making the task much easier, trees are an important special case of this problem. Although it is known how to predict the nodes of an unweighted tree in a nearly optimal way, in the weighted case a fully satisfactory algorithm is not available yet. We fill this hole and introduce an efficient node predictor, S HAZOO, which is nearly optimal on any weighted tree. Moreover, we show that S HAZOO can be viewed as a common nontrivial generalization of both previous approaches for unweighted trees and weighted lines. Experiments on real-world datasets confirm that S HAZOO performs well in that it fully exploits the structure of the input tree, and gets very close to (and sometimes better than) less scalable energy minimization methods. 1

2 0.66910613 102 nips-2011-Generalised Coupled Tensor Factorisation

Author: Kenan Y. Yılmaz, Ali T. Cemgil, Umut Simsekli

Abstract: We derive algorithms for generalised tensor factorisation (GTF) by building upon the well-established theory of Generalised Linear Models. Our algorithms are general in the sense that we can compute arbitrary factorisations in a message passing framework, derived for a broad class of exponential family distributions including special cases such as Tweedie’s distributions corresponding to βdivergences. By bounding the step size of the Fisher Scoring iteration of the GLM, we obtain general updates for real data and multiplicative updates for non-negative data. The GTF framework is, then extended easily to address the problems when multiple observed tensors are factorised simultaneously. We illustrate our coupled factorisation approach on synthetic data as well as on a musical audio restoration problem. 1

3 0.52262515 127 nips-2011-Image Parsing with Stochastic Scene Grammar

Author: Yibiao Zhao, Song-chun Zhu

Abstract: This paper proposes a parsing algorithm for scene understanding which includes four aspects: computing 3D scene layout, detecting 3D objects (e.g. furniture), detecting 2D faces (windows, doors etc.), and segmenting background. In contrast to previous scene labeling work that applied discriminative classifiers to pixels (or super-pixels), we use a generative Stochastic Scene Grammar (SSG). This grammar represents the compositional structures of visual entities from scene categories, 3D foreground/background, 2D faces, to 1D lines. The grammar includes three types of production rules and two types of contextual relations. Production rules: (i) AND rules represent the decomposition of an entity into sub-parts; (ii) OR rules represent the switching among sub-types of an entity; (iii) SET rules represent an ensemble of visual entities. Contextual relations: (i) Cooperative “+” relations represent positive links between binding entities, such as hinged faces of a object or aligned boxes; (ii) Competitive “-” relations represents negative links between competing entities, such as mutually exclusive boxes. We design an efficient MCMC inference algorithm, namely Hierarchical cluster sampling, to search in the large solution space of scene configurations. The algorithm has two stages: (i) Clustering: It forms all possible higher-level structures (clusters) from lower-level entities by production rules and contextual relations. (ii) Sampling: It jumps between alternative structures (clusters) in each layer of the hierarchy to find the most probable configuration (represented by a parse tree). In our experiment, we demonstrate the superiority of our algorithm over existing methods on public dataset. In addition, our approach achieves richer structures in the parse tree. 1

4 0.52226615 139 nips-2011-Kernel Bayes' Rule

Author: Kenji Fukumizu, Le Song, Arthur Gretton

Abstract: A nonparametric kernel-based method for realizing Bayes’ rule is proposed, based on kernel representations of probabilities in reproducing kernel Hilbert spaces. The prior and conditional probabilities are expressed as empirical kernel mean and covariance operators, respectively, and the kernel mean of the posterior distribution is computed in the form of a weighted sample. The kernel Bayes’ rule can be applied to a wide variety of Bayesian inference problems: we demonstrate Bayesian computation without likelihood, and filtering with a nonparametric statespace model. A consistency rate for the posterior estimate is established. 1

5 0.51985985 171 nips-2011-Metric Learning with Multiple Kernels

Author: Jun Wang, Huyen T. Do, Adam Woznica, Alexandros Kalousis

Abstract: Metric learning has become a very active research field. The most popular representative–Mahalanobis metric learning–can be seen as learning a linear transformation and then computing the Euclidean metric in the transformed space. Since a linear transformation might not always be appropriate for a given learning problem, kernelized versions of various metric learning algorithms exist. However, the problem then becomes finding the appropriate kernel function. Multiple kernel learning addresses this limitation by learning a linear combination of a number of predefined kernels; this approach can be also readily used in the context of multiple-source learning to fuse different data sources. Surprisingly, and despite the extensive work on multiple kernel learning for SVMs, there has been no work in the area of metric learning with multiple kernel learning. In this paper we fill this gap and present a general approach for metric learning with multiple kernel learning. Our approach can be instantiated with different metric learning algorithms provided that they satisfy some constraints. Experimental evidence suggests that our approach outperforms metric learning with an unweighted kernel combination and metric learning with cross-validation based kernel selection. 1

6 0.51985031 213 nips-2011-Phase transition in the family of p-resistances

7 0.5113492 233 nips-2011-Rapid Deformable Object Detection using Dual-Tree Branch-and-Bound

8 0.51022053 193 nips-2011-Object Detection with Grammar Models

9 0.50868911 234 nips-2011-Reconstructing Patterns of Information Diffusion from Incomplete Observations

10 0.50868517 64 nips-2011-Convergent Bounds on the Euclidean Distance

11 0.50457126 111 nips-2011-Hashing Algorithms for Large-Scale Learning

12 0.5041824 170 nips-2011-Message-Passing for Approximate MAP Inference with Latent Variables

13 0.49715498 140 nips-2011-Kernel Embeddings of Latent Tree Graphical Models

14 0.49689603 130 nips-2011-Inductive reasoning about chimeric creatures

15 0.4954499 231 nips-2011-Randomized Algorithms for Comparison-based Search

16 0.49512118 226 nips-2011-Projection onto A Nonnegative Max-Heap

17 0.49294952 100 nips-2011-Gaussian Process Training with Input Noise

18 0.49283919 157 nips-2011-Learning to Search Efficiently in High Dimensions

19 0.49044362 166 nips-2011-Maximal Cliques that Satisfy Hard Constraints with Application to Deformable Object Model Learning

20 0.48935202 150 nips-2011-Learning a Distance Metric from a Network