nips nips2013 nips2013-328 knowledge-graph by maker-knowledge-mining

328 nips-2013-The Total Variation on Hypergraphs - Learning on Hypergraphs Revisited

Source: pdf

Author: Matthias Hein, Simon Setzer, Leonardo Jost, Syama Sundar Rangapuram

Abstract: Hypergraphs allow one to encode higher-order relationships in data and are thus a very ﬂexible modeling tool. Current learning methods are either based on approximations of the hypergraphs via graphs or on tensor methods which are only applicable under special conditions. In this paper, we present a new learning framework on hypergraphs which fully uses the hypergraph structure. The key element is a family of regularization functionals based on the total variation on hypergraphs. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Current learning methods are either based on approximations of the hypergraphs via graphs or on tensor methods which are only applicable under special conditions. [sent-2, score-0.436]

2 In this paper, we present a new learning framework on hypergraphs which fully uses the hypergraph structure. [sent-3, score-1.072]

3 The key element is a family of regularization functionals based on the total variation on hypergraphs. [sent-4, score-0.221]

4 The ﬁrst one uses tensor methods for clustering as the higher-order extension of matrix (spectral) methods for graphs [7, 8, 9]. [sent-9, score-0.19]

5 While tensor methods are mathematically quite appealing, they are limited to socalled k-uniform hypergraphs, that is, each hyperedge contains exactly k vertices. [sent-10, score-0.246]

6 The second main approach can deal with arbitrary hypergraphs [10, 11]. [sent-12, score-0.343]

7 The basic idea of this line of work is to approximate the hypergraph via a standard weighted graph. [sent-13, score-0.73]

8 The two main ways of approximating the hypergraph by a standard graph are the clique and the star expansion which were compared in [12]. [sent-15, score-1.053]

9 One can summarize [12] by stating that no approximation fully encodes the hypergraph structure. [sent-16, score-0.729]

10 Earlier, [13] have proven that an exact representation of the hypergraph via a graph retaining its cut properties is impossible. [sent-17, score-1.039]

11 For both clustering and semisupervised learning the key element, either explicitly or implicitly, is the cut functional. [sent-19, score-0.356]

12 Our aim is to directly work with the cut deﬁned on the hypergraph. [sent-20, score-0.273]

13 We discuss in detail the differences of the hypergraph cut and the cut induced by the clique and star expansion in Section 2. [sent-21, score-1.546]

14 2, we introduce the total variation on a hypergraph as the Lovasz extension of the hypergraph cut. [sent-24, score-1.571]

15 Based on this, we propose a family of regularization functionals which interpolate between the total variation and a regularization functional enforcing smoother functions on the hypergraph corresponding to Laplacian-type regularization on graphs. [sent-25, score-1.027]

16 In Section 4, we show in line of recent research [14, 15, 16, 17] that there exists a tight relaxation of the normalized hypergraph cut. [sent-27, score-0.801]

17 In the experimental section 6, we show that fully incorporating hypergraph structure is beneﬁcial. [sent-30, score-0.729]

18 2, we introduce in analogy to graphs, the total variation on hypergraphs as the Lovasz extension of the hypergraph cut. [sent-36, score-1.215]

19 In this paper, we consider weighted undirected hypergraphs H = (V, E, w) where V is the vertex set with |V | = n and E the set of hyperedges with |E| = m. [sent-39, score-0.479]

20 Each hyperedge e ∈ E corresponds to a subset of vertices, i. [sent-40, score-0.224]

21 The vector w ∈ Rm contains for each hyperedge e its non-negative weight we . [sent-43, score-0.246]

22 We would like to emphasize that we do not impose the restriction that the hypergraph is k-uniform, i. [sent-47, score-0.713]

23 The considered class of hypergraphs contains the set of undirected, weighted graphs which is equivalent to the set of 2-uniform hypergraphs. [sent-50, score-0.431]

24 The motivation for the total variation on hypergraphs comes from the correspondence between the cut on a graph and the total variation functional. [sent-51, score-0.897]

25 Thus, we recall the deﬁnition of the cut on weighted graphs G = (V, W ) with weight matrix W . [sent-52, score-0.383]

26 Then, for a partition (C, C), the cut is deﬁned as cutG (C, C) = i,j : i∈C,j∈C wij . [sent-54, score-0.311]

27 This standard deﬁnition of the cut carries over naturally to a hypergraph H cutH (C, C) = e∈E: we . [sent-55, score-1.012]

28 (1) e∩C=∅, e∩C=∅ Thus, the cut functional on a hypergraph is just the sum of the weights of the hyperedges which have vertices both in C and C. [sent-56, score-1.136]

29 It is not biased towards a particular way the hyperedge is cut, that is, how many vertices of the hyperedge are in C resp. [sent-57, score-0.479]

30 This emphasizes that the vertices in a hyperedge belong together and we penalize every cut of a hyperedge with the same value. [sent-59, score-0.752]

31 In order to handle hypergraphs with existing methods developed for graphs, the focus in previous works [11, 12] has been on transforming the hypergraph into a graph. [sent-60, score-1.056]

32 In [11], they suggest using the clique expansion (CE), i. [sent-61, score-0.254]

33 , every hyperedge e ∈ H is replaced with a fully connected subgraph where every edge in this subgraph has weight we . [sent-63, score-0.316]

34 This leads to the cut functional cutCE , |e| cutCE (C, C) := e∈E: e∩C=∅, e∩C=∅ we |e ∩ C| |e ∩ C|. [sent-64, score-0.319]

35 |e| (2) Note that in contrast to the hypergraph cut (1), the value of cutCE depends on the way each hyperedge is cut since the term |e ∩ C| |e ∩ C| makes the weights dependent on the partition. [sent-65, score-1.483]

36 In particular, the smallest weight is attained if only a single vertex is split off, whereas the largest weight is attained if the partition of the hyperedge is most balanced. [sent-66, score-0.343]

37 In comparison to the hypergraph cut, this leads to a bias towards cuts that favor splitting off single vertices from a hyperedge which in our point of view is an undesired property for most applications. [sent-67, score-1.074]

38 We illustrate this with an example in Figure 1, where the minimum hypergraph cut (cutH ) leads to a balanced partition, whereas the minimum clique expansion cut (cutCE ) not only cuts an additional hyperedge but is also unbalanced. [sent-68, score-1.896]

39 Another argument against the clique expansion is computational complexity. [sent-70, score-0.254]

40 For large hyperedges the clique expansion leads to (almost) fully connected graphs which makes computations slow and is prohibitive for large hypergraphs. [sent-71, score-0.466]

41 We omit the discussion of the star graph approximation of hypergraphs discussed in [12] as it is shown there that the star graph expansion is very similar to the clique expansion. [sent-72, score-0.769]

42 [13] which states that in general there exists no graph with the same vertex set V which has for every partition (C, C) the same cut value as the hypergraph cut. [sent-74, score-1.086]

43 minimum cut of the clique expansion cutCE : For edge weights w1 = w4 = 10, w2 = w5 = 0. [sent-76, score-0.555]

44 6 the minimum hypergraph cut is (C1 , C 1 ) which is perfectly balanced. [sent-78, score-1.0]

45 Although cutting one hyperedge more and being unbalanced, (C2 , C 2 ) is the optimal cut for the clique expansion approximation. [sent-79, score-0.767]

46 Finally, note that for weighted 3-uniform hypergraphs it is always possible to ﬁnd a corresponding graph such that any cut of the graph is equal to the corresponding cut of the hypergraph. [sent-80, score-1.012]

47 Then, W ∈ 1 R|V |×|V | deﬁned as W = 2 Hdiag(w)H T deﬁnes the weight matrix of a graph G = (V, W ) where each cut of G has the same value as the corresponding hypergraph cut of H. [sent-84, score-1.334]

48 For graphs G = (V, W ), the total variation on graphs is deﬁned as the Lovasz extension of the n 1 graph cut [18] given as T VG : R|V | → R, TVG (f ) = 2 i,j=1 wij |fi − fj |. [sent-98, score-0.689]

49 The total variation TVH : R|V | → R on a hypergraph H = (V, E, w) deﬁned as ˆ the Lovasz extension of the hypergraph cut, S(C) = cutH (C, C), is a convex function given by we max fi − min fj = TVH (f ) = e∈E i∈e j∈e we max |fi − fj |. [sent-101, score-1.801]

50 e∈E i,j∈e Note that the total variation of a hypergraph cut reduces to the total variation on graphs if H is 2-uniform (standard graph). [sent-102, score-1.285]

51 There is an interesting relation of the total variation on hypergraphs to sparsity inducing group norms. [sent-103, score-0.457]

52 Namely, deﬁning for each edge e ∈ E the difference operator De : R|V | → R|V |×|V | by (De f )ij = fi − fj if i, j ∈ e and 0 otherwise, TVH can be written as, TVH (f ) = e∈E we De f ∞ , which can be seen as inducing group sparse structure on the gradient level. [sent-104, score-0.148]

53 It is known that using the total variation on graphs as a regularization functional in semi-supervised learning (SSL) leads to very spiky solutions for small numbers of labeled points. [sent-107, score-0.307]

54 For graphs this is achieved by using the family of regularization functionals ΩG,p : R|V | → R, n ΩG,p (f ) = 1 wij |fi − fj |p . [sent-109, score-0.254]

55 The regularization functionals ΩH,p : R|V | → R for a hypergraph H = (V, E, w) are deﬁned for p ≥ 1 as p we max fi − min fj ΩH,p (f ) = i∈e e∈E j∈e . [sent-114, score-0.961]

56 Thus, the difference between the hypergraph cut and its approximations such as clique and star expansion carries over to ΩH,p and ΩGCE ,p , respectively. [sent-121, score-1.313]

57 3 Semi-supervised Learning With the regularization functionals derived in the last section, we can immediately write down a formulation for two-class semi-supervised learning on hypergraphs similar to the well-known approaches of [19, 20]. [sent-122, score-0.45]

58 This is due to the effect that cutting “out” the labeled points leads to a much smaller cut than, e. [sent-131, score-0.328]

59 1, we discussed the difference between the hypergraph cut (1) and the graph cut of the clique expansion (2) of the hypergraph and gave a simple example in Figure 1 where these cuts yield quite different results. [sent-136, score-2.363]

60 Clearly, this difference carries over to the famous normalized cut criterion introduced in [21, 22] for clustering of graphs with applications in image segmentation. [sent-137, score-0.491]

61 normalized cut can be formulated as RCut(C, C) = cutH (C, C) , |C||C| NCut(C, C) = cutH (C, C) , vol(C) vol(C) which incorporate different balancing criteria. [sent-139, score-0.345]

62 Note, that in contrast to the normalized cut for graphs the normalized hypergraph cut allows no relaxation into a linear eigenproblem (spectral relaxation). [sent-140, score-1.468]

63 Thus, we follow a recent line of research [14, 15, 16, 17] where it has been shown that the standard spectral relaxation of the normalized cut used in spectral clustering [22] is loose and that a tight, in fact exact, relaxation can be formulated in terms of a nonlinear eigenproblem. [sent-141, score-0.545]

64 In this section, we extend their approach to hypergraphs and consider general H (C,C) ˆ balanced hypergraph cuts Bcut(C, C) of the form, Bcut(C, C) = cutS(C) , where S : 2V → R+ ˆ ˆ ˆ is a non-negative, symmetric set function (that is S(C) = S(C)). [sent-144, score-1.168]

65 For the normalized cut one has 4 ˆ ˆ S(C) = vol(C) vol(C) whereas for the Cheeger cut one has S(C) = min{vol C, vol C}. [sent-145, score-0.657]

66 Our following result shows that the balanced hypergraph cut also has an exact relaxation into a continuous nonlinear eigenproblem [14]. [sent-147, score-1.1]

67 Let H = (V, E, w) be a ﬁnite, weighted hypergraph and S : R|V | → R be the Lovasz ˆ extension of the symmetric, non-negative set function S : 2V → R. [sent-150, score-0.761]

68 Then, it holds that e∈E we max fi − min fj i∈e min j∈e S(f ) f ∈R|V | = min C⊂V cutH (C, C) . [sent-151, score-0.183]

69 Then, min t∈R cutH (Ct , Ct ) ≤ ˆ S(Ct ) e∈E we max fi − min fj i∈e j∈e S(f ) . [sent-153, score-0.162]

70 The last part of the theorem shows that “optimal thresholding” (turning f ∈ RV into a partition) among all level sets of any f ∈ R|V | can only lead to a better or equal balanced hypergraph cut. [sent-154, score-0.755]

71 Note that the balancing functions of ratio/normalized cut and Cheeger cut are submodular. [sent-172, score-0.577]

72 Let me be the number of vertices in hyperedge e ∈ E. [sent-193, score-0.255]

73 (k+1) + α(e,2) (k+1) )) The value ci = e∈E Hi,e is the number of hyperedges the vertex i lies in. [sent-204, score-0.149]

74 Moreover, we introduce for every hyperedge e ∈ E the functional Fe (αe ) = λwe (max(αe ) − min(αe ))2 . [sent-208, score-0.251]

75 As we show in the supplementary material, the ∗ conjugate functions Fe are not indicator functions and we thus solve the corresponding proximal problems via proximal problems for Fe . [sent-210, score-0.16]

76 Note that the clique expansion leads for all datasets to a graph which is close to being fully connected as all datasets contain large hyperedges. [sent-215, score-0.384]

77 For covertype (6,7) the weight matrix needs over 10GB of memory, the original hypergraph only 4MB. [sent-216, score-0.828]

78 For any σ > 0 and any αe ∈ Rme the proximal map ˜ 1 prox σ Fe (˜ e ) = arg min{ α αe ∈Rme 1 e α − αe ˜ 2 2 2 + 1 λwe (max(αe ) − min(αe ))2 } σ can be computed with O(me log me ) arithmetic operations. [sent-219, score-0.14]

79 As in [11], a hyperedge of weight one is created by all data points which have the same value of a categorical feature. [sent-227, score-0.278]

80 In [11], they suggest using a regularizer induced by the normalized Laplacian LCE arising from the clique expansion −1 −1 2 2 LCE = I − DCE HW H T DCE , we |E|×|E| where DCE is a diagonal matrix with entries dEC (i) = is a e∈E Hi,e |e| and W ∈ R diagonal matrix with entries w (e) = we /|e|. [sent-231, score-0.309]

81 For the largest example (covertype 6,7), where the clique expansion fails due to memory problems, our method takes 30-100s (depending on λ). [sent-236, score-0.254]

82 Our SSL methods based on ΩH,p , p = 1, 2 outperform consistently the clique expansion technique of Zhou et al [11] on all datasets except 20newsgroups3 . [sent-243, score-0.268]

83 A method based on pairwise interaction such as the clique expansion can better deal 2 This is a modiﬁed version by Sam Roweis of the original 20 newsgroups dataset available at http: //www. [sent-246, score-0.27]

84 3 Communications with the authors of [11] could not clarify the difference to their results on 20newsgroups 7 with such label noise as the large hyperedges for this dataset accumulate the label noise. [sent-251, score-0.136]

85 On all other datasets we observe that incorporating hypergraph structure leads to much better results. [sent-252, score-0.746]

86 As expected our squared TV functional (p = 2) outperforms slightly the total variation (p = 1) even though the difference is small. [sent-253, score-0.155]

87 We use the normalized hypergraph cut as clustering objective. [sent-480, score-1.093]

88 For more than two clusters we recursively partition the hypergraph until the desired number of clusters is reached. [sent-481, score-0.733]

89 For comparison we use the normalized spectral clustering approach based on the Laplacian LCE [11](clique expansion). [sent-482, score-0.141]

90 The ﬁrst part (ﬁrst 6 columns) of the following table shows the clustering errors (majority vote on each cluster) of both methods as well as the normalized cuts achieved by these methods on the hypergraph and on the graph resulting from the clique expansion. [sent-483, score-1.103]

91 The number k is chosen as the smallest number for which the graph becomes connected and we compare results of normalized 1-spectral clustering [14] and the standard spectral clustering [22]. [sent-485, score-0.274]

92 Note that the employed hypergraph construction has no free parameter. [sent-486, score-0.713]

93 Dataset Mushrooms Zoo 20-newsgroup covertype (4,5) covertype (6,7) Clustering Error % Ours [11] 10. [sent-487, score-0.186]

94 0041 First, we observe that our approach optimizing the normalized hypergraph cut yields better or similar results in terms of clustering errors compared to the clique expansion (except for 20-newsgroup for the same reason given in the previous paragraph). [sent-532, score-1.347]

95 Moreover, our method sometimes has even smaller cuts on the graphs resulting from the clique expansion, although it does not directly optimize this objective in contrast to [11]. [sent-535, score-0.301]

96 Second, the comparison to a standard graph-based approach where the similarity structure is obtained using the Hamming distance on the categorical features shows that using hypergraph structure is indeed useful. [sent-537, score-0.745]

97 Nevertheless, we think that there is room for improvement regarding the construction of the hypergraph which is a topic for future research. [sent-538, score-0.713]

98 Modeling hypergraphs by graphs with the same mincut properties. [sent-629, score-0.414]

99 An inverse power method for nonlinear eigenproblems with applications in 1u spectral clustering and sparse PCA. [sent-634, score-0.139]

100 Beyond spectral clustering - tight relaxations of balanced graph cuts. [sent-644, score-0.209]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('hypergraph', 0.713), ('hypergraphs', 0.343), ('cut', 0.273), ('hyperedge', 0.224), ('clique', 0.16), ('cuth', 0.119), ('lovasz', 0.119), ('tvh', 0.106), ('ssl', 0.097), ('expansion', 0.094), ('covertype', 0.093), ('hyperedges', 0.092), ('pdhg', 0.092), ('variation', 0.088), ('functionals', 0.074), ('proximal', 0.073), ('graphs', 0.071), ('fe', 0.071), ('vol', 0.07), ('cuts', 0.07), ('ke', 0.066), ('cutce', 0.066), ('lce', 0.066), ('clustering', 0.066), ('fi', 0.062), ('fj', 0.058), ('zoo', 0.058), ('prox', 0.053), ('graph', 0.053), ('mushrooms', 0.053), ('ratiodca', 0.053), ('rme', 0.053), ('ncut', 0.047), ('zhou', 0.046), ('balanced', 0.042), ('normalized', 0.041), ('dce', 0.04), ('spectral', 0.034), ('relaxation', 0.033), ('star', 0.033), ('regularization', 0.033), ('categorical', 0.032), ('cheeger', 0.032), ('laplacian', 0.031), ('extension', 0.031), ('convex', 0.031), ('vertices', 0.031), ('balancing', 0.031), ('ci', 0.03), ('hein', 0.029), ('vertex', 0.027), ('functional', 0.027), ('bcut', 0.026), ('setzer', 0.026), ('total', 0.026), ('carries', 0.026), ('proxg', 0.023), ('spiky', 0.023), ('eigenproblems', 0.023), ('rangapuram', 0.023), ('eigenproblem', 0.023), ('submodular', 0.022), ('tensor', 0.022), ('weight', 0.022), ('ct', 0.021), ('min', 0.021), ('partition', 0.02), ('labeled', 0.02), ('leads', 0.019), ('wij', 0.018), ('weighted', 0.017), ('inner', 0.017), ('splitting', 0.017), ('ihler', 0.017), ('semisupervised', 0.017), ('music', 0.016), ('nonlinear', 0.016), ('fully', 0.016), ('cutting', 0.016), ('pairwise', 0.016), ('relations', 0.015), ('loose', 0.015), ('positively', 0.015), ('label', 0.015), ('recommend', 0.015), ('datasets', 0.014), ('regularizer', 0.014), ('ps', 0.014), ('minimum', 0.014), ('arg', 0.014), ('connected', 0.014), ('imaging', 0.014), ('duality', 0.014), ('attained', 0.014), ('analogy', 0.014), ('supplementary', 0.014), ('difference', 0.014), ('edge', 0.014), ('tight', 0.014), ('subgraph', 0.013)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000001 328 nips-2013-The Total Variation on Hypergraphs - Learning on Hypergraphs Revisited

Author: Matthias Hein, Simon Setzer, Leonardo Jost, Syama Sundar Rangapuram

2 0.097404532 35 nips-2013-Analyzing the Harmonic Structure in Graph-Based Learning

Author: Xiao-Ming Wu, Zhenguo Li, Shih-Fu Chang

Abstract: We ﬁnd that various well-known graph-based models exhibit a common important harmonic structure in its target function – the value of a vertex is approximately the weighted average of the values of its adjacent neighbors. Understanding of such structure and analysis of the loss deﬁned over such structure help reveal important properties of the target function over a graph. In this paper, we show that the variation of the target function across a cut can be upper and lower bounded by the ratio of its harmonic loss and the cut cost. We use this to develop an analytical tool and analyze ﬁve popular graph-based models: absorbing random walks, partially absorbing random walks, hitting times, pseudo-inverse of the graph Laplacian, and eigenvectors of the Laplacian matrices. Our analysis sheds new insights into several open questions related to these models, and provides theoretical justiﬁcations and guidelines for their practical use. Simulations on synthetic and real datasets conﬁrm the potential of the proposed theory and tool.

3 0.09636759 202 nips-2013-Multiclass Total Variation Clustering

Author: Xavier Bresson, Thomas Laurent, David Uminsky, James von Brecht

Abstract: Ideas from the image processing literature have recently motivated a new set of clustering algorithms that rely on the concept of total variation. While these algorithms perform well for bi-partitioning tasks, their recursive extensions yield unimpressive results for multiclass clustering tasks. This paper presents a general framework for multiclass total variation clustering that does not rely on recursion. The results greatly outperform previous total variation algorithms and compare well with state-of-the-art NMF approaches. 1

4 0.083679378 151 nips-2013-Learning Chordal Markov Networks by Constraint Satisfaction

Author: Jukka Corander, Tomi Janhunen, Jussi Rintanen, Henrik Nyman, Johan Pensar

Abstract: We investigate the problem of learning the structure of a Markov network from data. It is shown that the structure of such networks can be described in terms of constraints which enables the use of existing solver technology with optimization capabilities to compute optimal networks starting from initial scores computed from the data. To achieve efﬁcient encodings, we develop a novel characterization of Markov network structure using a balancing condition on the separators between cliques forming the network. The resulting translations into propositional satisﬁability and its extensions such as maximum satisﬁability, satisﬁability modulo theories, and answer set programming, enable us to prove optimal certain networks which have been previously found by stochastic search. 1

5 0.076409139 268 nips-2013-Reflection methods for user-friendly submodular optimization

Author: Stefanie Jegelka, Francis Bach, Suvrit Sra

Abstract: Recently, it has become evident that submodularity naturally captures widely occurring concepts in machine learning, signal processing and computer vision. Consequently, there is need for efﬁcient optimization procedures for submodular functions, especially for minimization problems. While general submodular minimization is challenging, we propose a new method that exploits existing decomposability of submodular functions. In contrast to previous approaches, our method is neither approximate, nor impractical, nor does it need any cumbersome parameter tuning. Moreover, it is easy to implement and parallelize. A key component of our method is a formulation of the discrete submodular minimization problem as a continuous best approximation problem that is solved through a sequence of reﬂections, and its solution can be easily thresholded to obtain an optimal discrete solution. This method solves both the continuous and discrete formulations of the problem, and therefore has applications in learning, inference, and reconstruction. In our experiments, we illustrate the beneﬁts of our method on two image segmentation tasks. 1

6 0.065632097 207 nips-2013-Near-optimal Anomaly Detection in Graphs using Lovasz Extended Scan Statistic

7 0.061324697 56 nips-2013-Better Approximation and Faster Algorithm Using the Proximal Average

8 0.053059634 73 nips-2013-Convex Relaxations for Permutation Problems

9 0.04653598 282 nips-2013-Robust Multimodal Graph Matching: Sparse Coding Meets Graph Matching

10 0.04638236 11 nips-2013-A New Convex Relaxation for Tensor Completion

11 0.045059975 170 nips-2013-Learning with Invariance via Linear Functionals on Reproducing Kernel Hilbert Space

12 0.041899126 158 nips-2013-Learning Multiple Models via Regularized Weighting

13 0.041499771 222 nips-2013-On the Linear Convergence of the Proximal Gradient Method for Trace Norm Regularization

14 0.03794086 272 nips-2013-Regularized Spectral Clustering under the Degree-Corrected Stochastic Blockmodel

15 0.037643857 78 nips-2013-Curvature and Optimal Algorithms for Learning and Minimizing Submodular Functions

16 0.034635741 313 nips-2013-Stochastic Majorization-Minimization Algorithms for Large-Scale Optimization

17 0.030364772 192 nips-2013-Minimax Theory for High-dimensional Gaussian Mixtures with Sparse Mean Separation

18 0.029737648 75 nips-2013-Convex Two-Layer Modeling

19 0.029144783 87 nips-2013-Density estimation from unweighted k-nearest neighbor graphs: a roadmap

20 0.02903148 7 nips-2013-A Gang of Bandits

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.086), (1, 0.033), (2, 0.029), (3, 0.027), (4, 0.041), (5, 0.018), (6, -0.041), (7, -0.074), (8, 0.012), (9, -0.012), (10, -0.003), (11, -0.016), (12, 0.065), (13, -0.039), (14, 0.02), (15, 0.013), (16, -0.015), (17, 0.006), (18, -0.019), (19, -0.014), (20, 0.008), (21, 0.021), (22, 0.072), (23, -0.033), (24, -0.023), (25, -0.065), (26, 0.026), (27, 0.01), (28, -0.033), (29, 0.092), (30, -0.104), (31, 0.028), (32, 0.057), (33, 0.028), (34, -0.025), (35, -0.027), (36, 0.03), (37, -0.014), (38, -0.032), (39, 0.044), (40, 0.044), (41, 0.059), (42, 0.001), (43, 0.036), (44, -0.034), (45, 0.008), (46, -0.069), (47, -0.062), (48, 0.01), (49, 0.023)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.91017532 328 nips-2013-The Total Variation on Hypergraphs - Learning on Hypergraphs Revisited

Author: Matthias Hein, Simon Setzer, Leonardo Jost, Syama Sundar Rangapuram

2 0.76965702 202 nips-2013-Multiclass Total Variation Clustering

Author: Xavier Bresson, Thomas Laurent, David Uminsky, James von Brecht

3 0.6690622 35 nips-2013-Analyzing the Harmonic Structure in Graph-Based Learning

Author: Xiao-Ming Wu, Zhenguo Li, Shih-Fu Chang

4 0.62987709 207 nips-2013-Near-optimal Anomaly Detection in Graphs using Lovasz Extended Scan Statistic

Author: James L. Sharpnack, Akshay Krishnamurthy, Aarti Singh

Abstract: The detection of anomalous activity in graphs is a statistical problem that arises in many applications, such as network surveillance, disease outbreak detection, and activity monitoring in social networks. Beyond its wide applicability, graph structured anomaly detection serves as a case study in the difﬁculty of balancing computational complexity with statistical power. In this work, we develop from ﬁrst principles the generalized likelihood ratio test for determining if there is a well connected region of activation over the vertices in the graph in Gaussian noise. Because this test is computationally infeasible, we provide a relaxation, called the Lov´ sz extended scan statistic (LESS) that uses submodularity to approximate the a intractable generalized likelihood ratio. We demonstrate a connection between LESS and maximum a-posteriori inference in Markov random ﬁelds, which provides us with a poly-time algorithm for LESS. Using electrical network theory, we are able to control type 1 error for LESS and prove conditions under which LESS is risk consistent. Finally, we consider speciﬁc graph models, the torus, knearest neighbor graphs, and ǫ-random graphs. We show that on these graphs our results provide near-optimal performance by matching our results to known lower bounds. 1

5 0.56088412 249 nips-2013-Polar Operators for Structured Sparse Estimation

Author: Xinhua Zhang, Yao-Liang Yu, Dale Schuurmans

Abstract: Structured sparse estimation has become an important technique in many areas of data analysis. Unfortunately, these estimators normally create computational difﬁculties that entail sophisticated algorithms. Our ﬁrst contribution is to uncover a rich class of structured sparse regularizers whose polar operator can be evaluated efﬁciently. With such an operator, a simple conditional gradient method can then be developed that, when combined with smoothing and local optimization, significantly reduces training time vs. the state of the art. We also demonstrate a new reduction of polar to proximal maps that enables more efﬁcient latent fused lasso. 1

6 0.55763978 87 nips-2013-Density estimation from unweighted k-nearest neighbor graphs: a roadmap

7 0.55568969 350 nips-2013-Wavelets on Graphs via Deep Learning

8 0.55544287 56 nips-2013-Better Approximation and Faster Algorithm Using the Proximal Average

9 0.54946226 215 nips-2013-On Decomposing the Proximal Map

10 0.52568364 73 nips-2013-Convex Relaxations for Permutation Problems

11 0.52368248 282 nips-2013-Robust Multimodal Graph Matching: Sparse Coding Meets Graph Matching

12 0.51586711 182 nips-2013-Manifold-based Similarity Adaptation for Label Propagation

13 0.48558518 268 nips-2013-Reflection methods for user-friendly submodular optimization

14 0.46197852 25 nips-2013-Adaptive Anonymity via $b$-Matching

15 0.43957773 132 nips-2013-Global MAP-Optimality by Shrinking the Combinatorial Search Area with Convex Relaxation

16 0.42627558 151 nips-2013-Learning Chordal Markov Networks by Constraint Satisfaction

17 0.41951594 214 nips-2013-On Algorithms for Sparse Multi-factor NMF

18 0.41915476 272 nips-2013-Regularized Spectral Clustering under the Degree-Corrected Stochastic Blockmodel

19 0.4162125 222 nips-2013-On the Linear Convergence of the Proximal Gradient Method for Trace Norm Regularization

20 0.41403642 170 nips-2013-Learning with Invariance via Linear Functionals on Reproducing Kernel Hilbert Space

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(1, 0.296), (13, 0.01), (16, 0.023), (33, 0.127), (34, 0.14), (41, 0.033), (49, 0.016), (56, 0.059), (70, 0.026), (85, 0.038), (89, 0.023), (93, 0.034), (95, 0.033), (96, 0.011), (99, 0.032)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.74780357 112 nips-2013-Estimation Bias in Multi-Armed Bandit Algorithms for Search Advertising

Author: Min Xu, Tao Qin, Tie-Yan Liu

Abstract: In search advertising, the search engine needs to select the most proﬁtable advertisements to display, which can be formulated as an instance of online learning with partial feedback, also known as the stochastic multi-armed bandit (MAB) problem. In this paper, we show that the naive application of MAB algorithms to search advertising for advertisement selection will produce sample selection bias that harms the search engine by decreasing expected revenue and “estimation of the largest mean” (ELM) bias that harms the advertisers by increasing game-theoretic player-regret. We then propose simple bias-correction methods with beneﬁts to both the search engine and the advertisers. 1

same-paper 2 0.73923069 328 nips-2013-The Total Variation on Hypergraphs - Learning on Hypergraphs Revisited

Author: Matthias Hein, Simon Setzer, Leonardo Jost, Syama Sundar Rangapuram

3 0.69502819 324 nips-2013-The Fast Convergence of Incremental PCA

Author: Akshay Balsubramani, Sanjoy Dasgupta, Yoav Freund

Abstract: We consider a situation in which we see samples Xn ∈ Rd drawn i.i.d. from some distribution with mean zero and unknown covariance A. We wish to compute the top eigenvector of A in an incremental fashion - with an algorithm that maintains an estimate of the top eigenvector in O(d) space, and incrementally adjusts the estimate with each new data point that arrives. Two classical such schemes are due to Krasulina (1969) and Oja (1983). We give ﬁnite-sample convergence rates for both. 1

4 0.62902981 235 nips-2013-Online learning in episodic Markovian decision processes by relative entropy policy search

Author: Alexander Zimin, Gergely Neu

Abstract: We study the problem of online learning in ﬁnite episodic Markov decision processes (MDPs) where the loss function is allowed to change between episodes. The natural performance measure in this learning problem is the regret deﬁned as the difference between the total loss of the best stationary policy and the total loss suffered by the learner. We assume that the learner is given access to a ﬁnite action space A and the state space X has a layered structure with L layers, so that state transitions are only possible between consecutive layers. We describe a variant of the recently proposed Relative Entropy Policy Search algorithm and show that its regret after T episodes is 2 L|X ||A|T log(|X ||A|/L) in the bandit setting and 2L T log(|X ||A|/L) in the full information setting, given that the learner has perfect knowledge of the transition probabilities of the underlying MDP. These guarantees largely improve previously known results under much milder assumptions and cannot be signiﬁcantly improved under general assumptions. 1

5 0.62709337 333 nips-2013-Trading Computation for Communication: Distributed Stochastic Dual Coordinate Ascent

Author: Tianbao Yang

Abstract: We present and study a distributed optimization algorithm by employing a stochastic dual coordinate ascent method. Stochastic dual coordinate ascent methods enjoy strong theoretical guarantees and often have better performances than stochastic gradient descent methods in optimizing regularized loss minimization problems. It still lacks of efforts in studying them in a distributed framework. We make a progress along the line by presenting a distributed stochastic dual coordinate ascent algorithm in a star network, with an analysis of the tradeoff between computation and communication. We verify our analysis by experiments on real data sets. Moreover, we compare the proposed algorithm with distributed stochastic gradient descent methods and distributed alternating direction methods of multipliers for optimizing SVMs in the same distributed framework, and observe competitive performances. 1

6 0.56327707 100 nips-2013-Dynamic Clustering via Asymptotics of the Dependent Dirichlet Process Mixture

7 0.56170803 173 nips-2013-Least Informative Dimensions

8 0.55957443 286 nips-2013-Robust learning of low-dimensional dynamics from large neural ensembles

9 0.55842435 201 nips-2013-Multi-Task Bayesian Optimization

10 0.55733943 346 nips-2013-Variational Inference for Mahalanobis Distance Metrics in Gaussian Process Regression

11 0.55727202 239 nips-2013-Optimistic policy iteration and natural actor-critic: A unifying view and a non-optimality result

12 0.55674452 143 nips-2013-Integrated Non-Factorized Variational Inference

13 0.55521226 53 nips-2013-Bayesian inference for low rank spatiotemporal neural receptive fields

14 0.55508202 294 nips-2013-Similarity Component Analysis

15 0.55388796 169 nips-2013-Learning to Prune in Metric and Non-Metric Spaces

16 0.55356163 312 nips-2013-Stochastic Gradient Riemannian Langevin Dynamics on the Probability Simplex

17 0.55335313 97 nips-2013-Distributed Submodular Maximization: Identifying Representative Elements in Massive Data

18 0.55158478 152 nips-2013-Learning Efficient Random Maximum A-Posteriori Predictors with Non-Decomposable Loss Functions

19 0.55152375 115 nips-2013-Factorized Asymptotic Bayesian Inference for Latent Feature Models

20 0.55024439 105 nips-2013-Efficient Optimization for Sparse Gaussian Process Regression