nips nips2013 nips2013-328 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Matthias Hein, Simon Setzer, Leonardo Jost, Syama Sundar Rangapuram
Abstract: Hypergraphs allow one to encode higher-order relationships in data and are thus a very flexible modeling tool. Current learning methods are either based on approximations of the hypergraphs via graphs or on tensor methods which are only applicable under special conditions. In this paper, we present a new learning framework on hypergraphs which fully uses the hypergraph structure. The key element is a family of regularization functionals based on the total variation on hypergraphs. 1
Reference: text
sentIndex sentText sentNum sentScore
1 Current learning methods are either based on approximations of the hypergraphs via graphs or on tensor methods which are only applicable under special conditions. [sent-2, score-0.436]
2 In this paper, we present a new learning framework on hypergraphs which fully uses the hypergraph structure. [sent-3, score-1.072]
3 The key element is a family of regularization functionals based on the total variation on hypergraphs. [sent-4, score-0.221]
4 The first one uses tensor methods for clustering as the higher-order extension of matrix (spectral) methods for graphs [7, 8, 9]. [sent-9, score-0.19]
5 While tensor methods are mathematically quite appealing, they are limited to socalled k-uniform hypergraphs, that is, each hyperedge contains exactly k vertices. [sent-10, score-0.246]
6 The second main approach can deal with arbitrary hypergraphs [10, 11]. [sent-12, score-0.343]
7 The basic idea of this line of work is to approximate the hypergraph via a standard weighted graph. [sent-13, score-0.73]
8 The two main ways of approximating the hypergraph by a standard graph are the clique and the star expansion which were compared in [12]. [sent-15, score-1.053]
9 One can summarize [12] by stating that no approximation fully encodes the hypergraph structure. [sent-16, score-0.729]
10 Earlier, [13] have proven that an exact representation of the hypergraph via a graph retaining its cut properties is impossible. [sent-17, score-1.039]
11 For both clustering and semisupervised learning the key element, either explicitly or implicitly, is the cut functional. [sent-19, score-0.356]
12 Our aim is to directly work with the cut defined on the hypergraph. [sent-20, score-0.273]
13 We discuss in detail the differences of the hypergraph cut and the cut induced by the clique and star expansion in Section 2. [sent-21, score-1.546]
14 2, we introduce the total variation on a hypergraph as the Lovasz extension of the hypergraph cut. [sent-24, score-1.571]
15 Based on this, we propose a family of regularization functionals which interpolate between the total variation and a regularization functional enforcing smoother functions on the hypergraph corresponding to Laplacian-type regularization on graphs. [sent-25, score-1.027]
16 In Section 4, we show in line of recent research [14, 15, 16, 17] that there exists a tight relaxation of the normalized hypergraph cut. [sent-27, score-0.801]
17 In the experimental section 6, we show that fully incorporating hypergraph structure is beneficial. [sent-30, score-0.729]
18 2, we introduce in analogy to graphs, the total variation on hypergraphs as the Lovasz extension of the hypergraph cut. [sent-36, score-1.215]
19 In this paper, we consider weighted undirected hypergraphs H = (V, E, w) where V is the vertex set with |V | = n and E the set of hyperedges with |E| = m. [sent-39, score-0.479]
20 Each hyperedge e ∈ E corresponds to a subset of vertices, i. [sent-40, score-0.224]
21 The vector w ∈ Rm contains for each hyperedge e its non-negative weight we . [sent-43, score-0.246]
22 We would like to emphasize that we do not impose the restriction that the hypergraph is k-uniform, i. [sent-47, score-0.713]
23 The considered class of hypergraphs contains the set of undirected, weighted graphs which is equivalent to the set of 2-uniform hypergraphs. [sent-50, score-0.431]
24 The motivation for the total variation on hypergraphs comes from the correspondence between the cut on a graph and the total variation functional. [sent-51, score-0.897]
25 Thus, we recall the definition of the cut on weighted graphs G = (V, W ) with weight matrix W . [sent-52, score-0.383]
26 Then, for a partition (C, C), the cut is defined as cutG (C, C) = i,j : i∈C,j∈C wij . [sent-54, score-0.311]
27 This standard definition of the cut carries over naturally to a hypergraph H cutH (C, C) = e∈E: we . [sent-55, score-1.012]
28 (1) e∩C=∅, e∩C=∅ Thus, the cut functional on a hypergraph is just the sum of the weights of the hyperedges which have vertices both in C and C. [sent-56, score-1.136]
29 It is not biased towards a particular way the hyperedge is cut, that is, how many vertices of the hyperedge are in C resp. [sent-57, score-0.479]
30 This emphasizes that the vertices in a hyperedge belong together and we penalize every cut of a hyperedge with the same value. [sent-59, score-0.752]
31 In order to handle hypergraphs with existing methods developed for graphs, the focus in previous works [11, 12] has been on transforming the hypergraph into a graph. [sent-60, score-1.056]
32 In [11], they suggest using the clique expansion (CE), i. [sent-61, score-0.254]
33 , every hyperedge e ∈ H is replaced with a fully connected subgraph where every edge in this subgraph has weight we . [sent-63, score-0.316]
34 This leads to the cut functional cutCE , |e| cutCE (C, C) := e∈E: e∩C=∅, e∩C=∅ we |e ∩ C| |e ∩ C|. [sent-64, score-0.319]
35 |e| (2) Note that in contrast to the hypergraph cut (1), the value of cutCE depends on the way each hyperedge is cut since the term |e ∩ C| |e ∩ C| makes the weights dependent on the partition. [sent-65, score-1.483]
36 In particular, the smallest weight is attained if only a single vertex is split off, whereas the largest weight is attained if the partition of the hyperedge is most balanced. [sent-66, score-0.343]
37 In comparison to the hypergraph cut, this leads to a bias towards cuts that favor splitting off single vertices from a hyperedge which in our point of view is an undesired property for most applications. [sent-67, score-1.074]
38 We illustrate this with an example in Figure 1, where the minimum hypergraph cut (cutH ) leads to a balanced partition, whereas the minimum clique expansion cut (cutCE ) not only cuts an additional hyperedge but is also unbalanced. [sent-68, score-1.896]
39 Another argument against the clique expansion is computational complexity. [sent-70, score-0.254]
40 For large hyperedges the clique expansion leads to (almost) fully connected graphs which makes computations slow and is prohibitive for large hypergraphs. [sent-71, score-0.466]
41 We omit the discussion of the star graph approximation of hypergraphs discussed in [12] as it is shown there that the star graph expansion is very similar to the clique expansion. [sent-72, score-0.769]
42 [13] which states that in general there exists no graph with the same vertex set V which has for every partition (C, C) the same cut value as the hypergraph cut. [sent-74, score-1.086]
43 minimum cut of the clique expansion cutCE : For edge weights w1 = w4 = 10, w2 = w5 = 0. [sent-76, score-0.555]
44 6 the minimum hypergraph cut is (C1 , C 1 ) which is perfectly balanced. [sent-78, score-1.0]
45 Although cutting one hyperedge more and being unbalanced, (C2 , C 2 ) is the optimal cut for the clique expansion approximation. [sent-79, score-0.767]
46 Finally, note that for weighted 3-uniform hypergraphs it is always possible to find a corresponding graph such that any cut of the graph is equal to the corresponding cut of the hypergraph. [sent-80, score-1.012]
47 Then, W ∈ 1 R|V |×|V | defined as W = 2 Hdiag(w)H T defines the weight matrix of a graph G = (V, W ) where each cut of G has the same value as the corresponding hypergraph cut of H. [sent-84, score-1.334]
48 For graphs G = (V, W ), the total variation on graphs is defined as the Lovasz extension of the n 1 graph cut [18] given as T VG : R|V | → R, TVG (f ) = 2 i,j=1 wij |fi − fj |. [sent-98, score-0.689]
49 The total variation TVH : R|V | → R on a hypergraph H = (V, E, w) defined as ˆ the Lovasz extension of the hypergraph cut, S(C) = cutH (C, C), is a convex function given by we max fi − min fj = TVH (f ) = e∈E i∈e j∈e we max |fi − fj |. [sent-101, score-1.801]
50 e∈E i,j∈e Note that the total variation of a hypergraph cut reduces to the total variation on graphs if H is 2-uniform (standard graph). [sent-102, score-1.285]
51 There is an interesting relation of the total variation on hypergraphs to sparsity inducing group norms. [sent-103, score-0.457]
52 Namely, defining for each edge e ∈ E the difference operator De : R|V | → R|V |×|V | by (De f )ij = fi − fj if i, j ∈ e and 0 otherwise, TVH can be written as, TVH (f ) = e∈E we De f ∞ , which can be seen as inducing group sparse structure on the gradient level. [sent-104, score-0.148]
53 It is known that using the total variation on graphs as a regularization functional in semi-supervised learning (SSL) leads to very spiky solutions for small numbers of labeled points. [sent-107, score-0.307]
54 For graphs this is achieved by using the family of regularization functionals ΩG,p : R|V | → R, n ΩG,p (f ) = 1 wij |fi − fj |p . [sent-109, score-0.254]
55 The regularization functionals ΩH,p : R|V | → R for a hypergraph H = (V, E, w) are defined for p ≥ 1 as p we max fi − min fj ΩH,p (f ) = i∈e e∈E j∈e . [sent-114, score-0.961]
56 Thus, the difference between the hypergraph cut and its approximations such as clique and star expansion carries over to ΩH,p and ΩGCE ,p , respectively. [sent-121, score-1.313]
57 3 Semi-supervised Learning With the regularization functionals derived in the last section, we can immediately write down a formulation for two-class semi-supervised learning on hypergraphs similar to the well-known approaches of [19, 20]. [sent-122, score-0.45]
58 This is due to the effect that cutting “out” the labeled points leads to a much smaller cut than, e. [sent-131, score-0.328]
59 1, we discussed the difference between the hypergraph cut (1) and the graph cut of the clique expansion (2) of the hypergraph and gave a simple example in Figure 1 where these cuts yield quite different results. [sent-136, score-2.363]
60 Clearly, this difference carries over to the famous normalized cut criterion introduced in [21, 22] for clustering of graphs with applications in image segmentation. [sent-137, score-0.491]
61 normalized cut can be formulated as RCut(C, C) = cutH (C, C) , |C||C| NCut(C, C) = cutH (C, C) , vol(C) vol(C) which incorporate different balancing criteria. [sent-139, score-0.345]
62 Note, that in contrast to the normalized cut for graphs the normalized hypergraph cut allows no relaxation into a linear eigenproblem (spectral relaxation). [sent-140, score-1.468]
63 Thus, we follow a recent line of research [14, 15, 16, 17] where it has been shown that the standard spectral relaxation of the normalized cut used in spectral clustering [22] is loose and that a tight, in fact exact, relaxation can be formulated in terms of a nonlinear eigenproblem. [sent-141, score-0.545]
64 In this section, we extend their approach to hypergraphs and consider general H (C,C) ˆ balanced hypergraph cuts Bcut(C, C) of the form, Bcut(C, C) = cutS(C) , where S : 2V → R+ ˆ ˆ ˆ is a non-negative, symmetric set function (that is S(C) = S(C)). [sent-144, score-1.168]
65 For the normalized cut one has 4 ˆ ˆ S(C) = vol(C) vol(C) whereas for the Cheeger cut one has S(C) = min{vol C, vol C}. [sent-145, score-0.657]
66 Our following result shows that the balanced hypergraph cut also has an exact relaxation into a continuous nonlinear eigenproblem [14]. [sent-147, score-1.1]
67 Let H = (V, E, w) be a finite, weighted hypergraph and S : R|V | → R be the Lovasz ˆ extension of the symmetric, non-negative set function S : 2V → R. [sent-150, score-0.761]
68 Then, it holds that e∈E we max fi − min fj i∈e min j∈e S(f ) f ∈R|V | = min C⊂V cutH (C, C) . [sent-151, score-0.183]
69 Then, min t∈R cutH (Ct , Ct ) ≤ ˆ S(Ct ) e∈E we max fi − min fj i∈e j∈e S(f ) . [sent-153, score-0.162]
70 The last part of the theorem shows that “optimal thresholding” (turning f ∈ RV into a partition) among all level sets of any f ∈ R|V | can only lead to a better or equal balanced hypergraph cut. [sent-154, score-0.755]
71 Note that the balancing functions of ratio/normalized cut and Cheeger cut are submodular. [sent-172, score-0.577]
72 Let me be the number of vertices in hyperedge e ∈ E. [sent-193, score-0.255]
73 (k+1) + α(e,2) (k+1) )) The value ci = e∈E Hi,e is the number of hyperedges the vertex i lies in. [sent-204, score-0.149]
74 Moreover, we introduce for every hyperedge e ∈ E the functional Fe (αe ) = λwe (max(αe ) − min(αe ))2 . [sent-208, score-0.251]
75 As we show in the supplementary material, the ∗ conjugate functions Fe are not indicator functions and we thus solve the corresponding proximal problems via proximal problems for Fe . [sent-210, score-0.16]
76 Note that the clique expansion leads for all datasets to a graph which is close to being fully connected as all datasets contain large hyperedges. [sent-215, score-0.384]
77 For covertype (6,7) the weight matrix needs over 10GB of memory, the original hypergraph only 4MB. [sent-216, score-0.828]
78 For any σ > 0 and any αe ∈ Rme the proximal map ˜ 1 prox σ Fe (˜ e ) = arg min{ α αe ∈Rme 1 e α − αe ˜ 2 2 2 + 1 λwe (max(αe ) − min(αe ))2 } σ can be computed with O(me log me ) arithmetic operations. [sent-219, score-0.14]
79 As in [11], a hyperedge of weight one is created by all data points which have the same value of a categorical feature. [sent-227, score-0.278]
80 In [11], they suggest using a regularizer induced by the normalized Laplacian LCE arising from the clique expansion −1 −1 2 2 LCE = I − DCE HW H T DCE , we |E|×|E| where DCE is a diagonal matrix with entries dEC (i) = is a e∈E Hi,e |e| and W ∈ R diagonal matrix with entries w (e) = we /|e|. [sent-231, score-0.309]
81 For the largest example (covertype 6,7), where the clique expansion fails due to memory problems, our method takes 30-100s (depending on λ). [sent-236, score-0.254]
82 Our SSL methods based on ΩH,p , p = 1, 2 outperform consistently the clique expansion technique of Zhou et al [11] on all datasets except 20newsgroups3 . [sent-243, score-0.268]
83 A method based on pairwise interaction such as the clique expansion can better deal 2 This is a modified version by Sam Roweis of the original 20 newsgroups dataset available at http: //www. [sent-246, score-0.27]
84 3 Communications with the authors of [11] could not clarify the difference to their results on 20newsgroups 7 with such label noise as the large hyperedges for this dataset accumulate the label noise. [sent-251, score-0.136]
85 On all other datasets we observe that incorporating hypergraph structure leads to much better results. [sent-252, score-0.746]
86 As expected our squared TV functional (p = 2) outperforms slightly the total variation (p = 1) even though the difference is small. [sent-253, score-0.155]
87 We use the normalized hypergraph cut as clustering objective. [sent-480, score-1.093]
88 For more than two clusters we recursively partition the hypergraph until the desired number of clusters is reached. [sent-481, score-0.733]
89 For comparison we use the normalized spectral clustering approach based on the Laplacian LCE [11](clique expansion). [sent-482, score-0.141]
90 The first part (first 6 columns) of the following table shows the clustering errors (majority vote on each cluster) of both methods as well as the normalized cuts achieved by these methods on the hypergraph and on the graph resulting from the clique expansion. [sent-483, score-1.103]
91 The number k is chosen as the smallest number for which the graph becomes connected and we compare results of normalized 1-spectral clustering [14] and the standard spectral clustering [22]. [sent-485, score-0.274]
92 Note that the employed hypergraph construction has no free parameter. [sent-486, score-0.713]
93 Dataset Mushrooms Zoo 20-newsgroup covertype (4,5) covertype (6,7) Clustering Error % Ours [11] 10. [sent-487, score-0.186]
94 0041 First, we observe that our approach optimizing the normalized hypergraph cut yields better or similar results in terms of clustering errors compared to the clique expansion (except for 20-newsgroup for the same reason given in the previous paragraph). [sent-532, score-1.347]
95 Moreover, our method sometimes has even smaller cuts on the graphs resulting from the clique expansion, although it does not directly optimize this objective in contrast to [11]. [sent-535, score-0.301]
96 Second, the comparison to a standard graph-based approach where the similarity structure is obtained using the Hamming distance on the categorical features shows that using hypergraph structure is indeed useful. [sent-537, score-0.745]
97 Nevertheless, we think that there is room for improvement regarding the construction of the hypergraph which is a topic for future research. [sent-538, score-0.713]
98 Modeling hypergraphs by graphs with the same mincut properties. [sent-629, score-0.414]
99 An inverse power method for nonlinear eigenproblems with applications in 1u spectral clustering and sparse PCA. [sent-634, score-0.139]
100 Beyond spectral clustering - tight relaxations of balanced graph cuts. [sent-644, score-0.209]
wordName wordTfidf (topN-words)
[('hypergraph', 0.713), ('hypergraphs', 0.343), ('cut', 0.273), ('hyperedge', 0.224), ('clique', 0.16), ('cuth', 0.119), ('lovasz', 0.119), ('tvh', 0.106), ('ssl', 0.097), ('expansion', 0.094), ('covertype', 0.093), ('hyperedges', 0.092), ('pdhg', 0.092), ('variation', 0.088), ('functionals', 0.074), ('proximal', 0.073), ('graphs', 0.071), ('fe', 0.071), ('vol', 0.07), ('cuts', 0.07), ('ke', 0.066), ('cutce', 0.066), ('lce', 0.066), ('clustering', 0.066), ('fi', 0.062), ('fj', 0.058), ('zoo', 0.058), ('prox', 0.053), ('graph', 0.053), ('mushrooms', 0.053), ('ratiodca', 0.053), ('rme', 0.053), ('ncut', 0.047), ('zhou', 0.046), ('balanced', 0.042), ('normalized', 0.041), ('dce', 0.04), ('spectral', 0.034), ('relaxation', 0.033), ('star', 0.033), ('regularization', 0.033), ('categorical', 0.032), ('cheeger', 0.032), ('laplacian', 0.031), ('extension', 0.031), ('convex', 0.031), ('vertices', 0.031), ('balancing', 0.031), ('ci', 0.03), ('hein', 0.029), ('vertex', 0.027), ('functional', 0.027), ('bcut', 0.026), ('setzer', 0.026), ('total', 0.026), ('carries', 0.026), ('proxg', 0.023), ('spiky', 0.023), ('eigenproblems', 0.023), ('rangapuram', 0.023), ('eigenproblem', 0.023), ('submodular', 0.022), ('tensor', 0.022), ('weight', 0.022), ('ct', 0.021), ('min', 0.021), ('partition', 0.02), ('labeled', 0.02), ('leads', 0.019), ('wij', 0.018), ('weighted', 0.017), ('inner', 0.017), ('splitting', 0.017), ('ihler', 0.017), ('semisupervised', 0.017), ('music', 0.016), ('nonlinear', 0.016), ('fully', 0.016), ('cutting', 0.016), ('pairwise', 0.016), ('relations', 0.015), ('loose', 0.015), ('positively', 0.015), ('label', 0.015), ('recommend', 0.015), ('datasets', 0.014), ('regularizer', 0.014), ('ps', 0.014), ('minimum', 0.014), ('arg', 0.014), ('connected', 0.014), ('imaging', 0.014), ('duality', 0.014), ('attained', 0.014), ('analogy', 0.014), ('supplementary', 0.014), ('difference', 0.014), ('edge', 0.014), ('tight', 0.014), ('subgraph', 0.013)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000001 328 nips-2013-The Total Variation on Hypergraphs - Learning on Hypergraphs Revisited
Author: Matthias Hein, Simon Setzer, Leonardo Jost, Syama Sundar Rangapuram
Abstract: Hypergraphs allow one to encode higher-order relationships in data and are thus a very flexible modeling tool. Current learning methods are either based on approximations of the hypergraphs via graphs or on tensor methods which are only applicable under special conditions. In this paper, we present a new learning framework on hypergraphs which fully uses the hypergraph structure. The key element is a family of regularization functionals based on the total variation on hypergraphs. 1
2 0.097404532 35 nips-2013-Analyzing the Harmonic Structure in Graph-Based Learning
Author: Xiao-Ming Wu, Zhenguo Li, Shih-Fu Chang
Abstract: We find that various well-known graph-based models exhibit a common important harmonic structure in its target function – the value of a vertex is approximately the weighted average of the values of its adjacent neighbors. Understanding of such structure and analysis of the loss defined over such structure help reveal important properties of the target function over a graph. In this paper, we show that the variation of the target function across a cut can be upper and lower bounded by the ratio of its harmonic loss and the cut cost. We use this to develop an analytical tool and analyze five popular graph-based models: absorbing random walks, partially absorbing random walks, hitting times, pseudo-inverse of the graph Laplacian, and eigenvectors of the Laplacian matrices. Our analysis sheds new insights into several open questions related to these models, and provides theoretical justifications and guidelines for their practical use. Simulations on synthetic and real datasets confirm the potential of the proposed theory and tool.
3 0.09636759 202 nips-2013-Multiclass Total Variation Clustering
Author: Xavier Bresson, Thomas Laurent, David Uminsky, James von Brecht
Abstract: Ideas from the image processing literature have recently motivated a new set of clustering algorithms that rely on the concept of total variation. While these algorithms perform well for bi-partitioning tasks, their recursive extensions yield unimpressive results for multiclass clustering tasks. This paper presents a general framework for multiclass total variation clustering that does not rely on recursion. The results greatly outperform previous total variation algorithms and compare well with state-of-the-art NMF approaches. 1
4 0.083679378 151 nips-2013-Learning Chordal Markov Networks by Constraint Satisfaction
Author: Jukka Corander, Tomi Janhunen, Jussi Rintanen, Henrik Nyman, Johan Pensar
Abstract: We investigate the problem of learning the structure of a Markov network from data. It is shown that the structure of such networks can be described in terms of constraints which enables the use of existing solver technology with optimization capabilities to compute optimal networks starting from initial scores computed from the data. To achieve efficient encodings, we develop a novel characterization of Markov network structure using a balancing condition on the separators between cliques forming the network. The resulting translations into propositional satisfiability and its extensions such as maximum satisfiability, satisfiability modulo theories, and answer set programming, enable us to prove optimal certain networks which have been previously found by stochastic search. 1
5 0.076409139 268 nips-2013-Reflection methods for user-friendly submodular optimization
Author: Stefanie Jegelka, Francis Bach, Suvrit Sra
Abstract: Recently, it has become evident that submodularity naturally captures widely occurring concepts in machine learning, signal processing and computer vision. Consequently, there is need for efficient optimization procedures for submodular functions, especially for minimization problems. While general submodular minimization is challenging, we propose a new method that exploits existing decomposability of submodular functions. In contrast to previous approaches, our method is neither approximate, nor impractical, nor does it need any cumbersome parameter tuning. Moreover, it is easy to implement and parallelize. A key component of our method is a formulation of the discrete submodular minimization problem as a continuous best approximation problem that is solved through a sequence of reflections, and its solution can be easily thresholded to obtain an optimal discrete solution. This method solves both the continuous and discrete formulations of the problem, and therefore has applications in learning, inference, and reconstruction. In our experiments, we illustrate the benefits of our method on two image segmentation tasks. 1
6 0.065632097 207 nips-2013-Near-optimal Anomaly Detection in Graphs using Lovasz Extended Scan Statistic
7 0.061324697 56 nips-2013-Better Approximation and Faster Algorithm Using the Proximal Average
8 0.053059634 73 nips-2013-Convex Relaxations for Permutation Problems
9 0.04653598 282 nips-2013-Robust Multimodal Graph Matching: Sparse Coding Meets Graph Matching
10 0.04638236 11 nips-2013-A New Convex Relaxation for Tensor Completion
11 0.045059975 170 nips-2013-Learning with Invariance via Linear Functionals on Reproducing Kernel Hilbert Space
12 0.041899126 158 nips-2013-Learning Multiple Models via Regularized Weighting
13 0.041499771 222 nips-2013-On the Linear Convergence of the Proximal Gradient Method for Trace Norm Regularization
14 0.03794086 272 nips-2013-Regularized Spectral Clustering under the Degree-Corrected Stochastic Blockmodel
15 0.037643857 78 nips-2013-Curvature and Optimal Algorithms for Learning and Minimizing Submodular Functions
16 0.034635741 313 nips-2013-Stochastic Majorization-Minimization Algorithms for Large-Scale Optimization
17 0.030364772 192 nips-2013-Minimax Theory for High-dimensional Gaussian Mixtures with Sparse Mean Separation
18 0.029737648 75 nips-2013-Convex Two-Layer Modeling
19 0.029144783 87 nips-2013-Density estimation from unweighted k-nearest neighbor graphs: a roadmap
20 0.02903148 7 nips-2013-A Gang of Bandits
topicId topicWeight
[(0, 0.086), (1, 0.033), (2, 0.029), (3, 0.027), (4, 0.041), (5, 0.018), (6, -0.041), (7, -0.074), (8, 0.012), (9, -0.012), (10, -0.003), (11, -0.016), (12, 0.065), (13, -0.039), (14, 0.02), (15, 0.013), (16, -0.015), (17, 0.006), (18, -0.019), (19, -0.014), (20, 0.008), (21, 0.021), (22, 0.072), (23, -0.033), (24, -0.023), (25, -0.065), (26, 0.026), (27, 0.01), (28, -0.033), (29, 0.092), (30, -0.104), (31, 0.028), (32, 0.057), (33, 0.028), (34, -0.025), (35, -0.027), (36, 0.03), (37, -0.014), (38, -0.032), (39, 0.044), (40, 0.044), (41, 0.059), (42, 0.001), (43, 0.036), (44, -0.034), (45, 0.008), (46, -0.069), (47, -0.062), (48, 0.01), (49, 0.023)]
simIndex simValue paperId paperTitle
same-paper 1 0.91017532 328 nips-2013-The Total Variation on Hypergraphs - Learning on Hypergraphs Revisited
Author: Matthias Hein, Simon Setzer, Leonardo Jost, Syama Sundar Rangapuram
Abstract: Hypergraphs allow one to encode higher-order relationships in data and are thus a very flexible modeling tool. Current learning methods are either based on approximations of the hypergraphs via graphs or on tensor methods which are only applicable under special conditions. In this paper, we present a new learning framework on hypergraphs which fully uses the hypergraph structure. The key element is a family of regularization functionals based on the total variation on hypergraphs. 1
2 0.76965702 202 nips-2013-Multiclass Total Variation Clustering
Author: Xavier Bresson, Thomas Laurent, David Uminsky, James von Brecht
Abstract: Ideas from the image processing literature have recently motivated a new set of clustering algorithms that rely on the concept of total variation. While these algorithms perform well for bi-partitioning tasks, their recursive extensions yield unimpressive results for multiclass clustering tasks. This paper presents a general framework for multiclass total variation clustering that does not rely on recursion. The results greatly outperform previous total variation algorithms and compare well with state-of-the-art NMF approaches. 1
3 0.6690622 35 nips-2013-Analyzing the Harmonic Structure in Graph-Based Learning
Author: Xiao-Ming Wu, Zhenguo Li, Shih-Fu Chang
Abstract: We find that various well-known graph-based models exhibit a common important harmonic structure in its target function – the value of a vertex is approximately the weighted average of the values of its adjacent neighbors. Understanding of such structure and analysis of the loss defined over such structure help reveal important properties of the target function over a graph. In this paper, we show that the variation of the target function across a cut can be upper and lower bounded by the ratio of its harmonic loss and the cut cost. We use this to develop an analytical tool and analyze five popular graph-based models: absorbing random walks, partially absorbing random walks, hitting times, pseudo-inverse of the graph Laplacian, and eigenvectors of the Laplacian matrices. Our analysis sheds new insights into several open questions related to these models, and provides theoretical justifications and guidelines for their practical use. Simulations on synthetic and real datasets confirm the potential of the proposed theory and tool.
4 0.62987709 207 nips-2013-Near-optimal Anomaly Detection in Graphs using Lovasz Extended Scan Statistic
Author: James L. Sharpnack, Akshay Krishnamurthy, Aarti Singh
Abstract: The detection of anomalous activity in graphs is a statistical problem that arises in many applications, such as network surveillance, disease outbreak detection, and activity monitoring in social networks. Beyond its wide applicability, graph structured anomaly detection serves as a case study in the difficulty of balancing computational complexity with statistical power. In this work, we develop from first principles the generalized likelihood ratio test for determining if there is a well connected region of activation over the vertices in the graph in Gaussian noise. Because this test is computationally infeasible, we provide a relaxation, called the Lov´ sz extended scan statistic (LESS) that uses submodularity to approximate the a intractable generalized likelihood ratio. We demonstrate a connection between LESS and maximum a-posteriori inference in Markov random fields, which provides us with a poly-time algorithm for LESS. Using electrical network theory, we are able to control type 1 error for LESS and prove conditions under which LESS is risk consistent. Finally, we consider specific graph models, the torus, knearest neighbor graphs, and ǫ-random graphs. We show that on these graphs our results provide near-optimal performance by matching our results to known lower bounds. 1
5 0.56088412 249 nips-2013-Polar Operators for Structured Sparse Estimation
Author: Xinhua Zhang, Yao-Liang Yu, Dale Schuurmans
Abstract: Structured sparse estimation has become an important technique in many areas of data analysis. Unfortunately, these estimators normally create computational difficulties that entail sophisticated algorithms. Our first contribution is to uncover a rich class of structured sparse regularizers whose polar operator can be evaluated efficiently. With such an operator, a simple conditional gradient method can then be developed that, when combined with smoothing and local optimization, significantly reduces training time vs. the state of the art. We also demonstrate a new reduction of polar to proximal maps that enables more efficient latent fused lasso. 1
6 0.55763978 87 nips-2013-Density estimation from unweighted k-nearest neighbor graphs: a roadmap
7 0.55568969 350 nips-2013-Wavelets on Graphs via Deep Learning
8 0.55544287 56 nips-2013-Better Approximation and Faster Algorithm Using the Proximal Average
9 0.54946226 215 nips-2013-On Decomposing the Proximal Map
10 0.52568364 73 nips-2013-Convex Relaxations for Permutation Problems
11 0.52368248 282 nips-2013-Robust Multimodal Graph Matching: Sparse Coding Meets Graph Matching
12 0.51586711 182 nips-2013-Manifold-based Similarity Adaptation for Label Propagation
13 0.48558518 268 nips-2013-Reflection methods for user-friendly submodular optimization
14 0.46197852 25 nips-2013-Adaptive Anonymity via $b$-Matching
15 0.43957773 132 nips-2013-Global MAP-Optimality by Shrinking the Combinatorial Search Area with Convex Relaxation
16 0.42627558 151 nips-2013-Learning Chordal Markov Networks by Constraint Satisfaction
17 0.41951594 214 nips-2013-On Algorithms for Sparse Multi-factor NMF
18 0.41915476 272 nips-2013-Regularized Spectral Clustering under the Degree-Corrected Stochastic Blockmodel
19 0.4162125 222 nips-2013-On the Linear Convergence of the Proximal Gradient Method for Trace Norm Regularization
20 0.41403642 170 nips-2013-Learning with Invariance via Linear Functionals on Reproducing Kernel Hilbert Space
topicId topicWeight
[(1, 0.296), (13, 0.01), (16, 0.023), (33, 0.127), (34, 0.14), (41, 0.033), (49, 0.016), (56, 0.059), (70, 0.026), (85, 0.038), (89, 0.023), (93, 0.034), (95, 0.033), (96, 0.011), (99, 0.032)]
simIndex simValue paperId paperTitle
1 0.74780357 112 nips-2013-Estimation Bias in Multi-Armed Bandit Algorithms for Search Advertising
Author: Min Xu, Tao Qin, Tie-Yan Liu
Abstract: In search advertising, the search engine needs to select the most profitable advertisements to display, which can be formulated as an instance of online learning with partial feedback, also known as the stochastic multi-armed bandit (MAB) problem. In this paper, we show that the naive application of MAB algorithms to search advertising for advertisement selection will produce sample selection bias that harms the search engine by decreasing expected revenue and “estimation of the largest mean” (ELM) bias that harms the advertisers by increasing game-theoretic player-regret. We then propose simple bias-correction methods with benefits to both the search engine and the advertisers. 1
same-paper 2 0.73923069 328 nips-2013-The Total Variation on Hypergraphs - Learning on Hypergraphs Revisited
Author: Matthias Hein, Simon Setzer, Leonardo Jost, Syama Sundar Rangapuram
Abstract: Hypergraphs allow one to encode higher-order relationships in data and are thus a very flexible modeling tool. Current learning methods are either based on approximations of the hypergraphs via graphs or on tensor methods which are only applicable under special conditions. In this paper, we present a new learning framework on hypergraphs which fully uses the hypergraph structure. The key element is a family of regularization functionals based on the total variation on hypergraphs. 1
3 0.69502819 324 nips-2013-The Fast Convergence of Incremental PCA
Author: Akshay Balsubramani, Sanjoy Dasgupta, Yoav Freund
Abstract: We consider a situation in which we see samples Xn ∈ Rd drawn i.i.d. from some distribution with mean zero and unknown covariance A. We wish to compute the top eigenvector of A in an incremental fashion - with an algorithm that maintains an estimate of the top eigenvector in O(d) space, and incrementally adjusts the estimate with each new data point that arrives. Two classical such schemes are due to Krasulina (1969) and Oja (1983). We give finite-sample convergence rates for both. 1
4 0.62902981 235 nips-2013-Online learning in episodic Markovian decision processes by relative entropy policy search
Author: Alexander Zimin, Gergely Neu
Abstract: We study the problem of online learning in finite episodic Markov decision processes (MDPs) where the loss function is allowed to change between episodes. The natural performance measure in this learning problem is the regret defined as the difference between the total loss of the best stationary policy and the total loss suffered by the learner. We assume that the learner is given access to a finite action space A and the state space X has a layered structure with L layers, so that state transitions are only possible between consecutive layers. We describe a variant of the recently proposed Relative Entropy Policy Search algorithm and show that its regret after T episodes is 2 L|X ||A|T log(|X ||A|/L) in the bandit setting and 2L T log(|X ||A|/L) in the full information setting, given that the learner has perfect knowledge of the transition probabilities of the underlying MDP. These guarantees largely improve previously known results under much milder assumptions and cannot be significantly improved under general assumptions. 1
5 0.62709337 333 nips-2013-Trading Computation for Communication: Distributed Stochastic Dual Coordinate Ascent
Author: Tianbao Yang
Abstract: We present and study a distributed optimization algorithm by employing a stochastic dual coordinate ascent method. Stochastic dual coordinate ascent methods enjoy strong theoretical guarantees and often have better performances than stochastic gradient descent methods in optimizing regularized loss minimization problems. It still lacks of efforts in studying them in a distributed framework. We make a progress along the line by presenting a distributed stochastic dual coordinate ascent algorithm in a star network, with an analysis of the tradeoff between computation and communication. We verify our analysis by experiments on real data sets. Moreover, we compare the proposed algorithm with distributed stochastic gradient descent methods and distributed alternating direction methods of multipliers for optimizing SVMs in the same distributed framework, and observe competitive performances. 1
6 0.56327707 100 nips-2013-Dynamic Clustering via Asymptotics of the Dependent Dirichlet Process Mixture
7 0.56170803 173 nips-2013-Least Informative Dimensions
8 0.55957443 286 nips-2013-Robust learning of low-dimensional dynamics from large neural ensembles
9 0.55842435 201 nips-2013-Multi-Task Bayesian Optimization
10 0.55733943 346 nips-2013-Variational Inference for Mahalanobis Distance Metrics in Gaussian Process Regression
11 0.55727202 239 nips-2013-Optimistic policy iteration and natural actor-critic: A unifying view and a non-optimality result
12 0.55674452 143 nips-2013-Integrated Non-Factorized Variational Inference
13 0.55521226 53 nips-2013-Bayesian inference for low rank spatiotemporal neural receptive fields
14 0.55508202 294 nips-2013-Similarity Component Analysis
15 0.55388796 169 nips-2013-Learning to Prune in Metric and Non-Metric Spaces
16 0.55356163 312 nips-2013-Stochastic Gradient Riemannian Langevin Dynamics on the Probability Simplex
17 0.55335313 97 nips-2013-Distributed Submodular Maximization: Identifying Representative Elements in Massive Data
18 0.55158478 152 nips-2013-Learning Efficient Random Maximum A-Posteriori Predictors with Non-Decomposable Loss Functions
19 0.55152375 115 nips-2013-Factorized Asymptotic Bayesian Inference for Latent Feature Models
20 0.55024439 105 nips-2013-Efficient Optimization for Sparse Gaussian Process Regression