nips nips2012 nips2012-337 knowledge-graph by maker-knowledge-mining

337 nips-2012-The Lovász ϑ function, SVMs and finding large dense subgraphs


Source: pdf

Author: Vinay Jethava, Anders Martinsson, Chiranjib Bhattacharyya, Devdatt Dubhashi

Abstract: The Lov´ sz ϑ function of a graph, a fundamental tool in combinatorial optimizaa tion and approximation algorithms, is computed by solving a SDP. In this paper we establish that the Lov´ sz ϑ function is equivalent to a kernel learning problem a related to one class SVM. This interesting connection opens up many opportunities bridging graph theoretic algorithms and machine learning. We show that there exist graphs, which we call SVM − ϑ graphs, on which the Lov´ sz ϑ function a can be approximated well by a one-class SVM. This leads to novel use of SVM techniques for solving algorithmic problems in large graphs e.g. identifying a √ 1 planted clique of size Θ( n) in a random graph G(n, 2 ). A classic approach for this problem involves computing the ϑ function, however it is not scalable due to SDP computation. We show that the random graph with a planted clique is an example of SVM − ϑ graph. As a consequence a SVM based approach easily identifies the clique in large graphs and is competitive with the state-of-the-art. We introduce the notion of common orthogonal labelling and show that it can be computed by solving a Multiple Kernel learning problem. It is further shown that such a labelling is extremely useful in identifying a large common dense subgraph in multiple graphs, which is known to be a computationally difficult problem. The proposed algorithm achieves an order of magnitude scalability compared to state of the art methods. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 The Lov´ sz ϑ function, SVMs and finding large dense a subgraphs Vinay Jethava ∗ Computer Science & Engineering Department, Chalmers University of Technology 412 96, Goteborg, SWEDEN jethava@chalmers. [sent-1, score-0.453]

2 se Abstract The Lov´ sz ϑ function of a graph, a fundamental tool in combinatorial optimizaa tion and approximation algorithms, is computed by solving a SDP. [sent-8, score-0.352]

3 In this paper we establish that the Lov´ sz ϑ function is equivalent to a kernel learning problem a related to one class SVM. [sent-9, score-0.299]

4 This interesting connection opens up many opportunities bridging graph theoretic algorithms and machine learning. [sent-10, score-0.204]

5 We show that there exist graphs, which we call SVM − ϑ graphs, on which the Lov´ sz ϑ function a can be approximated well by a one-class SVM. [sent-11, score-0.317]

6 This leads to novel use of SVM techniques for solving algorithmic problems in large graphs e. [sent-12, score-0.267]

7 identifying a √ 1 planted clique of size Θ( n) in a random graph G(n, 2 ). [sent-14, score-0.902]

8 We show that the random graph with a planted clique is an example of SVM − ϑ graph. [sent-16, score-0.835]

9 As a consequence a SVM based approach easily identifies the clique in large graphs and is competitive with the state-of-the-art. [sent-17, score-0.58]

10 We introduce the notion of common orthogonal labelling and show that it can be computed by solving a Multiple Kernel learning problem. [sent-18, score-0.547]

11 It is further shown that such a labelling is extremely useful in identifying a large common dense subgraph in multiple graphs, which is known to be a computationally difficult problem. [sent-19, score-0.866]

12 1 Introduction The Lov´ sz ϑ function [19] plays a fundamental role in modern combinatorial optimization and a in various approximation algorithms on graphs, indeed Goemans was led to say It seems all roads lead to ϑ [10]. [sent-21, score-0.299]

13 This surprising connection opens up many opportunities which can benefit both graph theory and machine learning. [sent-24, score-0.204]

14 We give a new SDP characterization of Lov´ sz ϑ function, a min ω(K) = ϑ(G) K∈K(G) where ω(K) is computed by solving an one-class SVM. [sent-33, score-0.321]

15 The matrix K is a kernel matrix, associated with any orthogonal labelling of G. [sent-34, score-0.474]

16 Using an easy to compute orthogonal labelling we show that there exist graphs, which we call SVM − ϑ graphs, on which Lov´ sz ϑ function can be well approximated by solving an one-class a SVM. [sent-37, score-0.79]

17 The problem of finding a large common dense subgraph in multiple graphs arises in a variety of domains including Biology, Internet, Social Sciences [18]. [sent-40, score-0.695]

18 We introduce the notion of common orthogonal labelling which can be used to develop a formulation which is close in spirit to a Multiple Kernel Learning based formulation. [sent-42, score-0.494]

19 Our results on the well known DIMACS benchmark dataset show that it can identify large common dense subgraphs in wide variety of settings, beyond the realm of state-of-the-art methods. [sent-43, score-0.238]

20 Lastly, in Section 5, we show that the famous planted clique problem, can be easily solved for large graphs by solving an one-class SVM. [sent-46, score-0.974]

21 Many problems of interest in the area of machine learning can be reduced to the problem of detecting planted clique, e. [sent-47, score-0.339]

22 The planted clique problem consists of identifying a large clique in a random graph. [sent-50, score-1.068]

23 There is an elegant approach for identifying the planted clique by computing the Lov´ sz ϑ function [8], however it is not practical for large graphs as it requires solving an SDP. [sent-51, score-1.261]

24 a We show that the graph associated with the planted clique problem is a SVM − ϑ graph, paving the way for identifying the clique by solving an one-class SVM. [sent-52, score-1.275]

25 Apart from the method based on computing the ϑ function, there are other methods for planted clique identification, which do not require solving an SDP [2, 7, 24]. [sent-53, score-0.734]

26 An ¯ eigenvalue of graph G would mean the eigenvalue of the adjacency matrix of G. [sent-72, score-0.298]

27 Let GS = (S, ES ) denote the subgraph induced by S ⊆ V in graph G; having density γ(GS ) is given by γ(GS ) = |ES |/ |S| . [sent-78, score-0.526]

28 2 Let Ni (G) = {j ∈ V : (i, j) ∈ E} denote the set of neighbours of vertex i in graph G, and degree ¯ of node i to be di (G) = |Ni (G)|. [sent-79, score-0.265]

29 An independent set in G (a clique in G is a subset of vertices ¯ S ⊆ V for which no (every) pair of vertices has an edge in G (in G). [sent-80, score-0.565]

30 2 Lov´ sz ϑ function and Kernel learning a Consider the problem of embedding a graph G = (V, E) on a d dimensional unit sphere S d−1 . [sent-84, score-0.451]

31 The study of this problem was initiated in [19] which introduced the idea of orthogonal labelling: An 2 orthogonal labelling of graph G = (V, E) with |V | = n, is a matrix U = [u1 , . [sent-85, score-0.699]

32 , un ] ∈ Rd×n such that ui uj = 0 whenever (i, j) ∈ E and ui ∈ S d−1 ∀ i = 1, . [sent-88, score-0.278]

33 An orthogonal labelling defines an embedding of a graph on a d dimensional unit sphere: for every vertex i there is a vector ui on the unit sphere and for every (i, j) ∈ E ui and uj are orthogonal. [sent-92, score-0.908]

34 Using the notion of orthogonal labellings, [19] defined a function, famously known as Lov´ sz ϑ a function, which upper bounds the size of maximum independent set. [sent-93, score-0.413]

35 More specifically for any graph G : ALPHA(G) ≤ ϑ(G), where ALPHA(G) is the size of the largest independent set. [sent-94, score-0.176]

36 However, the Lov´ sz function a ϑ(G) gives a tractable upper-bound and since then Lov´ sz ϑ function has been extensively used a in solving a variety of algorithmic problems e. [sent-96, score-0.589]

37 It maybe useful to recall the definition of Lov´ sz ϑ function. [sent-99, score-0.268]

38 Denote the set of all possible orthogonal labellings of G by Lab(G) = {U = a [u1 , . [sent-100, score-0.177]

39 ϑ(G) = min min max U∈Lab(G) c∈S d−1 i 1 (c ui )2 (2) There exist several other equivalent definitions of ϑ, for a comprehensive discussion see [16]. [sent-104, score-0.146]

40 However computation of Lov´ sz ϑ function is not practical even for moderately sized graphs as it a requires solving a semidefinite program on a matrix which is of the size of the graph. [sent-105, score-0.659]

41 For a undirected graph G = (V, E), with |V | = n, let K(G) := {K ∈ S+ | Kii = n 1, i ∈ [n], Kij = 0, (i, j) ∈ E} Then, ϑ(G) = minK∈K(G) ω(K) Proof. [sent-109, score-0.154]

42 Hence by inspection it is clear that the columns of U defines an orthogonal labelling on G, i. [sent-112, score-0.42]

43 Note that if U is a labelling then U = Udiag( ) is also an orthogonal labelling for any = [ 1 , . [sent-116, score-0.738]

44 It thus suffices to consider only those labellings for which c ui ≥ 0 ∀ i = 1, . [sent-123, score-0.178]

45 1 is an easily computable upperbound on ϑ(G) namely that for any graph G ALPHA(G) ≤ ϑ(G) ≤ ω(K) ∀K ∈ K(G) (3) Since solving ω(K) is a convex quadratic program, it is indeed a computationally efficient alternative to the ϑ function. [sent-135, score-0.229]

46 In fact we will show that there exist families of graphs for which ϑ(G) can be approximated to within a constant factor by ω(K) for suitable K. [sent-136, score-0.295]

47 [20] For a graph G = (V, E) with |V | = n let C ∈ Sn matrix with Cij = 0 whenever (i, j) ∈ E. [sent-141, score-0.177]

48 If we fix C = A, the adjacency matrix, we obtain a very interesting orthogonal labelling, which we will refer to as LS labelling, introduced in [20]. [sent-148, score-0.153]

49 Indeed there exists family of graphs, called Q graphs for which LS labelling yields the interesting result ALPHA(G) = v(G, A), see [20]. [sent-149, score-0.532]

50 Indeed on a Q graph one does not need to compute a SDP, but can solve an one-class SVM, which has obvious computational benefits. [sent-150, score-0.154]

51 Inspired by this result, in the remaining part of the paper, we study this labelling more closely. [sent-151, score-0.318]

52 As a labelling is completely defined by the associated kernel matrix, we refer to the following kernel as the LS labelling, K= 3 A + I where ρ ≥ −λn (A). [sent-152, score-0.38]

53 ρ (4) SVM − ϑ graphs: Graphs where ϑ function can be approximated by SVM We now introduce a class of graphs on which ϑ function can be well approximated by ω(K) for K defined by (4). [sent-153, score-0.27]

54 A graph G is a SVM − ϑ graph if ω(K) ≤ (1 + O(1))ϑ(G) where K is a LS labelling. [sent-156, score-0.308]

55 Such classes of graphs are interesting because on them, one can approximate the Lov´ sz ϑ function a by solving an SVM, instead of an SDP, which in turn can be extremely useful in the design and analysis of approximation algorithms. [sent-157, score-0.557]

56 We will demosntrate two examples of SVM − ϑ graphs namely (a. [sent-158, score-0.236]

57 The classical Erd¨ s-Renyi random graph G(n, 1/2) has n vertices and each edge (i, j) is present o independently with probability 1/2. [sent-162, score-0.276]

58 Using the fact about degrees of vertices in G(n, 1/2), we know that ai e = n−1 + ∆i with |∆i | ≤ 2 n log n (8) where ai is the ith row of the adjacency matrix A. [sent-184, score-0.202]

59 Similar arguments show also that other families of graphs such as the 11 families of pseudo–random graphs described in [17] are SVM − ϑ graphs. [sent-199, score-0.514]

60 For such functions one can show that if p∗ = maxx∈C g(x) < ∞ then 1 g(x) 2 ∀x ∈ C p∗ − g(x) ≤ 2t 4 Dense common subgraph detection The problem of finding a large dense subgraph in multiple graphs has many applications [23, 22, 18]. [sent-203, score-0.981]

61 We introduce the notion of common orthogonal labelling, and show that it is indeed possible to recover dense regions in large graphs by solving a MKL problem. [sent-204, score-0.607]

62 , G(M ) } be a set of simple, undirected graphs G(m) = (V, E (m) ) defined on vertex set V = {1, . [sent-209, score-0.263]

63 Find a common subgraph which is dense in all the graphs. [sent-213, score-0.481]

64 Most algorithms which attempts the problem of finding a dense region are enumerative in nature and hence do not scale well to finding large cliques. [sent-214, score-0.199]

65 In the remainder of this section, we focus on finding a large common sparse subgraph in a given collection of graphs; with the observation that this is equivalent to finding a large common dense subgraph in the set of complement graphs. [sent-222, score-0.86]

66 , M on a common vertex set V with |V | = n, the common orthogonal labelling on all the labellings is given by ui ∈ S d−1 such that ui uj = 0 if (i, j) ∈ E (m) ∀ m = 1, . [sent-228, score-0.906]

67 Taking cue from MKL literature we pose a restricted version of the problem namely min ω(K) (12) M m=1 K= δm K(m) , δm ≥0 (m) M m=1 δm =1 (m) where K is an orthogonal labelling of G . [sent-238, score-0.442]

68 Direct verification shows that any feasible K is also a common orthogonal labelling. [sent-239, score-0.155]

69 , M (13) t∈R,αi ≥0 (m) where K is the LS labelling for G(m) , ∀m = 1, . [sent-248, score-0.318]

70 This result allows us to build a parameter free common sparse subgraph (CSS) algorithm shown in Figure 1 having following advantages: it provides a theoretical bound on subgraph density (Claim 4. [sent-253, score-0.668]

71 1 below); and, it requires no parameters from the user beyond the set of graphs G(1) , . [sent-254, score-0.214]

72 Let αmin,S = mini∈S ¯ the average of the support vector coefficients in the neighbourhood (m) (m) (m) subgraph GS having degree di (GS ) = |Ni (GS )|. [sent-259, score-0.348]

73 The subgraph GT induced by T , in graph G(m) , has density at most γ (m) where (1−c)ρ(m) γ (m) = αmin,SV (nT −1) ¯ (m) Ni (GS ) (m) j∈Ni (G ) S (m) di (GS ) α∗ j denote of vertex i in induced ∗ where c = min αi i∈SV (14) α∗ = Use MKL solvers to solve eqn. [sent-263, score-0.653]

74 This is equivalent to defining an orthogonal labelling on the Union graph of G(1) , . [sent-271, score-0.574]

75 , G(M ) 6 5 Finding Planted Cliques in G(n, 1/2) graphs Finding large cliques or independent sets is a computationally difficult problem even in random graphs. [sent-274, score-0.321]

76 Hidden planted clique A random graph G(n, 1/2) is chosen first and then a clique of size k is introduced in the first 1, . [sent-276, score-1.199]

77 √ [8] showed that if k = Ω( n), then the hidden clique can be discovered in polynomial time by computing the Lov´ sz ϑ function. [sent-281, score-0.649]

78 We consider the (equivalent) complement model G(n, 1/2, k) where a independent set is planted on √ ¯ the set of k vertices. [sent-283, score-0.379]

79 First we consider the expected case where all vertices outside the planted part S are adjacent to k/2 vertices in S and (n − k)/2 vertices outside S. [sent-294, score-0.688]

80 and all verties in the planted part have degree (n − k)/2. [sent-295, score-0.366]

81 Now apply Chernoff bounds to conclude that with high probability, all vertices in S have degree (n − k)/2 ± (n − k) log(n − k) and those outside S are √ adjacent to k/2 ± k log k vertices in S and to (n − k)/2 ± (n − k) log(n − k) vertices ouside ¯ S. [sent-297, score-0.38]

82 The above theorem suggests that the planted independent set can be recovered by taking the top k values in the optimal solution. [sent-302, score-0.375]

83 6 Experimental evaluation Comparison with exhaustive approach [14] We generate synthetic m = 3 random graphs over √ n vertices with average density δ = 0. [sent-310, score-0.382]

84 This is similar to the synthetic graphs generated in the original paper [see 14, Section 6. [sent-313, score-0.214]

85 It is interesting to note that CROCHET [14] took 2 hours and 9 hours for k = 50 and k = 60 sized cliques and failed to find a clique of size of 100. [sent-321, score-0.581]

86 7 Common dense subgraph detection We evaluate our algorithm for finding large dense regions on the DIMACS Challenge graphs 2 [15], which is a comprehensive benchmark for testing of clique finding and related algorithms. [sent-326, score-1.17]

87 For the families of dense graphs (brock, san, sanr), we focus on finding large dense region in the complement of the original graphs. [sent-327, score-0.57]

88 We run Algorithm 1 using SimpleMkl3 to find large common dense subgraph. [sent-328, score-0.195]

89 In order to evaluate the performance of our algorithm, we compute a = maxm a(m) and a = minm a(m) where ¯ (m) a(m) = γ(GT )/γ(G(m) ) is relative density of induced subgraph (compared to original graph density); and nT /N is relative size of induced subgraph compared to original graph size. [sent-329, score-1.069]

90 We note that our algorithm finds a large subgraph (large nT /N ) with higher density compared to original graph in all of DIMACS graph classes making it suitable for finding large dense regions in multiple graphs. [sent-332, score-0.801]

91 The MKL experiments reported in Table 1 took less than 1 minute (for each graph family); while the algorithm in [14] aborts after several hours due to memory constraints. [sent-334, score-0.191]

92 Planted clique recovery We generate 100 random graphs based on planted clique model G(n, 1/2, k) where n = 30000 and √ hidden clique size k = 2t n for each choice of t. [sent-335, score-1.674]

93 For t ≥ 2 we find perfect recovery of the clique on all the graphs, which is agreement with Theorem 5. [sent-339, score-0.376]

94 It is worth noting that the approach takes 10 minutes to recover the clique in this graph of 30000 vertices which is far beyond the scope of SDP based procedures. [sent-341, score-0.597]

95 02 Table 1: Common dense subgraph recovery on multiple graphs in DIMACS dataset. [sent-384, score-0.676]

96 In particular we have demonstrated that finding a common dense region in multiple graphs can be solved by a MKL problem, while finding a large planted clique can be solved by an one class SVM. [sent-388, score-1.142]

97 Finding and certifying a large hidden clique in a semirandom graph. [sent-429, score-0.381]

98 A convex quadratic characterization of the lov´ sz theta number. [sent-491, score-0.268]

99 Computational challenges with cliques, quasi-cliques and clique partitions in graphs. [sent-500, score-0.342]

100 Finding hidden cliques in linear time with high probability. [sent-509, score-0.146]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('clique', 0.342), ('planted', 0.339), ('labelling', 0.318), ('subgraph', 0.286), ('lov', 0.282), ('sz', 0.268), ('graphs', 0.214), ('svm', 0.158), ('graph', 0.154), ('dense', 0.142), ('sdp', 0.139), ('nt', 0.118), ('mkl', 0.109), ('cliques', 0.107), ('chalmers', 0.107), ('dimacs', 0.104), ('ui', 0.103), ('orthogonal', 0.102), ('gs', 0.101), ('vertices', 0.101), ('kkt', 0.098), ('alpha', 0.093), ('labellings', 0.075), ('lab', 0.071), ('minc', 0.069), ('kij', 0.069), ('goteborg', 0.064), ('ls', 0.062), ('nding', 0.059), ('enumerative', 0.057), ('common', 0.053), ('solving', 0.053), ('sweden', 0.052), ('adjacency', 0.051), ('uj', 0.05), ('finding', 0.05), ('sv', 0.049), ('vertex', 0.049), ('ni', 0.046), ('identifying', 0.045), ('density', 0.043), ('moderately', 0.043), ('subgraphs', 0.043), ('benny', 0.043), ('dubhashi', 0.043), ('jethava', 0.043), ('krivelevich', 0.043), ('induced', 0.043), ('semide', 0.041), ('complement', 0.04), ('maxi', 0.039), ('hidden', 0.039), ('coloring', 0.038), ('minm', 0.038), ('hours', 0.037), ('art', 0.037), ('sized', 0.036), ('theorem', 0.036), ('kii', 0.035), ('mink', 0.035), ('css', 0.035), ('di', 0.035), ('eigenvalue', 0.035), ('sn', 0.034), ('recovery', 0.034), ('establishes', 0.033), ('families', 0.032), ('erd', 0.031), ('kernel', 0.031), ('combinatorial', 0.031), ('feige', 0.03), ('sphere', 0.029), ('approximated', 0.028), ('gt', 0.028), ('claim', 0.028), ('log', 0.027), ('degree', 0.027), ('opportunities', 0.026), ('solved', 0.026), ('ee', 0.025), ('concave', 0.025), ('opens', 0.024), ('consequence', 0.024), ('exhaustive', 0.024), ('strongly', 0.023), ('outside', 0.023), ('matrix', 0.023), ('rd', 0.023), ('arguments', 0.022), ('namely', 0.022), ('regions', 0.022), ('size', 0.022), ('extremely', 0.022), ('recalling', 0.022), ('aij', 0.022), ('un', 0.022), ('comprehensive', 0.022), ('notion', 0.021), ('exist', 0.021), ('edge', 0.021)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0 337 nips-2012-The Lovász ϑ function, SVMs and finding large dense subgraphs

Author: Vinay Jethava, Anders Martinsson, Chiranjib Bhattacharyya, Devdatt Dubhashi

Abstract: The Lov´ sz ϑ function of a graph, a fundamental tool in combinatorial optimizaa tion and approximation algorithms, is computed by solving a SDP. In this paper we establish that the Lov´ sz ϑ function is equivalent to a kernel learning problem a related to one class SVM. This interesting connection opens up many opportunities bridging graph theoretic algorithms and machine learning. We show that there exist graphs, which we call SVM − ϑ graphs, on which the Lov´ sz ϑ function a can be approximated well by a one-class SVM. This leads to novel use of SVM techniques for solving algorithmic problems in large graphs e.g. identifying a √ 1 planted clique of size Θ( n) in a random graph G(n, 2 ). A classic approach for this problem involves computing the ϑ function, however it is not scalable due to SDP computation. We show that the random graph with a planted clique is an example of SVM − ϑ graph. As a consequence a SVM based approach easily identifies the clique in large graphs and is competitive with the state-of-the-art. We introduce the notion of common orthogonal labelling and show that it can be computed by solving a Multiple Kernel learning problem. It is further shown that such a labelling is extremely useful in identifying a large common dense subgraph in multiple graphs, which is known to be a computationally difficult problem. The proposed algorithm achieves an order of magnitude scalability compared to state of the art methods. 1

2 0.25141692 69 nips-2012-Clustering Sparse Graphs

Author: Yudong Chen, Sujay Sanghavi, Huan Xu

Abstract: We develop a new algorithm to cluster sparse unweighted graphs – i.e. partition the nodes into disjoint clusters so that there is higher density within clusters, and low across clusters. By sparsity we mean the setting where both the in-cluster and across cluster edge densities are very small, possibly vanishing in the size of the graph. Sparsity makes the problem noisier, and hence more difficult to solve. Any clustering involves a tradeoff between minimizing two kinds of errors: missing edges within clusters and present edges across clusters. Our insight is that in the sparse case, these must be penalized differently. We analyze our algorithm’s performance on the natural, classical and widely studied “planted partition” model (also called the stochastic block model); we show that our algorithm can cluster sparser graphs, and with smaller clusters, than all previous methods. This is seen empirically as well. 1

3 0.17077862 186 nips-2012-Learning as MAP Inference in Discrete Graphical Models

Author: Xianghang Liu, James Petterson, Tibério S. Caetano

Abstract: We present a new formulation for binary classification. Instead of relying on convex losses and regularizers such as in SVMs, logistic regression and boosting, or instead non-convex but continuous formulations such as those encountered in neural networks and deep belief networks, our framework entails a non-convex but discrete formulation, where estimation amounts to finding a MAP configuration in a graphical model whose potential functions are low-dimensional discrete surrogates for the misclassification loss. We argue that such a discrete formulation can naturally account for a number of issues that are typically encountered in either the convex or the continuous non-convex approaches, or both. By reducing the learning problem to a MAP inference problem, we can immediately translate the guarantees available for many inference settings to the learning problem itself. We empirically demonstrate in a number of experiments that this approach is promising in dealing with issues such as severe label noise, while still having global optimality guarantees. Due to the discrete nature of the formulation, it also allows for direct regularization through cardinality-based penalties, such as the 0 pseudo-norm, thus providing the ability to perform feature selection and trade-off interpretability and predictability in a principled manner. We also outline a number of open problems arising from the formulation. 1

4 0.13816828 328 nips-2012-Submodular-Bregman and the Lovász-Bregman Divergences with Applications

Author: Rishabh Iyer, Jeff A. Bilmes

Abstract: We introduce a class of discrete divergences on sets (equivalently binary vectors) that we call the submodular-Bregman divergences. We consider two kinds, defined either from tight modular upper or tight modular lower bounds of a submodular function. We show that the properties of these divergences are analogous to the (standard continuous) Bregman divergence. We demonstrate how they generalize many useful divergences, including the weighted Hamming distance, squared weighted Hamming, weighted precision, recall, conditional mutual information, and a generalized KL-divergence on sets. We also show that the generalized Bregman divergence on the Lov´ sz extension of a submodular function, which we a call the Lov´ sz-Bregman divergence, is a continuous extension of a submodular a Bregman divergence. We point out a number of applications, and in particular show that a proximal algorithm defined through the submodular Bregman divergence provides a framework for many mirror-descent style algorithms related to submodular function optimization. We also show that a generalization of the k-means algorithm using the Lov´ sz Bregman divergence is natural in clustering scenarios where a ordering is important. A unique property of this algorithm is that computing the mean ordering is extremely efficient unlike other order based distance measures. 1

5 0.12684888 199 nips-2012-Link Prediction in Graphs with Autoregressive Features

Author: Emile Richard, Stephane Gaiffas, Nicolas Vayatis

Abstract: In the paper, we consider the problem of link prediction in time-evolving graphs. We assume that certain graph features, such as the node degree, follow a vector autoregressive (VAR) model and we propose to use this information to improve the accuracy of prediction. Our strategy involves a joint optimization procedure over the space of adjacency matrices and VAR matrices which takes into account both sparsity and low rank properties of the matrices. Oracle inequalities are derived and illustrate the trade-offs in the choice of smoothing parameters when modeling the joint effect of sparsity and low rank property. The estimate is computed efficiently using proximal methods through a generalized forward-backward agorithm. 1

6 0.12188089 326 nips-2012-Structure estimation for discrete graphical models: Generalized covariance matrices and their inverses

7 0.11314036 284 nips-2012-Q-MKL: Matrix-induced Regularization in Multi-Kernel Learning with Applications to Neuroimaging

8 0.11293378 6 nips-2012-A Convex Formulation for Learning Scale-Free Networks via Submodular Relaxation

9 0.10809851 197 nips-2012-Learning with Recursive Perceptual Representations

10 0.10286197 287 nips-2012-Random function priors for exchangeable arrays with applications to graphs and relational data

11 0.085716948 260 nips-2012-Online Sum-Product Computation Over Trees

12 0.080093488 172 nips-2012-Latent Graphical Model Selection: Efficient Methods for Locally Tree-like Graphs

13 0.072734855 132 nips-2012-Fiedler Random Fields: A Large-Scale Spectral Approach to Statistical Network Modeling

14 0.066641971 149 nips-2012-Hierarchical Optimistic Region Selection driven by Curiosity

15 0.061003309 180 nips-2012-Learning Mixtures of Tree Graphical Models

16 0.06006825 214 nips-2012-Minimizing Sparse High-Order Energies by Submodular Vertex-Cover

17 0.05989797 335 nips-2012-The Bethe Partition Function of Log-supermodular Graphical Models

18 0.059573144 227 nips-2012-Multiclass Learning with Simplex Coding

19 0.05726254 206 nips-2012-Majorization for CRFs and Latent Likelihoods

20 0.055433109 188 nips-2012-Learning from Distributions via Support Measure Machines


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.164), (1, 0.055), (2, 0.05), (3, -0.107), (4, 0.019), (5, -0.004), (6, -0.033), (7, -0.058), (8, -0.162), (9, 0.065), (10, 0.053), (11, -0.039), (12, -0.011), (13, 0.044), (14, 0.09), (15, -0.064), (16, -0.015), (17, 0.006), (18, -0.09), (19, -0.028), (20, -0.009), (21, -0.04), (22, 0.027), (23, -0.079), (24, -0.007), (25, 0.131), (26, -0.141), (27, 0.011), (28, -0.029), (29, -0.13), (30, -0.068), (31, 0.068), (32, -0.109), (33, 0.084), (34, 0.057), (35, -0.12), (36, 0.166), (37, 0.041), (38, 0.061), (39, -0.002), (40, 0.004), (41, 0.004), (42, 0.077), (43, -0.015), (44, 0.029), (45, -0.029), (46, -0.056), (47, 0.013), (48, -0.077), (49, 0.117)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.94321895 337 nips-2012-The Lovász ϑ function, SVMs and finding large dense subgraphs

Author: Vinay Jethava, Anders Martinsson, Chiranjib Bhattacharyya, Devdatt Dubhashi

Abstract: The Lov´ sz ϑ function of a graph, a fundamental tool in combinatorial optimizaa tion and approximation algorithms, is computed by solving a SDP. In this paper we establish that the Lov´ sz ϑ function is equivalent to a kernel learning problem a related to one class SVM. This interesting connection opens up many opportunities bridging graph theoretic algorithms and machine learning. We show that there exist graphs, which we call SVM − ϑ graphs, on which the Lov´ sz ϑ function a can be approximated well by a one-class SVM. This leads to novel use of SVM techniques for solving algorithmic problems in large graphs e.g. identifying a √ 1 planted clique of size Θ( n) in a random graph G(n, 2 ). A classic approach for this problem involves computing the ϑ function, however it is not scalable due to SDP computation. We show that the random graph with a planted clique is an example of SVM − ϑ graph. As a consequence a SVM based approach easily identifies the clique in large graphs and is competitive with the state-of-the-art. We introduce the notion of common orthogonal labelling and show that it can be computed by solving a Multiple Kernel learning problem. It is further shown that such a labelling is extremely useful in identifying a large common dense subgraph in multiple graphs, which is known to be a computationally difficult problem. The proposed algorithm achieves an order of magnitude scalability compared to state of the art methods. 1

2 0.66580093 132 nips-2012-Fiedler Random Fields: A Large-Scale Spectral Approach to Statistical Network Modeling

Author: Antonino Freno, Mikaela Keller, Marc Tommasi

Abstract: Statistical models for networks have been typically committed to strong prior assumptions concerning the form of the modeled distributions. Moreover, the vast majority of currently available models are explicitly designed for capturing some specific graph properties (such as power-law degree distributions), which makes them unsuitable for application to domains where the behavior of the target quantities is not known a priori. The key contribution of this paper is twofold. First, we introduce the Fiedler delta statistic, based on the Laplacian spectrum of graphs, which allows to dispense with any parametric assumption concerning the modeled network properties. Second, we use the defined statistic to develop the Fiedler random field model, which allows for efficient estimation of edge distributions over large-scale random networks. After analyzing the dependence structure involved in Fiedler random fields, we estimate them over several real-world networks, showing that they achieve a much higher modeling accuracy than other well-known statistical approaches.

3 0.57621759 186 nips-2012-Learning as MAP Inference in Discrete Graphical Models

Author: Xianghang Liu, James Petterson, Tibério S. Caetano

Abstract: We present a new formulation for binary classification. Instead of relying on convex losses and regularizers such as in SVMs, logistic regression and boosting, or instead non-convex but continuous formulations such as those encountered in neural networks and deep belief networks, our framework entails a non-convex but discrete formulation, where estimation amounts to finding a MAP configuration in a graphical model whose potential functions are low-dimensional discrete surrogates for the misclassification loss. We argue that such a discrete formulation can naturally account for a number of issues that are typically encountered in either the convex or the continuous non-convex approaches, or both. By reducing the learning problem to a MAP inference problem, we can immediately translate the guarantees available for many inference settings to the learning problem itself. We empirically demonstrate in a number of experiments that this approach is promising in dealing with issues such as severe label noise, while still having global optimality guarantees. Due to the discrete nature of the formulation, it also allows for direct regularization through cardinality-based penalties, such as the 0 pseudo-norm, thus providing the ability to perform feature selection and trade-off interpretability and predictability in a principled manner. We also outline a number of open problems arising from the formulation. 1

4 0.55723089 287 nips-2012-Random function priors for exchangeable arrays with applications to graphs and relational data

Author: James Lloyd, Peter Orbanz, Zoubin Ghahramani, Daniel M. Roy

Abstract: A fundamental problem in the analysis of structured relational data like graphs, networks, databases, and matrices is to extract a summary of the common structure underlying relations between individual entities. Relational data are typically encoded in the form of arrays; invariance to the ordering of rows and columns corresponds to exchangeable arrays. Results in probability theory due to Aldous, Hoover and Kallenberg show that exchangeable arrays can be represented in terms of a random measurable function which constitutes the natural model parameter in a Bayesian model. We obtain a flexible yet simple Bayesian nonparametric model by placing a Gaussian process prior on the parameter function. Efficient inference utilises elliptical slice sampling combined with a random sparse approximation to the Gaussian process. We demonstrate applications of the model to network data and clarify its relation to models in the literature, several of which emerge as special cases. 1

5 0.53586096 69 nips-2012-Clustering Sparse Graphs

Author: Yudong Chen, Sujay Sanghavi, Huan Xu

Abstract: We develop a new algorithm to cluster sparse unweighted graphs – i.e. partition the nodes into disjoint clusters so that there is higher density within clusters, and low across clusters. By sparsity we mean the setting where both the in-cluster and across cluster edge densities are very small, possibly vanishing in the size of the graph. Sparsity makes the problem noisier, and hence more difficult to solve. Any clustering involves a tradeoff between minimizing two kinds of errors: missing edges within clusters and present edges across clusters. Our insight is that in the sparse case, these must be penalized differently. We analyze our algorithm’s performance on the natural, classical and widely studied “planted partition” model (also called the stochastic block model); we show that our algorithm can cluster sparser graphs, and with smaller clusters, than all previous methods. This is seen empirically as well. 1

6 0.5270294 6 nips-2012-A Convex Formulation for Learning Scale-Free Networks via Submodular Relaxation

7 0.5252502 10 nips-2012-A Linear Time Active Learning Algorithm for Link Classification

8 0.52031034 196 nips-2012-Learning with Partially Absorbing Random Walks

9 0.51630515 199 nips-2012-Link Prediction in Graphs with Autoregressive Features

10 0.50084341 135 nips-2012-Forging The Graphs: A Low Rank and Positive Semidefinite Graph Learning Approach

11 0.48803255 328 nips-2012-Submodular-Bregman and the Lovász-Bregman Divergences with Applications

12 0.4854466 214 nips-2012-Minimizing Sparse High-Order Energies by Submodular Vertex-Cover

13 0.48469791 346 nips-2012-Topology Constraints in Graphical Models

14 0.45059851 253 nips-2012-On Triangular versus Edge Representations --- Towards Scalable Modeling of Networks

15 0.42391095 70 nips-2012-Clustering by Nonnegative Matrix Factorization Using Graph Random Walk

16 0.4161599 309 nips-2012-Semi-supervised Eigenvectors for Locally-biased Learning

17 0.41412675 157 nips-2012-Identification of Recurrent Patterns in the Activation of Brain Networks

18 0.40186867 326 nips-2012-Structure estimation for discrete graphical models: Generalized covariance matrices and their inverses

19 0.39969036 317 nips-2012-Smooth-projected Neighborhood Pursuit for High-dimensional Nonparanormal Graph Estimation

20 0.39754057 197 nips-2012-Learning with Recursive Perceptual Representations


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.019), (21, 0.014), (38, 0.103), (42, 0.026), (54, 0.018), (64, 0.012), (74, 0.51), (76, 0.111), (80, 0.054), (92, 0.029)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.92983025 202 nips-2012-Locally Uniform Comparison Image Descriptor

Author: Andrew Ziegler, Eric Christiansen, David Kriegman, Serge J. Belongie

Abstract: Keypoint matching between pairs of images using popular descriptors like SIFT or a faster variant called SURF is at the heart of many computer vision algorithms including recognition, mosaicing, and structure from motion. However, SIFT and SURF do not perform well for real-time or mobile applications. As an alternative very fast binary descriptors like BRIEF and related methods use pairwise comparisons of pixel intensities in an image patch. We present an analysis of BRIEF and related approaches revealing that they are hashing schemes on the ordinal correlation metric Kendall’s tau. Here, we introduce Locally Uniform Comparison Image Descriptor (LUCID), a simple description method based on linear time permutation distances between the ordering of RGB values of two image patches. LUCID is computable in linear time with respect to the number of pixels and does not require floating point computation. 1

2 0.92326534 40 nips-2012-Analyzing 3D Objects in Cluttered Images

Author: Mohsen Hejrati, Deva Ramanan

Abstract: We present an approach to detecting and analyzing the 3D configuration of objects in real-world images with heavy occlusion and clutter. We focus on the application of finding and analyzing cars. We do so with a two-stage model; the first stage reasons about 2D shape and appearance variation due to within-class variation (station wagons look different than sedans) and changes in viewpoint. Rather than using a view-based model, we describe a compositional representation that models a large number of effective views and shapes using a small number of local view-based templates. We use this model to propose candidate detections and 2D estimates of shape. These estimates are then refined by our second stage, using an explicit 3D model of shape and viewpoint. We use a morphable model to capture 3D within-class variation, and use a weak-perspective camera model to capture viewpoint. We learn all model parameters from 2D annotations. We demonstrate state-of-the-art accuracy for detection, viewpoint estimation, and 3D shape reconstruction on challenging images from the PASCAL VOC 2011 dataset. 1

3 0.92239171 360 nips-2012-Visual Recognition using Embedded Feature Selection for Curvature Self-Similarity

Author: Angela Eigenstetter, Bjorn Ommer

Abstract: Category-level object detection has a crucial need for informative object representations. This demand has led to feature descriptors of ever increasing dimensionality like co-occurrence statistics and self-similarity. In this paper we propose a new object representation based on curvature self-similarity that goes beyond the currently popular approximation of objects using straight lines. However, like all descriptors using second order statistics, ours also exhibits a high dimensionality. Although improving discriminability, the high dimensionality becomes a critical issue due to lack of generalization ability and curse of dimensionality. Given only a limited amount of training data, even sophisticated learning algorithms such as the popular kernel methods are not able to suppress noisy or superfluous dimensions of such high-dimensional data. Consequently, there is a natural need for feature selection when using present-day informative features and, particularly, curvature self-similarity. We therefore suggest an embedded feature selection method for SVMs that reduces complexity and improves generalization capability of object models. By successfully integrating the proposed curvature self-similarity representation together with the embedded feature selection in a widely used state-of-the-art object detection framework we show the general pertinence of the approach. 1

same-paper 4 0.87213284 337 nips-2012-The Lovász ϑ function, SVMs and finding large dense subgraphs

Author: Vinay Jethava, Anders Martinsson, Chiranjib Bhattacharyya, Devdatt Dubhashi

Abstract: The Lov´ sz ϑ function of a graph, a fundamental tool in combinatorial optimizaa tion and approximation algorithms, is computed by solving a SDP. In this paper we establish that the Lov´ sz ϑ function is equivalent to a kernel learning problem a related to one class SVM. This interesting connection opens up many opportunities bridging graph theoretic algorithms and machine learning. We show that there exist graphs, which we call SVM − ϑ graphs, on which the Lov´ sz ϑ function a can be approximated well by a one-class SVM. This leads to novel use of SVM techniques for solving algorithmic problems in large graphs e.g. identifying a √ 1 planted clique of size Θ( n) in a random graph G(n, 2 ). A classic approach for this problem involves computing the ϑ function, however it is not scalable due to SDP computation. We show that the random graph with a planted clique is an example of SVM − ϑ graph. As a consequence a SVM based approach easily identifies the clique in large graphs and is competitive with the state-of-the-art. We introduce the notion of common orthogonal labelling and show that it can be computed by solving a Multiple Kernel learning problem. It is further shown that such a labelling is extremely useful in identifying a large common dense subgraph in multiple graphs, which is known to be a computationally difficult problem. The proposed algorithm achieves an order of magnitude scalability compared to state of the art methods. 1

5 0.83806676 339 nips-2012-The Time-Marginalized Coalescent Prior for Hierarchical Clustering

Author: Levi Boyles, Max Welling

Abstract: We introduce a new prior for use in Nonparametric Bayesian Hierarchical Clustering. The prior is constructed by marginalizing out the time information of Kingman’s coalescent, providing a prior over tree structures which we call the Time-Marginalized Coalescent (TMC). This allows for models which factorize the tree structure and times, providing two benefits: more flexible priors may be constructed and more efficient Gibbs type inference can be used. We demonstrate this on an example model for density estimation and show the TMC achieves competitive experimental results. 1

6 0.7906875 3 nips-2012-A Bayesian Approach for Policy Learning from Trajectory Preference Queries

7 0.72253609 185 nips-2012-Learning about Canonical Views from Internet Image Collections

8 0.72026795 357 nips-2012-Unsupervised Template Learning for Fine-Grained Object Recognition

9 0.71446067 274 nips-2012-Priors for Diversity in Generative Latent Variable Models

10 0.70239437 201 nips-2012-Localizing 3D cuboids in single-view images

11 0.68688703 176 nips-2012-Learning Image Descriptors with the Boosting-Trick

12 0.67256367 8 nips-2012-A Generative Model for Parts-based Object Segmentation

13 0.65624237 210 nips-2012-Memorability of Image Regions

14 0.64509851 101 nips-2012-Discriminatively Trained Sparse Code Gradients for Contour Detection

15 0.63568288 235 nips-2012-Natural Images, Gaussian Mixtures and Dead Leaves

16 0.62950516 303 nips-2012-Searching for objects driven by context

17 0.61836052 106 nips-2012-Dynamical And-Or Graph Learning for Object Shape Modeling and Detection

18 0.61400795 146 nips-2012-Graphical Gaussian Vector for Image Categorization

19 0.6037066 168 nips-2012-Kernel Latent SVM for Visual Recognition

20 0.5895707 81 nips-2012-Context-Sensitive Decision Forests for Object Detection