jmlr jmlr2013 jmlr2013-64 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Vinay Jethava, Anders Martinsson, Chiranjib Bhattacharyya, Devdatt Dubhashi
Abstract: In this paper we establish that the Lov´ sz ϑ function on a graph can be restated as a kernel learning a problem. We introduce the notion of SVM − ϑ graphs, on which Lov´ sz ϑ function can be apa proximated well by a Support vector machine (SVM). We show that Erd¨ s-R´ nyi random G(n, p) o e log4 n np graphs are SVM − ϑ graphs for n ≤ p < 1. Even if we embed a large clique of size Θ 1−p in a G(n, p) graph the resultant graph still remains a SVM − ϑ graph. This immediately suggests an SVM based algorithm for recovering a large planted clique in random graphs. Associated with the ϑ function is the notion of orthogonal labellings. We introduce common orthogonal labellings which extends the idea of orthogonal labellings to multiple graphs. This allows us to propose a Multiple Kernel learning (MKL) based solution which is capable of identifying a large common dense subgraph in multiple graphs. Both in the planted clique case and common subgraph detection problem the proposed solutions beat the state of the art by an order of magnitude. Keywords: orthogonal labellings of graphs, planted cliques, random graphs, common dense subgraph
Reference: text
sentIndex sentText sentNum sentScore
1 We show that Erd¨ s-R´ nyi random G(n, p) o e log4 n np graphs are SVM − ϑ graphs for n ≤ p < 1. [sent-10, score-0.56]
2 Even if we embed a large clique of size Θ 1−p in a G(n, p) graph the resultant graph still remains a SVM − ϑ graph. [sent-11, score-0.641]
3 This immediately suggests an SVM based algorithm for recovering a large planted clique in random graphs. [sent-12, score-0.61]
4 This allows us to propose a Multiple Kernel learning (MKL) based solution which is capable of identifying a large common dense subgraph in multiple graphs. [sent-15, score-0.48]
5 Both in the planted clique case and common subgraph detection problem the proposed solutions beat the state of the art by an order of magnitude. [sent-16, score-0.953]
6 Keywords: orthogonal labellings of graphs, planted cliques, random graphs, common dense subgraph 1. [sent-17, score-0.965]
7 Introduction In a general graph many problems, such as computing the size of the largest clique or determining the chromatic number, are NP hard (Garey and Johnson, 1979). [sent-18, score-0.455]
8 Consider the problem of finding a large planted clique in a random graph. [sent-26, score-0.61]
9 In particular we study the problem of finding a common dense subgraph in multiple graphs (Pardalos and Rebennack, 2010), a computationally challenging problem for large graphs. [sent-34, score-0.689]
10 We also study the problem of finding a hidden planted clique in a random graph. [sent-35, score-0.636]
11 Many problems in social network analysis can also be posed as dense subgraph discovery problem (Newman et al. [sent-46, score-0.44]
12 In this paper we target two difficult versions of dense subgraph recovery problem. [sent-54, score-0.44]
13 The problem of planted clique in a random graph is an instance of dense subgraph discovery in a random graph. [sent-55, score-1.236]
14 To cite an example recently the problem of correlation detection was formulated as that of finding a planted clique in a large random graph (Devroye et al. [sent-58, score-0.796]
15 , Podolyan and Karypis, 2009), we study the problem of finding a large common dense subgraph in multiple graphs. [sent-62, score-0.48]
16 One of the main contribution of this paper is to show how the connection of ϑ to SVMs can be exploited to find a dense subgraph in multiple graphs. [sent-70, score-0.44]
17 We extend the idea of orthogonal labelling to multiple graphs by introducing the notion of common orthogonal labelling. [sent-71, score-0.495]
18 An extremely difficult instance of dense subgraph recovery problem is to pose the question of finding a hidden clique in a random graph. [sent-79, score-0.735]
19 Another key contribution of this paper is show that one can find a planted clique by solving an SVM. [sent-81, score-0.61]
20 In particular, we show that in a G(n, 1 − p) graph even if we embed a clique of size k = Θ n(1 − p)/p , the resultant graph is a SVM − ϑ graph. [sent-82, score-0.641]
21 Furthermore, even if embed a sparse random subgraph in a large random graph, the resultant graph turns out to be SVM − ϑ graph. [sent-83, score-0.489]
22 Furthermore we show that the the graph associated with the planted clique problem also satisfies this property. [sent-93, score-0.796]
23 Let GS = (S, ES ) denote 2 ¯ the subgraph induced by S ⊆ V in graph G. [sent-141, score-0.538]
24 The study of this problem was initiated by Lov´ sz (1979) who introduced the idea of orthogonal labelling: a Definition 1 (Lov´ sz, 1979) An orthogonal labelling of graph G = (V, E) with |V | = n, is a matrix a U = [u1 , . [sent-151, score-0.648]
25 Definition 4 (Luz, 1995) A graph G, with adjacency matrix A, is called a Q graph whenever ALPHA(G) = v(G, A) where v(G, A) is defined in Theorem 2. [sent-215, score-0.524]
26 In Section 4 we will use this characterization to show that random graphs with planted cliques are not Q graphs. [sent-224, score-0.6]
27 Before we discuss the applicability of the results obtained here to random graphs we study the interesting problem of finding a dense common subgraph in multiple graphs. [sent-245, score-0.689]
28 Finding Large Dense Regions in Multiple Graphs The problem of finding a dense subgraph in a single graph is a computationally challenging problem (Pardalos and Rebennack, 2010). [sent-247, score-0.626]
29 An interesting method was proposed by Jiang and Pei (2009), which uses an enumerative strategy for finding a common dense subgraph in multiple graphs. [sent-251, score-0.512]
30 Find a common subgraph which is dense in all the graphs. [sent-287, score-0.48]
31 2 presents our algorithm for recovery of large common dense subgraph from multiple graphs. [sent-293, score-0.48]
32 1 C OMMON O RTHOGONAL L ABELLING AND MKL F ORMULATION We begin by defining common orthogonal labelling below: Definition 9 Given set of simple undirected graphs G on a common vertex set V = {1, . [sent-296, score-0.519]
33 Let ϒ(G) denote the size of the maximum common independent set, that is, subset of vertices CS ⊆ V (l) of maximum possible cardinality for which the subgraph GCS induced by CS in graph G(l) is an independent set for all G(l) ∈ G. [sent-309, score-0.619]
34 However, the relationship between minimum eigenvalue ρ(l) = −λn (A(l) ) of original graphs G(l) ∈ G, and minimum eigenvalue ρ∪ = −λn (A∪ ) of union graph G∪ is not clear, that is, ρ∪ can be greater or less than ρ(l) (see, e. [sent-365, score-0.471]
35 2 S UBGRAPH D ETECTION BASED ON M ULTIPLE G RAPHS In the remainder of this section, we relate the optimal solution (support vectors) of ω(K) and the density of related induced subgraph for the single graph case; which we later extend to multiple graphs. [sent-388, score-0.586]
36 We extend this notion to general graphs by relating the density of the induced subgraph obtained by choosing vertices having “high” support through the KKT conditions. [sent-394, score-0.65]
37 i ¯i Let α∗ (S) denote the average of the support vectors α∗ over the neighbourhood Ni (GS ) of node j ¯i ¯S i in subgraph GS induced by S ⊆ V in graph G; and α∗ be the minimum α∗ (S) over all i ∈ S, that is, ¯i α∗ (S) = ∑ j∈S Ai j α∗ i , di (GS ) and 3506 ¯S ¯i α∗ = min α∗ (S). [sent-397, score-0.592]
38 This result provides an upper bound on the density γ(GSc ) of the subgraph induced by set Sc in graph G for general c. [sent-405, score-0.586]
39 γ(GSV ) ≤ ∗ ¯ αSV (nSV − 1) This provides a simple procedure for finding a sparse subgraph by selection the subgraph GSV induced by the set of non-zero support vectors SV . [sent-408, score-0.655]
40 We now consider the problem of common dense subgraph recovery from multiple graphs based on the MKL formulation in (7). [sent-412, score-0.689]
41 The set T with cardinality nT = |T | induces a subgraph (l) (l) GT in graph G(l) ∈ G having density at most γ(GT ) given by (l) γ(GT ) ≤ (1 − αmin )ρ(l) ¯ (l) αSV (nT − 1) ∀ G(l) ∈ G. [sent-424, score-0.537]
42 The above result allows us to build a parameter-less common sparse subgraph (CSS) algorithm shown in Algorithm 1 having following advantages: it provides a theoretical bound on subgraph density; and, it requires no parameters from the user beyond the set of graphs G. [sent-435, score-0.855]
43 However, if nT is very large, that is, nT /N ≃ 1, the density of the induced subgraph is close to the average graph density. [sent-438, score-0.586]
44 More generally, one might be interested in a trade-off between the subgraph (l) size nT and subgraph density γ(GT ). [sent-439, score-0.654]
45 Analogous to the simple graph case, we can improve the subgraph density is obtained by choosing smaller region nodes Tc := {i ∈ T : α∗ > c} ⊆ T . [sent-440, score-0.537]
46 Subsequently we show that G(n, p) graphs and G(n, p) graphs with planted cliques are SVM − ϑ graphs. [sent-453, score-0.809]
47 This results immediately show that one can identify planted cliques or planted subgraphs. [sent-454, score-0.732]
48 Definition 16 A family of graphs G = {G = (V, E)} is said to be SVM − ϑ graph family if there exist a constant γ, such that for any graph G ∈ G with |V | ≥ n0 , the following holds: ω(K) ≤ γϑ(G), where ω(K) is defined in (1) and K is defined on G by (2). [sent-456, score-0.581]
49 We will demonstrate examples of such families of random graphs: the Erd¨ s–R´ nyi random graph G(n, p) o e and a planted variation. [sent-459, score-0.582]
50 This property can be very useful in detecting planted cliques in dense graphs. [sent-509, score-0.528]
51 Definition 20 (Hidden Planted Clique) A random G(n, q) graph is chosen first and then a clique of size k is introduced in the first 1, . [sent-514, score-0.455]
52 In this paper we establish that for large planted 2 ¯ clique G(n, 1− p, k) is indeed a SVM− ϑ graph. [sent-538, score-0.61]
53 We study the planted clique problem where k is in the above regime. [sent-548, score-0.61]
54 This motivates Algorithm 2 for finding planted clique in a random graph. [sent-611, score-0.61]
55 3 Finding Planted Subgraphs in Random Graphs The above results show that the SVM procedure can recover a planted independent set in a sparse random graph, which is later exploited to solve the planted clique problem in a dense graph. [sent-620, score-1.088]
56 Let G(n, p, p′ , k) be a graph which has a planted G(k, p′ ) graph on the first k vertices of a random G(n, p) graph. [sent-623, score-0.754]
57 Indeed one can show that the graph with a planted subgraph is also a SVM − ϑ graph. [sent-628, score-0.83]
58 As in the planted clique case the Algorithm 2 recovers the planted subgraph. [sent-646, score-0.951]
59 For the planted clique case one needs to consider d = − random variables. [sent-682, score-0.61]
60 Experimental Evaluation In Section 3 an algorithm was proposed which was capable of discovering a large common dense subgraph in a collection of graphs. [sent-690, score-0.48]
61 2 an algorithm for discovering a large planted clique in a single graph was discussed. [sent-692, score-0.796]
62 We also investigate a thresholding heuristic which improves induced subgraph density at the cost of subgraph size. [sent-703, score-0.703]
63 We evaluate the algorithm on following class of graphs c-fat graphs which are based on fault diagnosis problems and are relatively sparse; and, p hat graphs which are generalizations of Erd¨ so R´ nyi random graphs; and are characerized by wider degree spread compared to classical G(n, p) e 2. [sent-709, score-0.689]
64 For the families of dense graphs (brock, san, sanr), we focus on finding large dense region in the complement of the original graphs. [sent-714, score-0.508]
65 In order to evaluate the performance of our algorithm, we compute a = maxl a(l) and a = ¯ (l) where a(l) = γ(G(l) )/γ(G(l) ) is relative density of induced subgraph (compared to original minl a T graph density); and nT /N is relative size of induced subgraph compared to original graph size. [sent-718, score-1.124]
66 0017 Table 1: Common dense subgraph recovery on multiple graphs in DIMACS data set. [sent-748, score-0.649]
67 Here a and ¯ a denote the maximum and minimum relative density of the induced subgraph (relative to density of the original graph); nSV and nT denotes the number of support vectors and size of subset T ⊆ SV returned by Algorithm 1. [sent-749, score-0.448]
68 We note that our algorithm finds a large subgraph (large nT /N) with higher density compared to original graph in all of DIMACS graph classes making it suitable for finding large dense regions in multiple graphs. [sent-752, score-0.86]
69 We note that traditional set enumerative methods fail to handle dense subgraph recovery for the case when nT /N is large. [sent-753, score-0.472]
70 The results in Table 1 show that in case of c-fat and p hat graph families, the induced subgraph density is significantly improved (evidenced by high a and a); and, the number of support vectors ¯ nSV is a large fraction of N (nSV /N ≃ 1/2). [sent-756, score-0.618]
71 Here a and a denote ¯ the maximum and minimum relative density of the induced subgraph (relative to density of the original graph). [sent-793, score-0.448]
72 ¯ ¯ On the other hand, for brock and san graph families, the number of support vectors is equal to (l) the overall graph size nSV ≃ N; and consequently the relative density is 1, that is, γ(GSV ) = γ(G(l) ) which is not interesting. [sent-795, score-0.446]
73 (l) Figure 2 shows the variation in density of the induced subgraph γ(GSc ) relative to original graph density γ(G(l) ) for all graphs G(l) ∈ G at increasing subgraph sizes for the largest graph (resp. [sent-800, score-1.332]
74 Notice that in case of c-fat and p hat graph families (Figures 2(a) and 2(c)), one can further improve graph densities across all graphs G(l) ∈ G by choosing a higher value of c (and correspondingly, a smaller induced subgraph |Sc |). [sent-804, score-0.99]
75 We note that minimum and maximum induced subgraph density improves by choosing Sc instead of T , that is, a(Sc ) ≥ a(T ) and a(Sc ) ≥ a(T ) for ¯ ¯ all graph families. [sent-810, score-0.586]
76 It can be seen from Figure 2 that the induced subgraph density is not strictly monotonic with induced subgraph size |Sc |. [sent-811, score-0.752]
77 Figures (a) − (e) show the densities of the induced (l) subgraph γ(GSc ) relative to original graph density γ(G(l) ) for all graphs G(l) ∈ G at different values of c ∈ [0, 1] (i. [sent-831, score-0.795]
78 , different subgraph sizes |Sc |) for different DIMACS graph families. [sent-833, score-0.489]
79 Thus, one can choose a smaller induced subgraph Sc having higher induced subgraph density by selecting higher value of threshold c. [sent-835, score-0.752]
80 It is instructive to note that in all graph families, the graph with maximum relative density, for example, c-fat500 in Figure 2(a) is the graph with minimum average density among all graph G. [sent-836, score-0.792]
81 In other words, MKL-based approach tries to find a dense region in the sparsest subgraph G(l) ∈ G while making sure it is compatible with remaining graphs in G. [sent-837, score-0.649]
82 3 Planted Clique Recovery on Random Graphs We consider the case for Erdos-Renyi graph with general p = 1/2 and planted clique of size k, that is, G(n, p, k). [sent-839, score-0.796]
83 1 DATA S ET ¯ We generate ns = 100 random graphs based on G(n, 1 − p, k) with planted independent set of size 1 −α k = 2t n(1 − p)/p and p = 2 n where n = 20000 and α ≥ 0. [sent-842, score-0.55]
84 Note that the case of α = 0 yields √ the familiar planted clique model G(n, 1/2, k) with k = 2t n. [sent-847, score-0.61]
85 5 4 (b) Figure 3: (a) shows fr the fraction of graphs for which the hidden clique is recovered exactly; and, the average F1 -score measuring quality of recovered subset over ns trials at each t √ (k = 2t n). [sent-856, score-0.504]
86 3 D ISCUSSION As predicted by Theorem 23, there exists some t0 > 0 for which Lov´ sz ϑ function is bounded a by ω(K); and the planted clique can be recovered perfectly by selecting the top k support vectors in sorted descending order of α∗ . [sent-892, score-0.803]
87 We find experimentally that this approach recovers the planted i ¯ clique exactly for t ≥ 2 for all c ∈ {0, . [sent-893, score-0.61]
88 , 10}, that is, random graph G(n, 1 − p, k) with p = 1 n−α 2 and planted independent set of size k = 2t n(1 − p)/p. [sent-896, score-0.527]
89 In particular we discuss the case c = 0 which yields the Erd¨ s-R´ nyi graph with p = 1/2 and o e √ planted clique of size k = 2t n. [sent-897, score-0.826]
90 Figure 3(a) shows the fraction of graphs for which the hidden clique is recovered exactly using above procedure. [sent-898, score-0.504]
91 Extended Results on DIMACS Graph Families (l) Figure 5 shows the densities of the induced subgraph γ(GSc ) relative to original graph density γ(G(l) ) for all graphs G(l) ∈ G at different values of c ∈ [0, 1] (i. [sent-912, score-0.795]
92 , different subgraph sizes |Sc |) for remaining graphs (other than those presented in Figure 2) in different DIMACS graph families. [sent-914, score-0.698]
93 Some Results Related to G(n, 1 − p, k) In this section we derive two results related to the planted clique problem. [sent-916, score-0.61]
94 Figures (a) − (i) show the densities of the induced (l) subgraph γ(GSc ) relative to original graph density γ(G(l) ) for all graphs G(l) ∈ G at different values of c ∈ [0, 1] (i. [sent-956, score-0.795]
95 , different subgraph sizes |Sc |) for remaining graphs in different DIMACS graph families. [sent-958, score-0.698]
96 Using this together with Lemma 31, we get that G is a Q graph with probability 1 − O( 1 ) if n A − EA 2 √ 9k2 + + ln n + p ≤ kp − 6kp ln n. [sent-995, score-0.494]
97 kp − 16n kp Application of Lemma 33 yields A − EA graph if the following holds, 9k2 p ≥ 16n 2 = O (np(1 − p)). [sent-996, score-0.518]
98 k (32) Next we note that for k ≥ 8 n2/3 p−1/3 (ln n)1/3 the following holds 3 √ n 6kp ln n + ln n + O = k ≤ n 1 + k1/2 p1/2 k3/2 p1/2 1 1 6kp ln n 1 + O( 1/3 1/3 + ) 1/6 n p (ln n) (ln n)1/2 = 6kp ln n (1 + o(1)) 6kp ln n 1 + O where np = Ω(log4 n) by assumption in the theorem. [sent-998, score-0.467]
99 This means that, almost surely √ O kp ln n kp √ √ − 1| = =O max |pri − 1| = | i kp + O kp ln n kp + O kp ln n where we use that ln n ≪ kp as noted above. [sent-1035, score-1.47]
100 High-dimensional random geometric graphs and o their clique number. [sent-1136, score-0.478]
wordName wordTfidf (topN-words)
[('planted', 0.341), ('subgraph', 0.303), ('clique', 0.269), ('graphs', 0.209), ('lov', 0.193), ('sz', 0.193), ('graph', 0.186), ('kp', 0.166), ('artinsson', 0.159), ('ethava', 0.159), ('hattacharyya', 0.159), ('ubhashi', 0.159), ('labelling', 0.156), ('asz', 0.152), ('ense', 0.152), ('ubgraphs', 0.152), ('dense', 0.137), ('svm', 0.135), ('sc', 0.134), ('ov', 0.13), ('sv', 0.126), ('inding', 0.117), ('np', 0.112), ('labellings', 0.099), ('nsv', 0.099), ('alpha', 0.097), ('kkt', 0.095), ('adjacency', 0.095), ('gsc', 0.091), ('luz', 0.083), ('nt', 0.08), ('pri', 0.076), ('dimacs', 0.076), ('ln', 0.071), ('mkl', 0.07), ('gsv', 0.068), ('subgraphs', 0.066), ('ea', 0.064), ('gs', 0.058), ('feige', 0.058), ('sdp', 0.057), ('ai', 0.054), ('krauthgamer', 0.052), ('ni', 0.051), ('cliques', 0.05), ('induced', 0.049), ('density', 0.048), ('pei', 0.047), ('cqp', 0.045), ('deg', 0.045), ('orthogonal', 0.045), ('ls', 0.043), ('gt', 0.043), ('vertices', 0.041), ('ri', 0.04), ('common', 0.04), ('erd', 0.039), ('chalmers', 0.038), ('dubhashi', 0.038), ('gq', 0.038), ('eigenvalue', 0.038), ('jiang', 0.037), ('whenever', 0.034), ('ui', 0.033), ('schrijver', 0.032), ('enumerative', 0.032), ('hat', 0.032), ('pr', 0.032), ('lemma', 0.031), ('nc', 0.03), ('lab', 0.03), ('nyi', 0.03), ('theorem', 0.03), ('di', 0.029), ('vertex', 0.029), ('hush', 0.029), ('ek', 0.029), ('mini', 0.028), ('nding', 0.027), ('conv', 0.027), ('mcdiarmid', 0.027), ('brock', 0.026), ('bollob', 0.026), ('hidden', 0.026), ('regime', 0.026), ('finding', 0.026), ('johnson', 0.026), ('families', 0.025), ('min', 0.025), ('hoffman', 0.025), ('surely', 0.024), ('establishes', 0.024), ('frieze', 0.023), ('matrix', 0.023), ('ij', 0.023), ('css', 0.023), ('garey', 0.023), ('jethava', 0.023), ('juh', 0.023), ('kucera', 0.023)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999994 64 jmlr-2013-Lovasz theta function, SVMs and Finding Dense Subgraphs
Author: Vinay Jethava, Anders Martinsson, Chiranjib Bhattacharyya, Devdatt Dubhashi
Abstract: In this paper we establish that the Lov´ sz ϑ function on a graph can be restated as a kernel learning a problem. We introduce the notion of SVM − ϑ graphs, on which Lov´ sz ϑ function can be apa proximated well by a Support vector machine (SVM). We show that Erd¨ s-R´ nyi random G(n, p) o e log4 n np graphs are SVM − ϑ graphs for n ≤ p < 1. Even if we embed a large clique of size Θ 1−p in a G(n, p) graph the resultant graph still remains a SVM − ϑ graph. This immediately suggests an SVM based algorithm for recovering a large planted clique in random graphs. Associated with the ϑ function is the notion of orthogonal labellings. We introduce common orthogonal labellings which extends the idea of orthogonal labellings to multiple graphs. This allows us to propose a Multiple Kernel learning (MKL) based solution which is capable of identifying a large common dense subgraph in multiple graphs. Both in the planted clique case and common subgraph detection problem the proposed solutions beat the state of the art by an order of magnitude. Keywords: orthogonal labellings of graphs, planted cliques, random graphs, common dense subgraph
2 0.10767374 119 jmlr-2013-Variable Selection in High-Dimension with Random Designs and Orthogonal Matching Pursuit
Author: Antony Joseph
Abstract: The performance of orthogonal matching pursuit (OMP) for variable selection is analyzed for random designs. When contrasted with the deterministic case, since the performance is here measured after averaging over the distribution of the design matrix, one can have far less stringent sparsity constraints on the coefficient vector. We demonstrate that for exact sparse vectors, the performance of the OMP is similar to known results on the Lasso algorithm (Wainwright, 2009). Moreover, variable selection under a more relaxed sparsity assumption on the coefficient vector, whereby one has only control on the ℓ1 norm of the smaller coefficients, is also analyzed. As consequence of these results, we also show that the coefficient estimate satisfies strong oracle type inequalities. Keywords: high dimensional regression, greedy algorithms, Lasso, compressed sensing
3 0.094528399 93 jmlr-2013-Random Walk Kernels and Learning Curves for Gaussian Process Regression on Random Graphs
Author: Matthew J. Urry, Peter Sollich
Abstract: We consider learning on graphs, guided by kernels that encode similarity between vertices. Our focus is on random walk kernels, the analogues of squared exponential kernels in Euclidean spaces. We show that on large, locally treelike graphs these have some counter-intuitive properties, specifically in the limit of large kernel lengthscales. We consider using these kernels as covariance functions of Gaussian processes. In this situation one typically scales the prior globally to normalise the average of the prior variance across vertices. We demonstrate that, in contrast to the Euclidean case, this generically leads to significant variation in the prior variance across vertices, which is undesirable from a probabilistic modelling point of view. We suggest the random walk kernel should be normalised locally, so that each vertex has the same prior variance, and analyse the consequences of this by studying learning curves for Gaussian process regression. Numerical calculations as well as novel theoretical predictions for the learning curves using belief propagation show that one obtains distinctly different probabilistic models depending on the choice of normalisation. Our method for predicting the learning curves using belief propagation is significantly more accurate than previous approximations and should become exact in the limit of large random graphs. Keywords: Gaussian process, generalisation error, learning curve, cavity method, belief propagation, graph, random walk kernel
4 0.080822796 111 jmlr-2013-Supervised Feature Selection in Graphs with Path Coding Penalties and Network Flows
Author: Julien Mairal, Bin Yu
Abstract: We consider supervised learning problems where the features are embedded in a graph, such as gene expressions in a gene network. In this context, it is of much interest to automatically select a subgraph with few connected components; by exploiting prior knowledge, one can indeed improve the prediction performance or obtain results that are easier to interpret. Regularization or penalty functions for selecting features in graphs have recently been proposed, but they raise new algorithmic challenges. For example, they typically require solving a combinatorially hard selection problem among all connected subgraphs. In this paper, we propose computationally feasible strategies to select a sparse and well-connected subset of features sitting on a directed acyclic graph (DAG). We introduce structured sparsity penalties over paths on a DAG called “path coding” penalties. Unlike existing regularization functions that model long-range interactions between features in a graph, path coding penalties are tractable. The penalties and their proximal operators involve path selection problems, which we efficiently solve by leveraging network flow optimization. We experimentally show on synthetic, image, and genomic data that our approach is scalable and leads to more connected subgraphs than other regularization functions for graphs. Keywords: convex and non-convex optimization, network flow optimization, graph sparsity
5 0.077714287 99 jmlr-2013-Semi-Supervised Learning Using Greedy Max-Cut
Author: Jun Wang, Tony Jebara, Shih-Fu Chang
Abstract: Graph-based semi-supervised learning (SSL) methods play an increasingly important role in practical machine learning systems, particularly in agnostic settings when no parametric information or other prior knowledge is available about the data distribution. Given the constructed graph represented by a weight matrix, transductive inference is used to propagate known labels to predict the values of all unlabeled vertices. Designing a robust label diffusion algorithm for such graphs is a widely studied problem and various methods have recently been suggested. Many of these can be formalized as regularized function estimation through the minimization of a quadratic cost. However, most existing label diffusion methods minimize a univariate cost with the classification function as the only variable of interest. Since the observed labels seed the diffusion process, such univariate frameworks are extremely sensitive to the initial label choice and any label noise. To alleviate the dependency on the initial observed labels, this article proposes a bivariate formulation for graph-based SSL, where both the binary label information and a continuous classification function are arguments of the optimization. This bivariate formulation is shown to be equivalent to a linearly constrained Max-Cut problem. Finally an efficient solution via greedy gradient Max-Cut (GGMC) is derived which gradually assigns unlabeled vertices to each class with minimum connectivity. Once convergence guarantees are established, this greedy Max-Cut based SSL is applied on both artificial and standard benchmark data sets where it obtains superior classification accuracy compared to existing state-of-the-art SSL methods. Moreover, GGMC shows robustness with respect to the graph construction method and maintains high accuracy over extensive experiments with various edge linking and weighting schemes. Keywords: graph transduction, semi-supervised learning, bivariate formulation, mixed integer programming, greedy
6 0.074855 92 jmlr-2013-Random Spanning Trees and the Prediction of Weighted Graphs
7 0.069553405 29 jmlr-2013-Convex and Scalable Weakly Labeled SVMs
8 0.056112073 71 jmlr-2013-Message-Passing Algorithms for Quadratic Minimization
9 0.05183699 84 jmlr-2013-PC Algorithm for Nonparanormal Graphical Models
10 0.048611209 116 jmlr-2013-Truncated Power Method for Sparse Eigenvalue Problems
11 0.048088886 36 jmlr-2013-Distributions of Angles in Random Packing on Spheres
12 0.046492621 35 jmlr-2013-Distribution-Dependent Sample Complexity of Large Margin Learning
13 0.046106987 114 jmlr-2013-The Rate of Convergence of AdaBoost
14 0.044757541 54 jmlr-2013-JKernelMachines: A Simple Framework for Kernel Machines
15 0.041999754 109 jmlr-2013-Stress Functions for Nonlinear Dimension Reduction, Proximity Analysis, and Graph Drawing
16 0.040137663 6 jmlr-2013-A Plug-in Approach to Neyman-Pearson Classification
17 0.040104285 102 jmlr-2013-Sparse Matrix Inversion with Scaled Lasso
19 0.039754175 41 jmlr-2013-Experiment Selection for Causal Discovery
20 0.039661285 19 jmlr-2013-BudgetedSVM: A Toolbox for Scalable SVM Approximations
topicId topicWeight
[(0, -0.193), (1, 0.08), (2, 0.014), (3, 0.034), (4, 0.157), (5, 0.131), (6, 0.047), (7, 0.025), (8, -0.013), (9, -0.143), (10, 0.049), (11, 0.019), (12, 0.006), (13, 0.119), (14, 0.105), (15, 0.254), (16, 0.059), (17, -0.065), (18, 0.024), (19, -0.27), (20, 0.042), (21, -0.097), (22, -0.038), (23, -0.043), (24, -0.038), (25, 0.069), (26, -0.001), (27, 0.077), (28, -0.098), (29, -0.085), (30, -0.064), (31, 0.034), (32, 0.064), (33, 0.035), (34, 0.155), (35, 0.082), (36, 0.177), (37, 0.052), (38, -0.116), (39, 0.043), (40, 0.059), (41, -0.066), (42, 0.064), (43, -0.014), (44, -0.034), (45, -0.01), (46, 0.045), (47, -0.026), (48, 0.119), (49, -0.076)]
simIndex simValue paperId paperTitle
same-paper 1 0.93692422 64 jmlr-2013-Lovasz theta function, SVMs and Finding Dense Subgraphs
Author: Vinay Jethava, Anders Martinsson, Chiranjib Bhattacharyya, Devdatt Dubhashi
Abstract: In this paper we establish that the Lov´ sz ϑ function on a graph can be restated as a kernel learning a problem. We introduce the notion of SVM − ϑ graphs, on which Lov´ sz ϑ function can be apa proximated well by a Support vector machine (SVM). We show that Erd¨ s-R´ nyi random G(n, p) o e log4 n np graphs are SVM − ϑ graphs for n ≤ p < 1. Even if we embed a large clique of size Θ 1−p in a G(n, p) graph the resultant graph still remains a SVM − ϑ graph. This immediately suggests an SVM based algorithm for recovering a large planted clique in random graphs. Associated with the ϑ function is the notion of orthogonal labellings. We introduce common orthogonal labellings which extends the idea of orthogonal labellings to multiple graphs. This allows us to propose a Multiple Kernel learning (MKL) based solution which is capable of identifying a large common dense subgraph in multiple graphs. Both in the planted clique case and common subgraph detection problem the proposed solutions beat the state of the art by an order of magnitude. Keywords: orthogonal labellings of graphs, planted cliques, random graphs, common dense subgraph
2 0.5399313 93 jmlr-2013-Random Walk Kernels and Learning Curves for Gaussian Process Regression on Random Graphs
Author: Matthew J. Urry, Peter Sollich
Abstract: We consider learning on graphs, guided by kernels that encode similarity between vertices. Our focus is on random walk kernels, the analogues of squared exponential kernels in Euclidean spaces. We show that on large, locally treelike graphs these have some counter-intuitive properties, specifically in the limit of large kernel lengthscales. We consider using these kernels as covariance functions of Gaussian processes. In this situation one typically scales the prior globally to normalise the average of the prior variance across vertices. We demonstrate that, in contrast to the Euclidean case, this generically leads to significant variation in the prior variance across vertices, which is undesirable from a probabilistic modelling point of view. We suggest the random walk kernel should be normalised locally, so that each vertex has the same prior variance, and analyse the consequences of this by studying learning curves for Gaussian process regression. Numerical calculations as well as novel theoretical predictions for the learning curves using belief propagation show that one obtains distinctly different probabilistic models depending on the choice of normalisation. Our method for predicting the learning curves using belief propagation is significantly more accurate than previous approximations and should become exact in the limit of large random graphs. Keywords: Gaussian process, generalisation error, learning curve, cavity method, belief propagation, graph, random walk kernel
3 0.52924246 119 jmlr-2013-Variable Selection in High-Dimension with Random Designs and Orthogonal Matching Pursuit
Author: Antony Joseph
Abstract: The performance of orthogonal matching pursuit (OMP) for variable selection is analyzed for random designs. When contrasted with the deterministic case, since the performance is here measured after averaging over the distribution of the design matrix, one can have far less stringent sparsity constraints on the coefficient vector. We demonstrate that for exact sparse vectors, the performance of the OMP is similar to known results on the Lasso algorithm (Wainwright, 2009). Moreover, variable selection under a more relaxed sparsity assumption on the coefficient vector, whereby one has only control on the ℓ1 norm of the smaller coefficients, is also analyzed. As consequence of these results, we also show that the coefficient estimate satisfies strong oracle type inequalities. Keywords: high dimensional regression, greedy algorithms, Lasso, compressed sensing
4 0.50250584 92 jmlr-2013-Random Spanning Trees and the Prediction of Weighted Graphs
Author: Nicolò Cesa-Bianchi, Claudio Gentile, Fabio Vitale, Giovanni Zappella
Abstract: We investigate the problem of sequentially predicting the binary labels on the nodes of an arbitrary weighted graph. We show that, under a suitable parametrization of the problem, the optimal number of prediction mistakes can be characterized (up to logarithmic factors) by the cutsize of a random spanning tree of the graph. The cutsize is induced by the unknown adversarial labeling of the graph nodes. In deriving our characterization, we obtain a simple randomized algorithm achieving in expectation the optimal mistake bound on any polynomially connected weighted graph. Our algorithm draws a random spanning tree of the original graph and then predicts the nodes of this tree in constant expected amortized time and linear space. Experiments on real-world data sets show that our method compares well to both global (Perceptron) and local (label propagation) methods, while being generally faster in practice. Keywords: online learning, learning on graphs, graph prediction, random spanning trees
5 0.4318957 99 jmlr-2013-Semi-Supervised Learning Using Greedy Max-Cut
Author: Jun Wang, Tony Jebara, Shih-Fu Chang
Abstract: Graph-based semi-supervised learning (SSL) methods play an increasingly important role in practical machine learning systems, particularly in agnostic settings when no parametric information or other prior knowledge is available about the data distribution. Given the constructed graph represented by a weight matrix, transductive inference is used to propagate known labels to predict the values of all unlabeled vertices. Designing a robust label diffusion algorithm for such graphs is a widely studied problem and various methods have recently been suggested. Many of these can be formalized as regularized function estimation through the minimization of a quadratic cost. However, most existing label diffusion methods minimize a univariate cost with the classification function as the only variable of interest. Since the observed labels seed the diffusion process, such univariate frameworks are extremely sensitive to the initial label choice and any label noise. To alleviate the dependency on the initial observed labels, this article proposes a bivariate formulation for graph-based SSL, where both the binary label information and a continuous classification function are arguments of the optimization. This bivariate formulation is shown to be equivalent to a linearly constrained Max-Cut problem. Finally an efficient solution via greedy gradient Max-Cut (GGMC) is derived which gradually assigns unlabeled vertices to each class with minimum connectivity. Once convergence guarantees are established, this greedy Max-Cut based SSL is applied on both artificial and standard benchmark data sets where it obtains superior classification accuracy compared to existing state-of-the-art SSL methods. Moreover, GGMC shows robustness with respect to the graph construction method and maintains high accuracy over extensive experiments with various edge linking and weighting schemes. Keywords: graph transduction, semi-supervised learning, bivariate formulation, mixed integer programming, greedy
6 0.40642506 29 jmlr-2013-Convex and Scalable Weakly Labeled SVMs
7 0.38045698 109 jmlr-2013-Stress Functions for Nonlinear Dimension Reduction, Proximity Analysis, and Graph Drawing
8 0.34639165 111 jmlr-2013-Supervised Feature Selection in Graphs with Path Coding Penalties and Network Flows
9 0.32934132 71 jmlr-2013-Message-Passing Algorithms for Quadratic Minimization
10 0.30676746 19 jmlr-2013-BudgetedSVM: A Toolbox for Scalable SVM Approximations
11 0.29782432 36 jmlr-2013-Distributions of Angles in Random Packing on Spheres
12 0.29549232 110 jmlr-2013-Sub-Local Constraint-Based Learning of Bayesian Networks Using A Joint Dependence Criterion
13 0.28958911 41 jmlr-2013-Experiment Selection for Causal Discovery
14 0.28003904 54 jmlr-2013-JKernelMachines: A Simple Framework for Kernel Machines
15 0.26847792 6 jmlr-2013-A Plug-in Approach to Neyman-Pearson Classification
16 0.26588565 116 jmlr-2013-Truncated Power Method for Sparse Eigenvalue Problems
17 0.24917772 84 jmlr-2013-PC Algorithm for Nonparanormal Graphical Models
18 0.24682193 61 jmlr-2013-Learning Theory Analysis for Association Rules and Sequential Event Prediction
19 0.23127402 35 jmlr-2013-Distribution-Dependent Sample Complexity of Large Margin Learning
topicId topicWeight
[(0, 0.024), (5, 0.13), (6, 0.036), (10, 0.056), (14, 0.026), (20, 0.01), (23, 0.037), (29, 0.409), (68, 0.039), (70, 0.021), (75, 0.024), (85, 0.04), (87, 0.023), (89, 0.015)]
simIndex simValue paperId paperTitle
same-paper 1 0.66649425 64 jmlr-2013-Lovasz theta function, SVMs and Finding Dense Subgraphs
Author: Vinay Jethava, Anders Martinsson, Chiranjib Bhattacharyya, Devdatt Dubhashi
Abstract: In this paper we establish that the Lov´ sz ϑ function on a graph can be restated as a kernel learning a problem. We introduce the notion of SVM − ϑ graphs, on which Lov´ sz ϑ function can be apa proximated well by a Support vector machine (SVM). We show that Erd¨ s-R´ nyi random G(n, p) o e log4 n np graphs are SVM − ϑ graphs for n ≤ p < 1. Even if we embed a large clique of size Θ 1−p in a G(n, p) graph the resultant graph still remains a SVM − ϑ graph. This immediately suggests an SVM based algorithm for recovering a large planted clique in random graphs. Associated with the ϑ function is the notion of orthogonal labellings. We introduce common orthogonal labellings which extends the idea of orthogonal labellings to multiple graphs. This allows us to propose a Multiple Kernel learning (MKL) based solution which is capable of identifying a large common dense subgraph in multiple graphs. Both in the planted clique case and common subgraph detection problem the proposed solutions beat the state of the art by an order of magnitude. Keywords: orthogonal labellings of graphs, planted cliques, random graphs, common dense subgraph
Author: Alberto N. Escalante-B., Laurenz Wiskott
Abstract: Supervised learning from high-dimensional data, for example, multimedia data, is a challenging task. We propose an extension of slow feature analysis (SFA) for supervised dimensionality reduction called graph-based SFA (GSFA). The algorithm extracts a label-predictive low-dimensional set of features that can be post-processed by typical supervised algorithms to generate the final label or class estimation. GSFA is trained with a so-called training graph, in which the vertices are the samples and the edges represent similarities of the corresponding labels. A new weighted SFA optimization problem is introduced, generalizing the notion of slowness from sequences of samples to such training graphs. We show that GSFA computes an optimal solution to this problem in the considered function space and propose several types of training graphs. For classification, the most straightforward graph yields features equivalent to those of (nonlinear) Fisher discriminant analysis. Emphasis is on regression, where four different graphs were evaluated experimentally with a subproblem of face detection on photographs. The method proposed is promising particularly when linear models are insufficient as well as when feature selection is difficult. Keywords: slow feature analysis, feature extraction, classification, regression, pattern recognition, training graphs, nonlinear dimensionality reduction, supervised learning, implicitly supervised, high-dimensional data, image analysis
3 0.36682138 25 jmlr-2013-Communication-Efficient Algorithms for Statistical Optimization
Author: Yuchen Zhang, John C. Duchi, Martin J. Wainwright
Abstract: We analyze two communication-efficient algorithms for distributed optimization in statistical settings involving large-scale data sets. The first algorithm is a standard averaging method that distributes the N data samples evenly to m machines, performs separate minimization on each subset, and then averages the estimates. We provide a sharp analysis of this average mixture algorithm, showing that under a reasonable set of conditions, the combined parameter achieves √ mean-squared error (MSE) that decays as O (N −1 + (N/m)−2 ). Whenever m ≤ N, this guarantee matches the best possible rate achievable by a centralized algorithm having access to all N samples. The second algorithm is a novel method, based on an appropriate form of bootstrap subsampling. Requiring only a single round of communication, it has mean-squared error that decays as O (N −1 + (N/m)−3 ), and so is more robust to the amount of parallelization. In addition, we show that a stochastic gradient-based method attains mean-squared error decaying as O (N −1 + (N/m)−3/2 ), easing computation at the expense of a potentially slower MSE rate. We also provide an experimental evaluation of our methods, investigating their performance both on simulated data and on a large-scale regression problem from the internet search domain. In particular, we show that our methods can be used to efficiently solve an advertisement prediction problem from the Chinese SoSo Search Engine, which involves logistic regression with N ≈ 2.4 × 108 samples and d ≈ 740,000 covariates. Keywords: distributed learning, stochastic optimization, averaging, subsampling
4 0.36347219 105 jmlr-2013-Sparsity Regret Bounds for Individual Sequences in Online Linear Regression
Author: Sébastien Gerchinovitz
Abstract: We consider the problem of online linear regression on arbitrary deterministic sequences when the ambient dimension d can be much larger than the number of time rounds T . We introduce the notion of sparsity regret bound, which is a deterministic online counterpart of recent risk bounds derived in the stochastic setting under a sparsity scenario. We prove such regret bounds for an online-learning algorithm called SeqSEW and based on exponential weighting and data-driven truncation. In a second part we apply a parameter-free version of this algorithm to the stochastic setting (regression model with random design). This yields risk bounds of the same flavor as in Dalalyan and Tsybakov (2012a) but which solve two questions left open therein. In particular our risk bounds are adaptive (up to a logarithmic factor) to the unknown variance of the noise if the latter is Gaussian. We also address the regression model with fixed design. Keywords: sparsity, online linear regression, individual sequences, adaptive regret bounds
5 0.3622604 4 jmlr-2013-A Max-Norm Constrained Minimization Approach to 1-Bit Matrix Completion
Author: Tony Cai, Wen-Xin Zhou
Abstract: We consider in this paper the problem of noisy 1-bit matrix completion under a general non-uniform sampling distribution using the max-norm as a convex relaxation for the rank. A max-norm constrained maximum likelihood estimate is introduced and studied. The rate of convergence for the estimate is obtained. Information-theoretical methods are used to establish a minimax lower bound under the general sampling model. The minimax upper and lower bounds together yield the optimal rate of convergence for the Frobenius norm loss. Computational algorithms and numerical performance are also discussed. Keywords: 1-bit matrix completion, low-rank matrix, max-norm, trace-norm, constrained optimization, maximum likelihood estimate, optimal rate of convergence
6 0.36061615 102 jmlr-2013-Sparse Matrix Inversion with Scaled Lasso
8 0.35766977 119 jmlr-2013-Variable Selection in High-Dimension with Random Designs and Orthogonal Matching Pursuit
9 0.35734066 10 jmlr-2013-Algorithms and Hardness Results for Parallel Large Margin Learning
11 0.35717192 17 jmlr-2013-Belief Propagation for Continuous State Spaces: Stochastic Message-Passing with Quantitative Guarantees
12 0.35685861 72 jmlr-2013-Multi-Stage Multi-Task Feature Learning
13 0.35587224 48 jmlr-2013-Generalized Spike-and-Slab Priors for Bayesian Group Feature Selection Using Expectation Propagation
14 0.35521942 39 jmlr-2013-Efficient Active Learning of Halfspaces: An Aggressive Approach
15 0.35377496 114 jmlr-2013-The Rate of Convergence of AdaBoost
16 0.35300371 50 jmlr-2013-Greedy Feature Selection for Subspace Clustering
17 0.35222179 28 jmlr-2013-Construction of Approximation Spaces for Reinforcement Learning
18 0.35194719 73 jmlr-2013-Multicategory Large-Margin Unified Machines
19 0.3517929 9 jmlr-2013-A Widely Applicable Bayesian Information Criterion
20 0.35047677 14 jmlr-2013-Asymptotic Results on Adaptive False Discovery Rate Controlling Procedures Based on Kernel Estimators