nips nips2013 nips2013-207 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: James L. Sharpnack, Akshay Krishnamurthy, Aarti Singh
Abstract: The detection of anomalous activity in graphs is a statistical problem that arises in many applications, such as network surveillance, disease outbreak detection, and activity monitoring in social networks. Beyond its wide applicability, graph structured anomaly detection serves as a case study in the difficulty of balancing computational complexity with statistical power. In this work, we develop from first principles the generalized likelihood ratio test for determining if there is a well connected region of activation over the vertices in the graph in Gaussian noise. Because this test is computationally infeasible, we provide a relaxation, called the Lov´ sz extended scan statistic (LESS) that uses submodularity to approximate the a intractable generalized likelihood ratio. We demonstrate a connection between LESS and maximum a-posteriori inference in Markov random fields, which provides us with a poly-time algorithm for LESS. Using electrical network theory, we are able to control type 1 error for LESS and prove conditions under which LESS is risk consistent. Finally, we consider specific graph models, the torus, knearest neighbor graphs, and ǫ-random graphs. We show that on these graphs our results provide near-optimal performance by matching our results to known lower bounds. 1
Reference: text
sentIndex sentText sentNum sentScore
1 Near-optimal Anomaly Detection in Graphs using Lov´ sz Extended Scan Statistic a Akshay Krishnamurthy Computer Science Department Carnegie Mellon University Pittsburgh, PA 15213 akshaykr@cs. [sent-1, score-0.164]
2 edu Abstract The detection of anomalous activity in graphs is a statistical problem that arises in many applications, such as network surveillance, disease outbreak detection, and activity monitoring in social networks. [sent-6, score-0.765]
3 Beyond its wide applicability, graph structured anomaly detection serves as a case study in the difficulty of balancing computational complexity with statistical power. [sent-7, score-0.405]
4 In this work, we develop from first principles the generalized likelihood ratio test for determining if there is a well connected region of activation over the vertices in the graph in Gaussian noise. [sent-8, score-0.463]
5 Because this test is computationally infeasible, we provide a relaxation, called the Lov´ sz extended scan statistic (LESS) that uses submodularity to approximate the a intractable generalized likelihood ratio. [sent-9, score-1.144]
6 Finally, we consider specific graph models, the torus, knearest neighbor graphs, and ǫ-random graphs. [sent-12, score-0.205]
7 We show that on these graphs our results provide near-optimal performance by matching our results to known lower bounds. [sent-13, score-0.162]
8 1 Introduction Detecting anomalous activity refers to determining if we are observing merely noise (business as usual) or if there is some signal in the noise (anomalous activity). [sent-14, score-0.299]
9 Classically, anomaly detection focused on identifying rare behaviors and aberrant bursts in activity over a single data source or channel. [sent-15, score-0.323]
10 With this in mind, statistics needs to comprehensively address the detection of anomalous activity in graphs. [sent-17, score-0.404]
11 In this paper, we will study the detection of elevated activity in a graph with Gaussian noise. [sent-18, score-0.528]
12 By exploiting the sensor network structure 1 (based on proximity), one can detect activity in networks when the activity is very faint. [sent-21, score-0.34]
13 Recent theoretical contributions in the statistical literature[1, 2] have detailed the inherent difficulty of such a testing problem but have positive results only under restrictive conditions on the graph topology. [sent-22, score-0.277]
14 By combining knowledge from high-dimensional statistics, graph theory and mathematical programming, the characterization of detection algorithms over any graph topology by their statistical properties is possible. [sent-23, score-0.549]
15 Due to the combinatorial nature of graph based methods, problems can easily shift from having polynomial-time algorithms to having running times exponential in the size of the graph. [sent-25, score-0.273]
16 The applications of graph structured inference require that any method be scalable to large graphs. [sent-26, score-0.205]
17 1 Problem Setup Consider a connected, possibly weighted, directed graph G defined by a set of vertices V (|V | = p) and directed edges E (|E| = m) which are ordered pairs of vertices. [sent-29, score-0.293]
18 In other words, the set of activated vertices C have a small cut size in the graph G. [sent-37, score-0.329]
19 If such a test exists then H0 and H1 are asymptotically distinguishable, otherwise they are asymptotically indistinguishable (which occurs whenever the risk does not tend to 0). [sent-50, score-0.195]
20 We will be characterizing regimes for µ in which our test asymptotically distinguishes H0 from H1 . [sent-51, score-0.162]
21 2 Contributions Section 3 highlights what is known about the hypothesis testing problem 2, particularly we provide a regime for µ in which H0 and H1 are asymptotically indistinguishable. [sent-62, score-0.188]
22 1, we derive the graph scan statistic from the generalized likelihood ratio principle which we show to be a computationally intractable procedure. [sent-64, score-1.105]
23 2, we provide a relaxation of the graph scan statistic (GSS), the Lov´ sz extended scan statistic (LESS), and we show that it can be computed with suca cessive minimum s − t cut programs (a graph cut that separates a source vertex from a sink vertex). [sent-66, score-2.351]
24 In section 6, we show that GSS and LESS can asymptotically distinguish H0 and H1 in signal-to-noise ratios close to the lowest possible for some important graph models. [sent-68, score-0.321]
25 We will return to MRFs in a later section, as it will relate to our scan statistic. [sent-73, score-0.46]
26 The study of kernels over graphs began with the development of diffusion kernels [6], and was extended through Green’s functions on graphs [7]. [sent-75, score-0.438]
27 To the best of our knowledge, this paper is the first connection made between anomaly detection and MRFs. [sent-77, score-0.2]
28 Only recently have combinatorial structures such as graphs been proposed as the underlying structure of H1 . [sent-81, score-0.23]
29 In spatial statistics, it is common, when searching for anomalous activity to scan over regions in the spatial domain, testing for elevated activity[12, 13]. [sent-84, score-0.858]
30 There have been scan statistics proposed for graphs, most notably the work of [14] in which the authors scan over neighborhoods of the graphs defined by the graph distance. [sent-85, score-1.287]
31 Other work has been done on the theory and algorithms for scan statistics over specific graph models, but are not easily generalizable to arbitrary graphs [15, 1]. [sent-86, score-0.827]
32 More recently, it has been found that scanning over all well connected regions of a graph can be computationally intractable, and so approximations to the intractable likelihood-based procedure have been studied [16, 17]. [sent-87, score-0.3]
33 We follow in this line of work, with a relaxation to the intractable generalized likelihood ratio test. [sent-88, score-0.244]
34 (2) are asymptotically indistinguishable if µ=o min ρ dmax log where dmax is the maximum degree of graph G. [sent-93, score-0.424]
35 3 pd2 max ρ2 , √ p Now that a regime of asymptotic indistinguishability has been established, it is instructive to consider test statistics that do not take the graph into account (viz. [sent-94, score-0.305]
36 the statistics are unaffected by a change in the graph structure). [sent-95, score-0.205]
37 (2) The sum test statistic, v∈[p] yv , asymptotically distinguishes H0 from H1 if µ = ω(p/|C|). [sent-100, score-0.232]
38 As opposed to these naive tests one can scan over all clusters in C performing individual likelihood ratio tests. [sent-101, score-0.549]
39 This is called the scan statistic, and it is known to be a computationally intractable combinatorial optimization. [sent-102, score-0.623]
40 Previously, two alternatives to the scan statistic have been developed: the spectral scan statistic [16], and one based on the uniform spanning tree wavelet basis [17]. [sent-103, score-1.583]
41 The former is indeed a relaxation of the ideal, computationally intractable, scan statistic, but in many important graph topologies, such as the lattice, provides sub-optimal statistical performance. [sent-104, score-0.762]
42 The uniform spanning tree wavelets in effect allows one to scan over a subclass of the class, C, but tends to provide worse performance (as we will see in section 6) than that presented in this work. [sent-105, score-0.639]
43 Because the alternative is indexed by sets, C ∈ C(ρ), with a low cut size, it is reasonable that the test statistic that we will derive results from a combinatorial optimization program. [sent-108, score-0.413]
44 In fact, we will show we can express the generalized likelihood ratio (GLR) statistic in terms of a modular program with submodular constraints. [sent-109, score-0.524]
45 With this in mind, we provide a convex relaxation, using the Lov´ sz extension, to the ideal a GLR statistic. [sent-111, score-0.164]
46 Before we proceed, we will introduce the reader to submodularity and the Lov´ sz a extension. [sent-115, score-0.218]
47 ) For any set, which we may as well take to be the vertex set [p], we say that a function F : {0, 1}p → R is submodular if for any A, B ⊆ [p], F (A) + F (B) ≥ F (A ∩ B) + F (A ∪ B). [sent-117, score-0.169]
48 ) In this way, a submodular function experiences diminishing returns, as additions to large sets tend to be less dramatic than additions to small sets. [sent-119, score-0.231]
49 Moreover, for every submodular function there is a Lov´ sz extension f : [0, 1]p → R defined in the a following way: for x ∈ [0, 1]p let xji denote the ith largest element of x, then p f (x) = xj1 F ({j1 }) + i=2 (F ({j1 , . [sent-121, score-0.307]
50 The following facts about Lov´ sz extensions will be important. [sent-128, score-0.164]
51 [19] Let F be submodular and f be its Lov´ sz extension. [sent-130, score-0.267]
52 In this situation, we would employ the likelihood ratio test because by the Neyman-Pearson lemma it is the uniformly most powerful test statistic. [sent-135, score-0.179]
53 We will work with the graph scan statistic (GSS), C s = max ˆ 1⊤ y C |C| s. [sent-141, score-0.921]
54 In fact, it belongs to a class of programs, specifically modular programs with submodular constraints that is known to contain NP-hard instantiations, such as the ratio cut program and the knapsack program [18]. [sent-146, score-0.472]
55 2 Lov´ sz Extended Scan Statistic a It is common, when faced with combinatorial optimization programs that are computationally infeasible, to relax the domain from the discrete {0, 1}p to a continuous domain, such as [0, 1]p . [sent-149, score-0.324]
56 Generally, the hope is that optimizing the relaxation will approximate the combinatorial program well. [sent-150, score-0.209]
57 This will be accomplished by replacing it with its Lov´ sz extension (∇x)+ 1 ≤ ρ. [sent-152, score-0.164]
58 We then form the a relaxed program, which we will call the Lov´ sz extended scan statistic (LESS), a ⊤ ˆ = max max x y s. [sent-153, score-0.941]
59 Much of the previous work on graph structured statistical procedures assumes a Markov random field (MRF) model, in which there are discrete labels assigned to each vertex in [p], and the observed variables {yv }v∈[p] are conditionally independent given these labels. [sent-158, score-0.271]
60 Such programs are called graph-representable in [20], and are known to be solvable in the binary case with s-t graph cuts. [sent-163, score-0.265]
61 5 Theoretical Analysis So far we have developed a lower bound to the hypothesis testing problem, shown that some common detectors do not meet this guarantee, and developed the Lov´ sz extended scan statistic from a first principles. [sent-169, score-1.0]
62 Previously, electrical network theory, specifically the effective resistances of edges in the graph, has been useful in describing the theoretical performance of a detector derived from uniform spanning tree wavelets [17]. [sent-171, score-0.625]
63 As it turns out the performance of LESS is also dictated by the effective resistances of edges in the graph. [sent-172, score-0.356]
64 Effective resistances have been extensively studied in electrical network theory [23]. [sent-174, score-0.289]
65 The effective resistance between a source v ∈ V and a sink w ∈ V is the potential difference required to create a unit flow between them. [sent-182, score-0.252]
66 Hence, the effective resistance between v and w is rv,w = (δv − δw )⊤ ∆† (δv − δw ), where δv is the Dirac delta function. [sent-183, score-0.22]
67 There is a close connection between effective resistances and random spanning trees. [sent-184, score-0.393]
68 The uniform spanning tree (UST) is a random spanning tree, chosen uniformly at random from the set of all distinct spanning trees. [sent-185, score-0.352]
69 The foundational Matrix-Tree theorem [24, 23] states that the probability of an edge, e, being included in the UST is equal to the edge weight times the effective resistance We re . [sent-186, score-0.299]
70 The UST is an essential component of the proof of our main theorem, in that it provides a mechanism for unravelling the graph while still preserving the connectivity of the graph. [sent-187, score-0.205]
71 Let rC = max{ (u,v)∈E:u∈C Wu,v r(u,v) : C ∈ C} be the maximum effective resistance of the boundary of a cluster C. [sent-190, score-0.247]
72 The graph scan statistic, with probability at least 1 − α, is smaller than s≤ ˆ √ rC + 1 log p 2 2 log(p − 1) + 2 log 2 + 2 log(1/α) (5) 2. [sent-192, score-0.745]
73 Both the GSS and the LESS asymptotically distinguish H0 from H1 if µ = ω max{ rC log p, log p} σ To summarize we have established that the performance of the GSS and the LESS are dictated by the effective resistances of cuts in the graph. [sent-197, score-0.565]
74 6 may seem mysterious, the guarantee in fact nearly matches the lower bound for many graph models as we now show. [sent-199, score-0.205]
75 6 Specific Graph Models Theorem 5 shows that the effective resistance of the boundary plays a critical role in characterizing the distinguishability region of both the the GSS and LESS. [sent-200, score-0.293]
76 On specific graph families, we can compute the effective resistances precisely, leading to concrete detection guarantees that we will see nearly matches the lower bound in many cases. [sent-201, score-0.632]
77 Foster’s theorem lends evidence to the fact that the effective resistances should be much smaller than the cut size: Theorem 7. [sent-205, score-0.391]
78 (Foster’s Theorem [25, 26]) e∈E re = p − 1 Roughly speaking, the effective resistance of an edge selected uniformly at random is ≈ (p−1)/m = d−1 so the effective resistance of a cut is ≈ ρ/dave . [sent-206, score-0.558]
79 1 Edge Transitive Graphs An edge transitive graph, G, is one for which there is a graph automorphism mapping e0 to e1 for any pair of edges e0 , e1 . [sent-209, score-0.357]
80 Examples include the l-dimensional torus, the cycle, and the complete graph Kp . [sent-210, score-0.205]
81 The existence of these automorphisms implies that every edge has the same effective resistance, and by Foster’s theorem, we know that these resistances are exactly (p − 1)/m. [sent-211, score-0.335]
82 Moreover, since edge transitive graphs must be d-regular, we know that m = Θ(pd) so that re = Θ(1/d). [sent-212, score-0.279]
83 Thus as a corollary to Theorem 5 we have that both the GSS and LESS are near-optimal (optimal modulo √ logarithmic factors whenever ρ/d ≤ p) on edge transitive graphs: Corollary 8. [sent-213, score-0.182]
84 Let G be an edge-transitive graph with common degree d. [sent-214, score-0.235]
85 The two most popular such graphs are symmetric k-nearest neighbor graphs and ǫ-graphs. [sent-218, score-0.324]
86 To form a k-nearest neighbor graph Gk , we associate each vertex i with a point zi and we connect vertices i, j if zi is amongst the k-nearest neighbors, in ℓ2 , of zj or vice versa. [sent-228, score-0.324]
87 The graphs used are the square 2D Torus, kNN graph (k ≈ p1/4 ), and ǫ-graph (with ǫ ≈ p−1/3 ); with µ = 4, 4, 3 respectively, p = 225, and |C| ≈ p1/2 . [sent-231, score-0.367]
88 Let Gk be a k-NN graph with k/p → 0, k(k/p)2/D → ∞ and suppose the density f meets the regularity conditions in [27]. [sent-233, score-0.24]
89 Then both the GSS and LESS distinguish H0 from H1 provided that: µ = ω max{ ρ/k log p, log p} If Gǫ is an ǫ-graph with ǫ → 0, nǫD+2 → ∞ then both distinguish H0 from H1 provided that: µ = ω max ρ log p, log p pǫD The corollary follows immediately form Corollary 6 and the proofs in [17]. [sent-234, score-0.334]
90 Since under the regularity conditions, the maximum degree is Θ(k) and Θ(pǫD ) in k-NN and ǫ-graphs respectively, the √ corollary establishes the near optimality (again provided that ρ/d ≤ p) of both test statistics. [sent-235, score-0.175]
91 Each experiment is made with graphs with 225 vertices, and we report the true positive rate versus the false positive rate as the threshold varies (also known as the ROC. [sent-238, score-0.197]
92 ) For each graph model, LESS provides gains over the spectral scan statistic[16] and the UST wavelet detector[17], each of the gains are significant except for the ǫ-graph which is more modest. [sent-239, score-0.728]
93 7 Conclusions To summarize, while Corollary 6 characterizes the performance of GSS and LESS in terms of effective resistances, in many specific graph models, this can be translated into near-optimal detection guarantees for these test statistics. [sent-240, score-0.471]
94 We have demonstrated that the LESS provides guarantees similar to that of the computationally intractable generalized likelihood ratio test (GSS). [sent-241, score-0.256]
95 Furthermore, the LESS can be solved through successive graph cuts by relating it to MAP estimation in an MRF. [sent-242, score-0.253]
96 Diffusion kernels on graphs and other discrete input spaces. [sent-288, score-0.202]
97 A unified analytic framework based on minimum scan statistics for wireless ad hoc and sensor networks. [sent-335, score-0.518]
98 Changepoint detection over graphs with the spectral scan statistic. [sent-341, score-0.761]
99 Detecting activations over graphs using spanning tree wavelet bases. [sent-345, score-0.367]
100 What energy functions can be minimized via graph cuts? [sent-356, score-0.205]
wordName wordTfidf (topN-words)
[('scan', 0.46), ('gss', 0.367), ('statistic', 0.229), ('resistances', 0.206), ('graph', 0.205), ('lov', 0.199), ('sz', 0.164), ('graphs', 0.162), ('anomalous', 0.142), ('detection', 0.139), ('resistance', 0.138), ('activity', 0.123), ('spanning', 0.105), ('submodular', 0.103), ('ust', 0.101), ('rc', 0.093), ('glr', 0.092), ('effective', 0.082), ('program', 0.076), ('asymptotically', 0.075), ('testing', 0.072), ('cut', 0.071), ('yv', 0.07), ('transitive', 0.07), ('sharpnack', 0.069), ('torus', 0.069), ('combinatorial', 0.068), ('vertex', 0.066), ('relaxation', 0.065), ('corollary', 0.065), ('intractable', 0.063), ('wavelet', 0.063), ('anomaly', 0.061), ('elevated', 0.061), ('programs', 0.06), ('mrf', 0.059), ('sensor', 0.058), ('surveillance', 0.056), ('submodularity', 0.054), ('less', 0.054), ('ratio', 0.053), ('vertices', 0.053), ('aarti', 0.053), ('cuts', 0.048), ('edge', 0.047), ('electrical', 0.047), ('distinguishability', 0.046), ('kirchoff', 0.046), ('test', 0.045), ('activation', 0.044), ('distinguishes', 0.042), ('hypothesis', 0.041), ('der', 0.041), ('distinguish', 0.041), ('foster', 0.041), ('xji', 0.04), ('outbreak', 0.04), ('dual', 0.04), ('kernels', 0.04), ('detector', 0.04), ('log', 0.04), ('pittsburgh', 0.039), ('additions', 0.037), ('dmax', 0.037), ('potts', 0.037), ('akshay', 0.037), ('krishnamurthy', 0.037), ('wavelets', 0.037), ('arxiv', 0.037), ('tree', 0.037), ('network', 0.036), ('hypotheses', 0.036), ('likelihood', 0.036), ('false', 0.035), ('regularity', 0.035), ('edges', 0.035), ('markov', 0.035), ('extended', 0.034), ('null', 0.034), ('signal', 0.034), ('dictated', 0.033), ('knapsack', 0.033), ('alarm', 0.032), ('sink', 0.032), ('lv', 0.032), ('theorem', 0.032), ('computationally', 0.032), ('detecting', 0.032), ('derives', 0.031), ('snr', 0.031), ('mrfs', 0.031), ('carnegie', 0.03), ('mellon', 0.03), ('degree', 0.03), ('preprint', 0.029), ('instructive', 0.028), ('max', 0.027), ('generalized', 0.027), ('vs', 0.027), ('boundary', 0.027)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999899 207 nips-2013-Near-optimal Anomaly Detection in Graphs using Lovasz Extended Scan Statistic
Author: James L. Sharpnack, Akshay Krishnamurthy, Aarti Singh
Abstract: The detection of anomalous activity in graphs is a statistical problem that arises in many applications, such as network surveillance, disease outbreak detection, and activity monitoring in social networks. Beyond its wide applicability, graph structured anomaly detection serves as a case study in the difficulty of balancing computational complexity with statistical power. In this work, we develop from first principles the generalized likelihood ratio test for determining if there is a well connected region of activation over the vertices in the graph in Gaussian noise. Because this test is computationally infeasible, we provide a relaxation, called the Lov´ sz extended scan statistic (LESS) that uses submodularity to approximate the a intractable generalized likelihood ratio. We demonstrate a connection between LESS and maximum a-posteriori inference in Markov random fields, which provides us with a poly-time algorithm for LESS. Using electrical network theory, we are able to control type 1 error for LESS and prove conditions under which LESS is risk consistent. Finally, we consider specific graph models, the torus, knearest neighbor graphs, and ǫ-random graphs. We show that on these graphs our results provide near-optimal performance by matching our results to known lower bounds. 1
2 0.14235091 268 nips-2013-Reflection methods for user-friendly submodular optimization
Author: Stefanie Jegelka, Francis Bach, Suvrit Sra
Abstract: Recently, it has become evident that submodularity naturally captures widely occurring concepts in machine learning, signal processing and computer vision. Consequently, there is need for efficient optimization procedures for submodular functions, especially for minimization problems. While general submodular minimization is challenging, we propose a new method that exploits existing decomposability of submodular functions. In contrast to previous approaches, our method is neither approximate, nor impractical, nor does it need any cumbersome parameter tuning. Moreover, it is easy to implement and parallelize. A key component of our method is a formulation of the discrete submodular minimization problem as a continuous best approximation problem that is solved through a sequence of reflections, and its solution can be easily thresholded to obtain an optimal discrete solution. This method solves both the continuous and discrete formulations of the problem, and therefore has applications in learning, inference, and reconstruction. In our experiments, we illustrate the benefits of our method on two image segmentation tasks. 1
3 0.12292311 78 nips-2013-Curvature and Optimal Algorithms for Learning and Minimizing Submodular Functions
Author: Rishabh K. Iyer, Stefanie Jegelka, Jeff A. Bilmes
Abstract: We investigate three related and important problems connected to machine learning: approximating a submodular function everywhere, learning a submodular function (in a PAC-like setting [28]), and constrained minimization of submodular functions. We show that the complexity of all three problems depends on the “curvature” of the submodular function, and provide lower and upper bounds that refine and improve previous results [2, 6, 8, 27]. Our proof techniques are fairly generic. We either use a black-box transformation of the function (for approximation and learning), or a transformation of algorithms to use an appropriate surrogate function (for minimization). Curiously, curvature has been known to influence approximations for submodular maximization [3, 29], but its effect on minimization, approximation and learning has hitherto been open. We complete this picture, and also support our theoretical claims by empirical results. 1
4 0.12269153 282 nips-2013-Robust Multimodal Graph Matching: Sparse Coding Meets Graph Matching
Author: Marcelo Fiori, Pablo Sprechmann, Joshua Vogelstein, Pablo Muse, Guillermo Sapiro
Abstract: Graph matching is a challenging problem with very important applications in a wide range of fields, from image and video analysis to biological and biomedical problems. We propose a robust graph matching algorithm inspired in sparsityrelated techniques. We cast the problem, resembling group or collaborative sparsity formulations, as a non-smooth convex optimization problem that can be efficiently solved using augmented Lagrangian techniques. The method can deal with weighted or unweighted graphs, as well as multimodal data, where different graphs represent different types of data. The proposed approach is also naturally integrated with collaborative graph inference techniques, solving general network inference problems where the observed variables, possibly coming from different modalities, are not in correspondence. The algorithm is tested and compared with state-of-the-art graph matching techniques in both synthetic and real graphs. We also present results on multimodal graphs and applications to collaborative inference of brain connectivity from alignment-free functional magnetic resonance imaging (fMRI) data. The code is publicly available. 1
5 0.1081522 319 nips-2013-Submodular Optimization with Submodular Cover and Submodular Knapsack Constraints
Author: Rishabh K. Iyer, Jeff A. Bilmes
Abstract: We investigate two new optimization problems — minimizing a submodular function subject to a submodular lower bound constraint (submodular cover) and maximizing a submodular function subject to a submodular upper bound constraint (submodular knapsack). We are motivated by a number of real-world applications in machine learning including sensor placement and data subset selection, which require maximizing a certain submodular function (like coverage or diversity) while simultaneously minimizing another (like cooperative cost). These problems are often posed as minimizing the difference between submodular functions [9, 25] which is in the worst case inapproximable. We show, however, that by phrasing these problems as constrained optimization, which is more natural for many applications, we achieve a number of bounded approximation guarantees. We also show that both these problems are closely related and an approximation algorithm solving one can be used to obtain an approximation guarantee for the other. We provide hardness results for both problems thus showing that our approximation factors are tight up to log-factors. Finally, we empirically demonstrate the performance and good scalability properties of our algorithms. 1
6 0.1005016 350 nips-2013-Wavelets on Graphs via Deep Learning
7 0.096890427 151 nips-2013-Learning Chordal Markov Networks by Constraint Satisfaction
8 0.095036201 44 nips-2013-B-test: A Non-parametric, Low Variance Kernel Two-sample Test
9 0.094170243 35 nips-2013-Analyzing the Harmonic Structure in Graph-Based Learning
10 0.090636469 68 nips-2013-Confidence Intervals and Hypothesis Testing for High-Dimensional Statistical Models
11 0.08894904 87 nips-2013-Density estimation from unweighted k-nearest neighbor graphs: a roadmap
12 0.086703397 258 nips-2013-Projecting Ising Model Parameters for Fast Mixing
13 0.085516825 132 nips-2013-Global MAP-Optimality by Shrinking the Combinatorial Search Area with Convex Relaxation
14 0.084740035 289 nips-2013-Scalable kernels for graphs with continuous attributes
15 0.081625983 306 nips-2013-Speeding up Permutation Testing in Neuroimaging
16 0.076702684 125 nips-2013-From Bandits to Experts: A Tale of Domination and Independence
17 0.072765365 359 nips-2013-Σ-Optimality for Active Learning on Gaussian Random Fields
18 0.072081923 97 nips-2013-Distributed Submodular Maximization: Identifying Representative Elements in Massive Data
19 0.069789022 9 nips-2013-A Kernel Test for Three-Variable Interactions
20 0.068312712 184 nips-2013-Marginals-to-Models Reducibility
topicId topicWeight
[(0, 0.192), (1, 0.057), (2, 0.027), (3, 0.02), (4, 0.036), (5, 0.109), (6, -0.064), (7, -0.206), (8, -0.01), (9, -0.025), (10, 0.042), (11, -0.091), (12, 0.087), (13, -0.03), (14, 0.029), (15, 0.017), (16, 0.001), (17, -0.014), (18, -0.106), (19, 0.042), (20, -0.005), (21, -0.009), (22, 0.03), (23, -0.055), (24, -0.045), (25, 0.014), (26, 0.058), (27, 0.015), (28, -0.059), (29, 0.111), (30, 0.044), (31, -0.028), (32, 0.053), (33, 0.045), (34, 0.024), (35, -0.008), (36, 0.03), (37, -0.072), (38, -0.052), (39, 0.028), (40, 0.039), (41, 0.038), (42, 0.025), (43, 0.019), (44, -0.021), (45, -0.051), (46, -0.047), (47, -0.032), (48, 0.023), (49, 0.026)]
simIndex simValue paperId paperTitle
same-paper 1 0.9426055 207 nips-2013-Near-optimal Anomaly Detection in Graphs using Lovasz Extended Scan Statistic
Author: James L. Sharpnack, Akshay Krishnamurthy, Aarti Singh
Abstract: The detection of anomalous activity in graphs is a statistical problem that arises in many applications, such as network surveillance, disease outbreak detection, and activity monitoring in social networks. Beyond its wide applicability, graph structured anomaly detection serves as a case study in the difficulty of balancing computational complexity with statistical power. In this work, we develop from first principles the generalized likelihood ratio test for determining if there is a well connected region of activation over the vertices in the graph in Gaussian noise. Because this test is computationally infeasible, we provide a relaxation, called the Lov´ sz extended scan statistic (LESS) that uses submodularity to approximate the a intractable generalized likelihood ratio. We demonstrate a connection between LESS and maximum a-posteriori inference in Markov random fields, which provides us with a poly-time algorithm for LESS. Using electrical network theory, we are able to control type 1 error for LESS and prove conditions under which LESS is risk consistent. Finally, we consider specific graph models, the torus, knearest neighbor graphs, and ǫ-random graphs. We show that on these graphs our results provide near-optimal performance by matching our results to known lower bounds. 1
2 0.72304243 282 nips-2013-Robust Multimodal Graph Matching: Sparse Coding Meets Graph Matching
Author: Marcelo Fiori, Pablo Sprechmann, Joshua Vogelstein, Pablo Muse, Guillermo Sapiro
Abstract: Graph matching is a challenging problem with very important applications in a wide range of fields, from image and video analysis to biological and biomedical problems. We propose a robust graph matching algorithm inspired in sparsityrelated techniques. We cast the problem, resembling group or collaborative sparsity formulations, as a non-smooth convex optimization problem that can be efficiently solved using augmented Lagrangian techniques. The method can deal with weighted or unweighted graphs, as well as multimodal data, where different graphs represent different types of data. The proposed approach is also naturally integrated with collaborative graph inference techniques, solving general network inference problems where the observed variables, possibly coming from different modalities, are not in correspondence. The algorithm is tested and compared with state-of-the-art graph matching techniques in both synthetic and real graphs. We also present results on multimodal graphs and applications to collaborative inference of brain connectivity from alignment-free functional magnetic resonance imaging (fMRI) data. The code is publicly available. 1
3 0.7217558 328 nips-2013-The Total Variation on Hypergraphs - Learning on Hypergraphs Revisited
Author: Matthias Hein, Simon Setzer, Leonardo Jost, Syama Sundar Rangapuram
Abstract: Hypergraphs allow one to encode higher-order relationships in data and are thus a very flexible modeling tool. Current learning methods are either based on approximations of the hypergraphs via graphs or on tensor methods which are only applicable under special conditions. In this paper, we present a new learning framework on hypergraphs which fully uses the hypergraph structure. The key element is a family of regularization functionals based on the total variation on hypergraphs. 1
4 0.70811439 87 nips-2013-Density estimation from unweighted k-nearest neighbor graphs: a roadmap
Author: Ulrike Von Luxburg, Morteza Alamgir
Abstract: Consider an unweighted k-nearest neighbor graph on n points that have been sampled i.i.d. from some unknown density p on Rd . We prove how one can estimate the density p just from the unweighted adjacency matrix of the graph, without knowing the points themselves or any distance or similarity scores. The key insights are that local differences in link numbers can be used to estimate a local function of the gradient of p, and that integrating this function along shortest paths leads to an estimate of the underlying density. 1
5 0.68246228 35 nips-2013-Analyzing the Harmonic Structure in Graph-Based Learning
Author: Xiao-Ming Wu, Zhenguo Li, Shih-Fu Chang
Abstract: We find that various well-known graph-based models exhibit a common important harmonic structure in its target function – the value of a vertex is approximately the weighted average of the values of its adjacent neighbors. Understanding of such structure and analysis of the loss defined over such structure help reveal important properties of the target function over a graph. In this paper, we show that the variation of the target function across a cut can be upper and lower bounded by the ratio of its harmonic loss and the cut cost. We use this to develop an analytical tool and analyze five popular graph-based models: absorbing random walks, partially absorbing random walks, hitting times, pseudo-inverse of the graph Laplacian, and eigenvectors of the Laplacian matrices. Our analysis sheds new insights into several open questions related to these models, and provides theoretical justifications and guidelines for their practical use. Simulations on synthetic and real datasets confirm the potential of the proposed theory and tool.
6 0.65397066 350 nips-2013-Wavelets on Graphs via Deep Learning
7 0.63279432 25 nips-2013-Adaptive Anonymity via $b$-Matching
8 0.63127226 132 nips-2013-Global MAP-Optimality by Shrinking the Combinatorial Search Area with Convex Relaxation
9 0.62635148 182 nips-2013-Manifold-based Similarity Adaptation for Label Propagation
10 0.61472774 184 nips-2013-Marginals-to-Models Reducibility
11 0.60758317 291 nips-2013-Sensor Selection in High-Dimensional Gaussian Trees with Nuisances
12 0.5705902 8 nips-2013-A Graphical Transformation for Belief Propagation: Maximum Weight Matchings and Odd-Sized Cycles
13 0.56461078 151 nips-2013-Learning Chordal Markov Networks by Constraint Satisfaction
14 0.56273806 268 nips-2013-Reflection methods for user-friendly submodular optimization
15 0.55679297 97 nips-2013-Distributed Submodular Maximization: Identifying Representative Elements in Massive Data
16 0.55628508 202 nips-2013-Multiclass Total Variation Clustering
17 0.53680456 332 nips-2013-Tracking Time-varying Graphical Structure
18 0.5322417 73 nips-2013-Convex Relaxations for Permutation Problems
19 0.53156 33 nips-2013-An Approximate, Efficient LP Solver for LP Rounding
20 0.53115451 300 nips-2013-Solving the multi-way matching problem by permutation synchronization
topicId topicWeight
[(2, 0.015), (16, 0.03), (33, 0.153), (34, 0.09), (36, 0.012), (41, 0.039), (49, 0.022), (56, 0.116), (70, 0.04), (85, 0.058), (89, 0.052), (93, 0.031), (95, 0.065), (98, 0.213)]
simIndex simValue paperId paperTitle
same-paper 1 0.83221644 207 nips-2013-Near-optimal Anomaly Detection in Graphs using Lovasz Extended Scan Statistic
Author: James L. Sharpnack, Akshay Krishnamurthy, Aarti Singh
Abstract: The detection of anomalous activity in graphs is a statistical problem that arises in many applications, such as network surveillance, disease outbreak detection, and activity monitoring in social networks. Beyond its wide applicability, graph structured anomaly detection serves as a case study in the difficulty of balancing computational complexity with statistical power. In this work, we develop from first principles the generalized likelihood ratio test for determining if there is a well connected region of activation over the vertices in the graph in Gaussian noise. Because this test is computationally infeasible, we provide a relaxation, called the Lov´ sz extended scan statistic (LESS) that uses submodularity to approximate the a intractable generalized likelihood ratio. We demonstrate a connection between LESS and maximum a-posteriori inference in Markov random fields, which provides us with a poly-time algorithm for LESS. Using electrical network theory, we are able to control type 1 error for LESS and prove conditions under which LESS is risk consistent. Finally, we consider specific graph models, the torus, knearest neighbor graphs, and ǫ-random graphs. We show that on these graphs our results provide near-optimal performance by matching our results to known lower bounds. 1
2 0.82724518 135 nips-2013-Heterogeneous-Neighborhood-based Multi-Task Local Learning Algorithms
Author: Yu Zhang
Abstract: All the existing multi-task local learning methods are defined on homogeneous neighborhood which consists of all data points from only one task. In this paper, different from existing methods, we propose local learning methods for multitask classification and regression problems based on heterogeneous neighborhood which is defined on data points from all tasks. Specifically, we extend the knearest-neighbor classifier by formulating the decision function for each data point as a weighted voting among the neighbors from all tasks where the weights are task-specific. By defining a regularizer to enforce the task-specific weight matrix to approach a symmetric one, a regularized objective function is proposed and an efficient coordinate descent method is developed to solve it. For regression problems, we extend the kernel regression to multi-task setting in a similar way to the classification case. Experiments on some toy data and real-world datasets demonstrate the effectiveness of our proposed methods. 1
3 0.75869715 229 nips-2013-Online Learning of Nonparametric Mixture Models via Sequential Variational Approximation
Author: Dahua Lin
Abstract: Reliance on computationally expensive algorithms for inference has been limiting the use of Bayesian nonparametric models in large scale applications. To tackle this problem, we propose a Bayesian learning algorithm for DP mixture models. Instead of following the conventional paradigm – random initialization plus iterative update, we take an progressive approach. Starting with a given prior, our method recursively transforms it into an approximate posterior through sequential variational approximation. In this process, new components will be incorporated on the fly when needed. The algorithm can reliably estimate a DP mixture model in one pass, making it particularly suited for applications with massive data. Experiments on both synthetic data and real datasets demonstrate remarkable improvement on efficiency – orders of magnitude speed-up compared to the state-of-the-art. 1
4 0.73755306 356 nips-2013-Zero-Shot Learning Through Cross-Modal Transfer
Author: Richard Socher, Milind Ganjoo, Christopher D. Manning, Andrew Ng
Abstract: This work introduces a model that can recognize objects in images even if no training data is available for the object class. The only necessary knowledge about unseen visual categories comes from unsupervised text corpora. Unlike previous zero-shot learning models, which can only differentiate between unseen classes, our model can operate on a mixture of seen and unseen classes, simultaneously obtaining state of the art performance on classes with thousands of training images and reasonable performance on unseen classes. This is achieved by seeing the distributions of words in texts as a semantic space for understanding what objects look like. Our deep learning model does not require any manually defined semantic or visual features for either words or images. Images are mapped to be close to semantic word vectors corresponding to their classes, and the resulting image embeddings can be used to distinguish whether an image is of a seen or unseen class. We then use novelty detection methods to differentiate unseen classes from seen classes. We demonstrate two novelty detection strategies; the first gives high accuracy on unseen classes, while the second is conservative in its prediction of novelty and keeps the seen classes’ accuracy high. 1
5 0.73153865 154 nips-2013-Learning Gaussian Graphical Models with Observed or Latent FVSs
Author: Ying Liu, Alan Willsky
Abstract: Gaussian Graphical Models (GGMs) or Gauss Markov random fields are widely used in many applications, and the trade-off between the modeling capacity and the efficiency of learning and inference has been an important research problem. In this paper, we study the family of GGMs with small feedback vertex sets (FVSs), where an FVS is a set of nodes whose removal breaks all the cycles. Exact inference such as computing the marginal distributions and the partition function has complexity O(k 2 n) using message-passing algorithms, where k is the size of the FVS, and n is the total number of nodes. We propose efficient structure learning algorithms for two cases: 1) All nodes are observed, which is useful in modeling social or flight networks where the FVS nodes often correspond to a small number of highly influential nodes, or hubs, while the rest of the networks is modeled by a tree. Regardless of the maximum degree, without knowing the full graph structure, we can exactly compute the maximum likelihood estimate with complexity O(kn2 + n2 log n) if the FVS is known or in polynomial time if the FVS is unknown but has bounded size. 2) The FVS nodes are latent variables, where structure learning is equivalent to decomposing an inverse covariance matrix (exactly or approximately) into the sum of a tree-structured matrix and a low-rank matrix. By incorporating efficient inference into the learning steps, we can obtain a learning algorithm using alternating low-rank corrections with complexity O(kn2 + n2 log n) per iteration. We perform experiments using both synthetic data as well as real data of flight delays to demonstrate the modeling capacity with FVSs of various sizes. 1
6 0.72890937 271 nips-2013-Regularized M-estimators with nonconvexity: Statistical and algorithmic theory for local optima
7 0.72175258 333 nips-2013-Trading Computation for Communication: Distributed Stochastic Dual Coordinate Ascent
8 0.72004265 45 nips-2013-BIG & QUIC: Sparse Inverse Covariance Estimation for a Million Variables
9 0.71951735 233 nips-2013-Online Robust PCA via Stochastic Optimization
10 0.7190026 285 nips-2013-Robust Transfer Principal Component Analysis with Rank Constraints
11 0.71863216 142 nips-2013-Information-theoretic lower bounds for distributed statistical estimation with communication constraints
12 0.71818042 102 nips-2013-Efficient Algorithm for Privately Releasing Smooth Queries
13 0.71773815 282 nips-2013-Robust Multimodal Graph Matching: Sparse Coding Meets Graph Matching
14 0.7174114 268 nips-2013-Reflection methods for user-friendly submodular optimization
15 0.71599418 184 nips-2013-Marginals-to-Models Reducibility
16 0.71587431 149 nips-2013-Latent Structured Active Learning
17 0.71582097 116 nips-2013-Fantope Projection and Selection: A near-optimal convex relaxation of sparse PCA
18 0.71555263 186 nips-2013-Matrix factorization with binary components
19 0.71511579 73 nips-2013-Convex Relaxations for Permutation Problems
20 0.71427804 33 nips-2013-An Approximate, Efficient LP Solver for LP Rounding