nips nips2011 nips2011-81 knowledge-graph by maker-knowledge-mining

81 nips-2011-Efficient anomaly detection using bipartite k-NN graphs

Source: pdf

Author: Kumar Sricharan, Alfred O. Hero

Abstract: Learning minimum volume sets of an underlying nominal distribution is a very effective approach to anomaly detection. Several approaches to learning minimum volume sets have been proposed in the literature, including the K-point nearest neighbor graph (K-kNNG) algorithm based on the geometric entropy minimization (GEM) principle [4]. The K-kNNG detector, while possessing several desirable characteristics, suffers from high computation complexity, and in [4] a simpler heuristic approximation, the leave-one-out kNNG (L1O-kNNG) was proposed. In this paper, we propose a novel bipartite k-nearest neighbor graph (BPkNNG) anomaly detection scheme for estimating minimum volume sets. Our bipartite estimator retains all the desirable theoretical properties of the K-kNNG, while being computationally simpler than the K-kNNG and the surrogate L1OkNNG detectors. We show that BP-kNNG is asymptotically consistent in recovering the p-value of each test point. Experimental results are given that illustrate the superior performance of BP-kNNG as compared to the L1O-kNNG and other state of the art anomaly detection schemes.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Efﬁcient anomaly detection using bipartite k-NN graphs Kumar Sricharan Department of EECS University of Michigan Ann Arbor, MI 48104 kksreddy@umich. [sent-1, score-0.969]

2 edu Abstract Learning minimum volume sets of an underlying nominal distribution is a very effective approach to anomaly detection. [sent-4, score-0.871]

3 Several approaches to learning minimum volume sets have been proposed in the literature, including the K-point nearest neighbor graph (K-kNNG) algorithm based on the geometric entropy minimization (GEM) principle [4]. [sent-5, score-0.416]

4 In this paper, we propose a novel bipartite k-nearest neighbor graph (BPkNNG) anomaly detection scheme for estimating minimum volume sets. [sent-7, score-1.215]

5 Our bipartite estimator retains all the desirable theoretical properties of the K-kNNG, while being computationally simpler than the K-kNNG and the surrogate L1OkNNG detectors. [sent-8, score-0.253]

6 Experimental results are given that illustrate the superior performance of BP-kNNG as compared to the L1O-kNNG and other state of the art anomaly detection schemes. [sent-10, score-0.747]

7 1 Introduction Given a training set of normal events, the anomaly detection problem aims to identify unknown, anomalous events that deviate from the normal set. [sent-11, score-0.844]

8 This novelty detection problem arises in applications where failure to detect anomalous activity could lead to catastrophic outcomes, for example, detection of faults in mission-critical systems, quality control in manufacturing and medical diagnosis. [sent-12, score-0.532]

9 One class of algorithms assumes a family of parametrically deﬁned nominal distributions. [sent-14, score-0.194]

10 The drawback of these algorithms is model mismatch: the supposed distribution need not be a correct representation of the nominal data, which can then lead to poor false alarm rates. [sent-16, score-0.674]

11 More recently, several non-parametric methods based on minimum volume (MV) set estimation have been proposed. [sent-17, score-0.136]

12 These methods aim to ﬁnd the minimum volume set that recovers a certain probability mass α with respect to the unknown probability density of the nominal events. [sent-18, score-0.471]

13 Estimation of minimum volume sets is a difﬁcult problem, especially for high dimensional data. [sent-20, score-0.136]

14 Both types of approaches involve explicit approximation of high dimensional quant1 ities - the multivariate density function in the ﬁrst case and the boundary of the minimum volume set in the second and are therefore not easily applied to high dimensional problems. [sent-22, score-0.231]

15 However, the GEM based K-kNNG anomaly detection scheme proposed in [4] is computationally difﬁcult. [sent-24, score-0.713]

16 To address this issue, a surrogate L1O-kNNG anomaly detection scheme was proposed in [4]. [sent-25, score-0.742]

17 In this paper, we use the GEM principle to develop a bipartite k-nearest neighbor (k-NN) graphbased anomaly detection algorithm. [sent-27, score-1.049]

18 K-LPE [13] and RRS [7] are anomaly detection methods which are also based on k-NN graphs. [sent-29, score-0.713]

19 L1O-kNNG, K-LPE and RRS do not use bipartite graphs. [sent-31, score-0.224]

20 We will show that the bipartite nature of BP-kNNG results in signiﬁcant computational savings. [sent-32, score-0.224]

21 In addition, the K-LPE and RRS test statistics involve only the k-th nearest neighbor distance, while the statistic in BP-kNNG, like the L1O-kNNG, involves summation of the power weighted distance of all the edges in the k-NN graph. [sent-33, score-0.229]

22 Finally, we will show that the mean square rate of convergence of p-values in BP-kNNG (O(T −2/(2+d) )) is faster as compared to the convergence rate of K-LPE (O(T −2/5 +T −6/5d )), where T is the size of the nominal training sample and d is the dimension of the data. [sent-35, score-0.343]

23 In Section 2, we outline the statistical framework for minimum volume set anomaly detection. [sent-37, score-0.677]

24 In Section 3, we describe the GEM principle and the K-kNNG and L1O-kNNG anomaly detection schemes proposed in [4]. [sent-38, score-0.795]

25 Next, in Section 4, we develop our bipartite k-NN graph (BP-kNNG) method for anomaly detection. [sent-39, score-0.842]

26 We also show that our method compares favorably to other state of the art anomaly detection schemes when applied to real world data from the UCI repository [1]. [sent-42, score-0.855]

27 2 Statistical novelty detection The problem setup is as follows. [sent-44, score-0.223]

28 We seek to ﬁnd a functional D and corresponding detection rule D(x) > 0 so that X is declared to be nominal if D(x) > 0 holds and anomalous otherwise. [sent-50, score-0.46]

29 We seek to further constrain the choice of D to allow as few false negatives as possible for a ﬁxed allowance of false positives. [sent-52, score-0.404]

30 The anomaly detection problem can be formulated as testing the hypotheses H 0 : f = f0 versus H1 : f = f0 . [sent-58, score-0.713]

31 This requirement maintains the false positive rate at a level no greater than α. [sent-60, score-0.282]

32 The most suitable A 0 acceptance region from the collection A would be the set which minimizes the false negative rate. [sent-62, score-0.255]

33 In this case the false negative rate is bounded by Cλ(A) where λ(. [sent-64, score-0.244]

34 The optimal acceptance region with a maximum false alarm rate α is therefore given by the minimum volume set of level α: Λα = min{λ(A) : A f0 (x)dx ≥ α}. [sent-67, score-0.749]

35 Deﬁne the minimum entropy set of level α to be Ω α = min{Hν (A) : A f0 (x)dx ≥ 1 − α} where ν e Hν (A) = (1 − ν)−1 A f0 (x)dx is the R´ nyi ν-entropy of the density f 0 over the set A. [sent-68, score-0.26]

36 It can be shown that when f 0 is a Lebesgue density in R d , the minimum volume set and the minimum entropy set are equivalent, i. [sent-69, score-0.358]

37 Therefore, the optimal decision rule for a given level of false alarm α is to declare an anomaly if X ∈ Ω α . [sent-72, score-1.147]

38 3 GEM principle In this section, we brieﬂy review the geometric entropy minimization (GEM) principle method [4] for determining minimum entropy sets Ω α of level α. [sent-75, score-0.318]

39 The GEM method directly estimates the critical region Ωα for detecting anomalies using minimum coverings of subsets of points in a nominal training sample. [sent-76, score-0.49]

40 Points in the training sample that are not covered by the K-point minimal graphs are identiﬁed as tail events. [sent-80, score-0.142]

41 For any subset XK,T , deﬁne the total power weighted edge length of the k-NN graph on X K,T with power weighting γ (0 < γ < d), as K k LkN N (XK,T ) = |eti (l) |γ , i=1 l=1 where {t1 , . [sent-87, score-0.155]

42 Deﬁne the K-kNNG graph to be the K-point T k-NN graph having minimal length min XT ,K ∈XT LkN N (XT,K ) over all K subsets XK,T . [sent-91, score-0.199]

43 sample from a multivariate density f 0 (x) and ∗ if limK,T →∞ K/T = ρ, then the set XK,T converges a. [sent-98, score-0.159]

44 This set can be used to perform anomaly detection. [sent-101, score-0.541]

45 1 K-kNNG anomaly detection Given a test sample X, denote the pooled sample X T +1 = XT ∪ {X} and determine the K-kNNG / ∗ graph over X T +1 . [sent-103, score-0.897]

46 Declare X to be an anomaly if X ∈ X K,T +1 and nominal otherwise. [sent-104, score-0.735]

47 When the density f0 is Lebesgue continuous, it follows from [4] that as K, T → ∞, this anomaly detection algorithm has false alarm rate that converges to α = 1 − K/T and power that converges to that of the minimum volume set test of level α. [sent-105, score-1.666]

48 An identical detection scheme based on the K-minimal spanning tree has also been developed in [4]. [sent-106, score-0.201]

49 The K-kNNG anomaly detection scheme therefore offers a direct approach to detecting outliers while bypassing the more difﬁcult problems of density estimation and level set estimation in high dimensions. [sent-107, score-0.893]

50 As a result, the K-kNNG method is not well suited for anomaly detection for large sample sizes. [sent-110, score-0.741]

51 However, the L1OkNNG detects anomalies at a ﬁxed false alarm rate 1/(T + 1), where T is the training sample size. [sent-115, score-0.754]

52 To detect anomalies at a higher false alarm rate α ∗ , one would have to subsample the training set and only use T ∗ = 1/α∗ − 1 training samples. [sent-116, score-0.779]

53 In the next section, we propose a different GEM based algorithm that uses bipartite graphs. [sent-118, score-0.224]

54 The algorithm has algorithm has a much faster runtime than the L1O-kNNG, and unlike the L1O-kNNG, is asymptotically consistent and can operate at any speciﬁed alarm rate α. [sent-119, score-0.404]

55 K Deﬁne the bipartite k-NN graph on {X K,N , XM } to be the set of edges linking each X i ∈ XK,N to its k nearest neighbors in X M . [sent-123, score-0.375]

56 Deﬁne the total power weighted edge length of this bipartite k-NN graph with power weighting γ (0 < γ < d) and a ﬁxed number of edges s (1 ≤ s ≤ k) corresponding to each vertex X i ∈ XK,N to be K k Ls,k (XK,N , XM ) = |eti (l) |γ , i=1 l=k−s+1 where {t1 , . [sent-124, score-0.421]

57 , eti (k) } are the k-NN edges in the bipartite graph originating from X ti ∈ XK,N . [sent-130, score-0.42]

58 Deﬁne the bipartite K-kNNG graph to be the one having minimal weighted length min XN,K ∈XN Ls,k (XN,K , XM ) over all N subsets XK,N . [sent-131, score-0.346]

59 XK,N ∈X Using the theory of partitioned k-NN graph entropy estimators [11], it follows that as k/M → ∗ 0, k, N → ∞ and for ﬁxed s, the set X K,N converges a. [sent-133, score-0.172]

60 This suggests using the bipartite k-NN graph to detect anomalies in the following way. [sent-136, score-0.484]

61 Given a test point X, denote the pooled sample X N +1 = XN ∪ {X} and determine the optimal bipartite ∗ / ∗ K-kNNG graph X K,N +1 over {XK,N +1 , XM }. [sent-137, score-0.38]

62 Now declare X to be an anomaly if X ∈ X K,N +1 and nominal otherwise. [sent-138, score-0.823]

63 It is clear that by the GEM principle, this algorithm detects false alarms at a rate that converges to α = 1 − K/T and power that converges to that of the minimum volume set test of level α. [sent-139, score-0.607]

64 Because of ∗ the bipartite nature of the construction, this is equivalent to choosing X K,N +1 . [sent-147, score-0.224]

65 This leads to the proposed BP-kNNG anomaly detection algorithm described by Algorithm 1. [sent-148, score-0.713]

66 1 BP-kNNG p-value estimates The p-value is a score between 0 and 1 that is associated with the likelihood that a given point X 0 comes from a speciﬁed nominal distribution. [sent-150, score-0.194]

67 The BP-kNNG generates an estimate of the p-value 4 Algorithm 1 Anomaly detection scheme using bipartite k-NN graphs 1. [sent-151, score-0.428]

68 Input: Training samples X T , test samples X, false alarm rate α 2. [sent-152, score-0.573]

69 Speciﬁcally, for a given test point X 0 , the true p-value associated with a point X 0 in a minimum volume set test is given by p true (X0 ) = S(X0 ) f0 (z)dz where S(X0 ) = {z : f0 (z) ≤ f0 (X0 )} and E(X0 ) = {z : f0 (z) = f0 (X0 )}. [sent-158, score-0.238]

70 ptrue (X0 ) is the minimal level α at which X 0 would be rejected. [sent-159, score-0.179]

71 2 Asymptotic consistency and optimal convergence rates Here we prove that the BP-kNNG detector is asymptotically consistent by showing that for a ﬁxed number of edges s, E[(p bp (X0 ) − ptrue (X0 ))2 ] → 0 as k/M → 0, k, N → ∞. [sent-162, score-0.317]

72 Bias: We ﬁrst introduce the oracle p-value p orac (X0 ) = (1/N ) Xi ∈XN 1(f0 (Xi ) ≤ f0 (X0 )) and note that E[p orac (X0 )] = ptrue (X0 ). [sent-177, score-0.174]

73 The distance ei(l) of a point X i ∈ XN to its l-th ˆ nearest neighbor in X M is related to the bipartite l-nearest neighbor density estimate fl (Xi ) = d (l − 1)/(M cd ei(l) ) (section 2. [sent-178, score-0.481]

74 (3) Consistency of p-values: From (2) and (3), we obtain an asymptotic representation of the estimated p-value E[(p bp (X0 ) − ptrue (X0 ))2 ] = O((k/M )2/d ) + O(1/k) + O(1/N ). [sent-190, score-0.207]

75 This implies that pbp converges in mean square to p true , for a ﬁxed number of edges s, as k/M → 0, k, N → ∞. [sent-191, score-0.213]

76 On the other hand, because of the bipartite construction of our k-NN graph, dk (Xi ) for each Xi ∈ XN needs to be computed and stored only once. [sent-203, score-0.262]

77 For a total of L query points, the overall runtime complexity of our algorithm is therefore much smaller than the L1O-kNNG, KLPE and K-kNNG anomaly detection schemes (O(dT (T (4+d)/(4+2d) + L)) compared to O(dLT 2 ), T O(dLT 2 ) and O(dLK 2 K ) respectively). [sent-205, score-0.829]

78 5 Simulation comparisons We compare the L1O-kNNG and the bipartite K-kNNG schemes on a simulated data set. [sent-206, score-0.259]

79 The percentage of anomalies in the test set is therefore 20%. [sent-213, score-0.191]

80 (b) Comparison of observed false alarm rates for The labeled ’clairvoyant’ curve is the ROC of the L1O-kNNG and BP-kNNG with the desired false UMP anomaly detector. [sent-247, score-1.268]

81 03%) class 2,3,5,6,7 vs class 1 (7%) Table 1: Description of data used in anomaly detection experiments. [sent-254, score-0.713]

82 For this simple case the minimum √ volume set of level α is a disk centered at the origin with radius 2σ 2 log(1/α). [sent-256, score-0.174]

83 1(a), we compare the detection performance of L1O-kNNG and BP-kNNG against the ’clairvoyant’ UMP detector in terms of the ROC. [sent-263, score-0.208]

84 1(b) we note the close agreement between desired and observed false alarm rates for BP-kNNG. [sent-266, score-0.525]

85 Note that the L1O-kNNG signiﬁcantly underestimates its false alarm rate for higher levels of true false alarm. [sent-267, score-0.724]

86 This signiﬁcant savings in runtime is due to the fact that the bipartite graph does not have to be constructed separately for each new test instance; it sufﬁces to construct it once on the entire data set. [sent-272, score-0.404]

87 One of the anomaly data generators is Mulcross [8] and the other four are from the UCI repository [1]. [sent-277, score-0.578]

88 16/i 4 iF 147 79 75 26 15 ORCA 9487 6995 2512 267 157 Table 2: Comparison of anomaly detection schemes in terms of AUC and run-time for BP-kNNG (BP) against L1O-kNNG (L10), K-LPE, MassAD (Mass), iForest (iF) and ORCA. [sent-319, score-0.748]

89 179 Table 3: Comparison of desired and observed false alarm rates for BP-kNNG. [sent-353, score-0.525]

90 We are unable to report the AUC for K-LPE because of the large processing time and for L1O-kNNG because it cannot operate at high false alarm rates. [sent-364, score-0.48]

91 In addition, BP-kNNG allows the speciﬁcation of a threshold for anomaly detection at a desired false alarm rate. [sent-366, score-1.238]

92 This is corroborated by the results in Table 3, where we see that the observed false alarm rates across the different data sets are close to the desired false alarm rate. [sent-367, score-1.005]

93 6 Conclusions The geometric entropy minimization (GEM) principle was introduced in [4] to extract minimal set coverings that can be used to detect anomalies from a set of training samples. [sent-368, score-0.422]

94 In this paper we propose a bipartite k-nearest neighbor graph (BP-kNNG) anomaly detection algorithm based on the GEM principle. [sent-369, score-1.079]

95 We compared BP-kNNG against state of the art anomaly detection algorithms and showed that BPkNNG compares favorably in terms of both ROC performance and computation time. [sent-371, score-0.783]

96 In addition, BP-kNNG enjoys several other advantages including the ability to detect anomalies at a desired false alarm rate. [sent-372, score-0.708]

97 In BP-kNNG, the p-values of each test point can also be easily computed (1), making BP-kNNG easily extendable to incorporating false discovery rate constraints. [sent-373, score-0.295]

98 Geometric entropy minimization (gem) for anomaly detection and localization. [sent-399, score-0.772]

99 A computable plug-in estimator of minimum volume sets for novelty detection. [sent-417, score-0.187]

100 Anomaly detection with score functions based on nearest neighbor graphs. [sent-469, score-0.269]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('anomaly', 0.541), ('gem', 0.289), ('alarm', 0.278), ('bipartite', 0.224), ('false', 0.202), ('nominal', 0.194), ('detection', 0.172), ('anomalies', 0.14), ('pbp', 0.135), ('orca', 0.119), ('iforest', 0.116), ('massad', 0.116), ('ump', 0.116), ('knng', 0.096), ('lof', 0.096), ('ptrue', 0.096), ('density', 0.095), ('anomalous', 0.094), ('na', 0.092), ('declare', 0.088), ('xn', 0.081), ('bp', 0.081), ('xm', 0.08), ('eti', 0.077), ('hero', 0.077), ('mulcross', 0.077), ('graph', 0.077), ('auc', 0.075), ('volume', 0.068), ('rrs', 0.068), ('minimum', 0.068), ('neighbor', 0.065), ('entropy', 0.059), ('bpknng', 0.058), ('lkn', 0.058), ('smtp', 0.058), ('ghz', 0.055), ('kdd', 0.053), ('acceptance', 0.053), ('runtime', 0.052), ('clairvoyant', 0.051), ('coverings', 0.051), ('shuttle', 0.051), ('novelty', 0.051), ('test', 0.051), ('mv', 0.049), ('principle', 0.047), ('outliers', 0.047), ('sigmod', 0.047), ('mass', 0.046), ('minimal', 0.045), ('xi', 0.045), ('lebesgue', 0.045), ('desired', 0.045), ('detect', 0.043), ('rate', 0.042), ('xt', 0.042), ('edges', 0.042), ('roc', 0.041), ('ei', 0.04), ('power', 0.039), ('arbor', 0.039), ('dlt', 0.039), ('opteron', 0.039), ('orac', 0.039), ('sricharan', 0.039), ('dt', 0.038), ('processor', 0.038), ('dk', 0.038), ('level', 0.038), ('training', 0.037), ('forest', 0.037), ('repository', 0.037), ('detector', 0.036), ('ex', 0.036), ('gb', 0.036), ('favorably', 0.036), ('converges', 0.036), ('schemes', 0.035), ('ddimensional', 0.034), ('michigan', 0.034), ('art', 0.034), ('asymptotically', 0.032), ('graphs', 0.032), ('nearest', 0.032), ('dx', 0.031), ('ting', 0.031), ('mining', 0.031), ('consistency', 0.03), ('asymptotic', 0.03), ('amd', 0.029), ('spanning', 0.029), ('query', 0.029), ('surrogate', 0.029), ('sample', 0.028), ('attack', 0.028), ('ann', 0.028), ('uci', 0.028), ('detects', 0.027), ('card', 0.027)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0 81 nips-2011-Efficient anomaly detection using bipartite k-NN graphs

Author: Kumar Sricharan, Alfred O. Hero

2 0.25352332 110 nips-2011-Group Anomaly Detection using Flexible Genre Models

Author: Liang Xiong, Barnabás Póczos, Jeff G. Schneider

Abstract: An important task in exploring and analyzing real-world data sets is to detect unusual and interesting phenomena. In this paper, we study the group anomaly detection problem. Unlike traditional anomaly detection research that focuses on data points, our goal is to discover anomalous aggregated behaviors of groups of points. For this purpose, we propose the Flexible Genre Model (FGM). FGM is designed to characterize data groups at both the point level and the group level so as to detect various types of group anomalies. We evaluate the effectiveness of FGM on both synthetic and real data sets including images and turbulence data, and show that it is superior to existing approaches in detecting group anomalies. 1

3 0.079553083 9 nips-2011-A More Powerful Two-Sample Test in High Dimensions using Random Projection

Author: Miles Lopes, Laurent Jacob, Martin J. Wainwright

Abstract: We consider the hypothesis testing problem of detecting a shift between the means of two multivariate normal distributions in the high-dimensional setting, allowing for the data dimension p to exceed the sample size n. Our contribution is a new test statistic for the two-sample test of means that integrates a random projection with the classical Hotelling T 2 statistic. Working within a high-dimensional framework that allows (p, n) → ∞, we ﬁrst derive an asymptotic power function for our test, and then provide sufﬁcient conditions for it to achieve greater power than other state-of-the-art tests. Using ROC curves generated from simulated data, we demonstrate superior performance against competing tests in the parameter regimes anticipated by our theoretical results. Lastly, we illustrate an advantage of our procedure with comparisons on a high-dimensional gene expression dataset involving the discrimination of different types of cancer. 1

4 0.068364747 199 nips-2011-On fast approximate submodular minimization

Author: Stefanie Jegelka, Hui Lin, Jeff A. Bilmes

Abstract: We are motivated by an application to extract a representative subset of machine learning training data and by the poor empirical performance we observe of the popular minimum norm algorithm. In fact, for our application, minimum norm can have a running time of about O(n7 ) (O(n5 ) oracle calls). We therefore propose a fast approximate method to minimize arbitrary submodular functions. For a large sub-class of submodular functions, the algorithm is exact. Other submodular functions are iteratively approximated by tight submodular upper bounds, and then repeatedly optimized. We show theoretical properties, and empirical results suggest signiﬁcant speedups over minimum norm while retaining higher accuracies. 1

5 0.057896383 123 nips-2011-How biased are maximum entropy models?

Author: Jakob H. Macke, Iain Murray, Peter E. Latham

Abstract: Maximum entropy models have become popular statistical models in neuroscience and other areas in biology, and can be useful tools for obtaining estimates of mutual information in biological systems. However, maximum entropy models ﬁt to small data sets can be subject to sampling bias; i.e. the true entropy of the data can be severely underestimated. Here we study the sampling properties of estimates of the entropy obtained from maximum entropy models. We show that if the data is generated by a distribution that lies in the model class, the bias is equal to the number of parameters divided by twice the number of observations. However, in practice, the true distribution is usually outside the model class, and we show here that this misspeciﬁcation can lead to much larger bias. We provide a perturbative approximation of the maximally expected bias when the true model is out of model class, and we illustrate our results using numerical simulations of an Ising model; i.e. the second-order maximum entropy distribution on binary data. 1

6 0.055216238 117 nips-2011-High-Dimensional Graphical Model Selection: Tractable Graph Families and Necessary Conditions

7 0.054518726 137 nips-2011-Iterative Learning for Reliable Crowdsourcing Systems

8 0.051710289 231 nips-2011-Randomized Algorithms for Comparison-based Search

9 0.050953738 233 nips-2011-Rapid Deformable Object Detection using Dual-Tree Branch-and-Bound

10 0.050361216 166 nips-2011-Maximal Cliques that Satisfy Hard Constraints with Application to Deformable Object Model Learning

11 0.04929398 213 nips-2011-Phase transition in the family of p-resistances

12 0.047320209 45 nips-2011-Beating SGD: Learning SVMs in Sublinear Time

13 0.045414802 195 nips-2011-On Learning Discrete Graphical Models using Greedy Methods

14 0.044157103 1 nips-2011-$\theta$-MRF: Capturing Spatial and Semantic Structure in the Parameters for Scene Understanding

15 0.044104416 145 nips-2011-Learning Eigenvectors for Free

16 0.042292409 238 nips-2011-Relative Density-Ratio Estimation for Robust Distribution Comparison

17 0.042036001 87 nips-2011-Energetically Optimal Action Potentials

18 0.041045517 258 nips-2011-Sparse Bayesian Multi-Task Learning

19 0.040708385 146 nips-2011-Learning Higher-Order Graph Structure with Features by Structure Penalty

20 0.04033101 31 nips-2011-An Application of Tree-Structured Expectation Propagation for Channel Decoding

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.141), (1, 0.013), (2, -0.031), (3, -0.023), (4, -0.009), (5, -0.078), (6, -0.063), (7, -0.032), (8, -0.033), (9, -0.015), (10, 0.026), (11, 0.014), (12, -0.013), (13, -0.014), (14, 0.01), (15, 0.043), (16, 0.068), (17, 0.013), (18, -0.002), (19, 0.083), (20, -0.043), (21, 0.085), (22, 0.073), (23, 0.053), (24, 0.093), (25, -0.04), (26, -0.014), (27, 0.082), (28, 0.017), (29, 0.022), (30, 0.053), (31, 0.028), (32, -0.067), (33, 0.13), (34, -0.127), (35, -0.033), (36, 0.101), (37, -0.037), (38, 0.007), (39, -0.007), (40, 0.027), (41, -0.081), (42, 0.146), (43, 0.061), (44, 0.053), (45, -0.082), (46, -0.019), (47, 0.187), (48, 0.018), (49, 0.1)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.91607654 81 nips-2011-Efficient anomaly detection using bipartite k-NN graphs

Author: Kumar Sricharan, Alfred O. Hero

2 0.61809957 110 nips-2011-Group Anomaly Detection using Flexible Genre Models

Author: Liang Xiong, Barnabás Póczos, Jeff G. Schneider

3 0.4926078 241 nips-2011-Scalable Training of Mixture Models via Coresets

Author: Dan Feldman, Matthew Faulkner, Andreas Krause

Abstract: How can we train a statistical mixture model on a massive data set? In this paper, we show how to construct coresets for mixtures of Gaussians and natural generalizations. A coreset is a weighted subset of the data, which guarantees that models ﬁtting the coreset will also provide a good ﬁt for the original data set. We show that, perhaps surprisingly, Gaussian mixtures admit coresets of size independent of the size of the data set. More precisely, we prove that a weighted set of O(dk3 /ε2 ) data points sufﬁces for computing a (1 + ε)-approximation for the optimal model on the original n data points. Moreover, such coresets can be efﬁciently constructed in a map-reduce style computation, as well as in a streaming setting. Our results rely on a novel reduction of statistical estimation to problems in computational geometry, as well as new complexity results about mixtures of Gaussians. We empirically evaluate our algorithms on several real data sets, including a density estimation problem in the context of earthquake detection using accelerometers in mobile phones. 1

4 0.47701749 9 nips-2011-A More Powerful Two-Sample Test in High Dimensions using Random Projection

Author: Miles Lopes, Laurent Jacob, Martin J. Wainwright

5 0.46753722 117 nips-2011-High-Dimensional Graphical Model Selection: Tractable Graph Families and Necessary Conditions

Author: Animashree Anandkumar, Vincent Tan, Alan S. Willsky

Abstract: We consider the problem of Ising and Gaussian graphical model selection given n i.i.d. samples from the model. We propose an efﬁcient threshold-based algorithm for structure estimation based on conditional mutual information thresholding. This simple local algorithm requires only loworder statistics of the data and decides whether two nodes are neighbors in the unknown graph. We identify graph families for which the proposed algorithm has low sample and computational complexities. Under some transparent assumptions, we establish that the proposed algorithm is −4 structurally consistent (or sparsistent) when the number of samples scales as n = Ω(Jmin log p), where p is the number of nodes and Jmin is the minimum edge potential. We also develop novel non-asymptotic techniques for obtaining necessary conditions for graphical model selection. Keywords: Graphical model selection, high-dimensional learning, local-separation property, necessary conditions, typical sets, Fano’s inequality.

6 0.46477085 95 nips-2011-Fast and Accurate k-means For Large Datasets

7 0.46418476 146 nips-2011-Learning Higher-Order Graph Structure with Features by Structure Penalty

8 0.4488866 64 nips-2011-Convergent Bounds on the Euclidean Distance

9 0.44859937 238 nips-2011-Relative Density-Ratio Estimation for Robust Distribution Comparison

10 0.44410011 123 nips-2011-How biased are maximum entropy models?

11 0.44282058 213 nips-2011-Phase transition in the family of p-resistances

12 0.43868023 306 nips-2011-t-divergence Based Approximate Inference

13 0.4144226 166 nips-2011-Maximal Cliques that Satisfy Hard Constraints with Application to Deformable Object Model Learning

14 0.41377416 78 nips-2011-Efficient Methods for Overlapping Group Lasso

15 0.40994555 67 nips-2011-Data Skeletonization via Reeb Graphs

16 0.40253809 147 nips-2011-Learning Patient-Specific Cancer Survival Distributions as a Sequence of Dependent Regressors

17 0.40210035 91 nips-2011-Exploiting spatial overlap to efficiently compute appearance distances between image windows

18 0.40154484 236 nips-2011-Regularized Laplacian Estimation and Fast Eigenvector Approximation

19 0.39325747 62 nips-2011-Continuous-Time Regression Models for Longitudinal Networks

20 0.36218384 60 nips-2011-Confidence Sets for Network Structure

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.037), (4, 0.077), (15, 0.318), (20, 0.031), (26, 0.025), (31, 0.079), (33, 0.014), (43, 0.066), (45, 0.122), (57, 0.021), (74, 0.037), (83, 0.053), (99, 0.027)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.72829932 81 nips-2011-Efficient anomaly detection using bipartite k-NN graphs

Author: Kumar Sricharan, Alfred O. Hero

2 0.65533328 219 nips-2011-Predicting response time and error rates in visual search

Author: Bo Chen, Vidhya Navalpakkam, Pietro Perona

Abstract: A model of human visual search is proposed. It predicts both response time (RT) and error rates (RT) as a function of image parameters such as target contrast and clutter. The model is an ideal observer, in that it optimizes the Bayes ratio of target present vs target absent. The ratio is computed on the ﬁring pattern of V1/V2 neurons, modeled by Poisson distributions. The optimal mechanism for integrating information over time is shown to be a ‘soft max’ of diffusions, computed over the visual ﬁeld by ‘hypercolumns’ of neurons that share the same receptive ﬁeld and have different response properties to image features. An approximation of the optimal Bayesian observer, based on integrating local decisions, rather than diffusions, is also derived; it is shown experimentally to produce very similar predictions to the optimal observer in common psychophysics conditions. A psychophyisics experiment is proposed that may discriminate between which mechanism is used in the human brain. A B C Figure 1: Visual search. (A) Clutter and camouﬂage make visual search difﬁcult. (B,C) Psychologists and neuroscientists build synthetic displays to study visual search. In (B) the target ‘pops out’ (∆θ = 450 ), while in (C) the target requires more time to be detected (∆θ = 100 ) [1]. 1

3 0.58844578 159 nips-2011-Learning with the weighted trace-norm under arbitrary sampling distributions

Author: Rina Foygel, Ohad Shamir, Nati Srebro, Ruslan Salakhutdinov

Abstract: We provide rigorous guarantees on learning with the weighted trace-norm under arbitrary sampling distributions. We show that the standard weighted-trace norm might fail when the sampling distribution is not a product distribution (i.e. when row and column indexes are not selected independently), present a corrected variant for which we establish strong learning guarantees, and demonstrate that it works better in practice. We provide guarantees when weighting by either the true or empirical sampling distribution, and suggest that even if the true distribution is known (or is uniform), weighting by the empirical distribution may be beneﬁcial. 1

4 0.51183546 231 nips-2011-Randomized Algorithms for Comparison-based Search

Author: Dominique Tschopp, Suhas Diggavi, Payam Delgosha, Soheil Mohajer

Abstract: This paper addresses the problem of ﬁnding the nearest neighbor (or one of the R-nearest neighbors) of a query object q in a database of n objects, when we can only use a comparison oracle. The comparison oracle, given two reference objects and a query object, returns the reference object most similar to the query object. The main problem we study is how to search the database for the nearest neighbor (NN) of a query, while minimizing the questions. The difﬁculty of this problem depends on properties of the underlying database. We show the importance of a characterization: combinatorial disorder D which deﬁnes approximate triangle n inequalities on ranks. We present a lower bound of Ω(D log D + D2 ) average number of questions in the search phase for any randomized algorithm, which demonstrates the fundamental role of D for worst case behavior. We develop 3 a randomized scheme for NN retrieval in O(D3 log2 n + D log2 n log log nD ) 3 questions. The learning requires asking O(nD3 log2 n + D log2 n log log nD ) questions and O(n log2 n/ log(2D)) bits to store.

5 0.50865769 118 nips-2011-High-dimensional regression with noisy and missing data: Provable guarantees with non-convexity

Author: Po-ling Loh, Martin J. Wainwright

Abstract: Although the standard formulations of prediction problems involve fully-observed and noiseless data drawn in an i.i.d. manner, many applications involve noisy and/or missing data, possibly involving dependencies. We study these issues in the context of high-dimensional sparse linear regression, and propose novel estimators for the cases of noisy, missing, and/or dependent data. Many standard approaches to noisy or missing data, such as those using the EM algorithm, lead to optimization problems that are inherently non-convex, and it is difﬁcult to establish theoretical guarantees on practical algorithms. While our approach also involves optimizing non-convex programs, we are able to both analyze the statistical error associated with any global optimum, and prove that a simple projected gradient descent algorithm will converge in polynomial time to a small neighborhood of the set of global minimizers. On the statistical side, we provide non-asymptotic bounds that hold with high probability for the cases of noisy, missing, and/or dependent data. On the computational side, we prove that under the same types of conditions required for statistical consistency, the projected gradient descent algorithm will converge at geometric rates to a near-global minimizer. We illustrate these theoretical predictions with simulations, showing agreement with the predicted scalings. 1

6 0.50844896 64 nips-2011-Convergent Bounds on the Euclidean Distance

7 0.50671828 185 nips-2011-Newtron: an Efficient Bandit algorithm for Online Multiclass Prediction

8 0.50607711 80 nips-2011-Efficient Online Learning via Randomized Rounding

9 0.50502831 204 nips-2011-Online Learning: Stochastic, Constrained, and Smoothed Adversaries

10 0.50103873 149 nips-2011-Learning Sparse Representations of High Dimensional Data on Large Scale Dictionaries

11 0.50084358 106 nips-2011-Generalizing from Several Related Classification Tasks to a New Unlabeled Sample

12 0.50066191 199 nips-2011-On fast approximate submodular minimization

13 0.49986529 283 nips-2011-The Fixed Points of Off-Policy TD

14 0.49966037 29 nips-2011-Algorithms and hardness results for parallel large margin learning

15 0.49937546 22 nips-2011-Active Ranking using Pairwise Comparisons

16 0.49937174 258 nips-2011-Sparse Bayesian Multi-Task Learning

17 0.4983786 180 nips-2011-Multiple Instance Filtering

18 0.49796087 172 nips-2011-Minimax Localization of Structural Information in Large Noisy Matrices

19 0.49773306 150 nips-2011-Learning a Distance Metric from a Network

20 0.49624664 220 nips-2011-Prediction strategies without loss