nips nips2012 nips2012-223 knowledge-graph by maker-knowledge-mining

223 nips-2012-Multi-criteria Anomaly Detection using Pareto Depth Analysis

Source: pdf

Author: Ko-jen Hsiao, Kevin Xu, Jeff Calder, Alfred O. Hero

Abstract: We consider the problem of identifying patterns in a data set that exhibit anomalous behavior, often referred to as anomaly detection. In most anomaly detection algorithms, the dissimilarity between data samples is calculated by a single criterion, such as Euclidean distance. However, in many cases there may not exist a single dissimilarity measure that captures all possible anomalous patterns. In such a case, multiple criteria can be deﬁned, and one can test for anomalies by scalarizing the multiple criteria using a linear combination of them. If the importance of the different criteria are not known in advance, the algorithm may need to be executed multiple times with different choices of weights in the linear combination. In this paper, we introduce a novel non-parametric multi-criteria anomaly detection method using Pareto depth analysis (PDA). PDA uses the concept of Pareto optimality to detect anomalies under multiple criteria without having to run an algorithm multiple times with different choices of weights. The proposed PDA approach scales linearly in the number of criteria and is provably better than linear combinations of the criteria. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract We consider the problem of identifying patterns in a data set that exhibit anomalous behavior, often referred to as anomaly detection. [sent-4, score-0.504]

2 In most anomaly detection algorithms, the dissimilarity between data samples is calculated by a single criterion, such as Euclidean distance. [sent-5, score-0.608]

3 However, in many cases there may not exist a single dissimilarity measure that captures all possible anomalous patterns. [sent-6, score-0.228]

4 In such a case, multiple criteria can be deﬁned, and one can test for anomalies by scalarizing the multiple criteria using a linear combination of them. [sent-7, score-0.321]

5 In this paper, we introduce a novel non-parametric multi-criteria anomaly detection method using Pareto depth analysis (PDA). [sent-9, score-0.533]

6 PDA uses the concept of Pareto optimality to detect anomalies under multiple criteria without having to run an algorithm multiple times with different choices of weights. [sent-10, score-0.25]

7 Many methods for anomaly detection have been developed using both parametric and non-parametric approaches. [sent-13, score-0.485]

8 For complex high-dimensional data, multiple dissimilarity measures corresponding to different criteria may be required to detect certain types of anomalies. [sent-15, score-0.227]

9 Multiple criteria, such as dissimilarity in object speeds or trajectory shapes, can be used to detect a greater range of anomalies than any single criterion. [sent-17, score-0.203]

10 In order to perform anomaly detection using these multiple criteria, one could ﬁrst combine the dissimilarities using a linear combination. [sent-18, score-0.573]

11 Furthermore, when the weights are changed, the anomaly detection algorithm needs to be re-executed using the new weights. [sent-21, score-0.514]

12 In this paper we propose a novel non-parametric multi-criteria anomaly detection approach using Pareto depth analysis (PDA). [sent-22, score-0.533]

13 Hence, PDA is able to detect anomalies under multiple combinations of the criteria without explicitly forming these combinations. [sent-27, score-0.215]

14 Center: Dyads for the training samples (black dots) along with ﬁrst 20 Pareto fronts (green lines) under two criteria: |∆x| and |∆y|. [sent-29, score-0.311]

15 The Pareto fronts induce a partial ordering on the set of dyads. [sent-30, score-0.265]

16 Dyads associated with the test sample marked by the red circle concentrate around shallow fronts (near the lower left of the ﬁgure). [sent-31, score-0.331]

17 The PDA approach involves creating dyads corresponding to dissimilarities between pairs of data samples under all of the dissimilarity measures. [sent-33, score-0.586]

18 The ﬁrst Pareto front (depth one) is the set of non-dominated dyads. [sent-35, score-0.178]

19 The second Pareto front (depth two) is obtained by removing these non-dominated dyads, i. [sent-36, score-0.178]

20 peeling off the ﬁrst front, and recomputing the ﬁrst Pareto front of those remaining. [sent-38, score-0.178]

21 In this way, each dyad is assigned to a Pareto front at some depth (see Fig. [sent-40, score-0.359]

22 Nominal and anomalous samples are located near different Pareto front depths; thus computing the front depths of the dyads corresponding to a test sample can discriminate between nominal and anomalous samples. [sent-42, score-1.23]

23 Under assumptions that the multi-criteria dyads can be modeled as a realizations from a smooth K-dimensional density we provide a mathematical analysis of the behavior of the ﬁrst Pareto front. [sent-44, score-0.408]

24 Furthermore, this theoretical prediction is experimentally validated by comparing PDA to several state-of-the-art anomaly detection algorithms in two experiments involving both synthetic and real data sets. [sent-46, score-0.497]

25 In Section 3 we provide an introduction to Pareto fronts and present a theoretical analysis of the properties of the ﬁrst Pareto front. [sent-49, score-0.265]

26 Section 4 relates Pareto fronts to the multi-criteria anomaly detection problem, which leads to the PDA anomaly detection algorithm. [sent-50, score-1.235]

27 These methods typically formulate machine learning problems as multi-objective optimization problems where ﬁnding even the ﬁrst Pareto front is quite difﬁcult. [sent-53, score-0.178]

28 These methods differ from our use of Pareto optimality because we consider multiple Pareto fronts created from a ﬁnite set of items, so we do not need to employ sophisticated methods in order to ﬁnd these fronts. [sent-54, score-0.313]

29 Hero and Fleury [4] introduced a method for gene ranking using Pareto fronts that is related to our approach. [sent-55, score-0.265]

30 The method ranks genes, in order of interest to a biologist, by creating Pareto fronts of the data samples, i. [sent-56, score-0.265]

31 In this paper, we consider Pareto fronts of dyads, which correspond to dissimilarities between pairs of data samples rather than the samples themselves, and use the distribution of dyads in Pareto fronts to perform multi-criteria anomaly detection rather than ranking. [sent-59, score-1.534]

32 A similar area is that of multiple kernel learning [8], which is typically applied to supervised learning problems, unlike the unsupervised anomaly detection setting we consider. [sent-66, score-0.506]

33 Finally, many other anomaly detection methods have previously been proposed. [sent-67, score-0.485]

34 [2] both provide extensive surveys of different anomaly detection methods and applications. [sent-69, score-0.497]

35 Byers and Raftery [9] proposed to use the distance between a sample and its kth-nearest neighbor as the anomaly score for the sample; similarly, Angiulli and Pizzuti [10] and Eskin et al. [sent-71, score-0.42]

36 [12] used an anomaly score based on the local density of the k nearest neighbors of a sample. [sent-74, score-0.461]

37 Hero [13] and Sricharan and Hero [14] introduced non-parametric adaptive anomaly detection methods using geometric entropy minimization, based on random k-point minimal spanning trees and bipartite k-nearest neighbor (k-NN) graphs, respectively. [sent-75, score-0.485]

38 Zhao and Saligrama [15] proposed an anomaly detection algorithm k-LPE using local p-value estimation (LPE) based on a k-NN graph. [sent-76, score-0.485]

39 These k-NN anomaly detection schemes only depend on the data through the pairs of data points (dyads) that deﬁne the edges in the k-NN graphs. [sent-77, score-0.511]

40 All of the aforementioned methods are designed for single-criteria anomaly detection. [sent-78, score-0.371]

41 In the multicriteria setting, the single-criteria algorithms must be executed multiple times with different weights, unlike the PDA anomaly detection algorithm that we propose in Section 4. [sent-79, score-0.526]

42 Different choices of (nonnegative) weights in the linear combination could result in different minimizers; a set of items that are minimizers under some linear combination can then be created by using a grid search over the weights, for example. [sent-94, score-0.181]

43 The second Pareto front can be constructed by ﬁnding items that are not strictly dominated by any of the remaining items, which are members of the set S \ F1 . [sent-101, score-0.251]

44 More generally, deﬁne the ith Pareto front by   i−1 Fi = Pareto front of the set S \  Fj  . [sent-102, score-0.356]

45 j=1 For convenience, we say that a Pareto front Fi is deeper than Fj if i > j. [sent-103, score-0.194]

46 1 Mathematical properties of Pareto fronts The distribution of the number of points on the ﬁrst Pareto front was ﬁrst studied by BarndorffNielsen and Sobel in their seminal work [17]. [sent-105, score-0.469]

47 We will be concerned here with properties of the ﬁrst Pareto front that are relevant to the PDA anomaly detection algorithm and thus have not yet been considered in the literature. [sent-107, score-0.663]

48 For a measurable set A ⊂ Rd , we denote by FA the points on the ﬁrst Pareto front of Y1 , . [sent-115, score-0.204]

49 + + Although this is a common motivation for Pareto methods, there are, to the best of our knowledge, no results in the literature regarding how many points on the Pareto front are missed by scalarization. [sent-130, score-0.204]

50 In the context of this paper, if some Pareto-optimal points are not identiﬁed, then the anomaly score (deﬁned in section 4. [sent-138, score-0.428]

51 Hence the size of F \ L is a measure of how much the anomaly score is inﬂated and the degree to which Pareto methods will outperform linear scalarization. [sent-140, score-0.414]

52 4 Figure 2: Left: Non-convexities in the Pareto front induced by the geometry of the domain Ω (Theorem 1). [sent-189, score-0.198]

53 4 Multi-criteria anomaly detection Assume that a training set XN = {X1 , . [sent-205, score-0.503]

54 Given a test sample X, the objective of anomaly detection is to declare X to be an anomaly if X is signiﬁcantly different from samples in XN . [sent-209, score-0.934]

55 Denote the dissimilarity between Xi and Xj computed using the measure corresponding to the lth criterion by dl (i, j). [sent-212, score-0.18]

56 For convenience, denote the set of all dyads by D and the space of all 2 dyads RK by D. [sent-225, score-0.816]

57 By the deﬁnition of strict dominance in Section 3, a dyad Dij strictly dominates + another dyad Di∗ j ∗ if dl (i, j) ≤ dl (i∗ , j ∗ ) for all l ∈ {1, . [sent-226, score-0.396]

58 The ﬁrst Pareto front F1 corresponds to the set of dyads from D that are not strictly dominated by any other dyads from D. [sent-230, score-1.034]

59 The second Pareto front F2 corresponds to the set of dyads from D \ F1 that are not strictly dominated by any other dyads from D \ F1 , and so on, as deﬁned in Section 3. [sent-231, score-1.034]

60 Recall that we refer to Fi as a deeper front than Fj if i > j. [sent-232, score-0.194]

61 1 Pareto fronts of dyads For each sample Xn , there are N − 1 dyads corresponding to its connections with the other N − 1 samples. [sent-234, score-1.099]

62 Deﬁne the set of N − 1 dyads associated with Xn by Dn . [sent-235, score-0.408]

63 If most dyads in Dn are located at shallow Pareto fronts, then the dissimilarities between Xn and the other N − 1 samples are small under some combination of the criteria. [sent-236, score-0.549]

64 This is the basic idea of the proposed multi-criteria anomaly detection method using PDA. [sent-238, score-0.485]

65 , FM of the dyads from the training set, where the total number of fronts M is the required number of fronts such that each dyad is a member of a front. [sent-242, score-1.089]

66 When a test sample X is obtained, we create new dyads corresponding to connections between X and training samples, as illustrated in Figure 1. [sent-243, score-0.478]

67 Similar to many other anomaly detection methods, we connect each test sample to its k nearest neighbors. [sent-244, score-0.558]

68 We create s = i=1 ki new dyads, which we denote by the set 5 Algorithm 1 PDA anomaly detection algorithm. [sent-246, score-0.528]

69 Training phase: 1: for l = 1 → K do 2: Calculate pairwise dissimilarities dl (i, j) between all training samples Xi and Xj 3: Create dyads Dij = [d1 (i, j), . [sent-247, score-0.557]

70 In other words, we create a dyad between X and Xj if Xj new is among the ki nearest neighbors1 of X in any criterion i. [sent-254, score-0.25]

71 2 Anomaly detection using depths of dyads In k-NN based anomaly detection algorithms such as those mentioned in Section 2, the anomaly score is a function of the k nearest neighbors to a test sample. [sent-262, score-1.519]

72 With multiple criteria, one could deﬁne an anomaly score by scalarization. [sent-263, score-0.423]

73 From the probabilistic properties of Pareto fronts discussed in Section 3. [sent-264, score-0.265]

74 This motivates us to develop a multi-criteria anomaly score using Pareto fronts. [sent-266, score-0.402]

75 We start with the observation from Figure 1 that dyads corresponding to a nominal test sample are typically located near shallower fronts than dyads corresponding to an anomalous test sample. [sent-267, score-1.371]

76 Each test sample is new associated with s new dyads, where the ith dyad Di has depth ei . [sent-268, score-0.254]

77 For each test sample X, we deﬁne the anomaly score v(X) to be the mean of the ei ’s, which corresponds to the average depth of the s dyads associated with X. [sent-269, score-0.931]

78 Thus the anomaly score can be easily computed and compared to the decision threshold σ using the test v(X) = 1 s s ei i=1 H1 σ. [sent-270, score-0.457]

79 H0 Pseudocode for the PDA anomaly detector is shown in Algorithm 1. [sent-271, score-0.371]

80 Both the training time and the 1 If a training sample is one of the ki nearest neighbors in multiple criteria, then multiple copies of the dyad corresponding to the connection between the test sample and the training sample are created. [sent-273, score-0.387]

81 dyads as well, and it is supported by experimental results presented in Section 2 of the supplementary material. [sent-282, score-0.408]

82 To handle multiple criteria, other anomaly detection methods, such as the ones mentioned in Section 2, need to be re-executed multiple times using different (non-negative) linear combinations of the K criteria. [sent-316, score-0.568]

83 5 Experiments We compare the PDA method with four other nearest neighbor-based single-criterion anomaly detection algorithms mentioned in Section 2. [sent-320, score-0.522]

84 For these methods, we use linear combinations of the criteria with different weights selected by grid search to compare performance with PDA. [sent-321, score-0.184]

85 The anomalous samples are located just outside of this hypercube. [sent-325, score-0.184]

86 Each class differs from the nominal distribution in one of the four dimensions; the distribution in the anomalous dimension is uniform on [1, 1. [sent-327, score-0.197]

87 We draw 300 training samples from the nominal distribution followed by 100 test samples from a mixture of the nominal and anomalous distributions with a 0. [sent-329, score-0.353]

88 If the criteria are combined using linear combinations, the combined dissimilarity measure reduces to weighted squared Euclidean distance. [sent-332, score-0.196]

89 Right: A subset of the dyads for the training samples along with the ﬁrst 100 Pareto fronts. [sent-368, score-0.454]

90 The fronts are highly non-convex, partially explaining the superior performance of PDA. [sent-369, score-0.265]

91 We use two criteria for computing the dissimilarity between trajectories. [sent-370, score-0.184]

92 For a more detailed comparison, the ROC curve for PDA and the attainable region for k-LPE (the region between the ROC curves corresponding to weights resulting in the best and worst AUCs) is shown in Figure 3 along with the ﬁrst 100 Pareto fronts for PDA. [sent-383, score-0.327]

93 6 Conclusion In this paper we proposed a new multi-criteria anomaly detection method. [sent-386, score-0.485]

94 The proposed method uses Pareto depth analysis to compute the anomaly score of a test sample by examining the Pareto front depths of dyads corresponding to the test sample. [sent-387, score-1.123]

95 Dyads corresponding to an anomalous sample tended to be located at deeper fronts compared to dyads corresponding to a nominal sample. [sent-388, score-0.927]

96 Instead of choosing a speciﬁc weighting or performing a grid search on the weights for different dissimilarity measures, the proposed method can efﬁciently detect anomalies in a manner that scales linearly in the number of criteria. [sent-389, score-0.237]

97 We also thank Daniel DeWoskin for suggesting a fast algorithm for computing Pareto fronts in two criteria. [sent-393, score-0.265]

98 A geometric framework for unsupervised anomaly detection: Detecting intrusions in unlabeled data. [sent-458, score-0.371]

99 Geometric entropy minimization (GEM) for anomaly detection and localization. [sent-474, score-0.485]

100 Anomaly detection with score functions based on nearest neighbor graphs. [sent-485, score-0.182]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('pareto', 0.578), ('dyads', 0.408), ('anomaly', 0.371), ('pda', 0.337), ('fronts', 0.265), ('front', 0.178), ('anomalous', 0.133), ('dyad', 0.133), ('detection', 0.114), ('dissimilarity', 0.095), ('criteria', 0.089), ('nominal', 0.064), ('dissimilarities', 0.055), ('anomalies', 0.054), ('scalarization', 0.051), ('hero', 0.05), ('dl', 0.048), ('depth', 0.048), ('auc', 0.046), ('lpe', 0.041), ('item', 0.038), ('criterion', 0.037), ('fi', 0.037), ('nearest', 0.037), ('ei', 0.037), ('di', 0.036), ('dij', 0.036), ('pedestrian', 0.036), ('depths', 0.033), ('items', 0.033), ('trajectory', 0.032), ('score', 0.031), ('lof', 0.031), ('combinations', 0.029), ('weights', 0.029), ('nb', 0.028), ('trajectories', 0.028), ('samples', 0.028), ('aucs', 0.027), ('optimality', 0.027), ('ki', 0.027), ('points', 0.026), ('grid', 0.025), ('fl', 0.024), ('located', 0.023), ('roc', 0.022), ('neighbors', 0.022), ('strictly', 0.022), ('detect', 0.022), ('multiple', 0.021), ('yn', 0.021), ('attainable', 0.021), ('angiulli', 0.02), ('baryshnikov', 0.02), ('breunig', 0.02), ('chandola', 0.02), ('fleury', 0.02), ('multicriteria', 0.02), ('nbl', 0.02), ('pizzuti', 0.02), ('sobel', 0.02), ('yukich', 0.02), ('geometry', 0.02), ('minimizers', 0.02), ('rd', 0.02), ('training', 0.018), ('dominated', 0.018), ('shallow', 0.018), ('test', 0.018), ('byers', 0.018), ('hodge', 0.018), ('sricharan', 0.018), ('saligrama', 0.018), ('eskin', 0.018), ('sample', 0.018), ('combination', 0.017), ('views', 0.016), ('xn', 0.016), ('create', 0.016), ('choices', 0.016), ('near', 0.016), ('deeper', 0.016), ('ated', 0.016), ('raftery', 0.015), ('postponed', 0.015), ('yl', 0.015), ('fj', 0.014), ('declare', 0.014), ('median', 0.014), ('walking', 0.014), ('outlier', 0.013), ('linear', 0.012), ('curve', 0.012), ('surveys', 0.012), ('fth', 0.012), ('dominates', 0.012), ('concentrate', 0.012), ('theorem', 0.012), ('scales', 0.012), ('validated', 0.012)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0 223 nips-2012-Multi-criteria Anomaly Detection using Pareto Depth Analysis

Author: Ko-jen Hsiao, Kevin Xu, Jeff Calder, Alfred O. Hero

2 0.06913162 119 nips-2012-Entropy Estimations Using Correlated Symmetric Stable Random Projections

Author: Ping Li, Cun-hui Zhang

Abstract: Methods for efﬁciently estimating Shannon entropy of data streams have important applications in learning, data mining, and network anomaly detections (e.g., the DDoS attacks). For nonnegative data streams, the method of Compressed Counting (CC) [11, 13] based on maximally-skewed stable random projections can provide accurate estimates of the Shannon entropy using small storage. However, CC is no longer applicable when entries of data streams can be below zero, which is a common scenario when comparing two streams. In this paper, we propose an algorithm for entropy estimation in general data streams which allow negative entries. In our method, the Shannon entropy is approximated by the ﬁnite difference of two correlated frequency moments estimated from correlated samples of symmetric stable random variables. Interestingly, the estimator for the moment we recommend for entropy estimation barely has bounded variance itself, whereas the common geometric mean estimator (which has bounded higher-order moments) is not sufﬁcient for entropy estimation. Our experiments conﬁrm that this method is able to well approximate the Shannon entropy using small storage.

3 0.049840901 209 nips-2012-Max-Margin Structured Output Regression for Spatio-Temporal Action Localization

Author: Du Tran, Junsong Yuan

Abstract: Structured output learning has been successfully applied to object localization, where the mapping between an image and an object bounding box can be well captured. Its extension to action localization in videos, however, is much more challenging, because we need to predict the locations of the action patterns both spatially and temporally, i.e., identifying a sequence of bounding boxes that track the action in video. The problem becomes intractable due to the exponentially large size of the structured video space where actions could occur. We propose a novel structured learning approach for spatio-temporal action localization. The mapping between a video and a spatio-temporal action trajectory is learned. The intractable inference and learning problems are addressed by leveraging an efﬁcient Max-Path search method, thus making it feasible to optimize the model over the whole structured space. Experiments on two challenging benchmark datasets show that our proposed method outperforms the state-of-the-art methods. 1

4 0.04670272 97 nips-2012-Diffusion Decision Making for Adaptive k-Nearest Neighbor Classification

Author: Yung-kyun Noh, Frank Park, Daniel D. Lee

Abstract: This paper sheds light on some fundamental connections of the diffusion decision making model of neuroscience and cognitive psychology with k-nearest neighbor classiﬁcation. We show that conventional k-nearest neighbor classiﬁcation can be viewed as a special problem of the diffusion decision model in the asymptotic situation. By applying the optimal strategy associated with the diffusion decision model, an adaptive rule is developed for determining appropriate values of k in knearest neighbor classiﬁcation. Making use of the sequential probability ratio test (SPRT) and Bayesian analysis, we propose ﬁve different criteria for adaptively acquiring nearest neighbors. Experiments with both synthetic and real datasets demonstrate the effectiveness of our classiﬁcation criteria. 1

5 0.04652613 344 nips-2012-Timely Object Recognition

Author: Sergey Karayev, Tobias Baumgartner, Mario Fritz, Trevor Darrell

Abstract: In a large visual multi-class detection framework, the timeliness of results can be crucial. Our method for timely multi-class detection aims to give the best possible performance at any single point after a start time; it is terminated at a deadline time. Toward this goal, we formulate a dynamic, closed-loop policy that infers the contents of the image in order to decide which detector to deploy next. In contrast to previous work, our method signiﬁcantly diverges from the predominant greedy strategies, and is able to learn to take actions with deferred values. We evaluate our method with a novel timeliness measure, computed as the area under an Average Precision vs. Time curve. Experiments are conducted on the PASCAL VOC object detection dataset. If execution is stopped when only half the detectors have been run, our method obtains 66% better AP than a random ordering, and 14% better performance than an intelligent baseline. On the timeliness measure, our method obtains at least 11% better performance. Our method is easily extensible, as it treats detectors and classiﬁers as black boxes and learns from execution traces using reinforcement learning. 1

6 0.038580839 133 nips-2012-Finding Exemplars from Pairwise Dissimilarities via Simultaneous Sparse Recovery

7 0.036192603 303 nips-2012-Searching for objects driven by context

8 0.033736113 40 nips-2012-Analyzing 3D Objects in Cluttered Images

9 0.033701144 106 nips-2012-Dynamical And-Or Graph Learning for Object Shape Modeling and Detection

10 0.033407483 178 nips-2012-Learning Label Trees for Probabilistic Modelling of Implicit Feedback

11 0.032215454 81 nips-2012-Context-Sensitive Decision Forests for Object Detection

12 0.032097388 272 nips-2012-Practical Bayesian Optimization of Machine Learning Algorithms

13 0.030787801 1 nips-2012-3D Object Detection and Viewpoint Estimation with a Deformable 3D Cuboid Model

14 0.029598033 145 nips-2012-Gradient Weights help Nonparametric Regressors

15 0.029516205 173 nips-2012-Learned Prioritization for Trading Off Accuracy and Speed

16 0.028953113 117 nips-2012-Ensemble weighted kernel estimators for multivariate entropy estimation

17 0.02823597 101 nips-2012-Discriminatively Trained Sparse Code Gradients for Contour Detection

18 0.02816214 201 nips-2012-Localizing 3D cuboids in single-view images

19 0.027728904 258 nips-2012-Online L1-Dictionary Learning with Application to Novel Document Detection

20 0.027528541 30 nips-2012-Accuracy at the Top

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.077), (1, -0.002), (2, -0.022), (3, -0.009), (4, 0.022), (5, -0.041), (6, 0.007), (7, 0.017), (8, -0.001), (9, 0.013), (10, -0.04), (11, -0.007), (12, 0.02), (13, -0.046), (14, 0.021), (15, 0.038), (16, 0.032), (17, -0.008), (18, -0.003), (19, -0.001), (20, 0.033), (21, -0.007), (22, 0.068), (23, -0.006), (24, -0.027), (25, 0.012), (26, 0.054), (27, -0.004), (28, 0.035), (29, -0.045), (30, 0.011), (31, 0.033), (32, 0.018), (33, 0.014), (34, -0.023), (35, 0.026), (36, 0.038), (37, 0.09), (38, -0.018), (39, 0.011), (40, -0.07), (41, 0.018), (42, -0.044), (43, -0.03), (44, 0.013), (45, -0.029), (46, -0.031), (47, 0.007), (48, 0.005), (49, 0.002)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.85420758 223 nips-2012-Multi-criteria Anomaly Detection using Pareto Depth Analysis

Author: Ko-jen Hsiao, Kevin Xu, Jeff Calder, Alfred O. Hero

2 0.60179412 119 nips-2012-Entropy Estimations Using Correlated Symmetric Stable Random Projections

Author: Ping Li, Cun-hui Zhang

3 0.53980684 189 nips-2012-Learning from the Wisdom of Crowds by Minimax Entropy

Author: Dengyong Zhou, Sumit Basu, Yi Mao, John C. Platt

Abstract: An important way to make large training sets is to gather noisy labels from crowds of nonexperts. We propose a minimax entropy principle to improve the quality of these labels. Our method assumes that labels are generated by a probability distribution over workers, items, and labels. By maximizing the entropy of this distribution, the method naturally infers item confusability and worker expertise. We infer the ground truth by minimizing the entropy of this distribution, which we show minimizes the Kullback-Leibler (KL) divergence between the probability distribution and the unknown truth. We show that a simple coordinate descent scheme can optimize minimax entropy. Empirically, our results are substantially better than previously published methods for the same problem. 1

4 0.4867616 95 nips-2012-Density-Difference Estimation

Author: Masashi Sugiyama, Takafumi Kanamori, Taiji Suzuki, Marthinus D. Plessis, Song Liu, Ichiro Takeuchi

Abstract: We address the problem of estimating the difference between two probability densities. A naive approach is a two-step procedure of ﬁrst estimating two densities separately and then computing their difference. However, such a two-step procedure does not necessarily work well because the ﬁrst step is performed without regard to the second step and thus a small estimation error incurred in the ﬁrst stage can cause a big error in the second stage. In this paper, we propose a single-shot procedure for directly estimating the density difference without separately estimating two densities. We derive a non-parametric ﬁnite-sample error bound for the proposed single-shot density-difference estimator and show that it achieves the optimal convergence rate. We then show how the proposed density-difference estimator can be utilized in L2 -distance approximation. Finally, we experimentally demonstrate the usefulness of the proposed method in robust distribution comparison such as class-prior estimation and change-point detection.

5 0.48292762 117 nips-2012-Ensemble weighted kernel estimators for multivariate entropy estimation

Author: Kumar Sricharan, Alfred O. Hero

Abstract: The problem of estimation of entropy functionals of probability densities has received much attention in the information theory, machine learning and statistics communities. Kernel density plug-in estimators are simple, easy to implement and widely used for estimation of entropy. However, for large feature dimension d, kernel plug-in estimators suﬀer from the curse of dimensionality: the MSE rate of convergence is glacially slow - of order O(T −γ/d ), where T is the number of samples, and γ > 0 is a rate parameter. In this paper, it is shown that for suﬃciently smooth densities, an ensemble of kernel plug-in estimators can be combined via a weighted convex combination, such that the resulting weighted estimator has a superior parametric MSE rate of convergence of order O(T −1 ). Furthermore, it is shown that these optimal weights can be determined by solving a convex optimization problem which does not require training data or knowledge of the underlying density, and therefore can be performed oﬄine. This novel result is remarkable in that, while each of the individual kernel plug-in estimators belonging to the ensemble suﬀer from the curse of dimensionality, by appropriate ensemble averaging we can achieve parametric convergence rates. 1

6 0.47981003 1 nips-2012-3D Object Detection and Viewpoint Estimation with a Deformable 3D Cuboid Model

7 0.47771674 201 nips-2012-Localizing 3D cuboids in single-view images

8 0.46427915 175 nips-2012-Learning High-Density Regions for a Generalized Kolmogorov-Smirnov Test in High-Dimensional Data

9 0.46180913 40 nips-2012-Analyzing 3D Objects in Cluttered Images

10 0.4592984 57 nips-2012-Bayesian estimation of discrete entropy with mixtures of stick-breaking priors

11 0.45241061 303 nips-2012-Searching for objects driven by context

12 0.44561735 279 nips-2012-Projection Retrieval for Classification

13 0.43077838 83 nips-2012-Controlled Recognition Bounds for Visual Learning and Exploration

14 0.4238832 123 nips-2012-Exponential Concentration for Mutual Information Estimation with Application to Forests

15 0.42366666 106 nips-2012-Dynamical And-Or Graph Learning for Object Shape Modeling and Detection

16 0.42157149 30 nips-2012-Accuracy at the Top

17 0.41888157 49 nips-2012-Automatic Feature Induction for Stagewise Collaborative Filtering

18 0.41850927 357 nips-2012-Unsupervised Template Learning for Fine-Grained Object Recognition

19 0.4183822 344 nips-2012-Timely Object Recognition

20 0.41623834 59 nips-2012-Bayesian nonparametric models for bipartite graphs

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(0, 0.036), (1, 0.373), (11, 0.01), (17, 0.012), (21, 0.021), (38, 0.105), (42, 0.033), (49, 0.013), (55, 0.016), (74, 0.066), (76, 0.105), (80, 0.048), (92, 0.034)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.68416953 223 nips-2012-Multi-criteria Anomaly Detection using Pareto Depth Analysis

Author: Ko-jen Hsiao, Kevin Xu, Jeff Calder, Alfred O. Hero

2 0.66040879 266 nips-2012-Patient Risk Stratification for Hospital-Associated C. diff as a Time-Series Classification Task

Author: Jenna Wiens, Eric Horvitz, John V. Guttag

Abstract: A patient’s risk for adverse events is affected by temporal processes including the nature and timing of diagnostic and therapeutic activities, and the overall evolution of the patient’s pathophysiology over time. Yet many investigators ignore this temporal aspect when modeling patient outcomes, considering only the patient’s current or aggregate state. In this paper, we represent patient risk as a time series. In doing so, patient risk stratiﬁcation becomes a time-series classiﬁcation task. The task differs from most applications of time-series analysis, like speech processing, since the time series itself must ﬁrst be extracted. Thus, we begin by deﬁning and extracting approximate risk processes, the evolving approximate daily risk of a patient. Once obtained, we use these signals to explore different approaches to time-series classiﬁcation with the goal of identifying high-risk patterns. We apply the classiﬁcation to the speciﬁc task of identifying patients at risk of testing positive for hospital acquired Clostridium difﬁcile. We achieve an area under the receiver operating characteristic curve of 0.79 on a held-out set of several hundred patients. Our two-stage approach to risk stratiﬁcation outperforms classiﬁers that consider only a patient’s current state (p<0.05). 1

3 0.56460297 43 nips-2012-Approximate Message Passing with Consistent Parameter Estimation and Applications to Sparse Learning

Author: Ulugbek Kamilov, Sundeep Rangan, Michael Unser, Alyson K. Fletcher

Abstract: We consider the estimation of an i.i.d. vector x ∈ Rn from measurements y ∈ Rm obtained by a general cascade model consisting of a known linear transform followed by a probabilistic componentwise (possibly nonlinear) measurement channel. We present a method, called adaptive generalized approximate message passing (Adaptive GAMP), that enables joint learning of the statistics of the prior and measurement channel along with estimation of the unknown vector x. Our method can be applied to a large class of learning problems including the learning of sparse priors in compressed sensing or identiﬁcation of linear-nonlinear cascade models in dynamical systems and neural spiking processes. We prove that for large i.i.d. Gaussian transform matrices the asymptotic componentwise behavior of the adaptive GAMP algorithm is predicted by a simple set of scalar state evolution equations. This analysis shows that the adaptive GAMP method can yield asymptotically consistent parameter estimates, which implies that the algorithm achieves a reconstruction quality equivalent to the oracle algorithm that knows the correct parameter values. The adaptive GAMP methodology thus provides a systematic, general and computationally efﬁcient method applicable to a large range of complex linear-nonlinear models with provable guarantees. 1

4 0.52976227 61 nips-2012-Best Arm Identification: A Unified Approach to Fixed Budget and Fixed Confidence

Author: Victor Gabillon, Mohammad Ghavamzadeh, Alessandro Lazaric

Abstract: We study the problem of identifying the best arm(s) in the stochastic multi-armed bandit setting. This problem has been studied in the literature from two different perspectives: ﬁxed budget and ﬁxed conﬁdence. We propose a unifying approach that leads to a meta-algorithm called uniﬁed gap-based exploration (UGapE), with a common structure and similar theoretical analysis for these two settings. We prove a performance bound for the two versions of the algorithm showing that the two problems are characterized by the same notion of complexity. We also show how the UGapE algorithm as well as its theoretical analysis can be extended to take into account the variance of the arms and to multiple bandits. Finally, we evaluate the performance of UGapE and compare it with a number of existing ﬁxed budget and ﬁxed conﬁdence algorithms. 1

5 0.5231207 188 nips-2012-Learning from Distributions via Support Measure Machines

Author: Krikamol Muandet, Kenji Fukumizu, Francesco Dinuzzo, Bernhard Schölkopf

Abstract: This paper presents a kernel-based discriminative learning framework on probability measures. Rather than relying on large collections of vectorial training examples, our framework learns using a collection of probability distributions that have been constructed to meaningfully represent training data. By representing these probability distributions as mean embeddings in the reproducing kernel Hilbert space (RKHS), we are able to apply many standard kernel-based learning techniques in straightforward fashion. To accomplish this, we construct a generalization of the support vector machine (SVM) called a support measure machine (SMM). Our analyses of SMMs provides several insights into their relationship to traditional SVMs. Based on such insights, we propose a ﬂexible SVM (FlexSVM) that places different kernel functions on each training example. Experimental results on both synthetic and real-world data demonstrate the effectiveness of our proposed framework. 1

6 0.50889558 34 nips-2012-Active Learning of Multi-Index Function Models

7 0.4773052 209 nips-2012-Max-Margin Structured Output Regression for Spatio-Temporal Action Localization

8 0.44306201 139 nips-2012-Fused sparsity and robust estimation for linear models with unknown variance

9 0.42640087 273 nips-2012-Predicting Action Content On-Line and in Real Time before Action Onset – an Intracranial Human Study

10 0.42496136 260 nips-2012-Online Sum-Product Computation Over Trees

11 0.42311996 176 nips-2012-Learning Image Descriptors with the Boosting-Trick

12 0.4212718 235 nips-2012-Natural Images, Gaussian Mixtures and Dead Leaves

13 0.42112359 316 nips-2012-Small-Variance Asymptotics for Exponential Family Dirichlet Process Mixture Models

14 0.42098016 69 nips-2012-Clustering Sparse Graphs

15 0.42063543 149 nips-2012-Hierarchical Optimistic Region Selection driven by Curiosity

16 0.42004415 83 nips-2012-Controlled Recognition Bounds for Visual Learning and Exploration

17 0.41973948 112 nips-2012-Efficient Spike-Coding with Multiplicative Adaptation in a Spike Response Model

18 0.41954792 148 nips-2012-Hamming Distance Metric Learning

19 0.41899616 117 nips-2012-Ensemble weighted kernel estimators for multivariate entropy estimation

20 0.41895813 111 nips-2012-Efficient Sampling for Bipartite Matching Problems