nips nips2013 nips2013-261 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Mahito Sugiyama, Karsten Borgwardt
Abstract: Distance-based approaches to outlier detection are popular in data mining, as they do not require to model the underlying probability distribution, which is particularly challenging for high-dimensional data. We present an empirical comparison of various approaches to distance-based outlier detection across a large number of datasets. We report the surprising observation that a simple, sampling-based scheme outperforms state-of-the-art techniques in terms of both efficiency and effectiveness. To better understand this phenomenon, we provide a theoretical analysis why the sampling-based approach outperforms alternative methods based on k-nearest neighbor search. 1
Reference: text
sentIndex sentText sentNum sentScore
1 de Abstract Distance-based approaches to outlier detection are popular in data mining, as they do not require to model the underlying probability distribution, which is particularly challenging for high-dimensional data. [sent-6, score-0.566]
2 We present an empirical comparison of various approaches to distance-based outlier detection across a large number of datasets. [sent-7, score-0.549]
3 We focus in this paper on the latter, the distance-based approaches, which define outliers as objects located far away from the remaining objects. [sent-13, score-0.284]
4 More specifically, given a metric space (M, d), each object x ∈ M receives a real-valued outlierness score q(x) via a function q : M → R; q(x) depends on the distances between x and the other objects in the dataset. [sent-14, score-0.38]
5 Then the top-κ objects with maximum outlierness scores are reported to be outliers. [sent-15, score-0.254]
6 For example, LOF (Local Outlier Factor) [7] has become one of the most popular outlier detection methods, which measures the outlierness of each object by the difference of local densities between the object and its neighbors. [sent-17, score-0.876]
7 The main challenge, however, is its scalability since this approach potentially requires computation of all pairwise distances between objects in a dataset. [sent-18, score-0.14]
8 1 Here we show that a surprisingly simple and rapid sampling-based outlier detection method outperforms state-of-the-art distance-based methods in terms of both efficiency and effectiveness by conducting an extensive empirical analysis. [sent-22, score-0.597]
9 The proposed method behaves as follows: It takes a small set of samples from a given set of objects, followed by measuring the outlierness of each object by the distance from the object to its nearest neighbor in the sample set. [sent-23, score-0.345]
10 Intuitively, the sample set is employed as a telltale set, that is, it serves as an indicator of outlierness, as outliers should be significantly different from almost all objects by definition, including the objects in the sample set. [sent-24, score-0.407]
11 In addition, this method can be implemented in a one-pass manner with constant space complexity as we only have to store the sample set, which is ideal for analyzing massive datasets. [sent-26, score-0.057]
12 This paper is organized as follows: In Section 2, we describe our experimental design for the empirical comparison of different outlier detection strategies. [sent-27, score-0.549]
13 In Section 3, we review a number of state-of-the-art outlier detection methods which we used in our experiments, including our own proposal. [sent-28, score-0.549]
14 2 Experimental Design We present an extensive empirical analysis of state-of-the-art approaches for distance-based outlier detection and of our new approach, which are introduced in Section 3. [sent-30, score-0.549]
15 They are evaluated in terms of both scalability and effectiveness on synthetic and real-world datasets. [sent-31, score-0.058]
16 To evaluate the effectiveness of each method, we used the area under the precision-recall curve (AUPRC; equivalent to the average precision), which is a typical criterion to measure the success of outlier detection methods [1]. [sent-46, score-0.577]
17 It takes values from 0 to 1 and 1 is the best score, and quantifies whether the algorithm is able to retrieve outliers correctly. [sent-47, score-0.199]
18 We collected 14 real-world datasets from the UCI machine learning repository [2], with a wide range of sizes and dimensions, whose properties are summarized in Table 1. [sent-50, score-0.081]
19 Most of them have been intensively used in the outlier detection literature. [sent-51, score-0.549]
20 In particular, KDD1999 is one of the most popular benchmark datasets in outlier detection, which was originally used for the KDD Cup 1999. [sent-52, score-0.481]
21 The task is to detect intrusions from network traffic data, and as in [22], objects whose attribute logged in is positive were chosen as outliers. [sent-53, score-0.138]
22 For all datasets except for KDD1999, we assume that objects from the smallest class are outliers, as they are originally designed for classification rather than outlier detection. [sent-55, score-0.549]
23 For each dataset, inliers (non-outliers) were generated from a Gaussian mixture model with five equally weighted processes, resulting in five clusters. [sent-60, score-0.078]
24 The mean and the variance of each cluster was randomly set from the Gaussian distribution N (0, 1), and 30 outliers were generated from a uniform distribution in the range from the minimum to the maximum values of inliers. [sent-61, score-0.212]
25 3 Methods for Outlier Detection In the following, we will introduce the state-of-the-art methods in distance-based outlier detection, including our new sampling-based method. [sent-62, score-0.417]
26 Every method is formalized as a scoring function q : M → R on a metric space (M, d), which assigns a real-valued outlierness score to each object x 2 in a given set of objects X . [sent-63, score-0.355]
27 This means that at least a fraction α of all objects have a distance from x that is larger than δ. [sent-69, score-0.085]
28 [18] proposed to measure the outlierness by the kth-nearest neighbor (kth-NN) distance. [sent-72, score-0.169]
29 The score qkthNN (x) of an object x is defined as qkthNN (x) := dk (x; X ), where dk (x; X ) is the distance between x and its kth-NN in X . [sent-73, score-0.181]
30 It has a parameter k to specify the kth-NN and an additional parameter κ to retrieve the top-κ objects with the largest outlierness scores. [sent-78, score-0.246]
31 We set k = 5, which is a default setting used in the literature [4, 6, 15, 16], and set κ to be twice the number of outliers for each dataset. [sent-79, score-0.181]
32 Note that in practice we usually do not know the exact number of outliers and have to set κ large enough. [sent-80, score-0.181]
33 2 Iterative sampling Wu and Jermaine [21] proposed a sampling-based approach to efficiently approximate the kth-NN distance score qkthNN . [sent-82, score-0.105]
34 For each object x ∈ X , define qkthSp (x) := dk (x; Sx (X )), where Sx (X ) is a subset of X , which is randomly and iteratively sampled for each object x. [sent-83, score-0.175]
35 In addition, they introduced a random variable N = |O ∩ O′ | with two sets of top-κ outliers O and O′ with respect to qkthNN and qkthSp , and analyzed its expectation E(N ) and the variance Var(N ). [sent-84, score-0.181]
36 We implemented this method in C and set k = 5 and the sample size s = 20 unless stated otherwise. [sent-86, score-0.057]
37 We randomly and independently sample a subset S(X ) ⊂ X only once and define qSp (x) := ′min d(x, x′ ) x ∈S(X ) for each object x ∈ X . [sent-89, score-0.102]
38 Although this definition is closely related to Wu and Jermaine’s method qkthSp in the case of k = 1, our method performs sampling only once while their method performs sampling for each object. [sent-90, score-0.104]
39 We empirically show that this leads to significant differences in accuracy in outlier detection (see Section 4). [sent-91, score-0.549]
40 The outlierness of an object x is measured by the path length h(x) on the tree, and the score is normalized and averaged on t iTrees. [sent-104, score-0.27]
41 Finally, the outlierness score qtree (x) is defined as qtree (x) := 2−h(x)/c(s) , where h(x) is the average of h(x) on t iTrees and c(s) is defined as c(s) := 2H(s−1)−2(s−1)/n, where H denotes the harmonic number. [sent-105, score-0.374]
42 5 Local outlier factor (LOF) While LOF [7] is often referred to as not distance-based but density-based, we still include this method as it is also based on pairwise distances and is known to be a prominent outlier detection method. [sent-110, score-0.991]
43 The local reachability density of x of is defined as ρ(x) := |N k (x)| ( x′ ∈N k (x) max{ dk (x′ , X ), d(x, x′ ) })−1 . [sent-112, score-0.066]
44 Then the local outlier factor (LOF) qLOF (x) is defined as the ratio of the local reachability density of x and the average of the local reachability densities of its k-nearest neighbors, that is, ( ) ∑ qLOF (x) := |N k (x)|−1 x′ ∈N k (x) ρ(x′ ) ρ(x)−1 . [sent-113, score-0.514]
45 6 Angle-based outlier factor (ABOF) Kriegel et al. [sent-117, score-0.417]
46 is that if x is an outlier, the variance of angles between pairs of the remaining objects becomes small. [sent-122, score-0.085]
47 Formally, for an object x ∈ X define qABOF (x) := Vary,y′ ∈X c(y − x, y ′ − x). [sent-123, score-0.074]
48 [19], classifies objects into inliers and outliers o by introducing a hyperplane between them. [sent-138, score-0.344]
49 This classification can be turned into a ranking of outlierness by considering the signed distance to the separating hyperplane. [sent-139, score-0.143]
50 That is, the further an object is located in the outlier half space, the more likely it is to be a true outlier. [sent-140, score-0.509]
51 m # of outliers 126 207 212 200 240 268 30 554 1813 626 50859 125953 2747 703067 20887 30 34 274 30 649 617 8 1000 64 57 36 3 51 10 6 7 20 0. [sent-149, score-0.181]
52 40 5 10 50 200 Number of samples 1000 Figure 1: Average of area under the precisionrecall curves (AUPRCs) over all datasets with respect to changes in number of samples s for qSp (one-time sampling; our proposal) and qkthSp (iterative sampling by Wu and Jermaine [21]). [sent-152, score-0.082]
53 1 Experimental Results Sensitivity in sampling size and sampling scheme We first analyze the parameter sensitivity of our method qSp with respect to changes in the sample size s. [sent-163, score-0.132]
54 These scores with varying sample sizes are plotted in Figure 1. [sent-167, score-0.081]
55 Our method shows robust performance over all sample sizes from 5 to 1000 and the average AUPRC varies by less than 2%. [sent-168, score-0.055]
56 Interestingly, the score is maximized at a rather small sample size (s = 20) and monotonically (slightly) decreases with increasing sample size. [sent-169, score-0.138]
57 Moreover, for every sample size, the one-time sampling qSp significantly outperforms the iterative sampling qkthSp (Wilcoxon signed-rank test, α = 0. [sent-170, score-0.177]
58 2 Scalability and effectiveness Next we evaluate the scalability and effectiveness of the approaches introduced in Section 3 by systematically applying them to every dataset. [sent-174, score-0.086]
59 As we can see, our method qSp is the fastest among all methods; it can score more than five million objects within a few seconds. [sent-176, score-0.138]
60 The different costs of two processes, sampling once and performing nearest neighbor 5 Table 2: Running time (in seconds). [sent-178, score-0.078]
61 Averages in 10 trials are shown in four probabilistic methods qkthSp , qSp , qtree , and qABOF . [sent-179, score-0.089]
62 qkthNN Ionosphere Arrhythmia Wdbc Mfeat Isolet Pima Gaussian Optdigits Spambase Statlog Skin Pamap2 Covtype Kdd1999 Record Gaussian qkthSp qSp qtree qLOF qABOF qSVM 2. [sent-181, score-0.089]
63 qkthNN qkthSp qSp qtree qLOF qABOF qSVM Ionosphere Arrhythmia Wdbc Mfeat Isolet Pima Gaussian Optdigits Spambase Statlog Skin Pamap2 Covtype Kdd1999 Record Gaussian 0. [sent-291, score-0.089]
64 094 search versus re-sampling per object and performing kth-NN search, causes this difference. [sent-483, score-0.074]
65 The baseline qkthNN shows acceptable runtimes for large data only if the number of outliers is small. [sent-484, score-0.181]
66 It is interesting that qtree , which also uses one-time sampling like our method, shows better performance than exhaustive methods on average. [sent-489, score-0.165]
67 In contrast, qkthSp with iterative sampling is worst in terms of RMSD among all methods. [sent-490, score-0.077]
68 Based on these observations we can conclude that (1) small sample sizes lead to the maximum average precision for qSp ; (2) one-time sampling leads to better results than iterative sampling; (3) one-time sampling leads to better results than exhaustive methods and is also much faster. [sent-491, score-0.208]
69 6 5 Theoretical Analysis To understand why our new one-time sampling method qSp shows better performance than the other methods, we present a theoretical analysis to get answers to the following four questions: (1) What is the probability that qSp will correctly detect outliers? [sent-492, score-0.075]
70 Here we use the notion of Knorr and Ng’s DB(α, δ)-outliers [11, 12] and denote the set of DB(α, δ)-outliers by X (α; δ), that is, an object x ∈ X (α; δ) if |{ x′ ∈ X | d(x, x′ ) > δ }| ≥ αn holds. [sent-496, score-0.074]
71 We also define X (α; δ) = X \ X (α; δ) and, for simplicity, we call an element in X (α; δ) an outlier and that in X (α; δ) an inlier unless otherwise noted. [sent-497, score-0.441]
72 First we introduce a partition of inliers into subsets (clusters) using the threshold δ. [sent-501, score-0.078]
73 Then if we focus on a cluster C ∈ P δ , the probability of discriminating an outlier from inliers contained in C can be bounded from below. [sent-503, score-0.549]
74 Theorem 1 For an outlier x ∈ X (α; δ) and a cluster C ∈ P δ , we have ( ) Pr ∀x′ ∈ C, qSp (x) > qSp (x′ ) ≥ αs (1 − β s ) with β = (n − |C|)/n. [sent-505, score-0.448]
75 Moreover, if at least one object is sampled from the cluster C, qSp (x′ ) < δ holds for all x′ ∈ C. [sent-508, score-0.105]
76 For instance, if we assume that 5% of our data are outliers and fix α to be 0. [sent-511, score-0.181]
77 Next we consider the task of correctly discriminating an outlier from all inliers. [sent-525, score-0.44]
78 This can be achieved if for each cluster C ∈ P δ at least one object x ∈ C is chosen in the sampling process. [sent-526, score-0.157]
79 For every outlier x ∈ X (α; δ) and the sample size s ≥ l, we have ∑ ( ) Pr ∀x′ ∈ X (α; δ), qSp (x) > qSp (x′ ) ≥ αs f (s1 , . [sent-535, score-0.445]
80 i Furthermore, let I(α; δ) be a subset of X (α; δ) such that minx′ ∈I(α;δ) d(x, x′ ) > δ for every outlier x ∈ X (α; δ) and assume that P δ is a δ-partition of I(α; δ) instead of all inliers X (α; δ). [sent-550, score-0.495]
81 If S(X ) ⊆ I(α; δ) and at least one object is sampled from each cluster C ∈ P δ , qSp (x) > qSp (x′ ) holds for all pairs of an outlier x and an inlier x′ . [sent-551, score-0.546]
82 It is notable that the bound B(γ; δ) is independent of the actual number of outliers and inliers, which is a desirable property when analyzing large datasets. [sent-582, score-0.181]
83 From the differentiation dg/ds, we can see that this function is maximized at ( ) s = logβ log α/(log α + log β) , with the natural assumption 0 < β < α < 1 and this optimal sample size s is small for large α and small β, for example, s = 6 for (α, β) = (0. [sent-586, score-0.057]
84 Moreover, as we already saw above the bound B(γ; δ) is also maximized at such small sample sizes for large γ. [sent-591, score-0.084]
85 This could be the reason why qSp works well for small sample sizes, as these are common values for α, β, and γ in outlier detection. [sent-592, score-0.445]
86 Define Z(x, x′ ) := Pr(qkthSp (x) > qkthSp (x′ )) for the iterative sampling method qkthSp . [sent-594, score-0.077]
87 Since we repeat sampling for each object in qkthSp , probability Z(x, x′ ) for each x′ ∈ X (α; δ) is independent with respect to a fixed x ∈ X (α; δ). [sent-595, score-0.126]
88 Pr ∀x ∈ X (α; δ), ∀x′ ∈ X (α; δ), qkthSp (x) > qkthSp (x′ ) ≤ min x∈X (α;δ) x′ ∈X (α;δ) Although Z(x, x′ ) is typically close to 1 in outlier detection, the overall probability rapidly decreases if n is large. [sent-597, score-0.417]
89 Finally, let us consider the situation in which there exists the set of “true” outliers O ⊂ X given by an oracle. [sent-602, score-0.181]
90 This difference in detection ability could be a reason why qSp significantly outperforms qkthNN on average. [sent-611, score-0.152]
91 6 Conclusion In this study, we have performed an extensive set of experiments to compare current distance-based outlier detection methods. [sent-612, score-0.549]
92 Since the approach reached its best performance with small sample sizes, it achieves dramatic speedups compared to exhaustive methods and is faster than all state-of-the-art methods for distance-based outlier detection. [sent-614, score-0.469]
93 We are optimistic that these results will contribute to the further improvement of outlier detection techniques. [sent-617, score-0.549]
94 A comparative study for outlier detection techniques in data mining. [sent-637, score-0.549]
95 Mining distance-based outliers in near linear time with randomization and a simple pruning rule. [sent-642, score-0.181]
96 Appearance-based object recognition using SVMs: Which kernel should I use? [sent-671, score-0.074]
97 A near-linear time approximation algorithm for angle-based outlier detection in high-dimensional data. [sent-728, score-0.549]
98 Efficient algorithms for mining outliers from large data sets. [sent-733, score-0.237]
99 On-line unsupervised outlier detection using finite mixtures with discounting learning algorithms. [sent-759, score-0.549]
100 A survey on unsupervised outlier detection in highdimensional numerical data. [sent-770, score-0.571]
wordName wordTfidf (topN-words)
[('qsp', 0.635), ('outlier', 0.417), ('qkthsp', 0.325), ('qkthnn', 0.251), ('outliers', 0.181), ('outlierness', 0.143), ('detection', 0.132), ('jermaine', 0.089), ('qtree', 0.089), ('objects', 0.085), ('kriegel', 0.079), ('inliers', 0.078), ('lof', 0.078), ('object', 0.074), ('auprc', 0.074), ('knorr', 0.074), ('qabof', 0.074), ('qlof', 0.074), ('mfeat', 0.059), ('mining', 0.056), ('score', 0.053), ('isolet', 0.052), ('optdigits', 0.052), ('sampling', 0.052), ('arrhythmia', 0.044), ('auprcs', 0.044), ('qsvm', 0.044), ('rmsd', 0.044), ('statlog', 0.044), ('zimek', 0.044), ('pr', 0.042), ('sigkdd', 0.04), ('wdbc', 0.039), ('spambase', 0.039), ('pima', 0.039), ('reachability', 0.039), ('sl', 0.036), ('discovery', 0.035), ('covtype', 0.034), ('skin', 0.034), ('wu', 0.034), ('db', 0.033), ('pl', 0.033), ('ionosphere', 0.032), ('cluster', 0.031), ('datasets', 0.03), ('scalability', 0.03), ('bhaduri', 0.03), ('hawkins', 0.03), ('intrusions', 0.03), ('nms', 0.03), ('pagh', 0.03), ('schwabacher', 0.03), ('implemented', 0.029), ('maximized', 0.029), ('sample', 0.028), ('effectiveness', 0.028), ('ng', 0.027), ('sizes', 0.027), ('dk', 0.027), ('ams', 0.026), ('bay', 0.026), ('schubert', 0.026), ('scores', 0.026), ('neighbor', 0.026), ('distances', 0.025), ('acm', 0.025), ('iterative', 0.025), ('repository', 0.024), ('sx', 0.024), ('ramaswamy', 0.024), ('alfried', 0.024), ('krupp', 0.024), ('inlier', 0.024), ('exhaustive', 0.024), ('detect', 0.023), ('bases', 0.023), ('clusters', 0.023), ('discriminating', 0.023), ('karsten', 0.023), ('pham', 0.023), ('si', 0.022), ('highdimensional', 0.022), ('wilcoxon', 0.021), ('isolation', 0.021), ('uci', 0.02), ('sigmod', 0.02), ('anomaly', 0.02), ('outperforms', 0.02), ('knowledge', 0.02), ('ve', 0.019), ('record', 0.019), ('densities', 0.019), ('retrieve', 0.018), ('bingen', 0.018), ('cl', 0.018), ('located', 0.018), ('popular', 0.017), ('originally', 0.017), ('overcome', 0.017)]
simIndex simValue paperId paperTitle
same-paper 1 1.0 261 nips-2013-Rapid Distance-Based Outlier Detection via Sampling
Author: Mahito Sugiyama, Karsten Borgwardt
Abstract: Distance-based approaches to outlier detection are popular in data mining, as they do not require to model the underlying probability distribution, which is particularly challenging for high-dimensional data. We present an empirical comparison of various approaches to distance-based outlier detection across a large number of datasets. We report the surprising observation that a simple, sampling-based scheme outperforms state-of-the-art techniques in terms of both efficiency and effectiveness. To better understand this phenomenon, we provide a theoretical analysis why the sampling-based approach outperforms alternative methods based on k-nearest neighbor search. 1
2 0.14473665 158 nips-2013-Learning Multiple Models via Regularized Weighting
Author: Daniel Vainsencher, Shie Mannor, Huan Xu
Abstract: We consider the general problem of Multiple Model Learning (MML) from data, from the statistical and algorithmic perspectives; this problem includes clustering, multiple regression and subspace clustering as special cases. A common approach to solving new MML problems is to generalize Lloyd’s algorithm for clustering (or Expectation-Maximization for soft clustering). However this approach is unfortunately sensitive to outliers and large noise: a single exceptional point may take over one of the models. We propose a different general formulation that seeks for each model a distribution over data points; the weights are regularized to be sufficiently spread out. This enhances robustness by making assumptions on class balance. We further provide generalization bounds and explain how the new iterations may be computed efficiently. We demonstrate the robustness benefits of our approach with some experimental results and prove for the important case of clustering that our approach has a non-trivial breakdown point, i.e., is guaranteed to be robust to a fixed percentage of adversarial unbounded outliers. 1
3 0.11210357 232 nips-2013-Online PCA for Contaminated Data
Author: Jiashi Feng, Huan Xu, Shie Mannor, Shuicheng Yan
Abstract: We consider the online Principal Component Analysis (PCA) where contaminated samples (containing outliers) are revealed sequentially to the Principal Components (PCs) estimator. Due to their sensitiveness to outliers, previous online PCA algorithms fail in this case and their results can be arbitrarily skewed by the outliers. Here we propose the online robust PCA algorithm, which is able to improve the PCs estimation upon an initial one steadily, even when faced with a constant fraction of outliers. We show that the final result of the proposed online RPCA has an acceptable degradation from the optimum. Actually, under mild conditions, online RPCA achieves the maximal robustness with a 50% breakdown point. Moreover, online RPCA is shown to be efficient for both storage and computation, since it need not re-explore the previous samples as in traditional robust PCA algorithms. This endows online RPCA with scalability for large scale data. 1
4 0.091922142 356 nips-2013-Zero-Shot Learning Through Cross-Modal Transfer
Author: Richard Socher, Milind Ganjoo, Christopher D. Manning, Andrew Ng
Abstract: This work introduces a model that can recognize objects in images even if no training data is available for the object class. The only necessary knowledge about unseen visual categories comes from unsupervised text corpora. Unlike previous zero-shot learning models, which can only differentiate between unseen classes, our model can operate on a mixture of seen and unseen classes, simultaneously obtaining state of the art performance on classes with thousands of training images and reasonable performance on unseen classes. This is achieved by seeing the distributions of words in texts as a semantic space for understanding what objects look like. Our deep learning model does not require any manually defined semantic or visual features for either words or images. Images are mapped to be close to semantic word vectors corresponding to their classes, and the resulting image embeddings can be used to distinguish whether an image is of a seen or unseen class. We then use novelty detection methods to differentiate unseen classes from seen classes. We demonstrate two novelty detection strategies; the first gives high accuracy on unseen classes, while the second is conservative in its prediction of novelty and keeps the seen classes’ accuracy high. 1
5 0.063419707 119 nips-2013-Fast Template Evaluation with Vector Quantization
Author: Mohammad Amin Sadeghi, David Forsyth
Abstract: Applying linear templates is an integral part of many object detection systems and accounts for a significant portion of computation time. We describe a method that achieves a substantial end-to-end speedup over the best current methods, without loss of accuracy. Our method is a combination of approximating scores by vector quantizing feature windows and a number of speedup techniques including cascade. Our procedure allows speed and accuracy to be traded off in two ways: by choosing the number of Vector Quantization levels, and by choosing to rescore windows or not. Our method can be directly plugged into any recognition system that relies on linear templates. We demonstrate our method to speed up the original Exemplar SVM detector [1] by an order of magnitude and Deformable Part models [2] by two orders of magnitude with no loss of accuracy. 1
6 0.061225642 84 nips-2013-Deep Neural Networks for Object Detection
7 0.053634193 358 nips-2013-q-OCSVM: A q-Quantile Estimator for High-Dimensional Distributions
8 0.049744625 284 nips-2013-Robust Spatial Filtering with Beta Divergence
9 0.043175716 190 nips-2013-Mid-level Visual Element Discovery as Discriminative Mode Seeking
10 0.037650175 118 nips-2013-Fast Determinantal Point Process Sampling with Application to Clustering
11 0.03752793 355 nips-2013-Which Space Partitioning Tree to Use for Search?
12 0.034774303 207 nips-2013-Near-optimal Anomaly Detection in Graphs using Lovasz Extended Scan Statistic
13 0.033518735 138 nips-2013-Higher Order Priors for Joint Intrinsic Image, Objects, and Attributes Estimation
14 0.032795236 81 nips-2013-DeViSE: A Deep Visual-Semantic Embedding Model
15 0.02943578 176 nips-2013-Linear decision rule as aspiration for simple decision heuristics
16 0.028682649 233 nips-2013-Online Robust PCA via Stochastic Optimization
17 0.028627258 349 nips-2013-Visual Concept Learning: Combining Machine Vision and Bayesian Generalization on Concept Hierarchies
18 0.02818338 169 nips-2013-Learning to Prune in Metric and Non-Metric Spaces
19 0.027327118 243 nips-2013-Parallel Sampling of DP Mixture Models using Sub-Cluster Splits
20 0.026937053 162 nips-2013-Learning Trajectory Preferences for Manipulators via Iterative Improvement
topicId topicWeight
[(0, 0.092), (1, 0.035), (2, -0.019), (3, -0.008), (4, 0.042), (5, 0.01), (6, 0.003), (7, 0.008), (8, -0.057), (9, 0.023), (10, -0.069), (11, 0.047), (12, 0.023), (13, 0.038), (14, 0.016), (15, 0.029), (16, 0.008), (17, -0.049), (18, 0.038), (19, 0.024), (20, 0.014), (21, 0.011), (22, -0.036), (23, 0.004), (24, -0.025), (25, 0.065), (26, -0.003), (27, 0.019), (28, -0.019), (29, -0.025), (30, -0.027), (31, -0.0), (32, 0.058), (33, -0.005), (34, -0.023), (35, -0.048), (36, -0.031), (37, -0.031), (38, -0.044), (39, -0.021), (40, -0.075), (41, -0.007), (42, 0.032), (43, 0.025), (44, 0.055), (45, -0.137), (46, 0.076), (47, -0.009), (48, -0.072), (49, 0.06)]
simIndex simValue paperId paperTitle
same-paper 1 0.87509823 261 nips-2013-Rapid Distance-Based Outlier Detection via Sampling
Author: Mahito Sugiyama, Karsten Borgwardt
Abstract: Distance-based approaches to outlier detection are popular in data mining, as they do not require to model the underlying probability distribution, which is particularly challenging for high-dimensional data. We present an empirical comparison of various approaches to distance-based outlier detection across a large number of datasets. We report the surprising observation that a simple, sampling-based scheme outperforms state-of-the-art techniques in terms of both efficiency and effectiveness. To better understand this phenomenon, we provide a theoretical analysis why the sampling-based approach outperforms alternative methods based on k-nearest neighbor search. 1
2 0.58869547 232 nips-2013-Online PCA for Contaminated Data
Author: Jiashi Feng, Huan Xu, Shie Mannor, Shuicheng Yan
Abstract: We consider the online Principal Component Analysis (PCA) where contaminated samples (containing outliers) are revealed sequentially to the Principal Components (PCs) estimator. Due to their sensitiveness to outliers, previous online PCA algorithms fail in this case and their results can be arbitrarily skewed by the outliers. Here we propose the online robust PCA algorithm, which is able to improve the PCs estimation upon an initial one steadily, even when faced with a constant fraction of outliers. We show that the final result of the proposed online RPCA has an acceptable degradation from the optimum. Actually, under mild conditions, online RPCA achieves the maximal robustness with a 50% breakdown point. Moreover, online RPCA is shown to be efficient for both storage and computation, since it need not re-explore the previous samples as in traditional robust PCA algorithms. This endows online RPCA with scalability for large scale data. 1
3 0.58645886 158 nips-2013-Learning Multiple Models via Regularized Weighting
Author: Daniel Vainsencher, Shie Mannor, Huan Xu
Abstract: We consider the general problem of Multiple Model Learning (MML) from data, from the statistical and algorithmic perspectives; this problem includes clustering, multiple regression and subspace clustering as special cases. A common approach to solving new MML problems is to generalize Lloyd’s algorithm for clustering (or Expectation-Maximization for soft clustering). However this approach is unfortunately sensitive to outliers and large noise: a single exceptional point may take over one of the models. We propose a different general formulation that seeks for each model a distribution over data points; the weights are regularized to be sufficiently spread out. This enhances robustness by making assumptions on class balance. We further provide generalization bounds and explain how the new iterations may be computed efficiently. We demonstrate the robustness benefits of our approach with some experimental results and prove for the important case of clustering that our approach has a non-trivial breakdown point, i.e., is guaranteed to be robust to a fixed percentage of adversarial unbounded outliers. 1
4 0.52053183 119 nips-2013-Fast Template Evaluation with Vector Quantization
Author: Mohammad Amin Sadeghi, David Forsyth
Abstract: Applying linear templates is an integral part of many object detection systems and accounts for a significant portion of computation time. We describe a method that achieves a substantial end-to-end speedup over the best current methods, without loss of accuracy. Our method is a combination of approximating scores by vector quantizing feature windows and a number of speedup techniques including cascade. Our procedure allows speed and accuracy to be traded off in two ways: by choosing the number of Vector Quantization levels, and by choosing to rescore windows or not. Our method can be directly plugged into any recognition system that relies on linear templates. We demonstrate our method to speed up the original Exemplar SVM detector [1] by an order of magnitude and Deformable Part models [2] by two orders of magnitude with no loss of accuracy. 1
5 0.48848504 357 nips-2013-k-Prototype Learning for 3D Rigid Structures
Author: Hu Ding, Ronald Berezney, Jinhui Xu
Abstract: In this paper, we study the following new variant of prototype learning, called k-prototype learning problem for 3D rigid structures: Given a set of 3D rigid structures, find a set of k rigid structures so that each of them is a prototype for a cluster of the given rigid structures and the total cost (or dissimilarity) is minimized. Prototype learning is a core problem in machine learning and has a wide range of applications in many areas. Existing results on this problem have mainly focused on the graph domain. In this paper, we present the first algorithm for learning multiple prototypes from 3D rigid structures. Our result is based on a number of new insights to rigid structures alignment, clustering, and prototype reconstruction, and is practically efficient with quality guarantee. We validate our approach using two type of data sets, random data and biological data of chromosome territories. Experiments suggest that our approach can effectively learn prototypes in both types of data. 1
6 0.48776972 138 nips-2013-Higher Order Priors for Joint Intrinsic Image, Objects, and Attributes Estimation
7 0.48713261 238 nips-2013-Optimistic Concurrency Control for Distributed Unsupervised Learning
8 0.48039591 332 nips-2013-Tracking Time-varying Graphical Structure
9 0.4626815 276 nips-2013-Reshaping Visual Datasets for Domain Adaptation
10 0.45696947 190 nips-2013-Mid-level Visual Element Discovery as Discriminative Mode Seeking
11 0.45695069 358 nips-2013-q-OCSVM: A q-Quantile Estimator for High-Dimensional Distributions
12 0.44250944 233 nips-2013-Online Robust PCA via Stochastic Optimization
13 0.44063193 163 nips-2013-Learning a Deep Compact Image Representation for Visual Tracking
14 0.41789871 337 nips-2013-Transportability from Multiple Environments with Limited Experiments
15 0.41712821 84 nips-2013-Deep Neural Networks for Object Detection
16 0.41478336 343 nips-2013-Unsupervised Structure Learning of Stochastic And-Or Grammars
17 0.41473228 166 nips-2013-Learning invariant representations and applications to face verification
18 0.40954465 275 nips-2013-Reservoir Boosting : Between Online and Offline Ensemble Learning
19 0.40779823 335 nips-2013-Transfer Learning in a Transductive Setting
20 0.40513483 326 nips-2013-The Power of Asymmetry in Binary Hashing
topicId topicWeight
[(16, 0.056), (33, 0.091), (34, 0.078), (36, 0.013), (41, 0.018), (49, 0.024), (56, 0.079), (70, 0.047), (76, 0.367), (85, 0.033), (89, 0.03), (93, 0.036), (95, 0.02), (99, 0.019)]
simIndex simValue paperId paperTitle
same-paper 1 0.64313942 261 nips-2013-Rapid Distance-Based Outlier Detection via Sampling
Author: Mahito Sugiyama, Karsten Borgwardt
Abstract: Distance-based approaches to outlier detection are popular in data mining, as they do not require to model the underlying probability distribution, which is particularly challenging for high-dimensional data. We present an empirical comparison of various approaches to distance-based outlier detection across a large number of datasets. We report the surprising observation that a simple, sampling-based scheme outperforms state-of-the-art techniques in terms of both efficiency and effectiveness. To better understand this phenomenon, we provide a theoretical analysis why the sampling-based approach outperforms alternative methods based on k-nearest neighbor search. 1
2 0.60869396 359 nips-2013-Σ-Optimality for Active Learning on Gaussian Random Fields
Author: Yifei Ma, Roman Garnett, Jeff Schneider
Abstract: A common classifier for unlabeled nodes on undirected graphs uses label propagation from the labeled nodes, equivalent to the harmonic predictor on Gaussian random fields (GRFs). For active learning on GRFs, the commonly used V-optimality criterion queries nodes that reduce the L2 (regression) loss. V-optimality satisfies a submodularity property showing that greedy reduction produces a (1 − 1/e) globally optimal solution. However, L2 loss may not characterise the true nature of 0/1 loss in classification problems and thus may not be the best choice for active learning. We consider a new criterion we call Σ-optimality, which queries the node that minimizes the sum of the elements in the predictive covariance. Σ-optimality directly optimizes the risk of the surveying problem, which is to determine the proportion of nodes belonging to one class. In this paper we extend submodularity guarantees from V-optimality to Σ-optimality using properties specific to GRFs. We further show that GRFs satisfy the suppressor-free condition in addition to the conditional independence inherited from Markov random fields. We test Σoptimality on real-world graphs with both synthetic and real data and show that it outperforms V-optimality and other related methods on classification. 1
3 0.57033682 14 nips-2013-A Stability-based Validation Procedure for Differentially Private Machine Learning
Author: Kamalika Chaudhuri, Staal A. Vinterbo
Abstract: Differential privacy is a cryptographically motivated definition of privacy which has gained considerable attention in the algorithms, machine-learning and datamining communities. While there has been an explosion of work on differentially private machine learning algorithms, a major barrier to achieving end-to-end differential privacy in practical machine learning applications is the lack of an effective procedure for differentially private parameter tuning, or, determining the parameter value, such as a bin size in a histogram, or a regularization parameter, that is suitable for a particular application. In this paper, we introduce a generic validation procedure for differentially private machine learning algorithms that apply when a certain stability condition holds on the training algorithm and the validation performance metric. The training data size and the privacy budget used for training in our procedure is independent of the number of parameter values searched over. We apply our generic procedure to two fundamental tasks in statistics and machine-learning – training a regularized linear classifier and building a histogram density estimator that result in end-toend differentially private solutions for these problems. 1
4 0.56633765 115 nips-2013-Factorized Asymptotic Bayesian Inference for Latent Feature Models
Author: Kohei Hayashi, Ryohei Fujimaki
Abstract: This paper extends factorized asymptotic Bayesian (FAB) inference for latent feature models (LFMs). FAB inference has not been applicable to models, including LFMs, without a specific condition on the Hessian matrix of a complete loglikelihood, which is required to derive a “factorized information criterion” (FIC). Our asymptotic analysis of the Hessian matrix of LFMs shows that FIC of LFMs has the same form as those of mixture models. FAB/LFMs have several desirable properties (e.g., automatic hidden states selection and parameter identifiability) and empirically perform better than state-of-the-art Indian Buffet processes in terms of model selection, prediction, and computational efficiency. 1
5 0.55065614 250 nips-2013-Policy Shaping: Integrating Human Feedback with Reinforcement Learning
Author: Shane Griffith, Kaushik Subramanian, Jonathan Scholz, Charles Isbell, Andrea L. Thomaz
Abstract: A long term goal of Interactive Reinforcement Learning is to incorporate nonexpert human feedback to solve complex tasks. Some state-of-the-art methods have approached this problem by mapping human information to rewards and values and iterating over them to compute better control policies. In this paper we argue for an alternate, more effective characterization of human feedback: Policy Shaping. We introduce Advise, a Bayesian approach that attempts to maximize the information gained from human feedback by utilizing it as direct policy labels. We compare Advise to state-of-the-art approaches and show that it can outperform them and is robust to infrequent and inconsistent human feedback.
6 0.45537367 149 nips-2013-Latent Structured Active Learning
7 0.44051033 102 nips-2013-Efficient Algorithm for Privately Releasing Smooth Queries
8 0.40672606 177 nips-2013-Local Privacy and Minimax Bounds: Sharp Rates for Probability Estimation
9 0.40650952 100 nips-2013-Dynamic Clustering via Asymptotics of the Dependent Dirichlet Process Mixture
10 0.40638781 2 nips-2013-(Nearly) Optimal Algorithms for Private Online Learning in Full-information and Bandit Settings
11 0.40449783 121 nips-2013-Firing rate predictions in optimal balanced networks
12 0.40419498 238 nips-2013-Optimistic Concurrency Control for Distributed Unsupervised Learning
13 0.40390098 45 nips-2013-BIG & QUIC: Sparse Inverse Covariance Estimation for a Million Variables
14 0.40332618 350 nips-2013-Wavelets on Graphs via Deep Learning
15 0.40330774 204 nips-2013-Multiscale Dictionary Learning for Estimating Conditional Distributions
16 0.40323851 56 nips-2013-Better Approximation and Faster Algorithm Using the Proximal Average
17 0.40237617 249 nips-2013-Polar Operators for Structured Sparse Estimation
18 0.40201426 97 nips-2013-Distributed Submodular Maximization: Identifying Representative Elements in Massive Data
19 0.40191412 141 nips-2013-Inferring neural population dynamics from multiple partial recordings of the same neural circuit
20 0.40189099 104 nips-2013-Efficient Online Inference for Bayesian Nonparametric Relational Models