nips nips2009 nips2009-257 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Marco Cuturi, Jean-philippe Vert, Alexandre D'aspremont
Abstract: We propose new methodologies to detect anomalies in discrete-time processes taking values in a probability space. These methods are based on the inference of functionals whose evaluations on successive states visited by the process are stationary and have low autocorrelations. Deviations from this behavior are used to flag anomalies. The candidate functionals are estimated in a subspace of a reproducing kernel Hilbert space associated with the original probability space considered. We provide experimental results on simulated datasets which show that these techniques compare favorably with other algorithms.
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract We propose new methodologies to detect anomalies in discrete-time processes taking values in a probability space. [sent-5, score-0.305]
2 These methods are based on the inference of functionals whose evaluations on successive states visited by the process are stationary and have low autocorrelations. [sent-6, score-0.54]
3 The candidate functionals are estimated in a subspace of a reproducing kernel Hilbert space associated with the original probability space considered. [sent-8, score-0.464]
4 d) observations, anomaly detection is usually referred to as outlier detection and is in many ways equivalent to density estimation. [sent-14, score-0.326]
5 Several density estimators have been used in this context and we refer the reader to the exhaustive review in [1]. [sent-15, score-0.064]
6 Among such techniques, methods which estimate non-parametric alarm functions in reproducing kernel Hilbert spaces (rkHs) are particularly relevant to our work. [sent-16, score-0.222]
7 They form alarm functions of the type f ( · ) = i∈I ci k(xi , · ), where k is a positive definite kernel and (ci )i∈I is a family of coefficients paired with a family (xi )i∈I of previously observed data points. [sent-17, score-0.192]
8 Two well known kernel methods have been used so far for this purpose, namely kernel principal component analysis (kPCA) [2] and one-class support vector machines (ocSVM) [3]. [sent-19, score-0.166]
9 The ocSVM is a popular density estimation tool and it is thus not surprising that it has already found successful applications to detect anomalies in i. [sent-20, score-0.274]
10 kPCA can also be used to detect outliers as described in [5], where an outlier is defined as any point far enough from the boundaries of an ellipsoid in the rkHs containing most of the observed points. [sent-23, score-0.113]
11 These outlier detection methods can also be applied to dynamical systems. [sent-24, score-0.151]
12 We now monitor discrete time stochastic processes Z = (Zt )t∈ N taking values in a space Z and, based on previous observations zt−1 , · · · , z0 , we seek to detect whether a new observation zt abnormally deviates from the usual dynamics of the system. [sent-25, score-0.615]
13 1 In practice, anomaly detection then involves a two step procedure. [sent-29, score-0.225]
14 It first produces an estimator ˆ Zt of the conditional expectation of Zt given Zt−1 to extract an empirical estimator for the residues ˆ εt = Zt − Zt . [sent-30, score-0.052]
15 d assumption, abnormal residues can then be used to flag anomalies. [sent-33, score-0.084]
16 This ˆ approach and advanced extensions can be used both for multivariate data [6, 7] and linear processes in functional spaces [8] using spaces of H¨ lderian functions. [sent-34, score-0.178]
17 o The main contribution of our paper is to propose an estimation approach of alarm functionals that can be used on arbitrary Hilbert spaces and which bypasses the estimation of residues εt ∈ Z by foˆ cusing directly on suitable properties for alarm functionals. [sent-35, score-0.666]
18 Detecting anomalies in a sequence generated by white noise is a task which is arguably easier than detecting anomalies in arbitrary time-series. [sent-37, score-0.55]
19 In this sense, we look for functionals α such that α(Zt ) exhibits a stationary behavior with low autocorrelations, ideally white noise, which can be used in turn to flag an anomaly whenever α(Zt ) departs from normality. [sent-38, score-0.729]
20 We call functionals α that strike a good balance between exhibiting a low autocovariance of order 1 and a high variance on successive values Zt a white functional of the process Z. [sent-39, score-0.773]
21 Our definition can be naturally generalized to higher autocovariance orders as the reader will naturally see in the remaining of the paper. [sent-40, score-0.079]
22 For a multivariate stochastic process X = (Xt )t∈Z taking values in Rd , X is said to be cointegrated if there exists a vector a of Rd such that (aT Xt )t∈Z is stationary. [sent-42, score-0.066]
23 In this work we discard the immediate interpretability of the weights associated with linear functionals aT Xt to focus instead on functionals α in a rkHs H such that α(Zt ) is stationary, and use this property to detect anomalies. [sent-44, score-0.811]
24 In Section 2, we study different criterions to measure the autocorrelation of a process, directly inspired by min/max autocorrelation factors [10] and the seminal work of Box-Tiao [11] on cointegration. [sent-46, score-0.476]
25 We study the asymptotic properties of finite sample estimators of these criterions in Section 3 and discuss the practical estimation of white functionals in Section 4. [sent-47, score-0.676]
26 2 Criterions to define white functionals Consider a process Z = (Zt )t∈ Z taking values in a probability space Z. [sent-49, score-0.514]
27 Z will be mainly considered in this work under the light of its mapping onto a rkHs H associated with a bounded and continuous kernel k on Z × Z. [sent-50, score-0.059]
28 For two elements α and β of H we write α ⊗ β for their tensor product, namely the linear map of H onto itself such that α ⊗ β : x → α, x H β. [sent-53, score-0.099]
29 Using the notations of [14] we write C = Ep [φt ⊗ φt ], D = Ep [φt ⊗ φt+1 ], respectively for the covariance and autocovariance of order 1 of φt . [sent-54, score-0.16]
30 The following definitions introduce two criterions which quantify how related two successive evaluations of α(Zt ) are. [sent-57, score-0.177]
31 Given an element α of H such that α, Cα H > 0, γ(α) is the absolute autocorrelation of α(Z) of order 1, γ(α) = | corr(α(Zt ), α(Zt+1 )| = | α, Dα α, Cα H| . [sent-59, score-0.215]
32 2 If we assume further that φ is an autoregressive Hilbertian process of order 1 [14], ARH(1) for short, there exists a compact operator ρ : H → H and a H strong white noise1 (εt )t∈Z such that φt+1 = ρ φt + εt . [sent-62, score-0.317]
33 In their seminal work, Box and Tiao [11] quantify the predictability of the linear functionals of a vector autoregressive process in terms of variance ratios. [sent-63, score-0.631]
34 The following definition is a direct adaptation of that principle to autoregressive processes in Hilbert spaces. [sent-64, score-0.115]
35 2] we have that C = ρ Cρ∗ + Cε where for any linear operator A of H, A∗ is its adjoint. [sent-66, score-0.098]
36 Given an element α of H such that α, Cα H > 0, the predictability λ(α) is the quotient λ(α) = var α, ρ φt H α, ρ C ρ∗ α H α, DC −1 D∗ α = = var α, φt H α, Cα H α, Cα H H . [sent-68, score-0.179]
37 For any linear operator A of H and any non-zero element x of H write R(A, x) for the Rayleigh quotient x, Ax H . [sent-73, score-0.193]
38 x, x H R(A, x) = We use the notations in [12] and introduce the normalized cross-covariance (or rather auto1 1 covariance in the context of this paper) operator V = C − 2 DC − 2 . [sent-74, score-0.128]
39 Note that for any skewsymmetric operator A, that is A = −A∗ , we have that x, Ax H = A∗ x, x H = − Ax, x H = 0 ∗ and thus R(A, x) = R( A+A , x). [sent-75, score-0.098]
40 Minimizing γ is a more challenging problem since the operator V + V ∗ is not necessarily positive definite. [sent-79, score-0.098]
41 The formulation of γ and λ as Rayleigh quotients is also useful to obtain the asymptotic convergence of their empirical counterparts (Section 3) and to draw comparisons with kernel-CCA (Section 5). [sent-83, score-0.059]
42 3 Asymptotics and matrix expressions for empirical estimators of γ and λ 3. [sent-84, score-0.064]
43 d H-random variables 3 ARH(1) processes and more generally stationary linear processes in Hilbert spaces [14, Section 8]. [sent-89, score-0.153]
44 Assume that V is a compact operator, lim ǫn = 0 and lim n→∞ writing · S for the Hilbert-Schmidt operator norm, lim Vn − V n→∞ S = 0. [sent-93, score-0.208]
45 , φn which define the empirical estimators above also span a subspace Hn in H which can be used to estimate white functionals. [sent-112, score-0.207]
46 We write K for the original n + 1 × n + 1 Gram matrix 1 1 ¯ ¯ [k(zi , zj )]i,j and K for its centered counterpart K = (In − n 1n,n )K(In − n 1n,n ) = [ φi , φj H ]i,j . [sent-114, score-0.096]
47 4 Selecting white functionals in practice Both γ(α) and λ(α) are proxies to quantify the independence of successive observations α(Zt ). [sent-125, score-0.606]
48 Namely, functions with low γ and λ are likely to have low autocorrelations and be stationary when evaluated on the process Z, and the same can be said of functions with low γn and λn asymptotically. [sent-126, score-0.226]
49 However, when H is of high or infinite dimension, the direct minimization of γn and λn is likely to result in degenerate functions2 which may have extremely low autocovariance on Z but very low variance as well. [sent-127, score-0.18]
50 We select white functionals having this trade off in mind, such that both α, C, α H is not negligible and γ or λ are low at the same time. [sent-128, score-0.523]
51 2 Since the rank of operator Vn is actually n − 1, we are even guaranteed to find in Hn a minimizer for γn and another for λn with respectively zero predictability and zero absolute autocorrelation. [sent-129, score-0.203]
52 Namely, suppose Cn can be decomposed as Cn = n gi ei ⊗ ei where ei is an orthonormal basis of eigenvectors with i=1 eigenvalues in decreasing order g1 ≥ g2 ≥ · · · ≥ gn ≥ 0. [sent-132, score-0.177]
53 bT (G + nǫn I)b γn (β) = (3) (4) We define two different functions of Hp , βmac and βBT , as the the functionals in Hp whose coefficients correspond to the eigenvector with minimal (absolute) eigenvalue of the two Rayleigh quotients of Equations (3) and (4) respectively. [sent-149, score-0.496]
54 We call these functionals the minimum autocorrelation (MAC) and Box-Tiao (BT) functionals of Z. [sent-150, score-0.947]
55 • Input: n + 1 observations z0 , · · · , zn ∈ Z of a time-series Z, a p. [sent-152, score-0.069]
56 kernel k on Z × Z and a parameter p (we propose an experimental methodology to set p in Section 6. [sent-154, score-0.059]
57 3) n • Output: a real-valued function f (·) = i=0 ci k(zi , ·) that is a white functional of Z. [sent-155, score-0.212]
58 • Algorithm: – Compute the (n + 1) × (n + 1) kernel matrix K, center it and drop the first row and column to obtain K. [sent-156, score-0.059]
59 1 – Compute Ep = U diag(v1 , · · · , vp )−1/2 and G = n diag(v1 , · · · , vp ). [sent-158, score-0.08]
60 – Compute the matrix numerator N and denominator D of either Equation (3) or Equation (4) and recover the eigenvector b with minimal absolute eigenvalue of the generalized eigenvalue problem (N, D) n n 1 1 – Set a = Ep b ∈ Rn . [sent-159, score-0.157]
61 , zn and consider its eigenvector with smallest eigenvalue to detect cointegrated relationships in the process Zt . [sent-164, score-0.233]
62 Although the criterion considered by PCA, namely variance, disregards the temporal structure of the observations and only focuses on the values spanned by the process, this technique is useful to get rid of all non-stationary components of Zt . [sent-166, score-0.088]
63 On the other hand, kernel-PCA [2], a non-parametric extension of PCA, can be naturally applied for anomaly detection in an i. [sent-167, score-0.225]
64 It is thus natural to use kernel-PCA, namely an eigenfunction with low variance, and hope that it will have low autocorrelation to define white functionals of a process. [sent-171, score-0.84]
65 Our experiments show that this is indeed the case and in agreement with [5] seem to 3 Recall that if (ui , vi ) are eigenvalue and eigenvector pairs of K, the matrix E of coordinates of eigenfunc−1/2 tions ei expressed in the n points φ1 , . [sent-172, score-0.131]
66 5 indicate that the eigenfunctions which lie at the very low end of the spectrum, usually discarded as noise and less studied in the literature, can prove useful for anomaly detection tasks. [sent-176, score-0.298]
67 Indeed, the operator V V ∗ used in this work to define λ is used in the context of kernel-CCA to extract one of the two functions which maximally correlate two samples, the other function being obtained from V ∗ V . [sent-178, score-0.098]
68 in the context of this paper, V is an autocorrelation operator while the authors of [12] consider normalized covariances between two different samples; 2. [sent-180, score-0.287]
69 while kernel-CCA maximizes the Rayleigh quotient of V V ∗ , we look for eigenfunctions which lie at the lower end of the spectrum of the same operator. [sent-182, score-0.081]
70 A possible extension of our work is to look for two functionals f and g which, rather than maximize the correlation of two distinct samples as is the case in CCA, are estimated to minimize the correlation between g(zt ) and f (zt+1 ). [sent-183, score-0.379]
71 1 Generating sample paths polluted by anomalies We consider in this experimental section a simulated dynamical system perturbed by arbitrary anomalies. [sent-186, score-0.339]
72 To this effect, we use the Lotka-Volterra equations to generate time-series quantifying the populations of different species competing for common resources. [sent-187, score-0.065]
73 For all other timestamps tk < t < tk+1 , the system follows the usual dynamic of Equation (5). [sent-212, score-0.125]
74 Anomalies violate the usual dynamics in two different ways: first, they ignore the usual dynamical equations and the current location of the process to create instead purely random increments; second, depending on the magnitude of σδ relative to σǫ , such anomalies may induce unusual jumps. [sent-213, score-0.473]
75 797 −2 50 100 150 200 Figure 1: The figure on the top plots a sample path of length 200 of a 4-dimensional Lotka-Volterra dynamic system with perturbations drawn with σε = . [sent-234, score-0.055]
76 The data is split between 80 regular observations and 120 observations polluted by 10 anomalies. [sent-237, score-0.119]
77 All four functionals have been estimated using ρ = 1, and we highlight by a red dot the values they take when an anomaly is actually observed. [sent-238, score-0.533]
78 Writing ∆zi = zi − zi−1 , we use the following mixture of kernels k : k(zi , zj ) = ρ e−100 ∆zi −∆zj 2 + (1 − ρ)e−10 zi −zj 2 , (6) with ρ ∈ [0, 1]. [sent-242, score-0.161]
79 5, k accounts for both the state of the system and its most recent increments, while only increments are considered for ρ = 1. [sent-245, score-0.085]
80 Anomalies can be detected with both criterions, since they can be tracked down when the process visits unusual regions or undergoes brusque and atypical changes. [sent-246, score-0.09]
81 While the MAC functional is defined and estimated in order to behave as closely as possible to random i. [sent-249, score-0.069]
82 d noise, the BT functional βBT is tuned to be stationary as discussed in [11]. [sent-251, score-0.121]
83 In order to obtain a white functional from βBT it is possible to model the time series βBT (zt ) as an unidimensional autoregressive model, that is estimate (on the training sample again) coefficients r1 , r2 , . [sent-252, score-0.295]
84 ˆt βBT (zt ) = i=1 Both the order q and the autoregressive coefficients can be estimated on the training sample with standard AR packages, using for instance Schwartz’s criterion to select q. [sent-256, score-0.084]
85 In practice however we use p the residuals εBT = βBT (zt ) − i=1 ri βBT (zt−i ) to define the Box-Tiao residuals functional which ˆt ˜ we write βBT . [sent-259, score-0.192]
86 The detection rate naturally increases with the size of the anomaly, to the extent that the task becomes only a gap ˜ detection problem when σδ becomes closer to 0. [sent-293, score-0.142]
87 3 Parameter selection methodology and numerical results ˜ ˆ The BT functional βBT and its residuals βBT , the MAC function βmac , the one-class SVM focSVM and the p + 1th eigenfunction ep+1 are estimated on a set of 400 observations. [sent-297, score-0.149]
88 We set p through the rule that the p first directions must carry at least 98% of the total variance of Cn , that is p is the first n p integer such that i=1 gi > 0. [sent-298, score-0.068]
89 The BT and MAC functionals additionally require the use of a regularization term ǫn which we select by finding the best ridge regressor of φt+1 given φt through a 4-fold cross validation procedure on the training ˜ set. [sent-302, score-0.379]
90 For βBT , βBT , βmac and the kPCA functional ep+1 we use their respective empirical mean µ and variance σ on the training set to rescale and whiten their output on the test set, namely consider values (f (z) − µ)/σ. [sent-303, score-0.146]
91 Although more elaborate anomaly detection schemes on such unidimensional time-series might be considered, for the sake of simplicity we treat directly these raw outputs as alarm scores. [sent-304, score-0.357]
92 Having on the one hand the correct labels for anomalies and the scores for all detectors, we vary the threshold at which an alarm is raised to produce ROC curves. [sent-305, score-0.319]
93 4 Discussion In the experimental setting, anomalies can be characterized as unusual increments between two successive states of an otherwise smooth dynamical system. [sent-315, score-0.436]
94 Anomalies are unusual due to their size, controlled by σδ , and their directions, sampled in {−1, 1}4. [sent-316, score-0.063]
95 When the step σδ is relatively small, it is difficult to flag correctly an anomaly without taking into account the system’s dynamic as illustrated by the relatively poor performance of the ocSVM and the kPCA compared to the BT, BTres and MAC functions. [sent-317, score-0.18]
96 On the contrary, when σδ is big, anomalies can be more simply discriminated as big gaps. [sent-318, score-0.221]
97 We can hypothesize two reasons for this: first, white functionals may be less useful in such a regime that puts little emphasis on dynamics than a simple ocSVM with adequate kernel. [sent-320, score-0.516]
98 Second, in this study the BT and MAC functions flag anomalies whenever an evaluation goes outside of a certain bounding tube. [sent-321, score-0.221]
99 One-class novelty detection for seizure analysis from intracranial EEG. [sent-355, score-0.101]
100 Asymptotic normality for the empirical estimator of the autocorrelation operator of an ARH (1) process. [sent-432, score-0.321]
wordName wordTfidf (topN-words)
[('zt', 0.434), ('functionals', 0.379), ('bt', 0.362), ('anomalies', 0.221), ('autocorrelation', 0.189), ('mac', 0.177), ('cn', 0.168), ('ocsvm', 0.157), ('anomaly', 0.154), ('vn', 0.152), ('ep', 0.127), ('white', 0.108), ('criterions', 0.098), ('operator', 0.098), ('alarm', 0.098), ('kt', 0.092), ('kpca', 0.084), ('autoregressive', 0.084), ('arh', 0.079), ('autocovariance', 0.079), ('predictability', 0.079), ('hn', 0.076), ('dn', 0.074), ('detection', 0.071), ('functional', 0.069), ('rayleigh', 0.069), ('ag', 0.067), ('auc', 0.065), ('estimators', 0.064), ('unusual', 0.063), ('hilbert', 0.062), ('hp', 0.062), ('ztk', 0.059), ('kernel', 0.059), ('zi', 0.058), ('increments', 0.056), ('detect', 0.053), ('stationary', 0.052), ('residues', 0.052), ('write', 0.051), ('dynamical', 0.05), ('namely', 0.048), ('paragraph', 0.047), ('eigenvalue', 0.046), ('ei', 0.046), ('successive', 0.046), ('zj', 0.045), ('quotient', 0.044), ('eigenfunction', 0.044), ('rkhs', 0.044), ('tk', 0.042), ('stationarity', 0.042), ('xt', 0.041), ('observations', 0.04), ('vp', 0.04), ('eigenvector', 0.039), ('gi', 0.039), ('autocorrelations', 0.039), ('cointegrated', 0.039), ('cointegration', 0.039), ('orfe', 0.039), ('polluted', 0.039), ('tiao', 0.039), ('spaces', 0.039), ('species', 0.038), ('eigenfunctions', 0.037), ('low', 0.036), ('residuals', 0.036), ('span', 0.035), ('ci', 0.035), ('unidimensional', 0.034), ('normality', 0.034), ('quantify', 0.033), ('writing', 0.032), ('dc', 0.032), ('quotients', 0.032), ('abnormal', 0.032), ('processes', 0.031), ('outliers', 0.03), ('cients', 0.03), ('outlier', 0.03), ('notations', 0.03), ('novelty', 0.03), ('variance', 0.029), ('diag', 0.029), ('dynamics', 0.029), ('zn', 0.029), ('ax', 0.029), ('system', 0.029), ('usual', 0.028), ('var', 0.028), ('pca', 0.028), ('box', 0.027), ('asymptotic', 0.027), ('process', 0.027), ('equations', 0.027), ('reproducing', 0.026), ('lim', 0.026), ('absolute', 0.026), ('dynamic', 0.026)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000001 257 nips-2009-White Functionals for Anomaly Detection in Dynamical Systems
Author: Marco Cuturi, Jean-philippe Vert, Alexandre D'aspremont
Abstract: We propose new methodologies to detect anomalies in discrete-time processes taking values in a probability space. These methods are based on the inference of functionals whose evaluations on successive states visited by the process are stationary and have low autocorrelations. Deviations from this behavior are used to flag anomalies. The candidate functionals are estimated in a subspace of a reproducing kernel Hilbert space associated with the original probability space considered. We provide experimental results on simulated datasets which show that these techniques compare favorably with other algorithms.
2 0.2483559 22 nips-2009-Accelerated Gradient Methods for Stochastic Optimization and Online Learning
Author: Chonghai Hu, Weike Pan, James T. Kwok
Abstract: Regularized risk minimization often involves non-smooth optimization, either because of the loss function (e.g., hinge loss) or the regularizer (e.g., ℓ1 -regularizer). Gradient methods, though highly scalable and easy to implement, are known to converge slowly. In this paper, we develop a novel accelerated gradient method for stochastic optimization while still preserving their computational simplicity and scalability. The proposed algorithm, called SAGE (Stochastic Accelerated GradiEnt), exhibits fast convergence rates on stochastic composite optimization with convex or strongly convex objectives. Experimental results show that SAGE is faster than recent (sub)gradient methods including FOLOS, SMIDAS and SCD. Moreover, SAGE can also be extended for online learning, resulting in a simple algorithm but with the best regret bounds currently known for these problems. 1
3 0.13587742 34 nips-2009-Anomaly Detection with Score functions based on Nearest Neighbor Graphs
Author: Manqi Zhao, Venkatesh Saligrama
Abstract: We propose a novel non-parametric adaptive anomaly detection algorithm for high dimensional data based on score functions derived from nearest neighbor graphs on n-point nominal data. Anomalies are declared whenever the score of a test sample falls below α, which is supposed to be the desired false alarm level. The resulting anomaly detector is shown to be asymptotically optimal in that it is uniformly most powerful for the specified false alarm level, α, for the case when the anomaly density is a mixture of the nominal and a known density. Our algorithm is computationally efficient, being linear in dimension and quadratic in data size. It does not require choosing complicated tuning parameters or function approximation classes and it can adapt to local structure such as local change in dimensionality. We demonstrate the algorithm on both artificial and real data sets in high dimensional feature spaces.
4 0.13117281 217 nips-2009-Sharing Features among Dynamical Systems with Beta Processes
Author: Alan S. Willsky, Erik B. Sudderth, Michael I. Jordan, Emily B. Fox
Abstract: We propose a Bayesian nonparametric approach to the problem of modeling related time series. Using a beta process prior, our approach is based on the discovery of a set of latent dynamical behaviors that are shared among multiple time series. The size of the set and the sharing pattern are both inferred from data. We develop an efficient Markov chain Monte Carlo inference method that is based on the Indian buffet process representation of the predictive distribution of the beta process. In particular, our approach uses the sum-product algorithm to efficiently compute Metropolis-Hastings acceptance probabilities, and explores new dynamical behaviors via birth/death proposals. We validate our sampling algorithm using several synthetic datasets, and also demonstrate promising results on unsupervised segmentation of visual motion capture data.
5 0.10203169 94 nips-2009-Fast Learning from Non-i.i.d. Observations
Author: Ingo Steinwart, Andreas Christmann
Abstract: We prove an oracle inequality for generic regularized empirical risk minimization algorithms learning from α-mixing processes. To illustrate this oracle inequality, we use it to derive learning rates for some learning methods including least squares SVMs. Since the proof of the oracle inequality uses recent localization ideas developed for independent and identically distributed (i.i.d.) processes, it turns out that these learning rates are close to the optimal rates known in the i.i.d. case. 1
6 0.099735945 215 nips-2009-Sensitivity analysis in HMMs with application to likelihood maximization
7 0.094393827 213 nips-2009-Semi-supervised Learning using Sparse Eigenfunction Bases
8 0.085169397 168 nips-2009-Non-stationary continuous dynamic Bayesian networks
9 0.08214698 183 nips-2009-Optimal context separation of spiking haptic signals by second-order somatosensory neurons
10 0.080516346 3 nips-2009-AUC optimization and the two-sample problem
11 0.07816802 255 nips-2009-Variational Inference for the Nested Chinese Restaurant Process
12 0.072201774 212 nips-2009-Semi-Supervised Learning in Gigantic Image Collections
13 0.068117693 172 nips-2009-Nonparametric Bayesian Texture Learning and Synthesis
14 0.066703118 207 nips-2009-Robust Nonparametric Regression with Metric-Space Valued Output
15 0.066202268 238 nips-2009-Submanifold density estimation
16 0.065179564 220 nips-2009-Slow Learners are Fast
17 0.065103605 92 nips-2009-Fast Graph Laplacian Regularized Kernel Learning via Semidefinite–Quadratic–Linear Programming
18 0.060078923 64 nips-2009-Data-driven calibration of linear estimators with minimal penalties
19 0.058018405 8 nips-2009-A Fast, Consistent Kernel Two-Sample Test
20 0.057264622 182 nips-2009-Optimal Scoring for Unsupervised Learning
topicId topicWeight
[(0, -0.18), (1, 0.073), (2, 0.037), (3, 0.032), (4, 0.099), (5, -0.011), (6, 0.078), (7, 0.058), (8, -0.031), (9, -0.052), (10, 0.085), (11, 0.027), (12, -0.015), (13, -0.059), (14, -0.107), (15, -0.155), (16, -0.11), (17, -0.018), (18, 0.078), (19, 0.047), (20, 0.05), (21, 0.157), (22, -0.116), (23, -0.037), (24, -0.108), (25, 0.057), (26, -0.019), (27, 0.032), (28, 0.065), (29, 0.104), (30, 0.057), (31, 0.06), (32, -0.025), (33, 0.142), (34, 0.017), (35, -0.071), (36, 0.167), (37, -0.165), (38, 0.141), (39, -0.1), (40, -0.092), (41, -0.125), (42, -0.013), (43, -0.171), (44, 0.021), (45, -0.084), (46, -0.195), (47, -0.078), (48, 0.03), (49, 0.06)]
simIndex simValue paperId paperTitle
same-paper 1 0.95363647 257 nips-2009-White Functionals for Anomaly Detection in Dynamical Systems
Author: Marco Cuturi, Jean-philippe Vert, Alexandre D'aspremont
Abstract: We propose new methodologies to detect anomalies in discrete-time processes taking values in a probability space. These methods are based on the inference of functionals whose evaluations on successive states visited by the process are stationary and have low autocorrelations. Deviations from this behavior are used to flag anomalies. The candidate functionals are estimated in a subspace of a reproducing kernel Hilbert space associated with the original probability space considered. We provide experimental results on simulated datasets which show that these techniques compare favorably with other algorithms.
2 0.60654211 22 nips-2009-Accelerated Gradient Methods for Stochastic Optimization and Online Learning
Author: Chonghai Hu, Weike Pan, James T. Kwok
Abstract: Regularized risk minimization often involves non-smooth optimization, either because of the loss function (e.g., hinge loss) or the regularizer (e.g., ℓ1 -regularizer). Gradient methods, though highly scalable and easy to implement, are known to converge slowly. In this paper, we develop a novel accelerated gradient method for stochastic optimization while still preserving their computational simplicity and scalability. The proposed algorithm, called SAGE (Stochastic Accelerated GradiEnt), exhibits fast convergence rates on stochastic composite optimization with convex or strongly convex objectives. Experimental results show that SAGE is faster than recent (sub)gradient methods including FOLOS, SMIDAS and SCD. Moreover, SAGE can also be extended for online learning, resulting in a simple algorithm but with the best regret bounds currently known for these problems. 1
3 0.56150949 34 nips-2009-Anomaly Detection with Score functions based on Nearest Neighbor Graphs
Author: Manqi Zhao, Venkatesh Saligrama
Abstract: We propose a novel non-parametric adaptive anomaly detection algorithm for high dimensional data based on score functions derived from nearest neighbor graphs on n-point nominal data. Anomalies are declared whenever the score of a test sample falls below α, which is supposed to be the desired false alarm level. The resulting anomaly detector is shown to be asymptotically optimal in that it is uniformly most powerful for the specified false alarm level, α, for the case when the anomaly density is a mixture of the nominal and a known density. Our algorithm is computationally efficient, being linear in dimension and quadratic in data size. It does not require choosing complicated tuning parameters or function approximation classes and it can adapt to local structure such as local change in dimensionality. We demonstrate the algorithm on both artificial and real data sets in high dimensional feature spaces.
4 0.48809934 3 nips-2009-AUC optimization and the two-sample problem
Author: Nicolas Vayatis, Marine Depecker, Stéphan J. Clémençcon
Abstract: The purpose of the paper is to explore the connection between multivariate homogeneity tests and AUC optimization. The latter problem has recently received much attention in the statistical learning literature. From the elementary observation that, in the two-sample problem setup, the null assumption corresponds to the situation where the area under the optimal ROC curve is equal to 1/2, we propose a two-stage testing method based on data splitting. A nearly optimal scoring function in the AUC sense is first learnt from one of the two half-samples. Data from the remaining half-sample are then projected onto the real line and eventually ranked according to the scoring function computed at the first stage. The last step amounts to performing a standard Mann-Whitney Wilcoxon test in the onedimensional framework. We show that the learning step of the procedure does not affect the consistency of the test as well as its properties in terms of power, provided the ranking produced is accurate enough in the AUC sense. The results of a numerical experiment are eventually displayed in order to show the efficiency of the method. 1
5 0.43851441 94 nips-2009-Fast Learning from Non-i.i.d. Observations
Author: Ingo Steinwart, Andreas Christmann
Abstract: We prove an oracle inequality for generic regularized empirical risk minimization algorithms learning from α-mixing processes. To illustrate this oracle inequality, we use it to derive learning rates for some learning methods including least squares SVMs. Since the proof of the oracle inequality uses recent localization ideas developed for independent and identically distributed (i.i.d.) processes, it turns out that these learning rates are close to the optimal rates known in the i.i.d. case. 1
6 0.39697406 217 nips-2009-Sharing Features among Dynamical Systems with Beta Processes
7 0.38137928 172 nips-2009-Nonparametric Bayesian Texture Learning and Synthesis
8 0.33739072 238 nips-2009-Submanifold density estimation
9 0.32944003 215 nips-2009-Sensitivity analysis in HMMs with application to likelihood maximization
10 0.32687488 8 nips-2009-A Fast, Consistent Kernel Two-Sample Test
11 0.31495807 182 nips-2009-Optimal Scoring for Unsupervised Learning
12 0.31317377 92 nips-2009-Fast Graph Laplacian Regularized Kernel Learning via Semidefinite–Quadratic–Linear Programming
13 0.30821106 79 nips-2009-Efficient Recovery of Jointly Sparse Vectors
14 0.30070025 177 nips-2009-On Learning Rotations
15 0.2884379 10 nips-2009-A Gaussian Tree Approximation for Integer Least-Squares
16 0.28427097 213 nips-2009-Semi-supervised Learning using Sparse Eigenfunction Bases
17 0.28410369 183 nips-2009-Optimal context separation of spiking haptic signals by second-order somatosensory neurons
18 0.28180707 255 nips-2009-Variational Inference for the Nested Chinese Restaurant Process
19 0.27874735 33 nips-2009-Analysis of SVM with Indefinite Kernels
20 0.2767767 207 nips-2009-Robust Nonparametric Regression with Metric-Space Valued Output
topicId topicWeight
[(24, 0.026), (25, 0.049), (35, 0.065), (36, 0.082), (39, 0.027), (58, 0.525), (61, 0.021), (71, 0.039), (86, 0.058), (91, 0.027)]
simIndex simValue paperId paperTitle
1 0.98598826 43 nips-2009-Bayesian estimation of orientation preference maps
Author: Sebastian Gerwinn, Leonard White, Matthias Kaschube, Matthias Bethge, Jakob H. Macke
Abstract: Imaging techniques such as optical imaging of intrinsic signals, 2-photon calcium imaging and voltage sensitive dye imaging can be used to measure the functional organization of visual cortex across different spatial and temporal scales. Here, we present Bayesian methods based on Gaussian processes for extracting topographic maps from functional imaging data. In particular, we focus on the estimation of orientation preference maps (OPMs) from intrinsic signal imaging data. We model the underlying map as a bivariate Gaussian process, with a prior covariance function that reflects known properties of OPMs, and a noise covariance adjusted to the data. The posterior mean can be interpreted as an optimally smoothed estimate of the map, and can be used for model based interpolations of the map from sparse measurements. By sampling from the posterior distribution, we can get error bars on statistical properties such as preferred orientations, pinwheel locations or pinwheel counts. Finally, the use of an explicit probabilistic model facilitates interpretation of parameters and quantitative model comparisons. We demonstrate our model both on simulated data and on intrinsic signaling data from ferret visual cortex. 1
Author: John Wright, Arvind Ganesh, Shankar Rao, Yigang Peng, Yi Ma
Abstract: Principal component analysis is a fundamental operation in computational data analysis, with myriad applications ranging from web search to bioinformatics to computer vision and image analysis. However, its performance and applicability in real scenarios are limited by a lack of robustness to outlying or corrupted observations. This paper considers the idealized “robust principal component analysis” problem of recovering a low rank matrix A from corrupted observations D = A + E. Here, the corrupted entries E are unknown and the errors can be arbitrarily large (modeling grossly corrupted observations common in visual and bioinformatic data), but are assumed to be sparse. We prove that most matrices A can be efficiently and exactly recovered from most error sign-and-support patterns by solving a simple convex program, for which we give a fast and provably convergent algorithm. Our result holds even when the rank of A grows nearly proportionally (up to a logarithmic factor) to the dimensionality of the observation space and the number of errors E grows in proportion to the total number of entries in the matrix. A by-product of our analysis is the first proportional growth results for the related problem of completing a low-rank matrix from a small fraction of its entries. Simulations and real-data examples corroborate the theoretical results, and suggest potential applications in computer vision. 1
same-paper 3 0.97101003 257 nips-2009-White Functionals for Anomaly Detection in Dynamical Systems
Author: Marco Cuturi, Jean-philippe Vert, Alexandre D'aspremont
Abstract: We propose new methodologies to detect anomalies in discrete-time processes taking values in a probability space. These methods are based on the inference of functionals whose evaluations on successive states visited by the process are stationary and have low autocorrelations. Deviations from this behavior are used to flag anomalies. The candidate functionals are estimated in a subspace of a reproducing kernel Hilbert space associated with the original probability space considered. We provide experimental results on simulated datasets which show that these techniques compare favorably with other algorithms.
4 0.96241719 67 nips-2009-Directed Regression
Author: Yi-hao Kao, Benjamin V. Roy, Xiang Yan
Abstract: When used to guide decisions, linear regression analysis typically involves estimation of regression coefficients via ordinary least squares and their subsequent use to make decisions. When there are multiple response variables and features do not perfectly capture their relationships, it is beneficial to account for the decision objective when computing regression coefficients. Empirical optimization does so but sacrifices performance when features are well-chosen or training data are insufficient. We propose directed regression, an efficient algorithm that combines merits of ordinary least squares and empirical optimization. We demonstrate through a computational study that directed regression can generate significant performance gains over either alternative. We also develop a theory that motivates the algorithm. 1
5 0.95790488 223 nips-2009-Sparse Metric Learning via Smooth Optimization
Author: Yiming Ying, Kaizhu Huang, Colin Campbell
Abstract: In this paper we study the problem of learning a low-rank (sparse) distance matrix. We propose a novel metric learning model which can simultaneously conduct dimension reduction and learn a distance matrix. The sparse representation involves a mixed-norm regularization which is non-convex. We then show that it can be equivalently formulated as a convex saddle (min-max) problem. From this saddle representation, we develop an efficient smooth optimization approach [17] for sparse metric learning, although the learning model is based on a nondifferentiable loss function. Finally, we run experiments to validate the effectiveness and efficiency of our sparse metric learning model on various datasets.
6 0.94373626 81 nips-2009-Ensemble Nystrom Method
7 0.78819221 36 nips-2009-Asymptotic Analysis of MAP Estimation via the Replica Method and Compressed Sensing
8 0.77881122 147 nips-2009-Matrix Completion from Noisy Entries
9 0.76236427 177 nips-2009-On Learning Rotations
10 0.74415225 254 nips-2009-Variational Gaussian-process factor analysis for modeling spatio-temporal data
11 0.73955166 243 nips-2009-The Ordered Residual Kernel for Robust Motion Subspace Clustering
12 0.72492403 62 nips-2009-Correlation Coefficients are Insufficient for Analyzing Spike Count Dependencies
13 0.71653086 224 nips-2009-Sparse and Locally Constant Gaussian Graphical Models
14 0.71600968 228 nips-2009-Speeding up Magnetic Resonance Image Acquisition by Bayesian Multi-Slice Adaptive Compressed Sensing
15 0.71229345 61 nips-2009-Convex Relaxation of Mixture Regression with Efficient Algorithms
16 0.7073034 22 nips-2009-Accelerated Gradient Methods for Stochastic Optimization and Online Learning
17 0.70566553 20 nips-2009-A unified framework for high-dimensional analysis of $M$-estimators with decomposable regularizers
18 0.70563316 163 nips-2009-Neurometric function analysis of population codes
19 0.70417595 35 nips-2009-Approximating MAP by Compensating for Structural Relaxations
20 0.70305747 202 nips-2009-Regularized Distance Metric Learning:Theory and Algorithm