nips nips2004 nips2004-142 nips2004-142-reference knowledge-graph by maker-knowledge-mining

142 nips-2004-Outlier Detection with One-class Kernel Fisher Discriminants


Source: pdf

Author: Volker Roth

Abstract: The problem of detecting “atypical objects” or “outliers” is one of the classical topics in (robust) statistics. Recently, it has been proposed to address this problem by means of one-class SVM classifiers. The main conceptual shortcoming of most one-class approaches, however, is that in a strict sense they are unable to detect outliers, since the expected fraction of outliers has to be specified in advance. The method presented in this paper overcomes this problem by relating kernelized one-class classification to Gaussian density estimation in the induced feature space. Having established this relation, it is possible to identify “atypical objects” by quantifying their deviations from the Gaussian model. For RBF kernels it is shown that the Gaussian model is “rich enough” in the sense that it asymptotically provides an unbiased estimator for the true density. In order to overcome the inherent model selection problem, a cross-validated likelihood criterion for selecting all free model parameters is applied. 1


reference text

[1] R. Duda, P. Hart, and D. Stork. Pattern Classification. Wiley & Sons, 2001.

[2] J. Fox. Applied Regression, Linear Models, and Related Methods. Sage, 1997.

[3] T. Hastie, A. Buja, and R. Tibshirani. Penalized discriminant analysis. Annals of Statistics, 23:73–102, 1995.

[4] P.J. Huber. Robust Statistics. Wiley, 1981.

[5] M. Kendall and A. Stuart. The Advanced Theory of Statistics, volume 1. McMillan, 1977.

[6] G.P. Lepage. Vegas: An adaptive multidimensional integration program. Technical Report CLNS-80/447, Cornell University, 1980.

[7] S. Mika, G. R¨ tsch, J. Weston, B. Sch¨ lkopf, and K.-R. M¨ ller. Fisher discriminant analysis a o u with kernels. In Y.-H. Hu, J. Larsen, E. Wilson, and S. Douglas, editors, Neural Networks for Signal Processing IX, pages 41–48. IEEE, 1999.

[8] J. Moody. The effective number of parameters: An analysis of generalisation and regularisation in nonlinear learning systems. In J. Moody, S. Hanson, and R. Lippmann, editors, NIPS 4, 1992.

[9] W.H. Press, S.A. Teukolsky, W.T. Vetterling, and B.P. Flannery. Numerical Recipies in C. Cambridge University Press, 1992.

[10] V. Roth and V. Steinhage. Nonlinear discriminant analysis using kernel functions. In S.A. Solla, T.K. Leen, and K.-R. M¨ ller, editors, NIPS 12, pages 568–574. MIT Press, 2000. u ¨

[11] B. Sch¨ lkopf, S. Mika, C. Burges, P. Knirsch, K.-R. Muller, G. R¨ tsch, and A. Smola. Input o a space vs. feature space in kernel-based methods. IEEE Trans. Neural Networks, 10(5), 1999.

[12] B. Sch¨ lkopf, R.C. Williamson, A. Smola, and J. Shawe-Taylor. SV estimation of a distribuo ¨ tion’s support. In S. Solla, T. Leen, and K.-R. Muller, editors, NIPS 12, pages 582–588. 2000.

[13] M.J. van der Laan, S. Dudoit, and S. Keles. Asymptotic optimality of likelihood-based crossvalidation. Statistical Applications in Genetics and Molecular Biology, 3(1), 2004.