nips nips2004 nips2004-142 knowledge-graph by maker-knowledge-mining

142 nips-2004-Outlier Detection with One-class Kernel Fisher Discriminants


Source: pdf

Author: Volker Roth

Abstract: The problem of detecting “atypical objects” or “outliers” is one of the classical topics in (robust) statistics. Recently, it has been proposed to address this problem by means of one-class SVM classifiers. The main conceptual shortcoming of most one-class approaches, however, is that in a strict sense they are unable to detect outliers, since the expected fraction of outliers has to be specified in advance. The method presented in this paper overcomes this problem by relating kernelized one-class classification to Gaussian density estimation in the induced feature space. Having established this relation, it is possible to identify “atypical objects” by quantifying their deviations from the Gaussian model. For RBF kernels it is shown that the Gaussian model is “rich enough” in the sense that it asymptotically provides an unbiased estimator for the true density. In order to overcome the inherent model selection problem, a cross-validated likelihood criterion for selecting all free model parameters is applied. 1

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 The main conceptual shortcoming of most one-class approaches, however, is that in a strict sense they are unable to detect outliers, since the expected fraction of outliers has to be specified in advance. [sent-5, score-0.422]

2 The method presented in this paper overcomes this problem by relating kernelized one-class classification to Gaussian density estimation in the induced feature space. [sent-6, score-0.319]

3 For RBF kernels it is shown that the Gaussian model is “rich enough” in the sense that it asymptotically provides an unbiased estimator for the true density. [sent-8, score-0.262]

4 In order to overcome the inherent model selection problem, a cross-validated likelihood criterion for selecting all free model parameters is applied. [sent-9, score-0.332]

5 1 Introduction A one-class-classifier attempts to find a separating boundary between a data set and the rest of the feature space. [sent-10, score-0.206]

6 A natural application of such a classifier is estimating a contour line of the underlying data density for a certain quantile value. [sent-11, score-0.434]

7 Such contour lines may be used to separate “typical” objects from “atypical” ones. [sent-12, score-0.208]

8 Thus, a useful application scenario would be to find a boundary which separates the jointly distributed objects from the outliers. [sent-14, score-0.2]

9 Usually no labeled samples from the outlier class are available at all, and it is even unknown if there are any outliers present. [sent-16, score-0.37]

10 The fundamental problem of the one-class approach lies in the fact that outlier detection is a (partially) unsupervised task which has been “squeezed” into a classification framework. [sent-20, score-0.167]

11 This paper aims at overcoming this problem by linking kernel-based one-class classifiers to Gaussian density estimation in the induced feature space. [sent-22, score-0.319]

12 The main technical ingredient of our method is the one-class kernel Fisher discriminant classifier (OC-KFD), for which the relation to Gaussian density estimation is shown. [sent-26, score-0.431]

13 From the classification side, the OC-KFD-based model inherits the simple complexity control mechanism by using regularization techniques. [sent-27, score-0.123]

14 The explicit relation to Gaussian density estimation, on the other hand, makes it possible to formalize the notion of atypical objects by observing deviations from the Gaussian model. [sent-28, score-0.766]

15 It is clear that these deviations will heavily depend on the chosen model parameters. [sent-29, score-0.124]

16 In order to derive an objective characterization of atypical objects it is, thus, necessary to select a suitable model in advance. [sent-30, score-0.516]

17 2 Gaussian density estimation and one-class LDA Let X denote the n × d data matrix which contains the n input vectors xi ∈ Rd as rows. [sent-32, score-0.261]

18 It has been proposed to estimate a one-class decision boundary by separating the dataset from the origin [12], which effectively coincides with replicating all xi with opposite sign and separating X and −X. [sent-33, score-0.246]

19 Typically, a ν-SVM classifier with RBF kernel function is used. [sent-34, score-0.13]

20 The parameter ν bounds the expected number of outliers and must be selected a priori. [sent-35, score-0.292]

21 The latter has the advantage that is is closely related to Gaussian density estimation in the induced feature space. [sent-38, score-0.319]

22 By making this relation explicit, outliers can be identified without specifying the expected fraction of outliers in advance. [sent-39, score-0.679]

23 We start with a linear discriminant analysis (LDA) model, and then kernels will be introduced. [sent-40, score-0.112]

24 Without loss of generality we assume that the sample mean µ+ ≡ i xi > 0, so that the sample means of the positive data and the negative data differ: µ+ = µ− . [sent-42, score-0.161]

25 Such a ridge regression model assumes a penalized total covariance of the form T = (1/(2n)) · X X + γI = ˆ (1/n) · X X + γI. [sent-57, score-0.139]

26 If the transformation is carried out implicitly by introducing a Mercer kernel k(xi , xj ), we arrive at an equivalent problem in terms of the kernel matrix K = ΦΦ and the expansion coefficients α: α = (K + γI)−1 y. [sent-66, score-0.316]

27 ˆ (3) From [11] it follows that the mapped vectors can be represented in Rn as φ(x) = K −1/2 k(x), where k(x) denotes the kernel vector k(x) = (k(x, x1 ), . [sent-67, score-0.166]

28 ˆ ˆ ˆ Equation (4) establishes the desired link between OC-KFD and Gaussian density estimation, since for our outlier detection mechanism only Mahalanobis distances are needed. [sent-72, score-0.465]

29 While it seems to be rather complicated to estimate a density by the above procedure, the main benefit over directly estimating the mean and the covariance lies in the inherent complexity regulation properties of ridge regression. [sent-73, score-0.275]

30 Such a complexity control mechanism is of particular importance in highly nonlinear kernel models. [sent-74, score-0.168]

31 Moreover, for ridge-regression models it is possible to analytically calculate the effective degrees of freedom, a quantity that will be of particular interest when it comes to detecting outliers. [sent-75, score-0.279]

32 3 Detecting outliers Let us assume that the model is completely specified, i. [sent-76, score-0.297]

33 both the kernel function k(·, ·) and the regularization parameter γ are fixed. [sent-78, score-0.13]

34 The central lemma that helps us to detect outliers can be found in most statistical textbooks: Lemma 1. [sent-79, score-0.332]

35 Then ∆ ≡ (X − µ) Σ−1 (X − µ) follows a chi-square (χ2 ) distribution on d degrees of freedom. [sent-81, score-0.128]

36 For the penalized regression models, it might be more appropriate to use the effective degrees of freedom df instead of d in the above lemma. [sent-82, score-0.697]

37 In the case of one-class LDA with ridge penalties we can easily estimate it as df = trace(X(X X + γI)−1 X ), [8], which for a kernel model translates into df = trace(K(K + γI)−1 ). [sent-83, score-0.915]

38 The intuitive interpretation of the quantity df is the following: denoting by V the matrix of eigenvectors of K and by {λi }n the corresponding eigenvalues, the fitted values y read ˆ i=1 y = V diag {δi = λi /(λi + γ)} V y. [sent-84, score-0.403]

39 If the ordered eigenvalues decrease rapidly, however, the values δi are either close to zero or close to one, and df determines the number of terms that are “essentially different” from zero. [sent-86, score-0.343]

40 i From lemma 1 we conclude that if the data are well described by a Gaussian model in the kernel feature space, the observed Mahalanobis distances should look like a sample from a χ2 -distribution with df degrees of freedom. [sent-90, score-0.971]

41 A graphical way to test this hypothesis is to plot the observed quantiles against the theoretical χ2 quantiles, which in the ideal case gives a straight line. [sent-91, score-0.203]

42 Such a quantile-quantile plot is constructed as follows: Let ∆(i) denote the observed Mahalanobis distances ordered from lowest to highest, and p i the cumulative proportion before each ∆(i) given by pi = (i − 1/2)/n. [sent-92, score-0.248]

43 Let further zi = F −1 pi denote the theoretical quantile at position pi , where F is the cumulative χ2 -distribution function. [sent-93, score-0.293]

44 Deviations from linearity can be formalized by fitting a linear model on the observed quantiles and calculating confidence intervals around the fit. [sent-95, score-0.267]

45 A potential problem of this approach is that the outliers themselves heavily influence the quantile-quantile fit. [sent-97, score-0.255]

46 For estimating confidence intervals around the fit we use the standard formula (see [2, 5]) σ(∆(i) ) = b · (χ2 (zi ))−1 (pi (1 − pi ))/n, (7) which can be intuitively understood as the product of the slope b and the standard error of the quantiles. [sent-102, score-0.19]

47 A 100(1 − ε)% envelope around the fit is then defined as ∆(i) ± zε/2 σ(∆(i) ) where zε/2 is the 1 − (1 − ε)/2 quantile of the standard normal distribution. [sent-103, score-0.253]

48 The choice of the confidence level ε is somewhat arbitrary, and from a conceptual viewpoint one might even argue that the problem of specifying one free parameter (i. [sent-104, score-0.129]

49 the expected fraction of outliers) has simply been transferred into the problem of specifying another one. [sent-106, score-0.132]

50 In practice, however, selecting ε is a much more intuitive procedure than guessing the fraction of outliers. [sent-107, score-0.107]

51 4 Model selection In our model the data are first mapped into some feature space, in which then a Gaussian model is fitted. [sent-110, score-0.239]

52 Mahalanobis distances to the mean of this Gaussian are computed by evaluating (4). [sent-111, score-0.123]

53 The feature space mapping is implicitly defined by the kernel function, for which we assume that it is parametrized by a kernel parameter σ. [sent-112, score-0.45]

54 For selecting all free parameters in (4), we are, thus, left with the problem of selecting θ = (σ, γ) . [sent-113, score-0.156]

55 For kernels that map into a space with dimension p > n, however, two problems arise: (i) the subspace spanned by the mapped samples varies with different sample sizes; (ii) not the whole feature space is accessible for vectors in the input space. [sent-116, score-0.202]

56 As a consequence, it is difficult to find a “proper” normalization of the Gaussian density in the induced feature space. [sent-117, score-0.314]

57 We propose to avoid this problem by considering the likelihood in the input space rather than in the feature space, i. [sent-118, score-0.151]

58 we are looking for a properly normalized density model p(x|·) in Rd such that p(x|·) has the same contour lines as the Gaussian model in the feature space: p(xi |·) = p(xi |·) ⇔ p(φ(xi )|·) = p(φ(xj )|·). [sent-120, score-0.383]

59 Note that this density model in the input space has the same form as our Gaussian model in the feature space, except for the different normalization constant Z. [sent-122, score-0.352]

60 Computing this constant Z requires us to solve a normalization integral over the whole d-dimensional input space. [sent-123, score-0.094]

61 Since in general this integral is not analytically tractable for nonlinear kernel models, we propose to approximate Z by a Monte Carlo sampling method. [sent-124, score-0.176]

62 By using the CV likelihood framework we are guaranteed to (asymptotically) perform as well as the best model in the parametrized family. [sent-126, score-0.217]

63 Thus, the question arises whether the family of densities defined by a Gaussian model in a kernel-induced feature space is “rich enough” such that no systematic errors occur. [sent-127, score-0.125]

64 As σ → 0 , pn (x|Xn , θ) converges to n 1 a Parzen window with vanishing kernel width: pn (x|Xn , θ) → n i=1 δ(x − xi ). [sent-131, score-0.451]

65 The latter, however, is “rich enough” in the sense that it contains models which in the limit σ → 0 converge to an unbiased estimator for every continuous p(x). [sent-138, score-0.125]

66 Since contour lines of pn (x) are contour lines of a Gaussian model in the feature space, the Mahalanobis distances are expected to follow a χ2 distribution, and atypical objects can be detected by observing the distribution of the empirical Mahalanobis distances as described in the last section. [sent-139, score-1.181]

67 This is actually the case, since there exist decay rates for the kernel width σ such that n grows at a higher rate as the effective degrees of freedom df : Lemma 3. [sent-141, score-0.834]

68 Let k(xi , xj ) = exp(− xi − xj 2 /σ) and pn (x|Xn , σ, γ) defined by (8). [sent-142, score-0.288]

69 If σ ≤ 1 decays like O(n−1/2 ), and for fixed γ ≤ 1, the ratio df /n → 0 as n → ∞. [sent-143, score-0.343]

70 5 Experiments The performance of the proposed method is demonstrated for an outlier detection task in the field of face recognition. [sent-146, score-0.167]

71 html) contains ten different images of each of 40 distinct subjects, taken under different lighting conditions and at different facial expressions and facial details (glasses / no glasses). [sent-152, score-0.135]

72 All the images are taken against a homogeneous background with the subjects in an upright, frontal position. [sent-154, score-0.104]

73 In this experiment we additionally corrupted the dataset by including two images in which we have artificially changed normal glasses to “sunglasses” as can be seen in figure 1. [sent-155, score-0.183]

74 The goal is to demonstrate that the proposed method is able to identify these two atypical images without any problem-dependent prior assumptions. [sent-156, score-0.404]

75 Figure 1: Original and corrupted images with in-painted “sunglasses”. [sent-157, score-0.114]

76 Each of the 402 images is characterized by a 10-dimensional vector which contains the projections onto the leading 10 eigenfaces (eigenfaces are simply the eigenvectors of the images treated as pixel-wise vectorial objects). [sent-158, score-0.156]

77 These vectors are feed into a RBF kernel of the form k(xi , xj ) = exp(− xi − xj 2 /σ). [sent-159, score-0.313]

78 A simple 2-fold cross validation scheme is used: the dataset is randomly split into a training set and a test set of equal size, the model is build from the training set (including the numerical solution of the normalization integral), and finally the likelihood is evaluated on the test set. [sent-161, score-0.158]

79 Both the test likelihood and the corresponding model complexity measured in terms of the effective degrees of freedom (df ) are plotted in figure 2. [sent-164, score-0.424]

80 The df -curve, however, shows a similar plateau, indicating that all these models have comparable complexity. [sent-166, score-0.343]

81 This suggestion is indeed confirmed by the results in figure 2, where we compared the quantile-quantile plot for the maximum likelihood parameter value with that of a slightly suboptimal model. [sent-168, score-0.134]

82 Both quantile plots look very similar, and in both cases two objects clearly fall outside a 99% envelope around the linear fit. [sent-169, score-0.382]

83 Outside the plateau (no figure due to space limitations) the number of objects considered as outlies drastically increases in overfitting regime (σ too small), or decreases to zero in the underfitting regime (σ too large). [sent-170, score-0.29]

84 In figure 3 again the quantile plot for the most likely model is depicted. [sent-171, score-0.283]

85 This time, however, both objects identified as outliers are related to the corresponding original images, which in fact are the artificially corrupted ones. [sent-172, score-0.439]

86 Both the the effective degrees of freedom df = i λi /(λi + γ) and the Mahalanobis distances in eq. [sent-176, score-0.78]

87 The dotted line shows the corresponding effective degrees of freedom (df ). [sent-178, score-0.314]

88 Left + right panels: quantile plot for optimal model (left) and slightly suboptimal model (right). [sent-179, score-0.325]

89 99% 5 10 15 χ 2 quantiles 20 25 30 Figure 3: Quantile plot with linear fit (solid) and envelopes (99% and 99. [sent-182, score-0.203]

90 6 Conclusion Detecting outliers by way of one-class classifiers aims at finding a boundary that separates “typical” objects in a data sample from the “atypical” ones. [sent-190, score-0.5]

91 For the purpose of outlier detection, however, the availability of such prior information seems to be an unrealistic (or even contradictory) assumption. [sent-192, score-0.115]

92 The method proposed in this paper overcomes this shortcoming by using a one-class KFD classifier which is directly related to Gaussian density estimation in the induced feature space. [sent-193, score-0.357]

93 The model benefits from both the built-in classification method and the explicit parametric density model: from the former it inherits the simple complexity regulation mechanism based on only two tuning parameters. [sent-194, score-0.298]

94 Moreover, within the classification framework it is possible to quantify the model complexity in terms of the effective degrees of freedom df . [sent-195, score-0.699]

95 Since the density model is parametrized by both the kernel function and the regularization constant, it is necessary to select these free parameters before the outlier detection phase. [sent-197, score-0.635]

96 This parameter selection is achieved by observing the cross-validated likelihood for different parameter values, and choosing those parameters which maximize this quantity. [sent-198, score-0.14]

97 The theoretical motivation for this selection process follows from [13] where it has been shown that the cross-validation selector asymptotically performs as well as the so called benchmark selector which selects the best model contained in the parametrized family of models. [sent-199, score-0.472]

98 Moreover, for RBF kernels it is shown in lemma 2 that the corresponding model family is “rich enough” in the sense that it contains an unbiased estimator for the true density (as long as it is continuous) in the limit of vanishing kernel width. [sent-200, score-0.589]

99 Lemma 3 shows that there exist decay rates for the kernel width such that the ratio of effective degrees of freedom and sample size approaches zero. [sent-201, score-0.536]

100 The experiment on detecting persons wearing sunglasses within a collection of rather heterogeneous face images effectively demonstrates that the proposed method is able to detect atypical objects without prior assumptions on the expected number of outliers. [sent-202, score-0.742]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('atypical', 0.345), ('df', 0.343), ('outliers', 0.255), ('mahalanobis', 0.238), ('quantile', 0.175), ('quantiles', 0.137), ('density', 0.137), ('kernel', 0.13), ('objects', 0.129), ('degrees', 0.128), ('distances', 0.123), ('freedom', 0.121), ('selector', 0.115), ('vegas', 0.115), ('outlier', 0.115), ('parametrized', 0.107), ('pn', 0.105), ('lda', 0.095), ('gaussian', 0.093), ('xn', 0.087), ('detecting', 0.086), ('sunglasses', 0.086), ('rbf', 0.085), ('feature', 0.083), ('deviations', 0.082), ('contour', 0.079), ('tting', 0.077), ('lemma', 0.077), ('discriminant', 0.074), ('xi', 0.071), ('boundary', 0.071), ('glasses', 0.069), ('plateau', 0.069), ('likelihood', 0.068), ('plot', 0.066), ('effective', 0.065), ('unbiased', 0.063), ('dence', 0.063), ('estimator', 0.062), ('denoting', 0.06), ('pi', 0.059), ('images', 0.059), ('unpenalized', 0.058), ('cv', 0.057), ('ridge', 0.057), ('asymptotically', 0.057), ('xj', 0.056), ('rich', 0.055), ('fraction', 0.055), ('corrupted', 0.055), ('fisher', 0.054), ('classi', 0.054), ('estimation', 0.053), ('free', 0.052), ('detection', 0.052), ('selecting', 0.052), ('separating', 0.052), ('intervals', 0.05), ('kfd', 0.05), ('formal', 0.049), ('normalization', 0.048), ('width', 0.047), ('regime', 0.046), ('induced', 0.046), ('muller', 0.046), ('parzen', 0.046), ('hardly', 0.046), ('ingredients', 0.046), ('integral', 0.046), ('sample', 0.045), ('subjects', 0.045), ('estimating', 0.043), ('solla', 0.043), ('zurich', 0.043), ('inherits', 0.043), ('orthogonal', 0.042), ('model', 0.042), ('penalized', 0.04), ('specifying', 0.04), ('envelope', 0.04), ('unexpected', 0.04), ('vanishing', 0.04), ('overcome', 0.04), ('shortcoming', 0.038), ('eigenfaces', 0.038), ('mika', 0.038), ('facial', 0.038), ('regulation', 0.038), ('kernels', 0.038), ('mechanism', 0.038), ('around', 0.038), ('relation', 0.037), ('er', 0.037), ('expected', 0.037), ('gure', 0.037), ('cially', 0.037), ('conceptual', 0.037), ('mapped', 0.036), ('selection', 0.036), ('observing', 0.036)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000002 142 nips-2004-Outlier Detection with One-class Kernel Fisher Discriminants

Author: Volker Roth

Abstract: The problem of detecting “atypical objects” or “outliers” is one of the classical topics in (robust) statistics. Recently, it has been proposed to address this problem by means of one-class SVM classifiers. The main conceptual shortcoming of most one-class approaches, however, is that in a strict sense they are unable to detect outliers, since the expected fraction of outliers has to be specified in advance. The method presented in this paper overcomes this problem by relating kernelized one-class classification to Gaussian density estimation in the induced feature space. Having established this relation, it is possible to identify “atypical objects” by quantifying their deviations from the Gaussian model. For RBF kernels it is shown that the Gaussian model is “rich enough” in the sense that it asymptotically provides an unbiased estimator for the true density. In order to overcome the inherent model selection problem, a cross-validated likelihood criterion for selecting all free model parameters is applied. 1

2 0.15272905 92 nips-2004-Kernel Methods for Implicit Surface Modeling

Author: Joachim Giesen, Simon Spalinger, Bernhard Schölkopf

Abstract: We describe methods for computing an implicit model of a hypersurface that is given only by a finite sampling. The methods work by mapping the sample points into a reproducing kernel Hilbert space and then determining regions in terms of hyperplanes. 1

3 0.13683605 168 nips-2004-Semigroup Kernels on Finite Sets

Author: Marco Cuturi, Jean-philippe Vert

Abstract: Complex objects can often be conveniently represented by finite sets of simpler components, such as images by sets of patches or texts by bags of words. We study the class of positive definite (p.d.) kernels for two such objects that can be expressed as a function of the merger of their respective sets of components. We prove a general integral representation of such kernels and present two particular examples. One of them leads to a kernel for sets of points living in a space endowed itself with a positive definite kernel. We provide experimental results on a benchmark experiment of handwritten digits image classification which illustrate the validity of the approach. 1

4 0.13482465 188 nips-2004-The Laplacian PDF Distance: A Cost Function for Clustering in a Kernel Feature Space

Author: Robert Jenssen, Deniz Erdogmus, Jose Principe, Torbjørn Eltoft

Abstract: A new distance measure between probability density functions (pdfs) is introduced, which we refer to as the Laplacian pdf distance. The Laplacian pdf distance exhibits a remarkable connection to Mercer kernel based learning theory via the Parzen window technique for density estimation. In a kernel feature space defined by the eigenspectrum of the Laplacian data matrix, this pdf distance is shown to measure the cosine of the angle between cluster mean vectors. The Laplacian data matrix, and hence its eigenspectrum, can be obtained automatically based on the data at hand, by optimal Parzen window selection. We show that the Laplacian pdf distance has an interesting interpretation as a risk function connected to the probability of error. 1

5 0.12399817 59 nips-2004-Efficient Kernel Discriminant Analysis via QR Decomposition

Author: Tao Xiong, Jieping Ye, Qi Li, Ravi Janardan, Vladimir Cherkassky

Abstract: Linear Discriminant Analysis (LDA) is a well-known method for feature extraction and dimension reduction. It has been used widely in many applications such as face recognition. Recently, a novel LDA algorithm based on QR Decomposition, namely LDA/QR, has been proposed, which is competitive in terms of classification accuracy with other LDA algorithms, but it has much lower costs in time and space. However, LDA/QR is based on linear projection, which may not be suitable for data with nonlinear structure. This paper first proposes an algorithm called KDA/QR, which extends the LDA/QR algorithm to deal with nonlinear data by using the kernel operator. Then an efficient approximation of KDA/QR called AKDA/QR is proposed. Experiments on face image data show that the classification accuracy of both KDA/QR and AKDA/QR are competitive with Generalized Discriminant Analysis (GDA), a general kernel discriminant analysis algorithm, while AKDA/QR has much lower time and space costs. 1

6 0.11907038 133 nips-2004-Nonparametric Transforms of Graph Kernels for Semi-Supervised Learning

7 0.10775658 134 nips-2004-Object Classification from a Single Example Utilizing Class Relevance Metrics

8 0.10729741 178 nips-2004-Support Vector Classification with Input Data Uncertainty

9 0.10435312 197 nips-2004-Two-Dimensional Linear Discriminant Analysis

10 0.096849382 93 nips-2004-Kernel Projection Machine: a New Tool for Pattern Recognition

11 0.095792957 127 nips-2004-Neighbourhood Components Analysis

12 0.095349103 98 nips-2004-Learning Gaussian Process Kernels via Hierarchical Bayes

13 0.093552627 187 nips-2004-The Entire Regularization Path for the Support Vector Machine

14 0.093218304 9 nips-2004-A Method for Inferring Label Sampling Mechanisms in Semi-Supervised Learning

15 0.089261323 105 nips-2004-Log-concavity Results on Gaussian Process Methods for Supervised and Unsupervised Learning

16 0.088203095 62 nips-2004-Euclidean Embedding of Co-Occurrence Data

17 0.087175943 164 nips-2004-Semi-supervised Learning by Entropy Minimization

18 0.087027572 145 nips-2004-Parametric Embedding for Class Visualization

19 0.085799545 125 nips-2004-Multiple Relational Embedding

20 0.085737087 3 nips-2004-A Feature Selection Algorithm Based on the Global Minimization of a Generalization Error Bound


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.297), (1, 0.105), (2, -0.093), (3, 0.043), (4, -0.008), (5, 0.018), (6, -0.078), (7, -0.112), (8, 0.091), (9, 0.029), (10, 0.011), (11, -0.01), (12, 0.046), (13, -0.024), (14, -0.009), (15, -0.008), (16, -0.069), (17, -0.014), (18, 0.075), (19, 0.069), (20, -0.083), (21, -0.027), (22, 0.106), (23, 0.086), (24, 0.043), (25, -0.093), (26, 0.016), (27, 0.056), (28, -0.098), (29, 0.113), (30, -0.076), (31, -0.052), (32, -0.039), (33, -0.021), (34, -0.047), (35, -0.004), (36, 0.078), (37, -0.042), (38, -0.06), (39, 0.025), (40, -0.02), (41, -0.048), (42, 0.044), (43, 0.076), (44, 0.016), (45, 0.079), (46, -0.006), (47, -0.04), (48, 0.181), (49, 0.017)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.94941622 142 nips-2004-Outlier Detection with One-class Kernel Fisher Discriminants

Author: Volker Roth

Abstract: The problem of detecting “atypical objects” or “outliers” is one of the classical topics in (robust) statistics. Recently, it has been proposed to address this problem by means of one-class SVM classifiers. The main conceptual shortcoming of most one-class approaches, however, is that in a strict sense they are unable to detect outliers, since the expected fraction of outliers has to be specified in advance. The method presented in this paper overcomes this problem by relating kernelized one-class classification to Gaussian density estimation in the induced feature space. Having established this relation, it is possible to identify “atypical objects” by quantifying their deviations from the Gaussian model. For RBF kernels it is shown that the Gaussian model is “rich enough” in the sense that it asymptotically provides an unbiased estimator for the true density. In order to overcome the inherent model selection problem, a cross-validated likelihood criterion for selecting all free model parameters is applied. 1

2 0.65773952 127 nips-2004-Neighbourhood Components Analysis

Author: Jacob Goldberger, Geoffrey E. Hinton, Sam T. Roweis, Ruslan Salakhutdinov

Abstract: In this paper we propose a novel method for learning a Mahalanobis distance measure to be used in the KNN classification algorithm. The algorithm directly maximizes a stochastic variant of the leave-one-out KNN score on the training set. It can also learn a low-dimensional linear embedding of labeled data that can be used for data visualization and fast classification. Unlike other methods, our classification model is non-parametric, making no assumptions about the shape of the class distributions or the boundaries between them. The performance of the method is demonstrated on several data sets, both for metric learning and linear dimensionality reduction. 1

3 0.6572715 188 nips-2004-The Laplacian PDF Distance: A Cost Function for Clustering in a Kernel Feature Space

Author: Robert Jenssen, Deniz Erdogmus, Jose Principe, Torbjørn Eltoft

Abstract: A new distance measure between probability density functions (pdfs) is introduced, which we refer to as the Laplacian pdf distance. The Laplacian pdf distance exhibits a remarkable connection to Mercer kernel based learning theory via the Parzen window technique for density estimation. In a kernel feature space defined by the eigenspectrum of the Laplacian data matrix, this pdf distance is shown to measure the cosine of the angle between cluster mean vectors. The Laplacian data matrix, and hence its eigenspectrum, can be obtained automatically based on the data at hand, by optimal Parzen window selection. We show that the Laplacian pdf distance has an interesting interpretation as a risk function connected to the probability of error. 1

4 0.6422019 168 nips-2004-Semigroup Kernels on Finite Sets

Author: Marco Cuturi, Jean-philippe Vert

Abstract: Complex objects can often be conveniently represented by finite sets of simpler components, such as images by sets of patches or texts by bags of words. We study the class of positive definite (p.d.) kernels for two such objects that can be expressed as a function of the merger of their respective sets of components. We prove a general integral representation of such kernels and present two particular examples. One of them leads to a kernel for sets of points living in a space endowed itself with a positive definite kernel. We provide experimental results on a benchmark experiment of handwritten digits image classification which illustrate the validity of the approach. 1

5 0.63728511 93 nips-2004-Kernel Projection Machine: a New Tool for Pattern Recognition

Author: Laurent Zwald, Gilles Blanchard, Pascal Massart, Régis Vert

Abstract: This paper investigates the effect of Kernel Principal Component Analysis (KPCA) within the classification framework, essentially the regularization properties of this dimensionality reduction method. KPCA has been previously used as a pre-processing step before applying an SVM but we point out that this method is somewhat redundant from a regularization point of view and we propose a new algorithm called Kernel Projection Machine to avoid this redundancy, based on an analogy with the statistical framework of regression for a Gaussian white noise model. Preliminary experimental results show that this algorithm reaches the same performances as an SVM. 1

6 0.63419592 59 nips-2004-Efficient Kernel Discriminant Analysis via QR Decomposition

7 0.55050468 94 nips-2004-Kernels for Multi--task Learning

8 0.54308099 92 nips-2004-Kernel Methods for Implicit Surface Modeling

9 0.54181594 134 nips-2004-Object Classification from a Single Example Utilizing Class Relevance Metrics

10 0.53031737 105 nips-2004-Log-concavity Results on Gaussian Process Methods for Supervised and Unsupervised Learning

11 0.51312184 201 nips-2004-Using the Equivalent Kernel to Understand Gaussian Process Regression

12 0.51158684 133 nips-2004-Nonparametric Transforms of Graph Kernels for Semi-Supervised Learning

13 0.50859684 136 nips-2004-On Semi-Supervised Classification

14 0.49748981 125 nips-2004-Multiple Relational Embedding

15 0.49137321 98 nips-2004-Learning Gaussian Process Kernels via Hierarchical Bayes

16 0.48994434 60 nips-2004-Efficient Kernel Machines Using the Improved Fast Gauss Transform

17 0.47638783 11 nips-2004-A Second Order Cone programming Formulation for Classifying Missing Data

18 0.4763388 145 nips-2004-Parametric Embedding for Class Visualization

19 0.47532466 14 nips-2004-A Topographic Support Vector Machine: Classification Using Local Label Configurations

20 0.46844822 3 nips-2004-A Feature Selection Algorithm Based on the Global Minimization of a Generalization Error Bound


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(13, 0.122), (15, 0.166), (26, 0.066), (31, 0.038), (33, 0.168), (35, 0.027), (39, 0.025), (50, 0.051), (76, 0.252)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.96454251 132 nips-2004-Nonlinear Blind Source Separation by Integrating Independent Component Analysis and Slow Feature Analysis

Author: Tobias Blaschke, Laurenz Wiskott

Abstract: In contrast to the equivalence of linear blind source separation and linear independent component analysis it is not possible to recover the original source signal from some unknown nonlinear transformations of the sources using only the independence assumption. Integrating the objectives of statistical independence and temporal slowness removes this indeterminacy leading to a new method for nonlinear blind source separation. The principle of temporal slowness is adopted from slow feature analysis, an unsupervised method to extract slowly varying features from a given observed vectorial signal. The performance of the algorithm is demonstrated on nonlinearly mixed speech data. 1

2 0.93152648 30 nips-2004-Binet-Cauchy Kernels

Author: Alex J. Smola, S.v.n. Vishwanathan

Abstract: We propose a family of kernels based on the Binet-Cauchy theorem and its extension to Fredholm operators. This includes as special cases all currently known kernels derived from the behavioral framework, diffusion processes, marginalized kernels, kernels on graphs, and the kernels on sets arising from the subspace angle approach. Many of these kernels can be seen as the extrema of a new continuum of kernel functions, which leads to numerous new special cases. As an application, we apply the new class of kernels to the problem of clustering of video sequences with encouraging results. 1

same-paper 3 0.87440228 142 nips-2004-Outlier Detection with One-class Kernel Fisher Discriminants

Author: Volker Roth

Abstract: The problem of detecting “atypical objects” or “outliers” is one of the classical topics in (robust) statistics. Recently, it has been proposed to address this problem by means of one-class SVM classifiers. The main conceptual shortcoming of most one-class approaches, however, is that in a strict sense they are unable to detect outliers, since the expected fraction of outliers has to be specified in advance. The method presented in this paper overcomes this problem by relating kernelized one-class classification to Gaussian density estimation in the induced feature space. Having established this relation, it is possible to identify “atypical objects” by quantifying their deviations from the Gaussian model. For RBF kernels it is shown that the Gaussian model is “rich enough” in the sense that it asymptotically provides an unbiased estimator for the true density. In order to overcome the inherent model selection problem, a cross-validated likelihood criterion for selecting all free model parameters is applied. 1

4 0.8511138 60 nips-2004-Efficient Kernel Machines Using the Improved Fast Gauss Transform

Author: Changjiang Yang, Ramani Duraiswami, Larry S. Davis

Abstract: The computation and memory required for kernel machines with N training samples is at least O(N 2 ). Such a complexity is significant even for moderate size problems and is prohibitive for large datasets. We present an approximation technique based on the improved fast Gauss transform to reduce the computation to O(N ). We also give an error bound for the approximation, and provide experimental results on the UCI datasets. 1

5 0.78341794 104 nips-2004-Linear Multilayer Independent Component Analysis for Large Natural Scenes

Author: Yoshitatsu Matsuda, Kazunori Yamaguchi

Abstract: In this paper, linear multilayer ICA (LMICA) is proposed for extracting independent components from quite high-dimensional observed signals such as large-size natural scenes. There are two phases in each layer of LMICA. One is the mapping phase, where a one-dimensional mapping is formed by a stochastic gradient algorithm which makes more highlycorrelated (non-independent) signals be nearer incrementally. Another is the local-ICA phase, where each neighbor (namely, highly-correlated) pair of signals in the mapping is separated by the MaxKurt algorithm. Because LMICA separates only the highly-correlated pairs instead of all ones, it can extract independent components quite efficiently from appropriate observed signals. In addition, it is proved that LMICA always converges. Some numerical experiments verify that LMICA is quite efficient and effective in large-size natural image processing.

6 0.75918895 5 nips-2004-A Harmonic Excitation State-Space Approach to Blind Separation of Speech

7 0.73767561 25 nips-2004-Assignment of Multiplicative Mixtures in Natural Images

8 0.73554099 189 nips-2004-The Power of Selective Memory: Self-Bounded Learning of Prediction Suffix Trees

9 0.73490667 181 nips-2004-Synergies between Intrinsic and Synaptic Plasticity in Individual Model Neurons

10 0.73449725 131 nips-2004-Non-Local Manifold Tangent Learning

11 0.73425305 16 nips-2004-Adaptive Discriminative Generative Model and Its Applications

12 0.73086804 168 nips-2004-Semigroup Kernels on Finite Sets

13 0.72981602 178 nips-2004-Support Vector Classification with Input Data Uncertainty

14 0.72918665 31 nips-2004-Blind One-microphone Speech Separation: A Spectral Learning Approach

15 0.72901928 93 nips-2004-Kernel Projection Machine: a New Tool for Pattern Recognition

16 0.72705865 177 nips-2004-Supervised Graph Inference

17 0.72470272 4 nips-2004-A Generalized Bradley-Terry Model: From Group Competition to Individual Skill

18 0.72280318 28 nips-2004-Bayesian inference in spiking neurons

19 0.72242928 110 nips-2004-Matrix Exponential Gradient Updates for On-line Learning and Bregman Projection

20 0.72179329 124 nips-2004-Multiple Alignment of Continuous Time Series