nips nips2004 nips2004-59 knowledge-graph by maker-knowledge-mining

59 nips-2004-Efficient Kernel Discriminant Analysis via QR Decomposition

Source: pdf

Author: Tao Xiong, Jieping Ye, Qi Li, Ravi Janardan, Vladimir Cherkassky

Abstract: Linear Discriminant Analysis (LDA) is a well-known method for feature extraction and dimension reduction. It has been used widely in many applications such as face recognition. Recently, a novel LDA algorithm based on QR Decomposition, namely LDA/QR, has been proposed, which is competitive in terms of classiﬁcation accuracy with other LDA algorithms, but it has much lower costs in time and space. However, LDA/QR is based on linear projection, which may not be suitable for data with nonlinear structure. This paper ﬁrst proposes an algorithm called KDA/QR, which extends the LDA/QR algorithm to deal with nonlinear data by using the kernel operator. Then an efﬁcient approximation of KDA/QR called AKDA/QR is proposed. Experiments on face image data show that the classiﬁcation accuracy of both KDA/QR and AKDA/QR are competitive with Generalized Discriminant Analysis (GDA), a general kernel discriminant analysis algorithm, while AKDA/QR has much lower time and space costs. 1

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract Linear Discriminant Analysis (LDA) is a well-known method for feature extraction and dimension reduction. [sent-11, score-0.112]

2 It has been used widely in many applications such as face recognition. [sent-12, score-0.131]

3 Recently, a novel LDA algorithm based on QR Decomposition, namely LDA/QR, has been proposed, which is competitive in terms of classiﬁcation accuracy with other LDA algorithms, but it has much lower costs in time and space. [sent-13, score-0.192]

4 This paper ﬁrst proposes an algorithm called KDA/QR, which extends the LDA/QR algorithm to deal with nonlinear data by using the kernel operator. [sent-15, score-0.286]

5 Experiments on face image data show that the classiﬁcation accuracy of both KDA/QR and AKDA/QR are competitive with Generalized Discriminant Analysis (GDA), a general kernel discriminant analysis algorithm, while AKDA/QR has much lower time and space costs. [sent-17, score-0.741]

6 It has been used widely in many applications such as face recognition [2]. [sent-19, score-0.17]

7 Classical LDA aims to ﬁnd optimal transformation by minimizing the within-class distance and maximizing the between-class distance simultaneously, thus achieving maximum discrimination. [sent-20, score-0.072]

8 The optimal transformation can be readily computed by computing the eigen-decomposition on the scatter matrices. [sent-21, score-0.216]

9 To deal with such a limitation, nonlinear extensions through kernel functions have been proposed. [sent-23, score-0.224]

10 The main idea of kernel-based methods is to map the input data to a feature space through a nonlinear mapping, where the inner products in the feature space can be computed by a kernel function without knowing the nonlinear mapping explicitly [9]. [sent-24, score-0.433]

11 To our knowledge, there are few efﬁcient algorithms for general kernel based discriminant algorithms — most known algorithms effectively scale as O(n3 ) where n is the sample size. [sent-26, score-0.49]

12 Moreover, experiments in [11, 12] show that the classiﬁcation accuracy of LDA/QR is competitive with other LDA algorithms. [sent-35, score-0.128]

13 In this paper, we ﬁrst propose an algorithm, namely KDA/QR1 , which is a nonlinear extension of LDA/QR. [sent-36, score-0.093]

14 Since KDA/QR involves the whole kernel matrix, which is not scalable for large datasets, we also propose an approximation of KDA/QR, namely AKDA/QR. [sent-37, score-0.197]

15 A distinct property of AKDA/QR is that it scales as O(ndc), where n is the size of the data, d is the dimension of the data, and c is the number of classes. [sent-38, score-0.091]

16 We apply the proposed algorithms on face image datasets and compare them with LDA/QR, and Generalized Discriminant Analysis (GDA) [1], a general method for kernel discriminant analysis. [sent-39, score-0.642]

17 Experiments show that: (1) AKDA/QR is competitive with KDA/QR and GDA in classiﬁcation; (2) both KDA/QR and AKDA/QR outperform LDA/QR in classiﬁcation; and (3) AKDA/QR has much lower costs in time and space than GDA. [sent-40, score-0.129]

18 The ﬁrst stage maximizes the separation between different classes via QR Decomposition [4]. [sent-43, score-0.124]

19 The second stage addresses the issue of minimizing the within-class distance, while maintaining low time/space complexity. [sent-44, score-0.124]

20 Let A ∈ IRd×n be the data matrix, where each column ai is a vector in d-dimensional space. [sent-45, score-0.139]

21 Assume A is partitioned into c classes {Πi }c , and the size of the ith class |Πi | = ni . [sent-46, score-0.187]

22 The ﬁrst stage of LDA/QR aims to solve the following optimization problem, G = arg max trace(Gt Sb G). [sent-49, score-0.207]

23 The solution can also be obtained through QR Decomposition on the centroid matrix C [12], where C = [m1 , m2 , · · · , mc ] consists of the c centroids. [sent-52, score-0.286]

24 B ← Y t Y ; /*Reduced between-class scatter matrix*/ 6. [sent-58, score-0.139]

25 Compute the c eigenvectors φi of (T + µIc )−1 B with decreasing eigenvalues; 8. [sent-60, score-0.07]

26 Then G = QV , for any orthogonal matrix V , solves the optimization problem in Eq. [sent-63, score-0.162]

27 Note that the choice of orthogonal matrix V is arbitrary, since trace(Gt Sb G) = trace(V t Gt Sb GV ), for any orthogonal matrix V . [sent-65, score-0.23]

28 The second stage of LDA/QR reﬁnes the ﬁrst stage by addressing the issue of minimizing the within-class distance. [sent-66, score-0.248]

29 It incorporates the within-class scatter information by applying a relaxation scheme on V (relaxing V from an orthogonal matrix to an arbitrary matrix). [sent-67, score-0.254]

30 In the second stage of LDA/QR, we look for a transformation matrix G such that G = QV , for some V . [sent-68, score-0.233]

31 Since Gt Sb G = V t (Qt Sb Q)V , Gt Sw G = V t (Qt Sw Q)V , and Gt St G = V t (Qt St Q)V , the original problem of ﬁnding optimal G is equivalent to ﬁnding V , with B = Qt Sb Q, W = Qt Sw Q, and T = Qt St Q as the “reduced” betweenclass, within-class and total scatter matrices, respectively. [sent-71, score-0.139]

32 Note that B has much smaller size than the original scatter matrix Sb (similarly for W and T ). [sent-72, score-0.245]

33 The optimal V can be computed efﬁciently using many existing LDA-based methods, since we are dealing with matrices B, W , and T of size c by c. [sent-73, score-0.105]

34 We can compute the optimal V by simply applying regularized LDA; that is, we compute V , by solving a small eigenvalue problem on (W + µIc )−1 B or (T + µIc )−1 B (note T = B + W ), for some positive constant µ [3]. [sent-74, score-0.091]

35 We use the total scatter instead of the within-class scatter in Lines 4, 6, and 7, mainly for convenience of presentation of the kernel methods in Section 3 and Section 4. [sent-76, score-0.442]

36 3 Kernel discriminant analysis via QR-decomposition (KDA/QR) In this section, the KDA/QR algorithm, a nonlinear extension of LDA/QR through kernel functions, is presented. [sent-77, score-0.501]

37 Let Φ be a mapping to the feature space and Φ(A) be the data matrix in the feature space. [sent-78, score-0.181]

38 Then, the centroid matrix C Φ in the feature space is C Φ = mΦ , · · · , mΦ = 1 c 1 n1 Φ(ai ), · · · , i∈Π1 1 nc Φ(ai ) . [sent-79, score-0.388]

39 (2) i∈Πc 1 The global centroid in the feature space can be computed as mΦ = n i ni mΦ . [sent-80, score-0.331]

40 To maxii mize between-class distance in the feature space, as discussed in Section 2, we perform QR decomposition on C Φ , i. [sent-81, score-0.169]

41 A key observation is that RΦ can be computed as (C Φ )t C Φ = (RΦ )t RΦ by applying the Cholesky decomposition on (C Φ )t C Φ [4]. [sent-84, score-0.156]

42 Φ(an )], and the ith column 1 1 of M is (0, · · · , 0, ni , · · · , ni , 0, · · · , 0)t . [sent-88, score-0.201]

43 Let K be the kernel matrix with K(i, j) = Φ(ai ), Φ(aj ) . [sent-89, score-0.237]

44 Compute the c eigenvectors φΦ of (T Φ + µIc )−1 B Φ , with decreasing eigenvalues; i 9. [sent-100, score-0.07]

45 The matrices Y Φ , Z Φ , B Φ , and W Φ in the feature space (corresponding to the second stage in LDA/QR) can be computed as follows. [sent-103, score-0.25]

46 Φ In the feature space, we have Hb = C Φ N , where the ith column of N is √ √ ni Φ ((0, · · · , ni , · · · 0)t − n (n1 , · · · , nc )t . [sent-104, score-0.359]

47 We proceed by computing the c eigenvectors {φΦ }c of (T Φ + µIc )−1 B Φ . [sent-108, score-0.068]

48 The ﬁnal transformation matrix can be computed as c 1 2 GΦ = QΦ V Φ = C Φ (RΦ )−1 V Φ . [sent-110, score-0.15]

49 1 Complexity analysis of KDA/QR The cost to formulate the kernel matrix in Line 1 is O(n2 d). [sent-114, score-0.272]

50 The Cholesky decomposition in Line 3 takes O(c3 ) [4]. [sent-116, score-0.162]

51 Lines 4 takes O(c3 ), as M t KM is already computed in 1 Line 2. [sent-117, score-0.088]

52 In Line 5, the computation of Z Φ = E t KM (RΦ )−1 = (I − n eet )KM (RΦ )−1 = 1 KM (RΦ )−1 − n e (et KM )(RΦ )−1 in the given order takes O(nc2 ), assuming KM is kept in Line 2. [sent-118, score-0.092]

53 Hence, the total complexity of the kernel LDA/QR algorithm is O(n2 d). [sent-120, score-0.195]

54 Omitting the cost for evaluating the kernel matrix K, which is required in all kernel-based algorithms, the total cost is O(n2 ). [sent-121, score-0.237]

55 Note that all other general discriminant analysis algorithms scale as O(n3 ). [sent-122, score-0.305]

56 Note that the bottleneck of KDA/QR is the explicit formation of the large kernel matrix K for the computation of (C Φ )t C Φ in Line 2 of Algorithm 2. [sent-124, score-0.237]

57 Mathematically, the optimal x∗ can be computed j j j by solving the following optimization problem: min xj ∈Rd Φ(xj ) − 1 nj Φ(ai ) 2 for j = 1, · · · , c. [sent-128, score-0.183]

58 Consider Gaussian kernel function exp(− x − y 2 /σ), where σ is the bandwidth parameter. [sent-133, score-0.164]

59 The optimization problem in (5) is convex if for each j = 1, · · · , c and for all i ∈ Πj , 2 (xj − ai ) ≤ 1 σ (6) Proof. [sent-134, score-0.186]

60 It is easy to check that, for the Gaussian kernel, the optimization problem in (5) reduces to: min f (xj ) xj ∈Rd where f (x) = fi (x) is H(fi ) = for j = 1, · · · , c, (7) 2 i∈Πj fi (x) and fi (x) = −exp(− x − ai /σ). [sent-135, score-0.446]

61 The Hessian matrix of 2 2 2 t σ exp(− x − ai /σ)(I − σ (x − ai )(x − ai ) ). [sent-136, score-0.49]

62 It is easy to show that 2 if σ (x − ai ) ≤ 1, for all i ∈ Πj , then H(fi ) is positive semi-deﬁnite, that is, fi (x) is convex. [sent-137, score-0.212]

63 For applications involving high-dimensional data, such as face recognition, σ is usually large (typically ranging from thousands to hundreds of thousands [13]), and the condition in Lemma 4. [sent-139, score-0.187]

64 A key observation is that for relatively large σ, the centroid of each class in the original space will map very close to the centroid in the feature space [9], which can serve as the approximate solution 1 of the optimization problem in (7). [sent-142, score-0.447]

65 Experiments show that choosing x∗ = nj i∈Πj ai j produces results close to the one by solving the optimization problem in (7). [sent-143, score-0.24]

66 , c, the centroid matrix C Φ can be approximated by j ˆ C Φ ≈ [Φ(x∗ ) . [sent-148, score-0.23]

67 Compute x∗ = nj i∈Πj ai , for j = 1, · · · , c; j ˆ 2. [sent-152, score-0.193]

68 Compute the c eigenvectors φΦ of (T Φ + µIc )−1 B Φ , with decreasing eigenvalues; i Φ Φ ˆΦ Φ ˆ ˆ ˆ 9. [sent-160, score-0.07]

69 The Cholesky decomposition of K will i j Φ Φ t ˆΦ ˆ ˆ ˆ give us R by K = (R ) R . [sent-164, score-0.115]

70 1 Complexity analysis of AKDA/QR ˆ It takes O(dn) in Line 1. [sent-170, score-0.082]

71 The construction of the matrix K in Line 2 takes O(c2 d). [sent-171, score-0.12]

72 It then takes O(c3 ) and O(nc2 ) for matrix multiplications in Lines 6 and 7, respectively. [sent-174, score-0.12]

73 Table 1 lists the time and space complexities of several dimension reduction algorithms. [sent-177, score-0.136]

74 It is clear from the table that AKDA/QR is more efﬁcient than other kernel based methods. [sent-178, score-0.197]

75 Note that both KDA/QR and AKDA/QR have two parameters: σ for the kernel function and µ for the regularization. [sent-181, score-0.164]

76 We randomly select p samples of each person from the dataset for training and the rest for 0. [sent-187, score-0.07]

77 2 3 4 5 6 7 Number of training samples per class 8 3 4 5 6 7 Number of training samples per class 8 Figure 1: Comparison of classiﬁcation accuracy on PIX (left) and AR (right). [sent-202, score-0.172]

78 We repeat the experiments 20 times and report the average recognition accuracy of each method. [sent-204, score-0.087]

79 Datasets: We use the following three datasets in our study, which are publicly available: PIX contains 300 face images of 30 persons. [sent-209, score-0.202]

80 We subsample the images down to a size of 100 × 100 = 10000; ORL is a well-known dataset for face recognition. [sent-211, score-0.284]

81 It contains ten different face images of 40 persons, for a total of 400 images. [sent-212, score-0.166]

82 The image size is 92 × 112 = 10304; AR is a large face image datasets. [sent-213, score-0.246]

83 This subset contains 1638 face images of 126 persons. [sent-215, score-0.166]

84 We subsample the images down to a size of 60 × 40 = 2400. [sent-217, score-0.113]

85 LDA/QR: In this experiment, we compare the performance of AKDA/QR and KDA/QR with that of several other linear dimension reduction algorithms including PCA, LDA/QR on two face datasets. [sent-220, score-0.256]

86 1, where the x-axis denotes the number of samples per class in the training set and the y-axis denotes the classiﬁcation accuracy. [sent-223, score-0.062]

87 It is known that the images in the AR dataset contain pretty large area of occlusion due to sun glasses and scarves, which makes linear algorithms such as LDA/QR less effective. [sent-227, score-0.103]

88 Another interesting observation is that the approximate AKQA/QR algorithm is competitive with its exact version KDA/QR in all cases. [sent-228, score-0.111]

89 The comparison is made on the ORL face dataset, as the result of GDA on ORL is available in [5]. [sent-231, score-0.131]

90 The main observation from this experiment is that both KDA/QR and AKDA/QR are competitive with GDA, while AKDA/QR is much more efﬁcient than GDA (see Table 1). [sent-234, score-0.08]

91 Similar to the ﬁrst experiment, Table 2 shows that KDA/QR and AKDA/QR consistently outperform the PCA and LDA/QR algorithms in terms of recognition accuracy. [sent-235, score-0.116]

92 6 Conclusions In this paper, we ﬁrst present a general kernel discriminant analysis algorithm, called KDA/QR. [sent-236, score-0.441]

93 9875 Table 2: Comparison of classiﬁcation accuracy on ORL face image dataset. [sent-267, score-0.22]

94 Our experimental results show that the accuracy achieved by the two algorithms is very competitive with GDA, a general kernel discriminant algorithms, while AKDA/QR is much more efﬁcient. [sent-271, score-0.562]

95 Kernel-based optimized feature vectors selection and discriminant analysis for face recognition. [sent-307, score-0.462]

96 A mathematical programming approach to the kernel a u ﬁsher algorithm. [sent-314, score-0.164]

97 An improved training algorithm for kernel ﬁsher diso criminants. [sent-330, score-0.195]

98 Nonlinear component analysis as a kernel eigenvalue o u problem. [sent-341, score-0.23]

99 LDA/QR: An efﬁcient and effective dimension reduction algorithm and its theoretical foundation. [sent-346, score-0.128]

100 IDR/QR: An incremental dimension reduction algorithm via QR decomposition. [sent-355, score-0.128]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('gda', 0.366), ('km', 0.27), ('sb', 0.243), ('discriminant', 0.242), ('hb', 0.216), ('ht', 0.187), ('qr', 0.172), ('kernel', 0.164), ('centroid', 0.157), ('cholesky', 0.147), ('ndc', 0.141), ('scatter', 0.139), ('lda', 0.139), ('ai', 0.139), ('ic', 0.134), ('face', 0.131), ('stage', 0.124), ('decomposition', 0.115), ('sw', 0.112), ('gt', 0.107), ('nc', 0.104), ('pca', 0.1), ('qt', 0.099), ('janardan', 0.098), ('minnesota', 0.098), ('orl', 0.098), ('ye', 0.098), ('st', 0.089), ('kopf', 0.084), ('ktc', 0.084), ('ktz', 0.084), ('competitive', 0.08), ('ni', 0.079), ('mika', 0.075), ('hw', 0.074), ('pix', 0.074), ('matrix', 0.073), ('fi', 0.073), ('qv', 0.067), ('army', 0.067), ('line', 0.065), ('ar', 0.062), ('nonlinear', 0.06), ('dimension', 0.058), ('kfda', 0.056), ('xiong', 0.056), ('mc', 0.056), ('feature', 0.054), ('nj', 0.054), ('outperform', 0.049), ('ece', 0.049), ('irc', 0.049), ('jieping', 0.049), ('accuracy', 0.048), ('optimization', 0.047), ('takes', 0.047), ('subsample', 0.045), ('cse', 0.045), ('eet', 0.045), ('rd', 0.043), ('ith', 0.043), ('generalized', 0.043), ('eigenvectors', 0.042), ('ird', 0.042), ('irn', 0.042), ('sher', 0.042), ('orthogonal', 0.042), ('xj', 0.041), ('image', 0.041), ('computed', 0.041), ('dataset', 0.04), ('trace', 0.04), ('classi', 0.039), ('complexities', 0.039), ('recognition', 0.039), ('reduction', 0.039), ('transformation', 0.036), ('datasets', 0.036), ('aims', 0.036), ('images', 0.035), ('analysis', 0.035), ('table', 0.033), ('namely', 0.033), ('size', 0.033), ('class', 0.032), ('lines', 0.032), ('algorithm', 0.031), ('matrices', 0.031), ('eigenvalue', 0.031), ('sch', 0.03), ('samples', 0.03), ('compute', 0.03), ('kernels', 0.03), ('eigenvalues', 0.029), ('decreasing', 0.028), ('algorithms', 0.028), ('thousands', 0.028), ('ef', 0.028), ('department', 0.027), ('proceed', 0.026)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000001 59 nips-2004-Efficient Kernel Discriminant Analysis via QR Decomposition

Author: Tao Xiong, Jieping Ye, Qi Li, Ravi Janardan, Vladimir Cherkassky

2 0.38620266 197 nips-2004-Two-Dimensional Linear Discriminant Analysis

Author: Jieping Ye, Ravi Janardan, Qi Li

Abstract: Linear Discriminant Analysis (LDA) is a well-known scheme for feature extraction and dimension reduction. It has been used widely in many applications involving high-dimensional data, such as face recognition and image retrieval. An intrinsic limitation of classical LDA is the so-called singularity problem, that is, it fails when all scatter matrices are singular. A well-known approach to deal with the singularity problem is to apply an intermediate dimension reduction stage using Principal Component Analysis (PCA) before LDA. The algorithm, called PCA+LDA, is used widely in face recognition. However, PCA+LDA has high costs in time and space, due to the need for an eigen-decomposition involving the scatter matrices. In this paper, we propose a novel LDA algorithm, namely 2DLDA, which stands for 2-Dimensional Linear Discriminant Analysis. 2DLDA overcomes the singularity problem implicitly, while achieving efﬁciency. The key difference between 2DLDA and classical LDA lies in the model for data representation. Classical LDA works with vectorized representations of data, while the 2DLDA algorithm works with data in matrix representation. To further reduce the dimension by 2DLDA, the combination of 2DLDA and classical LDA, namely 2DLDA+LDA, is studied, where LDA is preceded by 2DLDA. The proposed algorithms are applied on face recognition and compared with PCA+LDA. Experiments show that 2DLDA and 2DLDA+LDA achieve competitive recognition accuracy, while being much more efﬁcient. 1

3 0.12399817 142 nips-2004-Outlier Detection with One-class Kernel Fisher Discriminants

Author: Volker Roth

Abstract: The problem of detecting “atypical objects” or “outliers” is one of the classical topics in (robust) statistics. Recently, it has been proposed to address this problem by means of one-class SVM classiﬁers. The main conceptual shortcoming of most one-class approaches, however, is that in a strict sense they are unable to detect outliers, since the expected fraction of outliers has to be speciﬁed in advance. The method presented in this paper overcomes this problem by relating kernelized one-class classiﬁcation to Gaussian density estimation in the induced feature space. Having established this relation, it is possible to identify “atypical objects” by quantifying their deviations from the Gaussian model. For RBF kernels it is shown that the Gaussian model is “rich enough” in the sense that it asymptotically provides an unbiased estimator for the true density. In order to overcome the inherent model selection problem, a cross-validated likelihood criterion for selecting all free model parameters is applied. 1

4 0.10681912 133 nips-2004-Nonparametric Transforms of Graph Kernels for Semi-Supervised Learning

Author: Xiaojin Zhu, Jaz Kandola, Zoubin Ghahramani, John D. Lafferty

Abstract: We present an algorithm based on convex optimization for constructing kernels for semi-supervised learning. The kernel matrices are derived from the spectral decomposition of graph Laplacians, and combine labeled and unlabeled data in a systematic fashion. Unlike previous work using diffusion kernels and Gaussian random ﬁeld kernels, a nonparametric kernel approach is presented that incorporates order constraints during optimization. This results in ﬂexible kernels and avoids the need to choose among different parametric forms. Our approach relies on a quadratically constrained quadratic program (QCQP), and is computationally feasible for large datasets. We evaluate the kernels on real datasets using support vector machines, with encouraging results. 1

5 0.099667668 168 nips-2004-Semigroup Kernels on Finite Sets

Author: Marco Cuturi, Jean-philippe Vert

Abstract: Complex objects can often be conveniently represented by ﬁnite sets of simpler components, such as images by sets of patches or texts by bags of words. We study the class of positive deﬁnite (p.d.) kernels for two such objects that can be expressed as a function of the merger of their respective sets of components. We prove a general integral representation of such kernels and present two particular examples. One of them leads to a kernel for sets of points living in a space endowed itself with a positive deﬁnite kernel. We provide experimental results on a benchmark experiment of handwritten digits image classiﬁcation which illustrate the validity of the approach. 1

6 0.094265677 54 nips-2004-Distributed Information Regularization on Graphs

7 0.089931853 102 nips-2004-Learning first-order Markov models for control

8 0.089120165 189 nips-2004-The Power of Selective Memory: Self-Bounded Learning of Prediction Suffix Trees

9 0.086712487 93 nips-2004-Kernel Projection Machine: a New Tool for Pattern Recognition

10 0.086607553 24 nips-2004-Approximately Efficient Online Mechanism Design

11 0.085662588 134 nips-2004-Object Classification from a Single Example Utilizing Class Relevance Metrics

12 0.085190661 127 nips-2004-Neighbourhood Components Analysis

13 0.084134243 79 nips-2004-Hierarchical Eigensolver for Transition Matrices in Spectral Methods

14 0.083852448 98 nips-2004-Learning Gaussian Process Kernels via Hierarchical Bayes

15 0.081233904 68 nips-2004-Face Detection --- Efficient and Rank Deficient

16 0.075588353 13 nips-2004-A Three Tiered Approach for Articulated Object Action Modeling and Recognition

17 0.071236953 18 nips-2004-Algebraic Set Kernels with Application to Inference Over Local Image Representations

18 0.070213079 188 nips-2004-The Laplacian PDF Distance: A Cost Function for Clustering in a Kernel Feature Space

19 0.069867983 8 nips-2004-A Machine Learning Approach to Conjoint Analysis

20 0.069320515 163 nips-2004-Semi-parametric Exponential Family PCA

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.228), (1, 0.084), (2, -0.019), (3, -0.029), (4, -0.033), (5, -0.019), (6, -0.13), (7, -0.182), (8, 0.168), (9, 0.11), (10, -0.057), (11, -0.103), (12, -0.095), (13, 0.131), (14, 0.028), (15, -0.127), (16, -0.24), (17, -0.038), (18, 0.277), (19, 0.157), (20, 0.159), (21, -0.132), (22, 0.113), (23, 0.17), (24, 0.101), (25, 0.064), (26, -0.142), (27, 0.016), (28, -0.021), (29, 0.089), (30, -0.023), (31, 0.027), (32, 0.051), (33, 0.016), (34, 0.002), (35, 0.061), (36, 0.092), (37, -0.107), (38, -0.03), (39, 0.049), (40, -0.036), (41, 0.107), (42, 0.039), (43, -0.011), (44, -0.007), (45, 0.004), (46, 0.011), (47, 0.036), (48, -0.023), (49, 0.022)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.93544745 197 nips-2004-Two-Dimensional Linear Discriminant Analysis

Author: Jieping Ye, Ravi Janardan, Qi Li

same-paper 2 0.93168068 59 nips-2004-Efficient Kernel Discriminant Analysis via QR Decomposition

Author: Tao Xiong, Jieping Ye, Qi Li, Ravi Janardan, Vladimir Cherkassky

3 0.59230411 127 nips-2004-Neighbourhood Components Analysis

Author: Jacob Goldberger, Geoffrey E. Hinton, Sam T. Roweis, Ruslan Salakhutdinov

Abstract: In this paper we propose a novel method for learning a Mahalanobis distance measure to be used in the KNN classiﬁcation algorithm. The algorithm directly maximizes a stochastic variant of the leave-one-out KNN score on the training set. It can also learn a low-dimensional linear embedding of labeled data that can be used for data visualization and fast classiﬁcation. Unlike other methods, our classiﬁcation model is non-parametric, making no assumptions about the shape of the class distributions or the boundaries between them. The performance of the method is demonstrated on several data sets, both for metric learning and linear dimensionality reduction. 1

4 0.44333744 142 nips-2004-Outlier Detection with One-class Kernel Fisher Discriminants

Author: Volker Roth

5 0.38525483 94 nips-2004-Kernels for Multi--task Learning

Author: Charles A. Micchelli, Massimiliano Pontil

Abstract: This paper provides a foundation for multi–task learning using reproducing kernel Hilbert spaces of vector–valued functions. In this setting, the kernel is a matrix–valued function. Some explicit examples will be described which go beyond our earlier results in [7]. In particular, we characterize classes of matrix– valued kernels which are linear and are of the dot product or the translation invariant type. We discuss how these kernels can be used to model relations between the tasks and present linear multi–task learning algorithms. Finally, we present a novel proof of the representer theorem for a minimizer of a regularization functional which is based on the notion of minimal norm interpolation. 1

6 0.37042874 68 nips-2004-Face Detection --- Efficient and Rank Deficient

7 0.36861107 93 nips-2004-Kernel Projection Machine: a New Tool for Pattern Recognition

8 0.34339273 18 nips-2004-Algebraic Set Kernels with Application to Inference Over Local Image Representations

9 0.33190787 133 nips-2004-Nonparametric Transforms of Graph Kernels for Semi-Supervised Learning

10 0.31210729 106 nips-2004-Machine Learning Applied to Perception: Decision Images for Gender Classification

11 0.29847947 79 nips-2004-Hierarchical Eigensolver for Transition Matrices in Spectral Methods

12 0.29288787 168 nips-2004-Semigroup Kernels on Finite Sets

13 0.28325802 60 nips-2004-Efficient Kernel Machines Using the Improved Fast Gauss Transform

14 0.27927995 205 nips-2004-Who's In the Picture

15 0.27478799 134 nips-2004-Object Classification from a Single Example Utilizing Class Relevance Metrics

16 0.26974493 177 nips-2004-Supervised Graph Inference

17 0.26955968 3 nips-2004-A Feature Selection Algorithm Based on the Global Minimization of a Generalization Error Bound

18 0.26816779 13 nips-2004-A Three Tiered Approach for Articulated Object Action Modeling and Recognition

19 0.26204807 182 nips-2004-Synergistic Face Detection and Pose Estimation with Energy-Based Models

20 0.25533393 8 nips-2004-A Machine Learning Approach to Conjoint Analysis

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(13, 0.06), (15, 0.691), (26, 0.037), (33, 0.088), (35, 0.012), (39, 0.021)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.97880816 50 nips-2004-Dependent Gaussian Processes

Author: Phillip Boyle, Marcus Frean

Abstract: Gaussian processes are usually parameterised in terms of their covariance functions. However, this makes it difﬁcult to deal with multiple outputs, because ensuring that the covariance matrix is positive deﬁnite is problematic. An alternative formulation is to treat Gaussian processes as white noise sources convolved with smoothing kernels, and to parameterise the kernel instead. Using this, we extend Gaussian processes to handle multiple, coupled outputs. 1

same-paper 2 0.97084796 59 nips-2004-Efficient Kernel Discriminant Analysis via QR Decomposition

Author: Tao Xiong, Jieping Ye, Qi Li, Ravi Janardan, Vladimir Cherkassky

3 0.96098799 13 nips-2004-A Three Tiered Approach for Articulated Object Action Modeling and Recognition

Author: Le Lu, Gregory D. Hager, Laurent Younes

Abstract: Visual action recognition is an important problem in computer vision. In this paper, we propose a new method to probabilistically model and recognize actions of articulated objects, such as hand or body gestures, in image sequences. Our method consists of three levels of representation. At the low level, we ﬁrst extract a feature vector invariant to scale and in-plane rotation by using the Fourier transform of a circular spatial histogram. Then, spectral partitioning [20] is utilized to obtain an initial clustering; this clustering is then reﬁned using a temporal smoothness constraint. Gaussian mixture model (GMM) based clustering and density estimation in the subspace of linear discriminant analysis (LDA) are then applied to thousands of image feature vectors to obtain an intermediate level representation. Finally, at the high level we build a temporal multiresolution histogram model for each action by aggregating the clustering weights of sampled images belonging to that action. We discuss how this high level representation can be extended to achieve temporal scaling invariance and to include Bi-gram or Multi-gram transition information. Both image clustering and action recognition/segmentation results are given to show the validity of our three tiered representation.

4 0.95247865 203 nips-2004-Validity Estimates for Loopy Belief Propagation on Binary Real-world Networks

Author: Joris M. Mooij, Hilbert J. Kappen

Abstract: We introduce a computationally efﬁcient method to estimate the validity of the BP method as a function of graph topology, the connectivity strength, frustration and network size. We present numerical results that demonstrate the correctness of our estimates for the uniform random model and for a real-world network (“C. Elegans”). Although the method is restricted to pair-wise interactions, no local evidence (zero “biases”) and binary variables, we believe that its predictions correctly capture the limitations of BP for inference and MAP estimation on arbitrary graphical models. Using this approach, we ﬁnd that BP always performs better than MF. Especially for large networks with broad degree distributions (such as scale-free networks) BP turns out to signiﬁcantly outperform MF. 1

5 0.91725314 92 nips-2004-Kernel Methods for Implicit Surface Modeling

Author: Joachim Giesen, Simon Spalinger, Bernhard Schölkopf

Abstract: We describe methods for computing an implicit model of a hypersurface that is given only by a ﬁnite sampling. The methods work by mapping the sample points into a reproducing kernel Hilbert space and then determining regions in terms of hyperplanes. 1

6 0.91679156 197 nips-2004-Two-Dimensional Linear Discriminant Analysis

7 0.90532029 183 nips-2004-Temporal-Difference Networks

8 0.83182025 9 nips-2004-A Method for Inferring Label Sampling Mechanisms in Semi-Supervised Learning

9 0.77481192 201 nips-2004-Using the Equivalent Kernel to Understand Gaussian Process Regression

10 0.76540089 148 nips-2004-Probabilistic Computation in Spiking Populations

11 0.753667 79 nips-2004-Hierarchical Eigensolver for Transition Matrices in Spectral Methods

12 0.7519294 168 nips-2004-Semigroup Kernels on Finite Sets

13 0.74354273 188 nips-2004-The Laplacian PDF Distance: A Cost Function for Clustering in a Kernel Feature Space

14 0.72012371 178 nips-2004-Support Vector Classification with Input Data Uncertainty

15 0.71932441 30 nips-2004-Binet-Cauchy Kernels

16 0.71745032 94 nips-2004-Kernels for Multi--task Learning

17 0.71738398 98 nips-2004-Learning Gaussian Process Kernels via Hierarchical Bayes

18 0.71714824 110 nips-2004-Matrix Exponential Gradient Updates for On-line Learning and Bregman Projection

19 0.70834905 18 nips-2004-Algebraic Set Kernels with Application to Inference Over Local Image Representations

20 0.7037164 55 nips-2004-Distributed Occlusion Reasoning for Tracking with Nonparametric Belief Propagation