nips nips2004 nips2004-59 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Tao Xiong, Jieping Ye, Qi Li, Ravi Janardan, Vladimir Cherkassky
Abstract: Linear Discriminant Analysis (LDA) is a well-known method for feature extraction and dimension reduction. It has been used widely in many applications such as face recognition. Recently, a novel LDA algorithm based on QR Decomposition, namely LDA/QR, has been proposed, which is competitive in terms of classification accuracy with other LDA algorithms, but it has much lower costs in time and space. However, LDA/QR is based on linear projection, which may not be suitable for data with nonlinear structure. This paper first proposes an algorithm called KDA/QR, which extends the LDA/QR algorithm to deal with nonlinear data by using the kernel operator. Then an efficient approximation of KDA/QR called AKDA/QR is proposed. Experiments on face image data show that the classification accuracy of both KDA/QR and AKDA/QR are competitive with Generalized Discriminant Analysis (GDA), a general kernel discriminant analysis algorithm, while AKDA/QR has much lower time and space costs. 1
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract Linear Discriminant Analysis (LDA) is a well-known method for feature extraction and dimension reduction. [sent-11, score-0.112]
2 It has been used widely in many applications such as face recognition. [sent-12, score-0.131]
3 Recently, a novel LDA algorithm based on QR Decomposition, namely LDA/QR, has been proposed, which is competitive in terms of classification accuracy with other LDA algorithms, but it has much lower costs in time and space. [sent-13, score-0.192]
4 This paper first proposes an algorithm called KDA/QR, which extends the LDA/QR algorithm to deal with nonlinear data by using the kernel operator. [sent-15, score-0.286]
5 Experiments on face image data show that the classification accuracy of both KDA/QR and AKDA/QR are competitive with Generalized Discriminant Analysis (GDA), a general kernel discriminant analysis algorithm, while AKDA/QR has much lower time and space costs. [sent-17, score-0.741]
6 It has been used widely in many applications such as face recognition [2]. [sent-19, score-0.17]
7 Classical LDA aims to find optimal transformation by minimizing the within-class distance and maximizing the between-class distance simultaneously, thus achieving maximum discrimination. [sent-20, score-0.072]
8 The optimal transformation can be readily computed by computing the eigen-decomposition on the scatter matrices. [sent-21, score-0.216]
9 To deal with such a limitation, nonlinear extensions through kernel functions have been proposed. [sent-23, score-0.224]
10 The main idea of kernel-based methods is to map the input data to a feature space through a nonlinear mapping, where the inner products in the feature space can be computed by a kernel function without knowing the nonlinear mapping explicitly [9]. [sent-24, score-0.433]
11 To our knowledge, there are few efficient algorithms for general kernel based discriminant algorithms — most known algorithms effectively scale as O(n3 ) where n is the sample size. [sent-26, score-0.49]
12 Moreover, experiments in [11, 12] show that the classification accuracy of LDA/QR is competitive with other LDA algorithms. [sent-35, score-0.128]
13 In this paper, we first propose an algorithm, namely KDA/QR1 , which is a nonlinear extension of LDA/QR. [sent-36, score-0.093]
14 Since KDA/QR involves the whole kernel matrix, which is not scalable for large datasets, we also propose an approximation of KDA/QR, namely AKDA/QR. [sent-37, score-0.197]
15 A distinct property of AKDA/QR is that it scales as O(ndc), where n is the size of the data, d is the dimension of the data, and c is the number of classes. [sent-38, score-0.091]
16 We apply the proposed algorithms on face image datasets and compare them with LDA/QR, and Generalized Discriminant Analysis (GDA) [1], a general method for kernel discriminant analysis. [sent-39, score-0.642]
17 Experiments show that: (1) AKDA/QR is competitive with KDA/QR and GDA in classification; (2) both KDA/QR and AKDA/QR outperform LDA/QR in classification; and (3) AKDA/QR has much lower costs in time and space than GDA. [sent-40, score-0.129]
18 The first stage maximizes the separation between different classes via QR Decomposition [4]. [sent-43, score-0.124]
19 The second stage addresses the issue of minimizing the within-class distance, while maintaining low time/space complexity. [sent-44, score-0.124]
20 Let A ∈ IRd×n be the data matrix, where each column ai is a vector in d-dimensional space. [sent-45, score-0.139]
21 Assume A is partitioned into c classes {Πi }c , and the size of the ith class |Πi | = ni . [sent-46, score-0.187]
22 The first stage of LDA/QR aims to solve the following optimization problem, G = arg max trace(Gt Sb G). [sent-49, score-0.207]
23 The solution can also be obtained through QR Decomposition on the centroid matrix C [12], where C = [m1 , m2 , · · · , mc ] consists of the c centroids. [sent-52, score-0.286]
24 B ← Y t Y ; /*Reduced between-class scatter matrix*/ 6. [sent-58, score-0.139]
25 Compute the c eigenvectors φi of (T + µIc )−1 B with decreasing eigenvalues; 8. [sent-60, score-0.07]
26 Then G = QV , for any orthogonal matrix V , solves the optimization problem in Eq. [sent-63, score-0.162]
27 Note that the choice of orthogonal matrix V is arbitrary, since trace(Gt Sb G) = trace(V t Gt Sb GV ), for any orthogonal matrix V . [sent-65, score-0.23]
28 The second stage of LDA/QR refines the first stage by addressing the issue of minimizing the within-class distance. [sent-66, score-0.248]
29 It incorporates the within-class scatter information by applying a relaxation scheme on V (relaxing V from an orthogonal matrix to an arbitrary matrix). [sent-67, score-0.254]
30 In the second stage of LDA/QR, we look for a transformation matrix G such that G = QV , for some V . [sent-68, score-0.233]
31 Since Gt Sb G = V t (Qt Sb Q)V , Gt Sw G = V t (Qt Sw Q)V , and Gt St G = V t (Qt St Q)V , the original problem of finding optimal G is equivalent to finding V , with B = Qt Sb Q, W = Qt Sw Q, and T = Qt St Q as the “reduced” betweenclass, within-class and total scatter matrices, respectively. [sent-71, score-0.139]
32 Note that B has much smaller size than the original scatter matrix Sb (similarly for W and T ). [sent-72, score-0.245]
33 The optimal V can be computed efficiently using many existing LDA-based methods, since we are dealing with matrices B, W , and T of size c by c. [sent-73, score-0.105]
34 We can compute the optimal V by simply applying regularized LDA; that is, we compute V , by solving a small eigenvalue problem on (W + µIc )−1 B or (T + µIc )−1 B (note T = B + W ), for some positive constant µ [3]. [sent-74, score-0.091]
35 We use the total scatter instead of the within-class scatter in Lines 4, 6, and 7, mainly for convenience of presentation of the kernel methods in Section 3 and Section 4. [sent-76, score-0.442]
36 3 Kernel discriminant analysis via QR-decomposition (KDA/QR) In this section, the KDA/QR algorithm, a nonlinear extension of LDA/QR through kernel functions, is presented. [sent-77, score-0.501]
37 Let Φ be a mapping to the feature space and Φ(A) be the data matrix in the feature space. [sent-78, score-0.181]
38 Then, the centroid matrix C Φ in the feature space is C Φ = mΦ , · · · , mΦ = 1 c 1 n1 Φ(ai ), · · · , i∈Π1 1 nc Φ(ai ) . [sent-79, score-0.388]
39 (2) i∈Πc 1 The global centroid in the feature space can be computed as mΦ = n i ni mΦ . [sent-80, score-0.331]
40 To maxii mize between-class distance in the feature space, as discussed in Section 2, we perform QR decomposition on C Φ , i. [sent-81, score-0.169]
41 A key observation is that RΦ can be computed as (C Φ )t C Φ = (RΦ )t RΦ by applying the Cholesky decomposition on (C Φ )t C Φ [4]. [sent-84, score-0.156]
42 Φ(an )], and the ith column 1 1 of M is (0, · · · , 0, ni , · · · , ni , 0, · · · , 0)t . [sent-88, score-0.201]
43 Let K be the kernel matrix with K(i, j) = Φ(ai ), Φ(aj ) . [sent-89, score-0.237]
44 Compute the c eigenvectors φΦ of (T Φ + µIc )−1 B Φ , with decreasing eigenvalues; i 9. [sent-100, score-0.07]
45 The matrices Y Φ , Z Φ , B Φ , and W Φ in the feature space (corresponding to the second stage in LDA/QR) can be computed as follows. [sent-103, score-0.25]
46 Φ In the feature space, we have Hb = C Φ N , where the ith column of N is √ √ ni Φ ((0, · · · , ni , · · · 0)t − n (n1 , · · · , nc )t . [sent-104, score-0.359]
47 We proceed by computing the c eigenvectors {φΦ }c of (T Φ + µIc )−1 B Φ . [sent-108, score-0.068]
48 The final transformation matrix can be computed as c 1 2 GΦ = QΦ V Φ = C Φ (RΦ )−1 V Φ . [sent-110, score-0.15]
49 1 Complexity analysis of KDA/QR The cost to formulate the kernel matrix in Line 1 is O(n2 d). [sent-114, score-0.272]
50 The Cholesky decomposition in Line 3 takes O(c3 ) [4]. [sent-116, score-0.162]
51 Lines 4 takes O(c3 ), as M t KM is already computed in 1 Line 2. [sent-117, score-0.088]
52 In Line 5, the computation of Z Φ = E t KM (RΦ )−1 = (I − n eet )KM (RΦ )−1 = 1 KM (RΦ )−1 − n e (et KM )(RΦ )−1 in the given order takes O(nc2 ), assuming KM is kept in Line 2. [sent-118, score-0.092]
53 Hence, the total complexity of the kernel LDA/QR algorithm is O(n2 d). [sent-120, score-0.195]
54 Omitting the cost for evaluating the kernel matrix K, which is required in all kernel-based algorithms, the total cost is O(n2 ). [sent-121, score-0.237]
55 Note that all other general discriminant analysis algorithms scale as O(n3 ). [sent-122, score-0.305]
56 Note that the bottleneck of KDA/QR is the explicit formation of the large kernel matrix K for the computation of (C Φ )t C Φ in Line 2 of Algorithm 2. [sent-124, score-0.237]
57 Mathematically, the optimal x∗ can be computed j j j by solving the following optimization problem: min xj ∈Rd Φ(xj ) − 1 nj Φ(ai ) 2 for j = 1, · · · , c. [sent-128, score-0.183]
58 Consider Gaussian kernel function exp(− x − y 2 /σ), where σ is the bandwidth parameter. [sent-133, score-0.164]
59 The optimization problem in (5) is convex if for each j = 1, · · · , c and for all i ∈ Πj , 2 (xj − ai ) ≤ 1 σ (6) Proof. [sent-134, score-0.186]
60 It is easy to check that, for the Gaussian kernel, the optimization problem in (5) reduces to: min f (xj ) xj ∈Rd where f (x) = fi (x) is H(fi ) = for j = 1, · · · , c, (7) 2 i∈Πj fi (x) and fi (x) = −exp(− x − ai /σ). [sent-135, score-0.446]
61 The Hessian matrix of 2 2 2 t σ exp(− x − ai /σ)(I − σ (x − ai )(x − ai ) ). [sent-136, score-0.49]
62 It is easy to show that 2 if σ (x − ai ) ≤ 1, for all i ∈ Πj , then H(fi ) is positive semi-definite, that is, fi (x) is convex. [sent-137, score-0.212]
63 For applications involving high-dimensional data, such as face recognition, σ is usually large (typically ranging from thousands to hundreds of thousands [13]), and the condition in Lemma 4. [sent-139, score-0.187]
64 A key observation is that for relatively large σ, the centroid of each class in the original space will map very close to the centroid in the feature space [9], which can serve as the approximate solution 1 of the optimization problem in (7). [sent-142, score-0.447]
65 Experiments show that choosing x∗ = nj i∈Πj ai j produces results close to the one by solving the optimization problem in (7). [sent-143, score-0.24]
66 , c, the centroid matrix C Φ can be approximated by j ˆ C Φ ≈ [Φ(x∗ ) . [sent-148, score-0.23]
67 Compute x∗ = nj i∈Πj ai , for j = 1, · · · , c; j ˆ 2. [sent-152, score-0.193]
68 Compute the c eigenvectors φΦ of (T Φ + µIc )−1 B Φ , with decreasing eigenvalues; i Φ Φ ˆΦ Φ ˆ ˆ ˆ 9. [sent-160, score-0.07]
69 The Cholesky decomposition of K will i j Φ Φ t ˆΦ ˆ ˆ ˆ give us R by K = (R ) R . [sent-164, score-0.115]
70 1 Complexity analysis of AKDA/QR ˆ It takes O(dn) in Line 1. [sent-170, score-0.082]
71 The construction of the matrix K in Line 2 takes O(c2 d). [sent-171, score-0.12]
72 It then takes O(c3 ) and O(nc2 ) for matrix multiplications in Lines 6 and 7, respectively. [sent-174, score-0.12]
73 Table 1 lists the time and space complexities of several dimension reduction algorithms. [sent-177, score-0.136]
74 It is clear from the table that AKDA/QR is more efficient than other kernel based methods. [sent-178, score-0.197]
75 Note that both KDA/QR and AKDA/QR have two parameters: σ for the kernel function and µ for the regularization. [sent-181, score-0.164]
76 We randomly select p samples of each person from the dataset for training and the rest for 0. [sent-187, score-0.07]
77 2 3 4 5 6 7 Number of training samples per class 8 3 4 5 6 7 Number of training samples per class 8 Figure 1: Comparison of classification accuracy on PIX (left) and AR (right). [sent-202, score-0.172]
78 We repeat the experiments 20 times and report the average recognition accuracy of each method. [sent-204, score-0.087]
79 Datasets: We use the following three datasets in our study, which are publicly available: PIX contains 300 face images of 30 persons. [sent-209, score-0.202]
80 We subsample the images down to a size of 100 × 100 = 10000; ORL is a well-known dataset for face recognition. [sent-211, score-0.284]
81 It contains ten different face images of 40 persons, for a total of 400 images. [sent-212, score-0.166]
82 The image size is 92 × 112 = 10304; AR is a large face image datasets. [sent-213, score-0.246]
83 This subset contains 1638 face images of 126 persons. [sent-215, score-0.166]
84 We subsample the images down to a size of 60 × 40 = 2400. [sent-217, score-0.113]
85 LDA/QR: In this experiment, we compare the performance of AKDA/QR and KDA/QR with that of several other linear dimension reduction algorithms including PCA, LDA/QR on two face datasets. [sent-220, score-0.256]
86 1, where the x-axis denotes the number of samples per class in the training set and the y-axis denotes the classification accuracy. [sent-223, score-0.062]
87 It is known that the images in the AR dataset contain pretty large area of occlusion due to sun glasses and scarves, which makes linear algorithms such as LDA/QR less effective. [sent-227, score-0.103]
88 Another interesting observation is that the approximate AKQA/QR algorithm is competitive with its exact version KDA/QR in all cases. [sent-228, score-0.111]
89 The comparison is made on the ORL face dataset, as the result of GDA on ORL is available in [5]. [sent-231, score-0.131]
90 The main observation from this experiment is that both KDA/QR and AKDA/QR are competitive with GDA, while AKDA/QR is much more efficient than GDA (see Table 1). [sent-234, score-0.08]
91 Similar to the first experiment, Table 2 shows that KDA/QR and AKDA/QR consistently outperform the PCA and LDA/QR algorithms in terms of recognition accuracy. [sent-235, score-0.116]
92 6 Conclusions In this paper, we first present a general kernel discriminant analysis algorithm, called KDA/QR. [sent-236, score-0.441]
93 9875 Table 2: Comparison of classification accuracy on ORL face image dataset. [sent-267, score-0.22]
94 Our experimental results show that the accuracy achieved by the two algorithms is very competitive with GDA, a general kernel discriminant algorithms, while AKDA/QR is much more efficient. [sent-271, score-0.562]
95 Kernel-based optimized feature vectors selection and discriminant analysis for face recognition. [sent-307, score-0.462]
96 A mathematical programming approach to the kernel a u fisher algorithm. [sent-314, score-0.164]
97 An improved training algorithm for kernel fisher diso criminants. [sent-330, score-0.195]
98 Nonlinear component analysis as a kernel eigenvalue o u problem. [sent-341, score-0.23]
99 LDA/QR: An efficient and effective dimension reduction algorithm and its theoretical foundation. [sent-346, score-0.128]
100 IDR/QR: An incremental dimension reduction algorithm via QR decomposition. [sent-355, score-0.128]
wordName wordTfidf (topN-words)
[('gda', 0.366), ('km', 0.27), ('sb', 0.243), ('discriminant', 0.242), ('hb', 0.216), ('ht', 0.187), ('qr', 0.172), ('kernel', 0.164), ('centroid', 0.157), ('cholesky', 0.147), ('ndc', 0.141), ('scatter', 0.139), ('lda', 0.139), ('ai', 0.139), ('ic', 0.134), ('face', 0.131), ('stage', 0.124), ('decomposition', 0.115), ('sw', 0.112), ('gt', 0.107), ('nc', 0.104), ('pca', 0.1), ('qt', 0.099), ('janardan', 0.098), ('minnesota', 0.098), ('orl', 0.098), ('ye', 0.098), ('st', 0.089), ('kopf', 0.084), ('ktc', 0.084), ('ktz', 0.084), ('competitive', 0.08), ('ni', 0.079), ('mika', 0.075), ('hw', 0.074), ('pix', 0.074), ('matrix', 0.073), ('fi', 0.073), ('qv', 0.067), ('army', 0.067), ('line', 0.065), ('ar', 0.062), ('nonlinear', 0.06), ('dimension', 0.058), ('kfda', 0.056), ('xiong', 0.056), ('mc', 0.056), ('feature', 0.054), ('nj', 0.054), ('outperform', 0.049), ('ece', 0.049), ('irc', 0.049), ('jieping', 0.049), ('accuracy', 0.048), ('optimization', 0.047), ('takes', 0.047), ('subsample', 0.045), ('cse', 0.045), ('eet', 0.045), ('rd', 0.043), ('ith', 0.043), ('generalized', 0.043), ('eigenvectors', 0.042), ('ird', 0.042), ('irn', 0.042), ('sher', 0.042), ('orthogonal', 0.042), ('xj', 0.041), ('image', 0.041), ('computed', 0.041), ('dataset', 0.04), ('trace', 0.04), ('classi', 0.039), ('complexities', 0.039), ('recognition', 0.039), ('reduction', 0.039), ('transformation', 0.036), ('datasets', 0.036), ('aims', 0.036), ('images', 0.035), ('analysis', 0.035), ('table', 0.033), ('namely', 0.033), ('size', 0.033), ('class', 0.032), ('lines', 0.032), ('algorithm', 0.031), ('matrices', 0.031), ('eigenvalue', 0.031), ('sch', 0.03), ('samples', 0.03), ('compute', 0.03), ('kernels', 0.03), ('eigenvalues', 0.029), ('decreasing', 0.028), ('algorithms', 0.028), ('thousands', 0.028), ('ef', 0.028), ('department', 0.027), ('proceed', 0.026)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000001 59 nips-2004-Efficient Kernel Discriminant Analysis via QR Decomposition
Author: Tao Xiong, Jieping Ye, Qi Li, Ravi Janardan, Vladimir Cherkassky
Abstract: Linear Discriminant Analysis (LDA) is a well-known method for feature extraction and dimension reduction. It has been used widely in many applications such as face recognition. Recently, a novel LDA algorithm based on QR Decomposition, namely LDA/QR, has been proposed, which is competitive in terms of classification accuracy with other LDA algorithms, but it has much lower costs in time and space. However, LDA/QR is based on linear projection, which may not be suitable for data with nonlinear structure. This paper first proposes an algorithm called KDA/QR, which extends the LDA/QR algorithm to deal with nonlinear data by using the kernel operator. Then an efficient approximation of KDA/QR called AKDA/QR is proposed. Experiments on face image data show that the classification accuracy of both KDA/QR and AKDA/QR are competitive with Generalized Discriminant Analysis (GDA), a general kernel discriminant analysis algorithm, while AKDA/QR has much lower time and space costs. 1
2 0.38620266 197 nips-2004-Two-Dimensional Linear Discriminant Analysis
Author: Jieping Ye, Ravi Janardan, Qi Li
Abstract: Linear Discriminant Analysis (LDA) is a well-known scheme for feature extraction and dimension reduction. It has been used widely in many applications involving high-dimensional data, such as face recognition and image retrieval. An intrinsic limitation of classical LDA is the so-called singularity problem, that is, it fails when all scatter matrices are singular. A well-known approach to deal with the singularity problem is to apply an intermediate dimension reduction stage using Principal Component Analysis (PCA) before LDA. The algorithm, called PCA+LDA, is used widely in face recognition. However, PCA+LDA has high costs in time and space, due to the need for an eigen-decomposition involving the scatter matrices. In this paper, we propose a novel LDA algorithm, namely 2DLDA, which stands for 2-Dimensional Linear Discriminant Analysis. 2DLDA overcomes the singularity problem implicitly, while achieving efficiency. The key difference between 2DLDA and classical LDA lies in the model for data representation. Classical LDA works with vectorized representations of data, while the 2DLDA algorithm works with data in matrix representation. To further reduce the dimension by 2DLDA, the combination of 2DLDA and classical LDA, namely 2DLDA+LDA, is studied, where LDA is preceded by 2DLDA. The proposed algorithms are applied on face recognition and compared with PCA+LDA. Experiments show that 2DLDA and 2DLDA+LDA achieve competitive recognition accuracy, while being much more efficient. 1
3 0.12399817 142 nips-2004-Outlier Detection with One-class Kernel Fisher Discriminants
Author: Volker Roth
Abstract: The problem of detecting “atypical objects” or “outliers” is one of the classical topics in (robust) statistics. Recently, it has been proposed to address this problem by means of one-class SVM classifiers. The main conceptual shortcoming of most one-class approaches, however, is that in a strict sense they are unable to detect outliers, since the expected fraction of outliers has to be specified in advance. The method presented in this paper overcomes this problem by relating kernelized one-class classification to Gaussian density estimation in the induced feature space. Having established this relation, it is possible to identify “atypical objects” by quantifying their deviations from the Gaussian model. For RBF kernels it is shown that the Gaussian model is “rich enough” in the sense that it asymptotically provides an unbiased estimator for the true density. In order to overcome the inherent model selection problem, a cross-validated likelihood criterion for selecting all free model parameters is applied. 1
4 0.10681912 133 nips-2004-Nonparametric Transforms of Graph Kernels for Semi-Supervised Learning
Author: Xiaojin Zhu, Jaz Kandola, Zoubin Ghahramani, John D. Lafferty
Abstract: We present an algorithm based on convex optimization for constructing kernels for semi-supervised learning. The kernel matrices are derived from the spectral decomposition of graph Laplacians, and combine labeled and unlabeled data in a systematic fashion. Unlike previous work using diffusion kernels and Gaussian random field kernels, a nonparametric kernel approach is presented that incorporates order constraints during optimization. This results in flexible kernels and avoids the need to choose among different parametric forms. Our approach relies on a quadratically constrained quadratic program (QCQP), and is computationally feasible for large datasets. We evaluate the kernels on real datasets using support vector machines, with encouraging results. 1
5 0.099667668 168 nips-2004-Semigroup Kernels on Finite Sets
Author: Marco Cuturi, Jean-philippe Vert
Abstract: Complex objects can often be conveniently represented by finite sets of simpler components, such as images by sets of patches or texts by bags of words. We study the class of positive definite (p.d.) kernels for two such objects that can be expressed as a function of the merger of their respective sets of components. We prove a general integral representation of such kernels and present two particular examples. One of them leads to a kernel for sets of points living in a space endowed itself with a positive definite kernel. We provide experimental results on a benchmark experiment of handwritten digits image classification which illustrate the validity of the approach. 1
6 0.094265677 54 nips-2004-Distributed Information Regularization on Graphs
7 0.089931853 102 nips-2004-Learning first-order Markov models for control
8 0.089120165 189 nips-2004-The Power of Selective Memory: Self-Bounded Learning of Prediction Suffix Trees
9 0.086712487 93 nips-2004-Kernel Projection Machine: a New Tool for Pattern Recognition
10 0.086607553 24 nips-2004-Approximately Efficient Online Mechanism Design
11 0.085662588 134 nips-2004-Object Classification from a Single Example Utilizing Class Relevance Metrics
12 0.085190661 127 nips-2004-Neighbourhood Components Analysis
13 0.084134243 79 nips-2004-Hierarchical Eigensolver for Transition Matrices in Spectral Methods
14 0.083852448 98 nips-2004-Learning Gaussian Process Kernels via Hierarchical Bayes
15 0.081233904 68 nips-2004-Face Detection --- Efficient and Rank Deficient
16 0.075588353 13 nips-2004-A Three Tiered Approach for Articulated Object Action Modeling and Recognition
17 0.071236953 18 nips-2004-Algebraic Set Kernels with Application to Inference Over Local Image Representations
18 0.070213079 188 nips-2004-The Laplacian PDF Distance: A Cost Function for Clustering in a Kernel Feature Space
19 0.069867983 8 nips-2004-A Machine Learning Approach to Conjoint Analysis
20 0.069320515 163 nips-2004-Semi-parametric Exponential Family PCA
topicId topicWeight
[(0, -0.228), (1, 0.084), (2, -0.019), (3, -0.029), (4, -0.033), (5, -0.019), (6, -0.13), (7, -0.182), (8, 0.168), (9, 0.11), (10, -0.057), (11, -0.103), (12, -0.095), (13, 0.131), (14, 0.028), (15, -0.127), (16, -0.24), (17, -0.038), (18, 0.277), (19, 0.157), (20, 0.159), (21, -0.132), (22, 0.113), (23, 0.17), (24, 0.101), (25, 0.064), (26, -0.142), (27, 0.016), (28, -0.021), (29, 0.089), (30, -0.023), (31, 0.027), (32, 0.051), (33, 0.016), (34, 0.002), (35, 0.061), (36, 0.092), (37, -0.107), (38, -0.03), (39, 0.049), (40, -0.036), (41, 0.107), (42, 0.039), (43, -0.011), (44, -0.007), (45, 0.004), (46, 0.011), (47, 0.036), (48, -0.023), (49, 0.022)]
simIndex simValue paperId paperTitle
1 0.93544745 197 nips-2004-Two-Dimensional Linear Discriminant Analysis
Author: Jieping Ye, Ravi Janardan, Qi Li
Abstract: Linear Discriminant Analysis (LDA) is a well-known scheme for feature extraction and dimension reduction. It has been used widely in many applications involving high-dimensional data, such as face recognition and image retrieval. An intrinsic limitation of classical LDA is the so-called singularity problem, that is, it fails when all scatter matrices are singular. A well-known approach to deal with the singularity problem is to apply an intermediate dimension reduction stage using Principal Component Analysis (PCA) before LDA. The algorithm, called PCA+LDA, is used widely in face recognition. However, PCA+LDA has high costs in time and space, due to the need for an eigen-decomposition involving the scatter matrices. In this paper, we propose a novel LDA algorithm, namely 2DLDA, which stands for 2-Dimensional Linear Discriminant Analysis. 2DLDA overcomes the singularity problem implicitly, while achieving efficiency. The key difference between 2DLDA and classical LDA lies in the model for data representation. Classical LDA works with vectorized representations of data, while the 2DLDA algorithm works with data in matrix representation. To further reduce the dimension by 2DLDA, the combination of 2DLDA and classical LDA, namely 2DLDA+LDA, is studied, where LDA is preceded by 2DLDA. The proposed algorithms are applied on face recognition and compared with PCA+LDA. Experiments show that 2DLDA and 2DLDA+LDA achieve competitive recognition accuracy, while being much more efficient. 1
same-paper 2 0.93168068 59 nips-2004-Efficient Kernel Discriminant Analysis via QR Decomposition
Author: Tao Xiong, Jieping Ye, Qi Li, Ravi Janardan, Vladimir Cherkassky
Abstract: Linear Discriminant Analysis (LDA) is a well-known method for feature extraction and dimension reduction. It has been used widely in many applications such as face recognition. Recently, a novel LDA algorithm based on QR Decomposition, namely LDA/QR, has been proposed, which is competitive in terms of classification accuracy with other LDA algorithms, but it has much lower costs in time and space. However, LDA/QR is based on linear projection, which may not be suitable for data with nonlinear structure. This paper first proposes an algorithm called KDA/QR, which extends the LDA/QR algorithm to deal with nonlinear data by using the kernel operator. Then an efficient approximation of KDA/QR called AKDA/QR is proposed. Experiments on face image data show that the classification accuracy of both KDA/QR and AKDA/QR are competitive with Generalized Discriminant Analysis (GDA), a general kernel discriminant analysis algorithm, while AKDA/QR has much lower time and space costs. 1
3 0.59230411 127 nips-2004-Neighbourhood Components Analysis
Author: Jacob Goldberger, Geoffrey E. Hinton, Sam T. Roweis, Ruslan Salakhutdinov
Abstract: In this paper we propose a novel method for learning a Mahalanobis distance measure to be used in the KNN classification algorithm. The algorithm directly maximizes a stochastic variant of the leave-one-out KNN score on the training set. It can also learn a low-dimensional linear embedding of labeled data that can be used for data visualization and fast classification. Unlike other methods, our classification model is non-parametric, making no assumptions about the shape of the class distributions or the boundaries between them. The performance of the method is demonstrated on several data sets, both for metric learning and linear dimensionality reduction. 1
4 0.44333744 142 nips-2004-Outlier Detection with One-class Kernel Fisher Discriminants
Author: Volker Roth
Abstract: The problem of detecting “atypical objects” or “outliers” is one of the classical topics in (robust) statistics. Recently, it has been proposed to address this problem by means of one-class SVM classifiers. The main conceptual shortcoming of most one-class approaches, however, is that in a strict sense they are unable to detect outliers, since the expected fraction of outliers has to be specified in advance. The method presented in this paper overcomes this problem by relating kernelized one-class classification to Gaussian density estimation in the induced feature space. Having established this relation, it is possible to identify “atypical objects” by quantifying their deviations from the Gaussian model. For RBF kernels it is shown that the Gaussian model is “rich enough” in the sense that it asymptotically provides an unbiased estimator for the true density. In order to overcome the inherent model selection problem, a cross-validated likelihood criterion for selecting all free model parameters is applied. 1
5 0.38525483 94 nips-2004-Kernels for Multi--task Learning
Author: Charles A. Micchelli, Massimiliano Pontil
Abstract: This paper provides a foundation for multi–task learning using reproducing kernel Hilbert spaces of vector–valued functions. In this setting, the kernel is a matrix–valued function. Some explicit examples will be described which go beyond our earlier results in [7]. In particular, we characterize classes of matrix– valued kernels which are linear and are of the dot product or the translation invariant type. We discuss how these kernels can be used to model relations between the tasks and present linear multi–task learning algorithms. Finally, we present a novel proof of the representer theorem for a minimizer of a regularization functional which is based on the notion of minimal norm interpolation. 1
6 0.37042874 68 nips-2004-Face Detection --- Efficient and Rank Deficient
7 0.36861107 93 nips-2004-Kernel Projection Machine: a New Tool for Pattern Recognition
8 0.34339273 18 nips-2004-Algebraic Set Kernels with Application to Inference Over Local Image Representations
9 0.33190787 133 nips-2004-Nonparametric Transforms of Graph Kernels for Semi-Supervised Learning
10 0.31210729 106 nips-2004-Machine Learning Applied to Perception: Decision Images for Gender Classification
11 0.29847947 79 nips-2004-Hierarchical Eigensolver for Transition Matrices in Spectral Methods
12 0.29288787 168 nips-2004-Semigroup Kernels on Finite Sets
13 0.28325802 60 nips-2004-Efficient Kernel Machines Using the Improved Fast Gauss Transform
14 0.27927995 205 nips-2004-Who's In the Picture
15 0.27478799 134 nips-2004-Object Classification from a Single Example Utilizing Class Relevance Metrics
16 0.26974493 177 nips-2004-Supervised Graph Inference
17 0.26955968 3 nips-2004-A Feature Selection Algorithm Based on the Global Minimization of a Generalization Error Bound
18 0.26816779 13 nips-2004-A Three Tiered Approach for Articulated Object Action Modeling and Recognition
19 0.26204807 182 nips-2004-Synergistic Face Detection and Pose Estimation with Energy-Based Models
20 0.25533393 8 nips-2004-A Machine Learning Approach to Conjoint Analysis
topicId topicWeight
[(13, 0.06), (15, 0.691), (26, 0.037), (33, 0.088), (35, 0.012), (39, 0.021)]
simIndex simValue paperId paperTitle
1 0.97880816 50 nips-2004-Dependent Gaussian Processes
Author: Phillip Boyle, Marcus Frean
Abstract: Gaussian processes are usually parameterised in terms of their covariance functions. However, this makes it difficult to deal with multiple outputs, because ensuring that the covariance matrix is positive definite is problematic. An alternative formulation is to treat Gaussian processes as white noise sources convolved with smoothing kernels, and to parameterise the kernel instead. Using this, we extend Gaussian processes to handle multiple, coupled outputs. 1
same-paper 2 0.97084796 59 nips-2004-Efficient Kernel Discriminant Analysis via QR Decomposition
Author: Tao Xiong, Jieping Ye, Qi Li, Ravi Janardan, Vladimir Cherkassky
Abstract: Linear Discriminant Analysis (LDA) is a well-known method for feature extraction and dimension reduction. It has been used widely in many applications such as face recognition. Recently, a novel LDA algorithm based on QR Decomposition, namely LDA/QR, has been proposed, which is competitive in terms of classification accuracy with other LDA algorithms, but it has much lower costs in time and space. However, LDA/QR is based on linear projection, which may not be suitable for data with nonlinear structure. This paper first proposes an algorithm called KDA/QR, which extends the LDA/QR algorithm to deal with nonlinear data by using the kernel operator. Then an efficient approximation of KDA/QR called AKDA/QR is proposed. Experiments on face image data show that the classification accuracy of both KDA/QR and AKDA/QR are competitive with Generalized Discriminant Analysis (GDA), a general kernel discriminant analysis algorithm, while AKDA/QR has much lower time and space costs. 1
3 0.96098799 13 nips-2004-A Three Tiered Approach for Articulated Object Action Modeling and Recognition
Author: Le Lu, Gregory D. Hager, Laurent Younes
Abstract: Visual action recognition is an important problem in computer vision. In this paper, we propose a new method to probabilistically model and recognize actions of articulated objects, such as hand or body gestures, in image sequences. Our method consists of three levels of representation. At the low level, we first extract a feature vector invariant to scale and in-plane rotation by using the Fourier transform of a circular spatial histogram. Then, spectral partitioning [20] is utilized to obtain an initial clustering; this clustering is then refined using a temporal smoothness constraint. Gaussian mixture model (GMM) based clustering and density estimation in the subspace of linear discriminant analysis (LDA) are then applied to thousands of image feature vectors to obtain an intermediate level representation. Finally, at the high level we build a temporal multiresolution histogram model for each action by aggregating the clustering weights of sampled images belonging to that action. We discuss how this high level representation can be extended to achieve temporal scaling invariance and to include Bi-gram or Multi-gram transition information. Both image clustering and action recognition/segmentation results are given to show the validity of our three tiered representation.
4 0.95247865 203 nips-2004-Validity Estimates for Loopy Belief Propagation on Binary Real-world Networks
Author: Joris M. Mooij, Hilbert J. Kappen
Abstract: We introduce a computationally efficient method to estimate the validity of the BP method as a function of graph topology, the connectivity strength, frustration and network size. We present numerical results that demonstrate the correctness of our estimates for the uniform random model and for a real-world network (“C. Elegans”). Although the method is restricted to pair-wise interactions, no local evidence (zero “biases”) and binary variables, we believe that its predictions correctly capture the limitations of BP for inference and MAP estimation on arbitrary graphical models. Using this approach, we find that BP always performs better than MF. Especially for large networks with broad degree distributions (such as scale-free networks) BP turns out to significantly outperform MF. 1
5 0.91725314 92 nips-2004-Kernel Methods for Implicit Surface Modeling
Author: Joachim Giesen, Simon Spalinger, Bernhard Schölkopf
Abstract: We describe methods for computing an implicit model of a hypersurface that is given only by a finite sampling. The methods work by mapping the sample points into a reproducing kernel Hilbert space and then determining regions in terms of hyperplanes. 1
6 0.91679156 197 nips-2004-Two-Dimensional Linear Discriminant Analysis
7 0.90532029 183 nips-2004-Temporal-Difference Networks
8 0.83182025 9 nips-2004-A Method for Inferring Label Sampling Mechanisms in Semi-Supervised Learning
9 0.77481192 201 nips-2004-Using the Equivalent Kernel to Understand Gaussian Process Regression
10 0.76540089 148 nips-2004-Probabilistic Computation in Spiking Populations
11 0.753667 79 nips-2004-Hierarchical Eigensolver for Transition Matrices in Spectral Methods
12 0.7519294 168 nips-2004-Semigroup Kernels on Finite Sets
13 0.74354273 188 nips-2004-The Laplacian PDF Distance: A Cost Function for Clustering in a Kernel Feature Space
14 0.72012371 178 nips-2004-Support Vector Classification with Input Data Uncertainty
15 0.71932441 30 nips-2004-Binet-Cauchy Kernels
16 0.71745032 94 nips-2004-Kernels for Multi--task Learning
17 0.71738398 98 nips-2004-Learning Gaussian Process Kernels via Hierarchical Bayes
18 0.71714824 110 nips-2004-Matrix Exponential Gradient Updates for On-line Learning and Bregman Projection
19 0.70834905 18 nips-2004-Algebraic Set Kernels with Application to Inference Over Local Image Representations
20 0.7037164 55 nips-2004-Distributed Occlusion Reasoning for Tracking with Nonparametric Belief Propagation