nips nips2004 nips2004-197 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Jieping Ye, Ravi Janardan, Qi Li
Abstract: Linear Discriminant Analysis (LDA) is a well-known scheme for feature extraction and dimension reduction. It has been used widely in many applications involving high-dimensional data, such as face recognition and image retrieval. An intrinsic limitation of classical LDA is the so-called singularity problem, that is, it fails when all scatter matrices are singular. A well-known approach to deal with the singularity problem is to apply an intermediate dimension reduction stage using Principal Component Analysis (PCA) before LDA. The algorithm, called PCA+LDA, is used widely in face recognition. However, PCA+LDA has high costs in time and space, due to the need for an eigen-decomposition involving the scatter matrices. In this paper, we propose a novel LDA algorithm, namely 2DLDA, which stands for 2-Dimensional Linear Discriminant Analysis. 2DLDA overcomes the singularity problem implicitly, while achieving efficiency. The key difference between 2DLDA and classical LDA lies in the model for data representation. Classical LDA works with vectorized representations of data, while the 2DLDA algorithm works with data in matrix representation. To further reduce the dimension by 2DLDA, the combination of 2DLDA and classical LDA, namely 2DLDA+LDA, is studied, where LDA is preceded by 2DLDA. The proposed algorithms are applied on face recognition and compared with PCA+LDA. Experiments show that 2DLDA and 2DLDA+LDA achieve competitive recognition accuracy, while being much more efficient. 1
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract Linear Discriminant Analysis (LDA) is a well-known scheme for feature extraction and dimension reduction. [sent-7, score-0.069]
2 It has been used widely in many applications involving high-dimensional data, such as face recognition and image retrieval. [sent-8, score-0.205]
3 An intrinsic limitation of classical LDA is the so-called singularity problem, that is, it fails when all scatter matrices are singular. [sent-9, score-0.422]
4 A well-known approach to deal with the singularity problem is to apply an intermediate dimension reduction stage using Principal Component Analysis (PCA) before LDA. [sent-10, score-0.256]
5 The algorithm, called PCA+LDA, is used widely in face recognition. [sent-11, score-0.129]
6 However, PCA+LDA has high costs in time and space, due to the need for an eigen-decomposition involving the scatter matrices. [sent-12, score-0.106]
7 2DLDA overcomes the singularity problem implicitly, while achieving efficiency. [sent-14, score-0.096]
8 The key difference between 2DLDA and classical LDA lies in the model for data representation. [sent-15, score-0.147]
9 Classical LDA works with vectorized representations of data, while the 2DLDA algorithm works with data in matrix representation. [sent-16, score-0.094]
10 To further reduce the dimension by 2DLDA, the combination of 2DLDA and classical LDA, namely 2DLDA+LDA, is studied, where LDA is preceded by 2DLDA. [sent-17, score-0.226]
11 The proposed algorithms are applied on face recognition and compared with PCA+LDA. [sent-18, score-0.131]
12 1 Introduction Linear Discriminant Analysis [2, 4] is a well-known scheme for feature extraction and dimension reduction. [sent-20, score-0.069]
13 It has been used widely in many applications such as face recognition [1], image retrieval [6], microarray data classification [3], etc. [sent-21, score-0.189]
14 The optimal projection (transformation) can be readily computed by applying the eigendecomposition on the scatter matrices. [sent-23, score-0.145]
15 An intrinsic limitation of classical LDA is that its objective function requires the nonsingularity of one of the scatter matrices. [sent-24, score-0.235]
16 For many applications, such as face recognition, all scatter matrices in question can be singular since the data is from a very high-dimensional space, and in general, the dimension exceeds the number of data points. [sent-25, score-0.355]
17 This is known as the undersampled or singularity problem [5]. [sent-26, score-0.144]
18 Among these LDA extensions, PCA+LDA has received a lot of attention, especially for face recognition [1]. [sent-29, score-0.131]
19 In this two-stage algorithm, an intermediate dimension reduction stage using PCA is applied before LDA. [sent-30, score-0.16]
20 Unlike classical LDA, we consider the projection of the data onto a space which is the tensor product of two vector spaces. [sent-36, score-0.216]
21 We formulate our dimension reduction problem as an optimization problem in Section 3. [sent-37, score-0.108]
22 Unlike classical LDA, there is no closed form solution for the optimization problem; instead, we derive a heuristic, namely 2DLDA. [sent-38, score-0.175]
23 To further reduce the dimension, which is desirable for efficient querying, we consider the combination of 2DLDA and LDA, namely 2DLDA+LDA, where the dimension of the space transformed by 2DLDA is further reduced by LDA. [sent-39, score-0.205]
24 We perform experiments on three well-known face datasets to evaluate the effectiveness of 2DLDA and 2DLDA+LDA and compare with PCA+LDA, which is used widely in face recognition. [sent-40, score-0.269]
25 Our experiments show that: (1) 2DLDA is applicable to high-dimensional undersampled data such as face images, i. [sent-41, score-0.154]
26 , it implicitly avoids the singularity problem encountered in classical LDA; and (2) 2DLDA and 2DLDA+LDA have distinctly lower costs in time and space than PCA+LDA, and achieve classification accuracy that is competitive with PCA+LDA. [sent-43, score-0.364]
27 2 An overview of LDA In this section, we give a brief overview of classical LDA. [sent-44, score-0.154]
28 Given a data matrix A ∈ IRN ×n , classical LDA aims to find a transformation G ∈ IRN × that maps each column ai of A, for 1 ≤ i ≤ n, in the N -dimensional space to a vector bi in the -dimensional space. [sent-46, score-0.38]
29 Equivalently, classical LDA aims to find a vector space G spanned by {gi }i=1 , where G = [g1 , · · · , g ], T such that each ai is projected onto G by (g1 · ai , · · · , g T · ai )T ∈ IR . [sent-48, score-0.453]
30 Assume that the original data in A is partitioned into k classes as A = {Π1 , · · · , Πk }, where k Πi contains ni data points from the ith class, and i=1 ni = n. [sent-49, score-0.25]
31 Classical LDA aims to find the optimal transformation G such that the class structure of the original high-dimensional space is preserved in the low-dimensional space. [sent-50, score-0.134]
32 In the low-dimensional space resulting from the linear transformation G (or the linear projection onto the vector space G), L L the within-class and between-class matrices become Sb = GT Sb G, and Sw = GT Sw G. [sent-54, score-0.21]
33 L L An optimal transformation G would maximize trace(Sb ) and minimize trace(Sw ). [sent-55, score-0.056]
34 Common optimizations in classical discriminant analysis include (see [4]): L L max trace((Sw )−1 Sb ) G L L and min trace((Sb )−1 Sw ) . [sent-56, score-0.178]
35 Therefore, the reduced dimension by classical LDA is at most k − 1. [sent-61, score-0.237]
36 A stable way to compute the eigen-decomposition is to apply SVD on the scatter matrices. [sent-62, score-0.091]
37 Note that a limitation of classical LDA in many applications involving undersampled data, such as text documents and images, is that at least one of the scatter matrices is required to be nonsingular. [sent-64, score-0.376]
38 Several extensions, including pseudo-inverse LDA, regularized LDA, and PCA+LDA, were proposed in the past to deal with the singularity problem. [sent-65, score-0.096]
39 3 2-Dimensional LDA The key difference between classical LDA and the 2DLDA that we propose in this paper is in the representation of data. [sent-67, score-0.168]
40 While classical LDA uses the vectorized representation, 2DLDA works with data in matrix representation. [sent-68, score-0.201]
41 We will see later in this section that the matrix representation in 2DLDA leads to an eigendecomposition on matrices with much smaller sizes. [sent-69, score-0.168]
42 More specifically, 2DLDA involves the eigen-decomposition of matrices with sizes r×r and c×c, which are much smaller than the matrices in classical LDA. [sent-70, score-0.308]
43 Unlike classical LDA, 2DLDA considers the following ( 1 × 2 )-dimensional space L ⊗ R, 1 which is the tensor product of the following two spaces: L spanned by {ui }i=1 and r× 1 2 and R = R spanned by {vi }i=1 . [sent-72, score-0.208]
44 Define two matrices L = [u1 , · · · , u 1 ] ∈ IR c× 2 r×c . [sent-73, score-0.091]
45 Then the projection of X ∈ IR onto the space L ⊗ R is [v1 , · · · , v 2 ] ∈ IR LT XR ∈ R 1 × 2 . [sent-74, score-0.064]
46 Let Ai ∈ IRr×c , for i = 1, · · · , n, be the n images in the dataset, clustered into classes 1 Π1 , · · · , Πk , where Πi has ni images. [sent-75, score-0.153]
47 Let Mi = ni X∈Πi X be the mean of the ith 1 class, 1 ≤ i ≤ k, and M = n i=1 X∈Πi X be the global mean. [sent-76, score-0.134]
48 In 2DLDA, we consider images as two-dimensional signals and aim to find two transformation matrices L ∈ IRr× 1 and R ∈ IRc× 2 that map each Ai ∈ IRr×c , for 1 ≤ i ≤ n, to a matrix Bi ∈ IR 1 × 2 such that Bi = LT Ai R. [sent-77, score-0.189]
49 k Like classical LDA, 2DLDA aims to find the optimal transformations (projections) L and R such that the class structure of the original high-dimensional space is preserved in the low-dimensional space. [sent-78, score-0.24]
50 A natural similarity metric between matrices is the Frobenius norm [8]. [sent-79, score-0.091]
51 Under this metric, the (squared) within-class and between-class distances Dw and Db can be computed as follows: k ||X − Mi ||2 , F Dw = i=1 X∈Πi k Db = ni ||Mi − M ||2 . [sent-80, score-0.099]
52 F i=1 Using the property of the trace, that is, trace(M M T ) = ||M ||2 , for any matrix M , we can F rewrite Dw and Db as follows: k Dw = (X − Mi )(X − Mi )T trace , i=1 X∈Πi k Db = ni (Mi − M )(Mi − M )T trace . [sent-81, score-0.419]
53 i=1 In the low-dimensional space resulting from the linear transformations L and R, the withinclass and between-class distances become ˜ Dw k = LT (X − Mi )RRT (X − Mi )T L , trace i=1 X∈Πi ˜ Db k = ni LT (Mi − M )RRT (Mi − M )T L . [sent-82, score-0.283]
54 trace i=1 ˜ ˜ The optimal transformations L and R would maximize Db and minimize Dw . [sent-83, score-0.185]
55 More specifically, for a fixed R, we can compute the optimal L by solving an optimization problem similar to the one in Eq. [sent-85, score-0.055]
56 Computation of L ˜ ˜ For a fixed R, Dw and Db can be rewritten as ˜ w = trace LT S R L , Db = trace LT S R L , ˜ D w b Algorithm 2DLDA(A1 , · · · , An , 1 , 2 ) Input: A1 , · · · , An , 1 , 2 Output: L, R, B1 , · · · , Bn 1 1. [sent-91, score-0.313]
57 Compute the mean Mi of ith class for each i as Mi = ni X∈Πi X; k 1 2. [sent-92, score-0.154]
58 Sw ← i=1 X∈Πi (X − Mi )Rj−1 Rj−1 (X − Mi )T , k R T Sb ← i=1 ni (Mi − M )Rj−1 Rj−1 (Mi − M )T ; R −1 R Sb ; 6. [sent-96, score-0.099]
59 Sw ← i=1 X∈Πi (X − Mi )T Lj LT (X − Mi ), j k L Sb ← i=1 ni (Mi − M )T Lj LT (Mi − M ); j L −1 L Sb ; 9. [sent-99, score-0.099]
60 where k k R Sw = (X − Mi )RRT (X − Mi )T , R Sb = i=1 X∈Πi ni (Mi − M )RRT (Mi − M )T . [sent-106, score-0.099]
61 (1), the optimal L can be computed by solving R R the following optimization problem: maxL trace (LT Sw L)−1 (LT Sb L) . [sent-108, score-0.204]
62 The solution R R can be obtained by solving the following generalized eigenvalue problem: Sw x = λSb x. [sent-109, score-0.051]
63 Note that the size of the matrices Sw and Sb is r × r, which is much smaller than the size of the matrices Sw and Sb in classical LDA. [sent-111, score-0.308]
64 A key observation is that Dw and Db can be rewritten as L ˜ Dw = trace RT Sw R , L ˜ Db = trace RT Sb R , where k L Sw = k (X − Mi ) LL (X − Mi ), T i=1 X∈Πi T L Sb = ni (Mi − M )T LLT (Mi − M ). [sent-113, score-0.433]
65 i=1 This follows from the following property of trace, that is, trace(AB) = trace(BA), for any two matrices A and B. [sent-114, score-0.091]
66 Similarly, the optimal R can be computed by solving the following optimization problem: L L maxR trace (RT Sw R)−1 (RT Sb R) . [sent-115, score-0.204]
67 Note L L that the size of the matrices Sw and Sb is c × c, much smaller than Sw and Sb . [sent-118, score-0.091]
68 Since the number of rows (r)√ the number of columns (c) of an image Ai are generally and comparable, i. [sent-124, score-0.057]
69 The key to the low space complexity R R L L of the algorithm is that the matrices Sw , Sb , Sw , and Sb can be formed by reading the matrices A incrementally. [sent-130, score-0.239]
70 1 2DLDA+LDA As mentioned in the Introduction, PCA is commonly applied as an intermediate dimensionreduction stage before LDA to overcome the singularity problem of classical LDA. [sent-132, score-0.292]
71 In this section, we consider the combination of 2DLDA and LDA, namely 2DLDA+LDA, where the dimension by 2DLDA is further reduced by LDA, since small reduced dimension is desirable for efficient querying. [sent-133, score-0.27]
72 More specifically, in the first stage of 2DLDA+LDA, each image Ai ∈ IRr×c is reduced to Bi ∈ IRd×d by 2DLDA, with d < min(r, c). [sent-134, score-0.123]
73 In the second 2 stage, each Bi is first transformed to a vector bi ∈ IRd (matrix-to-vector alignment), then k−1 bi is further reduced to bL ∈ IR by LDA with k − 1 < d2 , where k is the number i of classes. [sent-135, score-0.218]
74 Here, “matrix-to-vector alignment” means that the matrix is transformed to a vector by concatenating all its rows together consecutively. [sent-136, score-0.074]
75 The time complexity of the first stage by 2DLDA is O(ndN I). [sent-137, score-0.081]
76 The second stage applies classical LDA to data in d2 -dimensional space, hence takes O(n(d2 )2 ), assuming n > d2 . [sent-138, score-0.172]
77 4 Experiments In this section, we experimentally evaluate the performance of 2DLDA and 2DLDA+LDA on face images and compare with PCA+LDA, used widely in face recognition. [sent-140, score-0.272]
78 Datasets: We use three face datasets in our study: PIX, ORL, and PIE, which are publicly available. [sent-145, score-0.14]
79 We subsample the images down to a size of 100 × 100 = 10000. [sent-151, score-0.061]
80 PIE is a subset of the CMU–PIE face image dataset (available at http://www. [sent-159, score-0.162]
81 We subsample the images down to a size of 220 × 175 = 38500. [sent-166, score-0.061]
82 8 2 4 6 8 10 12 14 Number of iterations 16 18 20 2 4 6 8 10 12 14 Number of iterations 16 18 20 Figure 1: Effect of the number of iterations on 2DLDA and 2DLDA+LDA using the three face datasets; PIX, ORL and PIE (from left to right). [sent-193, score-0.205]
83 It is clear that both accuracy curves are stable with respect to the number of iterations. [sent-197, score-0.071]
84 In general, the accuracy curves of 2DLDA+LDA are slightly more stable than those of 2DLDA. [sent-198, score-0.071]
85 The impact of the value of the reduced dimension d: In this experiment, we study the effect of the value of d on 2DLDA and 2DLDA+LDA, where the value of d determines the dimensionality in the transformed space by 2DLDA. [sent-202, score-0.157]
86 We did extensive experiments using different values of d on the face image datasets. [sent-203, score-0.141]
87 The results are summarized in Figure 2, where the x-axis denotes the values of d (between 1 and 15) and the y-axis denotes the classification accuracy with 1-Nearest-Neighbor as the classifier. [sent-204, score-0.055]
88 As shown in Figure 2, the accuracy curves on all datasets stabilize around d = 4 to 6. [sent-205, score-0.089]
89 Comparison on classification accuracy and efficiency: In this experiment, we evaluate the effectiveness of the proposed algorithms in terms of classification accuracy and efficiency and compare with PCA+LDA. [sent-206, score-0.11]
90 Hence 2DLDA+LDA is a more effective dimension reduction algorithm than PCA+LDA, as it is competitive to PCA+LDA in classification and has the same number of reduced dimensions in the transformed space, while it has much lower time and space costs. [sent-211, score-0.217]
91 5 Conclusions An efficient algorithm, namely 2DLDA, is presented for dimension reduction. [sent-212, score-0.1]
92 The key difference between 2DLDA and LDA is that 2DLDA works on the matrix representation of images directly, while LDA uses a vector representation. [sent-214, score-0.12]
93 2DLDA has asymptotically minimum memory requirements, and lower time complexity than LDA, which is desirable for large face datasets, while it implicitly avoids the singularity problem encountered in classical LDA. [sent-215, score-0.412]
94 We also study the combination of 2DLDA and LDA, namely 2DLDA+LDA, where the dimension by 2DLDA is further reduced by LDA. [sent-216, score-0.142]
95 Experiments show that 2DLDA and 2DLDA+LDA are competitive with PCA+LDA, in terms of classification accuracy, while they have significantly lower time and space costs. [sent-217, score-0.055]
96 6 0 2 4 6 8 10 Value of d 12 14 2 4 6 8 10 Value of d 12 14 Figure 2: Effect of the value of the reduced dimension d on 2DLDA and 2DLDA+LDA using the three face datasets; PIX, ORL and PIE (from left to right). [sent-242, score-0.217]
97 19 100% 157 Table 2: Comparison on classification accuracy and efficiency: “—” means that PCA+LDA is not applicable for PIE, due to its large size. [sent-256, score-0.055]
98 Note that PCA+LDA involves an eigendecomposition of the scatter matrices, which requires the whole data matrix to reside in main memory. [sent-257, score-0.131]
99 Two-dimensional PCA: a new approach to appearance-based face representation and recognition. [sent-310, score-0.127]
100 GPCA: An efficient dimension reduction scheme for image compression and retrieval. [sent-320, score-0.125]
wordName wordTfidf (topN-words)
[('lda', 0.698), ('sw', 0.331), ('sb', 0.321), ('mi', 0.193), ('pca', 0.15), ('trace', 0.149), ('classical', 0.126), ('pie', 0.121), ('db', 0.112), ('face', 0.106), ('ni', 0.099), ('singularity', 0.096), ('lt', 0.091), ('matrices', 0.091), ('dw', 0.09), ('ai', 0.079), ('scatter', 0.075), ('bi', 0.073), ('dimension', 0.069), ('orl', 0.066), ('pix', 0.066), ('ir', 0.062), ('irr', 0.06), ('rrt', 0.06), ('accuracy', 0.055), ('janardan', 0.053), ('discriminant', 0.052), ('rj', 0.05), ('undersampled', 0.048), ('stage', 0.046), ('reduced', 0.042), ('ye', 0.039), ('nonsingular', 0.039), ('transformation', 0.039), ('images', 0.037), ('army', 0.036), ('image', 0.035), ('ith', 0.035), ('datasets', 0.034), ('vectorized', 0.034), ('irn', 0.034), ('eigendecomposition', 0.034), ('iterations', 0.033), ('lj', 0.032), ('namely', 0.031), ('transformed', 0.03), ('ndn', 0.03), ('onto', 0.029), ('gt', 0.029), ('classi', 0.028), ('sec', 0.028), ('rt', 0.026), ('tensor', 0.026), ('jieping', 0.026), ('minnesota', 0.026), ('aims', 0.025), ('recognition', 0.025), ('competitive', 0.024), ('subsample', 0.024), ('cse', 0.024), ('intermediate', 0.024), ('widely', 0.023), ('matrix', 0.022), ('bn', 0.022), ('ird', 0.022), ('rows', 0.022), ('dataset', 0.021), ('key', 0.021), ('ciency', 0.021), ('reduction', 0.021), ('representation', 0.021), ('spanned', 0.02), ('limitation', 0.02), ('solving', 0.02), ('complexity', 0.02), ('class', 0.02), ('projection', 0.019), ('transformations', 0.019), ('cation', 0.019), ('works', 0.019), ('optimization', 0.018), ('optimal', 0.017), ('desirable', 0.017), ('preserved', 0.017), ('eigenvectors', 0.017), ('classes', 0.017), ('eigenvalue', 0.016), ('involving', 0.016), ('implicitly', 0.016), ('svd', 0.016), ('avoids', 0.016), ('stable', 0.016), ('space', 0.016), ('generalized', 0.015), ('rewritten', 0.015), ('extensions', 0.015), ('time', 0.015), ('overview', 0.014), ('intrinsic', 0.014), ('singular', 0.014)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000001 197 nips-2004-Two-Dimensional Linear Discriminant Analysis
Author: Jieping Ye, Ravi Janardan, Qi Li
Abstract: Linear Discriminant Analysis (LDA) is a well-known scheme for feature extraction and dimension reduction. It has been used widely in many applications involving high-dimensional data, such as face recognition and image retrieval. An intrinsic limitation of classical LDA is the so-called singularity problem, that is, it fails when all scatter matrices are singular. A well-known approach to deal with the singularity problem is to apply an intermediate dimension reduction stage using Principal Component Analysis (PCA) before LDA. The algorithm, called PCA+LDA, is used widely in face recognition. However, PCA+LDA has high costs in time and space, due to the need for an eigen-decomposition involving the scatter matrices. In this paper, we propose a novel LDA algorithm, namely 2DLDA, which stands for 2-Dimensional Linear Discriminant Analysis. 2DLDA overcomes the singularity problem implicitly, while achieving efficiency. The key difference between 2DLDA and classical LDA lies in the model for data representation. Classical LDA works with vectorized representations of data, while the 2DLDA algorithm works with data in matrix representation. To further reduce the dimension by 2DLDA, the combination of 2DLDA and classical LDA, namely 2DLDA+LDA, is studied, where LDA is preceded by 2DLDA. The proposed algorithms are applied on face recognition and compared with PCA+LDA. Experiments show that 2DLDA and 2DLDA+LDA achieve competitive recognition accuracy, while being much more efficient. 1
2 0.38620266 59 nips-2004-Efficient Kernel Discriminant Analysis via QR Decomposition
Author: Tao Xiong, Jieping Ye, Qi Li, Ravi Janardan, Vladimir Cherkassky
Abstract: Linear Discriminant Analysis (LDA) is a well-known method for feature extraction and dimension reduction. It has been used widely in many applications such as face recognition. Recently, a novel LDA algorithm based on QR Decomposition, namely LDA/QR, has been proposed, which is competitive in terms of classification accuracy with other LDA algorithms, but it has much lower costs in time and space. However, LDA/QR is based on linear projection, which may not be suitable for data with nonlinear structure. This paper first proposes an algorithm called KDA/QR, which extends the LDA/QR algorithm to deal with nonlinear data by using the kernel operator. Then an efficient approximation of KDA/QR called AKDA/QR is proposed. Experiments on face image data show that the classification accuracy of both KDA/QR and AKDA/QR are competitive with Generalized Discriminant Analysis (GDA), a general kernel discriminant analysis algorithm, while AKDA/QR has much lower time and space costs. 1
3 0.17698289 127 nips-2004-Neighbourhood Components Analysis
Author: Jacob Goldberger, Geoffrey E. Hinton, Sam T. Roweis, Ruslan Salakhutdinov
Abstract: In this paper we propose a novel method for learning a Mahalanobis distance measure to be used in the KNN classification algorithm. The algorithm directly maximizes a stochastic variant of the leave-one-out KNN score on the training set. It can also learn a low-dimensional linear embedding of labeled data that can be used for data visualization and fast classification. Unlike other methods, our classification model is non-parametric, making no assumptions about the shape of the class distributions or the boundaries between them. The performance of the method is demonstrated on several data sets, both for metric learning and linear dimensionality reduction. 1
4 0.11249659 13 nips-2004-A Three Tiered Approach for Articulated Object Action Modeling and Recognition
Author: Le Lu, Gregory D. Hager, Laurent Younes
Abstract: Visual action recognition is an important problem in computer vision. In this paper, we propose a new method to probabilistically model and recognize actions of articulated objects, such as hand or body gestures, in image sequences. Our method consists of three levels of representation. At the low level, we first extract a feature vector invariant to scale and in-plane rotation by using the Fourier transform of a circular spatial histogram. Then, spectral partitioning [20] is utilized to obtain an initial clustering; this clustering is then refined using a temporal smoothness constraint. Gaussian mixture model (GMM) based clustering and density estimation in the subspace of linear discriminant analysis (LDA) are then applied to thousands of image feature vectors to obtain an intermediate level representation. Finally, at the high level we build a temporal multiresolution histogram model for each action by aggregating the clustering weights of sampled images belonging to that action. We discuss how this high level representation can be extended to achieve temporal scaling invariance and to include Bi-gram or Multi-gram transition information. Both image clustering and action recognition/segmentation results are given to show the validity of our three tiered representation.
5 0.10558408 87 nips-2004-Integrating Topics and Syntax
Author: Thomas L. Griffiths, Mark Steyvers, David M. Blei, Joshua B. Tenenbaum
Abstract: Statistical approaches to language learning typically focus on either short-range syntactic dependencies or long-range semantic dependencies between words. We present a generative model that uses both kinds of dependencies, and can be used to simultaneously find syntactic classes and semantic topics despite having no representation of syntax or semantics beyond statistical dependency. This model is competitive on tasks like part-of-speech tagging and document classification with models that exclusively use short- and long-range dependencies respectively. 1
6 0.10435312 142 nips-2004-Outlier Detection with One-class Kernel Fisher Discriminants
7 0.072049066 163 nips-2004-Semi-parametric Exponential Family PCA
8 0.06897556 124 nips-2004-Multiple Alignment of Continuous Time Series
9 0.067214727 90 nips-2004-Joint Probabilistic Curve Clustering and Alignment
10 0.064488895 169 nips-2004-Sharing Clusters among Related Groups: Hierarchical Dirichlet Processes
11 0.063664995 18 nips-2004-Algebraic Set Kernels with Application to Inference Over Local Image Representations
12 0.061768282 114 nips-2004-Maximum Likelihood Estimation of Intrinsic Dimension
13 0.059833415 113 nips-2004-Maximum-Margin Matrix Factorization
14 0.059295166 161 nips-2004-Self-Tuning Spectral Clustering
15 0.057595823 68 nips-2004-Face Detection --- Efficient and Rank Deficient
16 0.056673773 145 nips-2004-Parametric Embedding for Class Visualization
17 0.055983983 66 nips-2004-Exponential Family Harmoniums with an Application to Information Retrieval
18 0.052840512 106 nips-2004-Machine Learning Applied to Perception: Decision Images for Gender Classification
19 0.05251161 178 nips-2004-Support Vector Classification with Input Data Uncertainty
20 0.04584942 2 nips-2004-A Direct Formulation for Sparse PCA Using Semidefinite Programming
topicId topicWeight
[(0, -0.15), (1, 0.058), (2, -0.04), (3, -0.09), (4, 0.004), (5, 0.009), (6, -0.157), (7, -0.091), (8, 0.229), (9, 0.131), (10, -0.019), (11, -0.131), (12, -0.156), (13, 0.121), (14, 0.091), (15, -0.155), (16, -0.264), (17, 0.027), (18, 0.203), (19, 0.161), (20, 0.13), (21, -0.129), (22, 0.136), (23, 0.183), (24, 0.198), (25, 0.117), (26, -0.171), (27, -0.022), (28, -0.013), (29, 0.112), (30, 0.029), (31, -0.03), (32, 0.016), (33, 0.026), (34, 0.048), (35, 0.113), (36, 0.124), (37, -0.138), (38, -0.062), (39, 0.05), (40, -0.037), (41, 0.115), (42, 0.017), (43, -0.04), (44, -0.01), (45, 0.024), (46, -0.016), (47, -0.022), (48, -0.051), (49, -0.039)]
simIndex simValue paperId paperTitle
same-paper 1 0.98052019 197 nips-2004-Two-Dimensional Linear Discriminant Analysis
Author: Jieping Ye, Ravi Janardan, Qi Li
Abstract: Linear Discriminant Analysis (LDA) is a well-known scheme for feature extraction and dimension reduction. It has been used widely in many applications involving high-dimensional data, such as face recognition and image retrieval. An intrinsic limitation of classical LDA is the so-called singularity problem, that is, it fails when all scatter matrices are singular. A well-known approach to deal with the singularity problem is to apply an intermediate dimension reduction stage using Principal Component Analysis (PCA) before LDA. The algorithm, called PCA+LDA, is used widely in face recognition. However, PCA+LDA has high costs in time and space, due to the need for an eigen-decomposition involving the scatter matrices. In this paper, we propose a novel LDA algorithm, namely 2DLDA, which stands for 2-Dimensional Linear Discriminant Analysis. 2DLDA overcomes the singularity problem implicitly, while achieving efficiency. The key difference between 2DLDA and classical LDA lies in the model for data representation. Classical LDA works with vectorized representations of data, while the 2DLDA algorithm works with data in matrix representation. To further reduce the dimension by 2DLDA, the combination of 2DLDA and classical LDA, namely 2DLDA+LDA, is studied, where LDA is preceded by 2DLDA. The proposed algorithms are applied on face recognition and compared with PCA+LDA. Experiments show that 2DLDA and 2DLDA+LDA achieve competitive recognition accuracy, while being much more efficient. 1
2 0.79640311 59 nips-2004-Efficient Kernel Discriminant Analysis via QR Decomposition
Author: Tao Xiong, Jieping Ye, Qi Li, Ravi Janardan, Vladimir Cherkassky
Abstract: Linear Discriminant Analysis (LDA) is a well-known method for feature extraction and dimension reduction. It has been used widely in many applications such as face recognition. Recently, a novel LDA algorithm based on QR Decomposition, namely LDA/QR, has been proposed, which is competitive in terms of classification accuracy with other LDA algorithms, but it has much lower costs in time and space. However, LDA/QR is based on linear projection, which may not be suitable for data with nonlinear structure. This paper first proposes an algorithm called KDA/QR, which extends the LDA/QR algorithm to deal with nonlinear data by using the kernel operator. Then an efficient approximation of KDA/QR called AKDA/QR is proposed. Experiments on face image data show that the classification accuracy of both KDA/QR and AKDA/QR are competitive with Generalized Discriminant Analysis (GDA), a general kernel discriminant analysis algorithm, while AKDA/QR has much lower time and space costs. 1
3 0.56599921 127 nips-2004-Neighbourhood Components Analysis
Author: Jacob Goldberger, Geoffrey E. Hinton, Sam T. Roweis, Ruslan Salakhutdinov
Abstract: In this paper we propose a novel method for learning a Mahalanobis distance measure to be used in the KNN classification algorithm. The algorithm directly maximizes a stochastic variant of the leave-one-out KNN score on the training set. It can also learn a low-dimensional linear embedding of labeled data that can be used for data visualization and fast classification. Unlike other methods, our classification model is non-parametric, making no assumptions about the shape of the class distributions or the boundaries between them. The performance of the method is demonstrated on several data sets, both for metric learning and linear dimensionality reduction. 1
4 0.30694097 142 nips-2004-Outlier Detection with One-class Kernel Fisher Discriminants
Author: Volker Roth
Abstract: The problem of detecting “atypical objects” or “outliers” is one of the classical topics in (robust) statistics. Recently, it has been proposed to address this problem by means of one-class SVM classifiers. The main conceptual shortcoming of most one-class approaches, however, is that in a strict sense they are unable to detect outliers, since the expected fraction of outliers has to be specified in advance. The method presented in this paper overcomes this problem by relating kernelized one-class classification to Gaussian density estimation in the induced feature space. Having established this relation, it is possible to identify “atypical objects” by quantifying their deviations from the Gaussian model. For RBF kernels it is shown that the Gaussian model is “rich enough” in the sense that it asymptotically provides an unbiased estimator for the true density. In order to overcome the inherent model selection problem, a cross-validated likelihood criterion for selecting all free model parameters is applied. 1
5 0.27701658 87 nips-2004-Integrating Topics and Syntax
Author: Thomas L. Griffiths, Mark Steyvers, David M. Blei, Joshua B. Tenenbaum
Abstract: Statistical approaches to language learning typically focus on either short-range syntactic dependencies or long-range semantic dependencies between words. We present a generative model that uses both kinds of dependencies, and can be used to simultaneously find syntactic classes and semantic topics despite having no representation of syntax or semantics beyond statistical dependency. This model is competitive on tasks like part-of-speech tagging and document classification with models that exclusively use short- and long-range dependencies respectively. 1
6 0.25973484 68 nips-2004-Face Detection --- Efficient and Rank Deficient
7 0.24803983 66 nips-2004-Exponential Family Harmoniums with an Application to Information Retrieval
8 0.24508262 106 nips-2004-Machine Learning Applied to Perception: Decision Images for Gender Classification
9 0.24273926 205 nips-2004-Who's In the Picture
10 0.24044526 13 nips-2004-A Three Tiered Approach for Articulated Object Action Modeling and Recognition
11 0.23025575 163 nips-2004-Semi-parametric Exponential Family PCA
12 0.22487494 18 nips-2004-Algebraic Set Kernels with Application to Inference Over Local Image Representations
13 0.21054992 182 nips-2004-Synergistic Face Detection and Pose Estimation with Energy-Based Models
14 0.19509065 93 nips-2004-Kernel Projection Machine: a New Tool for Pattern Recognition
15 0.19017102 169 nips-2004-Sharing Clusters among Related Groups: Hierarchical Dirichlet Processes
16 0.18803637 143 nips-2004-PAC-Bayes Learning of Conjunctions and Classification of Gene-Expression Data
17 0.18081386 79 nips-2004-Hierarchical Eigensolver for Transition Matrices in Spectral Methods
18 0.17929739 90 nips-2004-Joint Probabilistic Curve Clustering and Alignment
19 0.17512134 94 nips-2004-Kernels for Multi--task Learning
20 0.1746614 2 nips-2004-A Direct Formulation for Sparse PCA Using Semidefinite Programming
topicId topicWeight
[(5, 0.098), (13, 0.092), (15, 0.427), (26, 0.064), (31, 0.017), (33, 0.083), (35, 0.017), (39, 0.053), (50, 0.015)]
simIndex simValue paperId paperTitle
1 0.95518845 59 nips-2004-Efficient Kernel Discriminant Analysis via QR Decomposition
Author: Tao Xiong, Jieping Ye, Qi Li, Ravi Janardan, Vladimir Cherkassky
Abstract: Linear Discriminant Analysis (LDA) is a well-known method for feature extraction and dimension reduction. It has been used widely in many applications such as face recognition. Recently, a novel LDA algorithm based on QR Decomposition, namely LDA/QR, has been proposed, which is competitive in terms of classification accuracy with other LDA algorithms, but it has much lower costs in time and space. However, LDA/QR is based on linear projection, which may not be suitable for data with nonlinear structure. This paper first proposes an algorithm called KDA/QR, which extends the LDA/QR algorithm to deal with nonlinear data by using the kernel operator. Then an efficient approximation of KDA/QR called AKDA/QR is proposed. Experiments on face image data show that the classification accuracy of both KDA/QR and AKDA/QR are competitive with Generalized Discriminant Analysis (GDA), a general kernel discriminant analysis algorithm, while AKDA/QR has much lower time and space costs. 1
2 0.95350277 50 nips-2004-Dependent Gaussian Processes
Author: Phillip Boyle, Marcus Frean
Abstract: Gaussian processes are usually parameterised in terms of their covariance functions. However, this makes it difficult to deal with multiple outputs, because ensuring that the covariance matrix is positive definite is problematic. An alternative formulation is to treat Gaussian processes as white noise sources convolved with smoothing kernels, and to parameterise the kernel instead. Using this, we extend Gaussian processes to handle multiple, coupled outputs. 1
same-paper 3 0.95069504 197 nips-2004-Two-Dimensional Linear Discriminant Analysis
Author: Jieping Ye, Ravi Janardan, Qi Li
Abstract: Linear Discriminant Analysis (LDA) is a well-known scheme for feature extraction and dimension reduction. It has been used widely in many applications involving high-dimensional data, such as face recognition and image retrieval. An intrinsic limitation of classical LDA is the so-called singularity problem, that is, it fails when all scatter matrices are singular. A well-known approach to deal with the singularity problem is to apply an intermediate dimension reduction stage using Principal Component Analysis (PCA) before LDA. The algorithm, called PCA+LDA, is used widely in face recognition. However, PCA+LDA has high costs in time and space, due to the need for an eigen-decomposition involving the scatter matrices. In this paper, we propose a novel LDA algorithm, namely 2DLDA, which stands for 2-Dimensional Linear Discriminant Analysis. 2DLDA overcomes the singularity problem implicitly, while achieving efficiency. The key difference between 2DLDA and classical LDA lies in the model for data representation. Classical LDA works with vectorized representations of data, while the 2DLDA algorithm works with data in matrix representation. To further reduce the dimension by 2DLDA, the combination of 2DLDA and classical LDA, namely 2DLDA+LDA, is studied, where LDA is preceded by 2DLDA. The proposed algorithms are applied on face recognition and compared with PCA+LDA. Experiments show that 2DLDA and 2DLDA+LDA achieve competitive recognition accuracy, while being much more efficient. 1
4 0.94774318 13 nips-2004-A Three Tiered Approach for Articulated Object Action Modeling and Recognition
Author: Le Lu, Gregory D. Hager, Laurent Younes
Abstract: Visual action recognition is an important problem in computer vision. In this paper, we propose a new method to probabilistically model and recognize actions of articulated objects, such as hand or body gestures, in image sequences. Our method consists of three levels of representation. At the low level, we first extract a feature vector invariant to scale and in-plane rotation by using the Fourier transform of a circular spatial histogram. Then, spectral partitioning [20] is utilized to obtain an initial clustering; this clustering is then refined using a temporal smoothness constraint. Gaussian mixture model (GMM) based clustering and density estimation in the subspace of linear discriminant analysis (LDA) are then applied to thousands of image feature vectors to obtain an intermediate level representation. Finally, at the high level we build a temporal multiresolution histogram model for each action by aggregating the clustering weights of sampled images belonging to that action. We discuss how this high level representation can be extended to achieve temporal scaling invariance and to include Bi-gram or Multi-gram transition information. Both image clustering and action recognition/segmentation results are given to show the validity of our three tiered representation.
5 0.92737496 203 nips-2004-Validity Estimates for Loopy Belief Propagation on Binary Real-world Networks
Author: Joris M. Mooij, Hilbert J. Kappen
Abstract: We introduce a computationally efficient method to estimate the validity of the BP method as a function of graph topology, the connectivity strength, frustration and network size. We present numerical results that demonstrate the correctness of our estimates for the uniform random model and for a real-world network (“C. Elegans”). Although the method is restricted to pair-wise interactions, no local evidence (zero “biases”) and binary variables, we believe that its predictions correctly capture the limitations of BP for inference and MAP estimation on arbitrary graphical models. Using this approach, we find that BP always performs better than MF. Especially for large networks with broad degree distributions (such as scale-free networks) BP turns out to significantly outperform MF. 1
6 0.92101514 92 nips-2004-Kernel Methods for Implicit Surface Modeling
7 0.91061378 183 nips-2004-Temporal-Difference Networks
8 0.8619951 9 nips-2004-A Method for Inferring Label Sampling Mechanisms in Semi-Supervised Learning
9 0.82991385 201 nips-2004-Using the Equivalent Kernel to Understand Gaussian Process Regression
10 0.81423891 148 nips-2004-Probabilistic Computation in Spiking Populations
11 0.80786753 79 nips-2004-Hierarchical Eigensolver for Transition Matrices in Spectral Methods
12 0.80028349 168 nips-2004-Semigroup Kernels on Finite Sets
13 0.78235805 178 nips-2004-Support Vector Classification with Input Data Uncertainty
14 0.78084677 188 nips-2004-The Laplacian PDF Distance: A Cost Function for Clustering in a Kernel Feature Space
15 0.7778039 110 nips-2004-Matrix Exponential Gradient Updates for On-line Learning and Bregman Projection
16 0.77351362 94 nips-2004-Kernels for Multi--task Learning
17 0.76681286 30 nips-2004-Binet-Cauchy Kernels
18 0.76485759 18 nips-2004-Algebraic Set Kernels with Application to Inference Over Local Image Representations
19 0.76254809 98 nips-2004-Learning Gaussian Process Kernels via Hierarchical Bayes
20 0.76031083 68 nips-2004-Face Detection --- Efficient and Rank Deficient