nips nips2008 nips2008-200 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Minh H. Nguyen, Fernando Torre
Abstract: Kernel Principal Component Analysis (KPCA) is a popular generalization of linear PCA that allows non-linear feature extraction. In KPCA, data in the input space is mapped to higher (usually) dimensional feature space where the data can be linearly modeled. The feature space is typically induced implicitly by a kernel function, and linear PCA in the feature space is performed via the kernel trick. However, due to the implicitness of the feature space, some extensions of PCA such as robust PCA cannot be directly generalized to KPCA. This paper presents a technique to overcome this problem, and extends it to a unified framework for treating noise, missing data, and outliers in KPCA. Our method is based on a novel cost function to perform inference in KPCA. Extensive experiments, in both synthetic and real data, show that our algorithm outperforms existing methods. 1
Reference: text
sentIndex sentText sentNum sentScore
1 The feature space is typically induced implicitly by a kernel function, and linear PCA in the feature space is performed via the kernel trick. [sent-4, score-0.354]
2 However, due to the implicitness of the feature space, some extensions of PCA such as robust PCA cannot be directly generalized to KPCA. [sent-5, score-0.219]
3 This paper presents a technique to overcome this problem, and extends it to a unified framework for treating noise, missing data, and outliers in KPCA. [sent-6, score-0.507]
4 Unfortunately, realistic visual data is often corrupted by undesirable artifacts due to occlusion (e. [sent-13, score-0.228]
5 Therefore, robustness to noise, missing data, and outliers is a desired property to have for algorithms in computer vision. [sent-30, score-0.55]
6 Input Space Feature Space Figure 1: Several types of data corruption and Figure 2: Using KPCA principal subspace to results of our method. [sent-32, score-0.231]
7 ruption by additive Gaussian noise, c) missing data, d) hand occlusion, e) specular reflection. [sent-34, score-0.335]
8 Throughout the years, several extensions of PCA have been proposed to address the problems of outliers and missing data, see [6] for a review. [sent-36, score-0.507]
9 However, it still remains unclear how to generalize those extensions to KPCA; since directly migrating robust PCA techniques to KPCA is not possible 1 due to the implicitness of the feature space. [sent-37, score-0.219]
10 To overcome this problem, in this paper, we propose Robust KPCA (RKPCA), a unified framework for denoising images, recovering missing data, and handling intra-sample outliers. [sent-38, score-0.43]
11 Robust computation in RKPCA does not suffer from the implicitness of the feature space because of a novel cost function for reconstructing “clean” images from corrupted data. [sent-39, score-0.299]
12 The proposed cost function is composed of two terms, requiring the reconstructed image to be close to the KPCA principal subspace as well as to the input sample. [sent-40, score-0.471]
13 We show that robustness can be naturally achieved by using robust functions to measure the closeness between the reconstructed and the input data. [sent-41, score-0.282]
14 1 KPCA and pre-image KPCA [19, 18, 20] is a non-linear extension of principal component analysis (PCA) using kernel methods. [sent-43, score-0.293]
15 The kernel represents an implicit mapping of the data to a (usually) higher dimensional space where linear PCA is performed. [sent-44, score-0.131]
16 The mapping function ϕ : X → H is implicitly induced by a kernel function k : X × X → ℜ that defines the similarity between data in the input space. [sent-46, score-0.128]
17 One can show that if k(·, ·) is a kernel then the function ϕ(·) and the feature space H exist; furthermore k(x, y) = ϕ(x), ϕ(y) [18]. [sent-47, score-0.177]
18 However, directly performing linear PCA in the feature space might not be feasible because the feature space typically has very high dimensionality (including infinity). [sent-48, score-0.146]
19 Let k(·, ·) denote a kernel function, and K denote the kernel matrix (element ij of K is kij = k(di , dj )). [sent-54, score-0.278]
20 KPCA is computed in closed form by finding first m eigenvectors (ai ’s) corresponding to the largest eigenvalues (λi ’s) of the kernel matrix K (i. [sent-55, score-0.129]
21 Assume x is a data point in the input space, and let Pϕ(x) denote the projection of ϕ(x) onto the principal subspace {vi }m . [sent-64, score-0.255]
22 Because {vi }m is a set of orthonormal vectors, we have Pϕ(x) = 1 1 m i=1 ϕ(x), vi vi . [sent-65, score-0.132]
23 The reconstruction error (in feature space) is given by: Eproj (x) = ||ϕ(x) − Pϕ(x)||2 = ϕ(x), ϕ(x) − 2 where r(x) = ΓT ϕ(x), and M = ai aT . [sent-66, score-0.207]
24 i ϕ(x), vi 2 = k(x, x) − r(x)T Mr(x), (1) The pre-image of the projection is the z ∈ X that satisfies ϕ(z) = Pϕ(x); z is also referred to as the KPCA reconstruction of x. [sent-67, score-0.201]
25 However, the pre-image of Pϕ(x) usually does not exist, so finding the KPCA reconstruction of x means finding z such that ϕ(z) is as close to Pϕ(x) as possible. [sent-68, score-0.158]
26 Sch¨ lkopf et al [17] and Mika et o al [13] propose to approximate the reconstruction of x by arg minz ||ϕ(z) − Pϕ(x)||2 . [sent-70, score-0.597]
27 Two other 2 objective functions have been proposed by Kwok & Tsang [10] and Bakir et al [2]. [sent-71, score-0.179]
28 2 KPCA-based algorithms for dealing with noise, outliers and missing data Over the years, several methods extending KPCA algorithms to deal with noise, outliers, or missing data have been proposed. [sent-73, score-0.839]
29 Mika et al [13], Kwok & Tsang [10], and Bakir et al [2] show how denoising can be achieved by using the pre-image. [sent-74, score-0.449]
30 Firstly, because the input image x is noisy, the similarity measurement between x and other data point di (i. [sent-76, score-0.172]
31 k(x, di ) the kernel) might be adversely affected, biasing the KPCA reconstruction of x. [sent-78, score-0.237]
32 In (b), we seek z such that ϕ(z) is close to both ϕ(x) and the principal subspace. [sent-93, score-0.178]
33 current KPCA reconstruction methods equally weigh all the features (i. [sent-94, score-0.166]
34 Some [7, 22, 1] only consider robustness of the principal subspace; they do not address robust fitting. [sent-98, score-0.293]
35 Lu et al [12] present an iterative approach to handle outliers in training data. [sent-99, score-0.462]
36 At each iteration, the KPCA model is built, and the data points that have the highest reconstruction errors are regarded as outliers and discarded from the training set. [sent-100, score-0.376]
37 However, this approach does not handle intra-sample outliers (outliers that occur at a pixel level [6]). [sent-101, score-0.284]
38 Several other approaches also considering Berar et al [3] propose to use KPCA with polynomial kernels to handle missing data. [sent-102, score-0.543]
39 Sanguinetti & Lawrence [16] propose an elegant framework to handle missing data. [sent-105, score-0.338]
40 This paper presents a novel cost function that unifies the treatment of noise, missing data and outliers in KPCA. [sent-108, score-0.542]
41 1 KPCA reconstruction revisited Given an image x ∈ X , Fig. [sent-111, score-0.212]
42 Mathematically, the task is to find a point z ∈ X such that ϕ(z) is in the principal subspace (denote PS) and ϕ(z) is as close to ϕ(x) as possible. [sent-113, score-0.254]
43 In other words, finding the KPCA reconstruction of x is to optimize: arg min ||ϕ(z) − ϕ(x)||2 s. [sent-114, score-0.158]
44 There is a common relaxation approach used by existing methods for computing the KPCA reconstruction of x. [sent-118, score-0.159]
45 This approach conceptually involves two steps:(i) finding Pϕ(x) which is the closest point to ϕ(x) among all the points in the principal subspace, (ii) finding z such that ϕ(z) is as close to Pϕ(x) as possible. [sent-119, score-0.178]
46 If L2 norm is used to measure the closeness between ϕ(z) and Pϕ(x), the resulting KPCA reconstruction is arg minz ||ϕ(z) − Pϕ(x)||2 . [sent-122, score-0.226]
47 2 This approach for KPCA reconstruction is not robust. [sent-123, score-0.135]
48 For example, if x is corrupted with intrasample outliers (e. [sent-124, score-0.32]
49 The KPCA reconstruction of x is taken as: arg min ||ϕ(x) − ϕ(z)||2 + C ||ϕ(z) − Pϕ(z)||2 . [sent-130, score-0.158]
50 2 2 z Eproj (z) 3 (3) Algorithm 1 RKPCA for missing attribute values in training data Input: training data D, number of iterations m, number of partitions k. [sent-131, score-0.397]
51 Initialize: missing values by the means of known values. [sent-132, score-0.296]
52 , Dk for i = 1 to k do Train RKPCA using data D \ Di Run RKPCA fitting for Di with known missing attributes. [sent-136, score-0.296]
53 end for Update missing values of D end for Intuitively, the above cost function requires the KPCA reconstruction of x is a point z that ϕ(z) is close to both ϕ(x) and the principal subspace. [sent-137, score-0.644]
54 In short, the KPCA reconstruction of x can be taken as: arg min E0 (x, z) + CEproj (z) . [sent-146, score-0.158]
55 (4) z By choosing appropriate forms for E0 , one can use KPCA to handle noise, missing data, and intrasample outliers. [sent-147, score-0.377]
56 2 Dealing with missing data in testing samples Assume the KPCA has been learned from complete and noise free data. [sent-150, score-0.381]
57 3 Dealing with intra-sample outliers in testing samples To handle intra-sample outliers, we could use a robust function for E0 . [sent-153, score-0.397]
58 4 Dealing with missing data and intra-sample outliers in training data Previous sections have shown how to deal with outliers and missing data in the testing set (assuming KPCA has been learned from a clean training set). [sent-157, score-1.178]
59 If we have missing data in the training samples [6], a simple approach is to iteratively alternate between estimating the missing values and updating the KPCA principal subspace until convergence. [sent-158, score-0.853]
60 An algorithm for handling intra-sample outliers in training data could be constructed similarly. [sent-160, score-0.284]
61 Alternatively, a kernel matrix could be computed ignoring the missing values, that is, each kij = 1 exp(−γ2 ||Wi Wj (xi − xj )||2 ), where γ2 = trace(Wi Wj ) . [sent-161, score-0.423]
62 Note that optimizing this function is not harder than optimizing the objective function used by Mika et al [13]. [sent-171, score-0.179]
63 In the case of missing data (some entries in the diagonal of W, and therefore W2 , will be zero), missing components of x would not affect the computation of u and z. [sent-187, score-0.632]
64 Entries corresponding to the missing components of the resulting z will be pixel-weighted combinations of the training data. [sent-188, score-0.326]
65 Similar to the observation of Mika et al [13], the second term of vector u pulls z towards a single Gaussian cluster. [sent-190, score-0.179]
66 The attraction force generated by a training data point di reflects the correlation between ϕ(z) and ϕ(di ), the correlation between ϕ(z) and eigenvectors vj ’s, and the contributions of ϕ(di ) to the eigenvectors. [sent-191, score-0.157]
67 1 RKPCA for intra-sample outliers In this section, we compare RKPCA with three approaches for handling intra-sample outliers: (i) Robust Linear PCA [6], (ii) Mika et al ’s KPCA reconstruction [13], and (iii) Kwok & Tsang’s KPCA reconstruction [10]. [sent-200, score-0.703]
68 We only make use of the directly-illuminated frontal face images under five different expressions (smile, disgust, squint, surprise and scream), see Fig. [sent-203, score-0.149]
69 A shape-normalized face is generated for every face by warping it towards the mean shape using affine transformation. [sent-209, score-0.212]
70 The mean shape is used as the face mask and the values inside the mask are vectorized. [sent-212, score-0.158]
71 For each occlusion size and test image pair, we generate 5 Energy 80% Energy 95% Figure 4: a) 68 landmarks, b) a shape-normalized face, c) synthetic occlusion. [sent-214, score-0.274]
72 Base Line Mika et al Kwok&Tsang; Robust PCA Ours 14. [sent-234, score-0.179]
73 The statistics are available for three types of face regions (whole face, occluded region, and non-occluded region), different occlusion sizes, and different energy settings. [sent-416, score-0.357]
74 Our method consistently outperforms other methods for different occlusion sizes and energy levels. [sent-417, score-0.307]
75 a square occlusion window of that size, drawing the pixel values randomly from 0 to 255. [sent-418, score-0.189]
76 A synthetic testing image is then created by pasting the occlusion window at a random position. [sent-419, score-0.323]
77 4c displays such an image with occlusion size of 20. [sent-421, score-0.235]
78 For every synthetic testing image and each of the four algorithms, we compute the mean (at pixel level) of the absolute differences between the reconstructed image and the original test image without occlusion. [sent-422, score-0.431]
79 Base Line is the method that does nothing; the reconstructed images are exactly the same as the input testing images. [sent-430, score-0.22]
80 5, our method consistently outperforms others for all energy levels and occlusion sizes (using the whole-face statistics). [sent-432, score-0.307]
81 2 RKPCA for incomplete training data To compare the ability to handle missing attributes in training data of our algorithm with other methods, we perform some experiments on the well known Oil Flow dataset [4]. [sent-443, score-0.436]
82 We test our algorithm with different amount of missing data (from 5% to 50%) and repeat each experiment for 50 times. [sent-446, score-0.296]
83 We run Algorithm 1 to recover the values of the missing attributes, with m = 25, k = 10, γ = 0. [sent-448, score-0.296]
84 6 Table 1: Reconstruction errors for 5 different methods and 10 probabilities of missing values for the Oil Flow dataset. [sent-453, score-0.296]
85 Our method outperforms other methods for all levels of missing data. [sent-454, score-0.355]
86 The mean method is a widely used heuristic where the missing value of an attribute of a data point is filled by the mean of known values of the same attribute of other data points. [sent-473, score-0.401]
87 The 1-NN method is another widely used heuristic in which the missing values are replaced by the values of the nearest point, where the pairwise distance is calculated using only the attributes with known values. [sent-474, score-0.357]
88 1, our method outperforms other methods for all levels of missing data. [sent-477, score-0.355]
89 3 RKPCA for denoising This section describes denoising experiments on the Multi-PIE database with Gaussian additive noise. [sent-479, score-0.182]
90 For a fair evaluation, we only compare our algorithm with Mika et al ’s, Kwok & Tsang’s and Linear PCA. [sent-480, score-0.179]
91 a) original image, b) corrupted by Gaussian noise, c) denoised using PCA, d) using Mika et al, e) using Kwok & Tsang method, f) result of our method. [sent-484, score-0.158]
92 The set of images used in these experiments is exactly the same as those in the occlusion experiments described in Sec. [sent-485, score-0.201]
93 An example for a pair of clean and corrupted images are shown in Fig. [sent-490, score-0.168]
94 For every synthetic testing image, we compute the mean (at pixel level) of the absolute difference between the denoised image and the ground-truth. [sent-492, score-0.24]
95 Table 2: Results of image denoising on the Multi-PIE database. [sent-500, score-0.168]
96 In fact, the quantitative results show that our method is marginally better than Mika et al ’s method and substantially better than the other two. [sent-537, score-0.225]
97 6c-f), the reconstruction image of our method preserves much more fine details than the others. [sent-539, score-0.235]
98 7 5 Conclusion In this paper, we have proposed Robust Kernel PCA, a unified framework for handling noise, occlusion and missing data. [sent-540, score-0.497]
99 The cost function requires the reconstructed data point to be close to the original data point as well as to the principal subspace. [sent-542, score-0.294]
100 Therefore, the implicitness of the feature space is avoided and optimization is possible. [sent-545, score-0.151]
wordName wordTfidf (topN-words)
[('kpca', 0.613), ('missing', 0.296), ('pca', 0.22), ('outliers', 0.211), ('mika', 0.204), ('rkpca', 0.175), ('tsang', 0.175), ('occlusion', 0.158), ('principal', 0.155), ('kwok', 0.136), ('reconstruction', 0.135), ('al', 0.135), ('face', 0.106), ('kernel', 0.104), ('robust', 0.095), ('denoising', 0.091), ('reconstructed', 0.081), ('implicitness', 0.078), ('image', 0.077), ('subspace', 0.076), ('di', 0.071), ('corrupted', 0.07), ('energy', 0.067), ('vi', 0.066), ('sanguinetti', 0.062), ('clean', 0.055), ('lkopf', 0.052), ('na', 0.051), ('lawrence', 0.051), ('testing', 0.049), ('dj', 0.047), ('flow', 0.047), ('oil', 0.047), ('bakir', 0.047), ('sch', 0.047), ('feature', 0.046), ('et', 0.044), ('eds', 0.044), ('denoised', 0.044), ('robustness', 0.043), ('handling', 0.043), ('images', 0.043), ('whole', 0.043), ('handle', 0.042), ('attribute', 0.041), ('diagonal', 0.04), ('closeness', 0.039), ('berar', 0.039), ('eproj', 0.039), ('intrasample', 0.039), ('pkpca', 0.039), ('specular', 0.039), ('synthetic', 0.039), ('attributes', 0.038), ('ps', 0.038), ('noise', 0.036), ('outperforms', 0.036), ('dealing', 0.036), ('smola', 0.035), ('cost', 0.035), ('component', 0.034), ('torre', 0.034), ('uncorrupted', 0.034), ('adversely', 0.031), ('ppca', 0.031), ('attraction', 0.031), ('synthetically', 0.031), ('weigh', 0.031), ('pixel', 0.031), ('uni', 0.03), ('training', 0.03), ('minz', 0.029), ('illumination', 0.029), ('landmarks', 0.029), ('nding', 0.028), ('region', 0.028), ('space', 0.027), ('ller', 0.026), ('lu', 0.026), ('occluded', 0.026), ('mask', 0.026), ('cmu', 0.026), ('kernels', 0.026), ('base', 0.026), ('ai', 0.026), ('letters', 0.025), ('bishop', 0.025), ('eigenvectors', 0.025), ('relaxation', 0.024), ('dn', 0.024), ('input', 0.024), ('zi', 0.024), ('column', 0.023), ('arg', 0.023), ('tsch', 0.023), ('kij', 0.023), ('sizes', 0.023), ('close', 0.023), ('method', 0.023), ('exp', 0.022)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999958 200 nips-2008-Robust Kernel Principal Component Analysis
Author: Minh H. Nguyen, Fernando Torre
Abstract: Kernel Principal Component Analysis (KPCA) is a popular generalization of linear PCA that allows non-linear feature extraction. In KPCA, data in the input space is mapped to higher (usually) dimensional feature space where the data can be linearly modeled. The feature space is typically induced implicitly by a kernel function, and linear PCA in the feature space is performed via the kernel trick. However, due to the implicitness of the feature space, some extensions of PCA such as robust PCA cannot be directly generalized to KPCA. This paper presents a technique to overcome this problem, and extends it to a unified framework for treating noise, missing data, and outliers in KPCA. Our method is based on a novel cost function to perform inference in KPCA. Extensive experiments, in both synthetic and real data, show that our algorithm outperforms existing methods. 1
2 0.26082489 238 nips-2008-Theory of matching pursuit
Author: Zakria Hussain, John S. Shawe-taylor
Abstract: We analyse matching pursuit for kernel principal components analysis (KPCA) by proving that the sparse subspace it produces is a sample compression scheme. We show that this bound is tighter than the KPCA bound of Shawe-Taylor et al [7] and highly predictive of the size of the subspace needed to capture most of the variance in the data. We analyse a second matching pursuit algorithm called kernel matching pursuit (KMP) which does not correspond to a sample compression scheme. However, we give a novel bound that views the choice of subspace of the KMP algorithm as a compression scheme and hence provide a VC bound to upper bound its future loss. Finally we describe how the same bound can be applied to other matching pursuit related algorithms. 1
3 0.10233375 227 nips-2008-Supervised Exponential Family Principal Component Analysis via Convex Optimization
Author: Yuhong Guo
Abstract: Recently, supervised dimensionality reduction has been gaining attention, owing to the realization that data labels are often available and indicate important underlying structure in the data. In this paper, we present a novel convex supervised dimensionality reduction approach based on exponential family PCA, which is able to avoid the local optima of typical EM learning. Moreover, by introducing a sample-based approximation to exponential family models, it overcomes the limitation of the prevailing Gaussian assumptions of standard PCA, and produces a kernelized formulation for nonlinear supervised dimensionality reduction. A training algorithm is then devised based on a subgradient bundle method, whose scalability can be gained using a coordinate descent procedure. The advantage of our global optimization approach is demonstrated by empirical results over both synthetic and real data. 1
4 0.096723802 31 nips-2008-Bayesian Exponential Family PCA
Author: Shakir Mohamed, Zoubin Ghahramani, Katherine A. Heller
Abstract: Principal Components Analysis (PCA) has become established as one of the key tools for dimensionality reduction when dealing with real valued data. Approaches such as exponential family PCA and non-negative matrix factorisation have successfully extended PCA to non-Gaussian data types, but these techniques fail to take advantage of Bayesian inference and can suffer from problems of overfitting and poor generalisation. This paper presents a fully probabilistic approach to PCA, which is generalised to the exponential family, based on Hybrid Monte Carlo sampling. We describe the model which is based on a factorisation of the observed data matrix, and show performance of the model on both synthetic and real data. 1
5 0.082657896 148 nips-2008-Natural Image Denoising with Convolutional Networks
Author: Viren Jain, Sebastian Seung
Abstract: We present an approach to low-level vision that combines two main ideas: the use of convolutional networks as an image processing architecture and an unsupervised learning procedure that synthesizes training samples from specific noise models. We demonstrate this approach on the challenging problem of natural image denoising. Using a test set with a hundred natural images, we find that convolutional networks provide comparable and in some cases superior performance to state of the art wavelet and Markov random field (MRF) methods. Moreover, we find that a convolutional network offers similar performance in the blind denoising setting as compared to other techniques in the non-blind setting. We also show how convolutional networks are mathematically related to MRF approaches by presenting a mean field theory for an MRF specially designed for image denoising. Although these approaches are related, convolutional networks avoid computational difficulties in MRF approaches that arise from probabilistic learning and inference. This makes it possible to learn image processing architectures that have a high degree of representational power (we train models with over 15,000 parameters), but whose computational expense is significantly less than that associated with inference in MRF approaches with even hundreds of parameters. 1 Background Low-level image processing tasks include edge detection, interpolation, and deconvolution. These tasks are useful both in themselves, and as a front-end for high-level visual tasks like object recognition. This paper focuses on the task of denoising, defined as the recovery of an underlying image from an observation that has been subjected to Gaussian noise. One approach to image denoising is to transform an image from pixel intensities into another representation where statistical regularities are more easily captured. For example, the Gaussian scale mixture (GSM) model introduced by Portilla and colleagues is based on a multiscale wavelet decomposition that provides an effective description of local image statistics [1, 2]. Another approach is to try and capture statistical regularities of pixel intensities directly using Markov random fields (MRFs) to define a prior over the image space. Initial work used handdesigned settings of the parameters, but recently there has been increasing success in learning the parameters of such models from databases of natural images [3, 4, 5, 6, 7, 8]. Prior models can be used for tasks such as image denoising by augmenting the prior with a noise model. Alternatively, an MRF can be used to model the probability distribution of the clean image conditioned on the noisy image. This conditional random field (CRF) approach is said to be discriminative, in contrast to the generative MRF approach. Several researchers have shown that the CRF approach can outperform generative learning on various image restoration and labeling tasks [9, 10]. CRFs have recently been applied to the problem of image denoising as well [5]. 1 The present work is most closely related to the CRF approach. Indeed, certain special cases of convolutional networks can be seen as performing maximum likelihood inference on a CRF [11]. The advantage of the convolutional network approach is that it avoids a general difficulty with applying MRF-based methods to image analysis: the computational expense associated with both parameter estimation and inference in probabilistic models. For example, naive methods of learning MRFbased models involve calculation of the partition function, a normalization factor that is generally intractable for realistic models and image dimensions. As a result, a great deal of research has been devoted to approximate MRF learning and inference techniques that meliorate computational difficulties, generally at the cost of either representational power or theoretical guarantees [12, 13]. Convolutional networks largely avoid these difficulties by posing the computational task within the statistical framework of regression rather than density estimation. Regression is a more tractable computation and therefore permits models with greater representational power than methods based on density estimation. This claim will be argued for with empirical results on the denoising problem, as well as mathematical connections between MRF and convolutional network approaches. 2 Convolutional Networks Convolutional networks have been extensively applied to visual object recognition using architectures that accept an image as input and, through alternating layers of convolution and subsampling, produce one or more output values that are thresholded to yield binary predictions regarding object identity [14, 15]. In contrast, we study networks that accept an image as input and produce an entire image as output. Previous work has used such architectures to produce images with binary targets in image restoration problems for specialized microscopy data [11, 16]. Here we show that similar architectures can also be used to produce images with the analog fluctuations found in the intensity distributions of natural images. Network Dynamics and Architecture A convolutional network is an alternating sequence of linear filtering and nonlinear transformation operations. The input and output layers include one or more images, while intermediate layers contain “hidden
6 0.080292746 63 nips-2008-Dimensionality Reduction for Data in Multiple Feature Representations
7 0.079006791 116 nips-2008-Learning Hybrid Models for Image Annotation with Partially Labeled Data
8 0.074827418 216 nips-2008-Sparse probabilistic projections
9 0.072922796 80 nips-2008-Extended Grassmann Kernels for Subspace-Based Learning
10 0.070820034 202 nips-2008-Robust Regression and Lasso
11 0.070197314 61 nips-2008-Diffeomorphic Dimensionality Reduction
12 0.063439988 79 nips-2008-Exploring Large Feature Spaces with Hierarchical Multiple Kernel Learning
13 0.062818855 42 nips-2008-Cascaded Classification Models: Combining Models for Holistic Scene Understanding
14 0.059703127 192 nips-2008-Reducing statistical dependencies in natural signals using radial Gaussianization
15 0.059070136 78 nips-2008-Exact Convex Confidence-Weighted Learning
16 0.058662184 113 nips-2008-Kernelized Sorting
17 0.058528014 57 nips-2008-Deflation Methods for Sparse PCA
18 0.057891618 27 nips-2008-Artificial Olfactory Brain for Mixture Identification
19 0.057825256 130 nips-2008-MCBoost: Multiple Classifier Boosting for Perceptual Co-clustering of Images and Visual Features
20 0.054682296 191 nips-2008-Recursive Segmentation and Recognition Templates for 2D Parsing
topicId topicWeight
[(0, -0.178), (1, -0.083), (2, 0.016), (3, 0.023), (4, 0.039), (5, 0.003), (6, -0.052), (7, -0.045), (8, 0.007), (9, -0.008), (10, 0.142), (11, -0.088), (12, 0.044), (13, 0.08), (14, 0.063), (15, -0.075), (16, 0.099), (17, -0.03), (18, 0.039), (19, -0.111), (20, 0.071), (21, -0.085), (22, 0.041), (23, -0.054), (24, -0.114), (25, -0.12), (26, -0.105), (27, -0.034), (28, -0.195), (29, 0.198), (30, 0.074), (31, -0.14), (32, -0.015), (33, 0.081), (34, 0.061), (35, 0.07), (36, 0.053), (37, -0.074), (38, 0.008), (39, -0.001), (40, 0.041), (41, 0.008), (42, 0.114), (43, -0.017), (44, 0.012), (45, -0.182), (46, 0.011), (47, -0.008), (48, -0.028), (49, -0.013)]
simIndex simValue paperId paperTitle
same-paper 1 0.93000174 200 nips-2008-Robust Kernel Principal Component Analysis
Author: Minh H. Nguyen, Fernando Torre
Abstract: Kernel Principal Component Analysis (KPCA) is a popular generalization of linear PCA that allows non-linear feature extraction. In KPCA, data in the input space is mapped to higher (usually) dimensional feature space where the data can be linearly modeled. The feature space is typically induced implicitly by a kernel function, and linear PCA in the feature space is performed via the kernel trick. However, due to the implicitness of the feature space, some extensions of PCA such as robust PCA cannot be directly generalized to KPCA. This paper presents a technique to overcome this problem, and extends it to a unified framework for treating noise, missing data, and outliers in KPCA. Our method is based on a novel cost function to perform inference in KPCA. Extensive experiments, in both synthetic and real data, show that our algorithm outperforms existing methods. 1
2 0.76775312 238 nips-2008-Theory of matching pursuit
Author: Zakria Hussain, John S. Shawe-taylor
Abstract: We analyse matching pursuit for kernel principal components analysis (KPCA) by proving that the sparse subspace it produces is a sample compression scheme. We show that this bound is tighter than the KPCA bound of Shawe-Taylor et al [7] and highly predictive of the size of the subspace needed to capture most of the variance in the data. We analyse a second matching pursuit algorithm called kernel matching pursuit (KMP) which does not correspond to a sample compression scheme. However, we give a novel bound that views the choice of subspace of the KMP algorithm as a compression scheme and hence provide a VC bound to upper bound its future loss. Finally we describe how the same bound can be applied to other matching pursuit related algorithms. 1
3 0.67378914 31 nips-2008-Bayesian Exponential Family PCA
Author: Shakir Mohamed, Zoubin Ghahramani, Katherine A. Heller
Abstract: Principal Components Analysis (PCA) has become established as one of the key tools for dimensionality reduction when dealing with real valued data. Approaches such as exponential family PCA and non-negative matrix factorisation have successfully extended PCA to non-Gaussian data types, but these techniques fail to take advantage of Bayesian inference and can suffer from problems of overfitting and poor generalisation. This paper presents a fully probabilistic approach to PCA, which is generalised to the exponential family, based on Hybrid Monte Carlo sampling. We describe the model which is based on a factorisation of the observed data matrix, and show performance of the model on both synthetic and real data. 1
4 0.65288067 227 nips-2008-Supervised Exponential Family Principal Component Analysis via Convex Optimization
Author: Yuhong Guo
Abstract: Recently, supervised dimensionality reduction has been gaining attention, owing to the realization that data labels are often available and indicate important underlying structure in the data. In this paper, we present a novel convex supervised dimensionality reduction approach based on exponential family PCA, which is able to avoid the local optima of typical EM learning. Moreover, by introducing a sample-based approximation to exponential family models, it overcomes the limitation of the prevailing Gaussian assumptions of standard PCA, and produces a kernelized formulation for nonlinear supervised dimensionality reduction. A training algorithm is then devised based on a subgradient bundle method, whose scalability can be gained using a coordinate descent procedure. The advantage of our global optimization approach is demonstrated by empirical results over both synthetic and real data. 1
5 0.54862082 61 nips-2008-Diffeomorphic Dimensionality Reduction
Author: Christian Walder, Bernhard Schölkopf
Abstract: This paper introduces a new approach to constructing meaningful lower dimensional representations of sets of data points. We argue that constraining the mapping between the high and low dimensional spaces to be a diffeomorphism is a natural way of ensuring that pairwise distances are approximately preserved. Accordingly we develop an algorithm which diffeomorphically maps the data near to a lower dimensional subspace and then projects onto that subspace. The problem of solving for the mapping is transformed into one of solving for an Eulerian flow field which we compute using ideas from kernel methods. We demonstrate the efficacy of our approach on various real world data sets. 1
6 0.45544758 216 nips-2008-Sparse probabilistic projections
7 0.41954231 63 nips-2008-Dimensionality Reduction for Data in Multiple Feature Representations
8 0.41565043 68 nips-2008-Efficient Direct Density Ratio Estimation for Non-stationarity Adaptation and Outlier Detection
9 0.38805717 57 nips-2008-Deflation Methods for Sparse PCA
10 0.35844824 62 nips-2008-Differentiable Sparse Coding
11 0.35407734 56 nips-2008-Deep Learning with Kernel Regularization for Visual Recognition
12 0.35121584 148 nips-2008-Natural Image Denoising with Convolutional Networks
13 0.3471891 113 nips-2008-Kernelized Sorting
14 0.33607265 218 nips-2008-Spectral Clustering with Perturbed Data
15 0.33600453 83 nips-2008-Fast High-dimensional Kernel Summations Using the Monte Carlo Multipole Method
16 0.32970244 188 nips-2008-QUIC-SVD: Fast SVD Using Cosine Trees
17 0.32475868 226 nips-2008-Supervised Dictionary Learning
18 0.31893274 27 nips-2008-Artificial Olfactory Brain for Mixture Identification
19 0.31027952 126 nips-2008-Localized Sliced Inverse Regression
20 0.30337104 3 nips-2008-A Massively Parallel Digital Learning Processor
topicId topicWeight
[(6, 0.065), (7, 0.087), (12, 0.043), (28, 0.129), (57, 0.137), (59, 0.012), (63, 0.032), (77, 0.09), (78, 0.014), (83, 0.043), (99, 0.24)]
simIndex simValue paperId paperTitle
same-paper 1 0.78862 200 nips-2008-Robust Kernel Principal Component Analysis
Author: Minh H. Nguyen, Fernando Torre
Abstract: Kernel Principal Component Analysis (KPCA) is a popular generalization of linear PCA that allows non-linear feature extraction. In KPCA, data in the input space is mapped to higher (usually) dimensional feature space where the data can be linearly modeled. The feature space is typically induced implicitly by a kernel function, and linear PCA in the feature space is performed via the kernel trick. However, due to the implicitness of the feature space, some extensions of PCA such as robust PCA cannot be directly generalized to KPCA. This paper presents a technique to overcome this problem, and extends it to a unified framework for treating noise, missing data, and outliers in KPCA. Our method is based on a novel cost function to perform inference in KPCA. Extensive experiments, in both synthetic and real data, show that our algorithm outperforms existing methods. 1
2 0.67095625 216 nips-2008-Sparse probabilistic projections
Author: Cédric Archambeau, Francis R. Bach
Abstract: We present a generative model for performing sparse probabilistic projections, which includes sparse principal component analysis and sparse canonical correlation analysis as special cases. Sparsity is enforced by means of automatic relevance determination or by imposing appropriate prior distributions, such as generalised hyperbolic distributions. We derive a variational Expectation-Maximisation algorithm for the estimation of the hyperparameters and show that our novel probabilistic approach compares favourably to existing techniques. We illustrate how the proposed method can be applied in the context of cryptoanalysis as a preprocessing tool for the construction of template attacks. 1
3 0.66857266 27 nips-2008-Artificial Olfactory Brain for Mixture Identification
Author: Mehmet K. Muezzinoglu, Alexander Vergara, Ramon Huerta, Thomas Nowotny, Nikolai Rulkov, Henry Abarbanel, Allen Selverston, Mikhail Rabinovich
Abstract: The odor transduction process has a large time constant and is susceptible to various types of noise. Therefore, the olfactory code at the sensor/receptor level is in general a slow and highly variable indicator of the input odor in both natural and artificial situations. Insects overcome this problem by using a neuronal device in their Antennal Lobe (AL), which transforms the identity code of olfactory receptors to a spatio-temporal code. This transformation improves the decision of the Mushroom Bodies (MBs), the subsequent classifier, in both speed and accuracy. Here we propose a rate model based on two intrinsic mechanisms in the insect AL, namely integration and inhibition. Then we present a MB classifier model that resembles the sparse and random structure of insect MB. A local Hebbian learning procedure governs the plasticity in the model. These formulations not only help to understand the signal conditioning and classification methods of insect olfactory systems, but also can be leveraged in synthetic problems. Among them, we consider here the discrimination of odor mixtures from pure odors. We show on a set of records from metal-oxide gas sensors that the cascade of these two new models facilitates fast and accurate discrimination of even highly imbalanced mixtures from pure odors. 1
4 0.66185021 208 nips-2008-Shared Segmentation of Natural Scenes Using Dependent Pitman-Yor Processes
Author: Erik B. Sudderth, Michael I. Jordan
Abstract: We develop a statistical framework for the simultaneous, unsupervised segmentation and discovery of visual object categories from image databases. Examining a large set of manually segmented scenes, we show that object frequencies and segment sizes both follow power law distributions, which are well modeled by the Pitman–Yor (PY) process. This nonparametric prior distribution leads to learning algorithms which discover an unknown set of objects, and segmentation methods which automatically adapt their resolution to each image. Generalizing previous applications of PY processes, we use Gaussian processes to discover spatially contiguous segments which respect image boundaries. Using a novel family of variational approximations, our approach produces segmentations which compare favorably to state-of-the-art methods, while simultaneously discovering categories shared among natural scenes. 1
5 0.65952945 100 nips-2008-How memory biases affect information transmission: A rational analysis of serial reproduction
Author: Jing Xu, Thomas L. Griffiths
Abstract: Many human interactions involve pieces of information being passed from one person to another, raising the question of how this process of information transmission is affected by the capacities of the agents involved. In the 1930s, Sir Frederic Bartlett explored the influence of memory biases in “serial reproduction” of information, in which one person’s reconstruction of a stimulus from memory becomes the stimulus seen by the next person. These experiments were done using relatively uncontrolled stimuli such as pictures and stories, but suggested that serial reproduction would transform information in a way that reflected the biases inherent in memory. We formally analyze serial reproduction using a Bayesian model of reconstruction from memory, giving a general result characterizing the effect of memory biases on information transmission. We then test the predictions of this account in two experiments using simple one-dimensional stimuli. Our results provide theoretical and empirical justification for the idea that serial reproduction reflects memory biases. 1
6 0.65600783 80 nips-2008-Extended Grassmann Kernels for Subspace-Based Learning
7 0.6505869 192 nips-2008-Reducing statistical dependencies in natural signals using radial Gaussianization
8 0.6485101 93 nips-2008-Global Ranking Using Continuous Conditional Random Fields
9 0.64724314 158 nips-2008-Offline Handwriting Recognition with Multidimensional Recurrent Neural Networks
10 0.643197 62 nips-2008-Differentiable Sparse Coding
11 0.64042234 236 nips-2008-The Mondrian Process
12 0.64024651 116 nips-2008-Learning Hybrid Models for Image Annotation with Partially Labeled Data
13 0.64016873 66 nips-2008-Dynamic visual attention: searching for coding length increments
14 0.63984746 118 nips-2008-Learning Transformational Invariants from Natural Movies
15 0.63946944 234 nips-2008-The Infinite Factorial Hidden Markov Model
16 0.6337164 197 nips-2008-Relative Performance Guarantees for Approximate Inference in Latent Dirichlet Allocation
17 0.63077068 232 nips-2008-The Conjoint Effect of Divisive Normalization and Orientation Selectivity on Redundancy Reduction
18 0.63032115 63 nips-2008-Dimensionality Reduction for Data in Multiple Feature Representations
19 0.62978846 138 nips-2008-Modeling human function learning with Gaussian processes
20 0.62789196 246 nips-2008-Unsupervised Learning of Visual Sense Models for Polysemous Words