nips nips2008 nips2008-238 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Zakria Hussain, John S. Shawe-taylor
Abstract: We analyse matching pursuit for kernel principal components analysis (KPCA) by proving that the sparse subspace it produces is a sample compression scheme. We show that this bound is tighter than the KPCA bound of Shawe-Taylor et al [7] and highly predictive of the size of the subspace needed to capture most of the variance in the data. We analyse a second matching pursuit algorithm called kernel matching pursuit (KMP) which does not correspond to a sample compression scheme. However, we give a novel bound that views the choice of subspace of the KMP algorithm as a compression scheme and hence provide a VC bound to upper bound its future loss. Finally we describe how the same bound can be applied to other matching pursuit related algorithms. 1
Reference: text
sentIndex sentText sentNum sentScore
1 Theory of matching pursuit Zakria Hussain and John Shawe-Taylor Department of Computer Science University College London, UK {z. [sent-1, score-0.402]
2 uk Abstract We analyse matching pursuit for kernel principal components analysis (KPCA) by proving that the sparse subspace it produces is a sample compression scheme. [sent-6, score-1.483]
3 We show that this bound is tighter than the KPCA bound of Shawe-Taylor et al [7] and highly predictive of the size of the subspace needed to capture most of the variance in the data. [sent-7, score-0.485]
4 We analyse a second matching pursuit algorithm called kernel matching pursuit (KMP) which does not correspond to a sample compression scheme. [sent-8, score-1.604]
5 However, we give a novel bound that views the choice of subspace of the KMP algorithm as a compression scheme and hence provide a VC bound to upper bound its future loss. [sent-9, score-1.257]
6 Finally we describe how the same bound can be applied to other matching pursuit related algorithms. [sent-10, score-0.597]
7 1 Introduction Matching pursuit refers to a family of algorithms that generate a set of bases for learning in a greedy fashion. [sent-11, score-0.271]
8 A good example of this approach is the matching pursuit algorithm [4]. [sent-12, score-0.431]
9 Viewed from this angle sparse kernel principal components analysis (PCA) looks for a small number of kernel basis vectors in order to maximise the Rayleigh quotient. [sent-13, score-0.807]
10 The algorithm was proposed by [8]1 and motivated by matching pursuit [4], but to our knowledge sparse PCA has not been analysed theoretically. [sent-14, score-0.578]
11 In this paper we show that sparse PCA (KPCA) is a sample compression scheme and can be bounded using the size of the compression set [3, 2] which is the set of training examples used in the construction of the KPCA subspace. [sent-15, score-1.323]
12 A related algorithm called kernel matching pursuit (KMP) [10] is a sparse version of least squares regression but without the property of being a compression scheme. [sent-17, score-1.254]
13 However, we use the number of basis vectors constructed by KMP to help upper bound the loss of the KMP algorithm using the VC dimension. [sent-18, score-0.462]
14 This bound is novel in that it is applied in an empirically chosen low dimensional hypothesis space and applies independently of the actual dimension of the ambient feature space (including one constructed from the Gaussian kernel). [sent-19, score-0.247]
15 Finally we also show that the KMP bound can be applied to a sparse kernel canonical correlation analysis that uses a similar matching pursuit technique. [sent-21, score-0.876]
16 For the principal components analysis the data is a sample S = {xi }m of i=1 1 The algorithm was proposed as a low rank kernel approximation – however the algorithm turns out to be a sparse kernel PCA (to be shown). [sent-26, score-0.765]
17 For simplicity we always assume that the examples are already projected into the kernel defined feature space, so that the kernel matrix K has entries K[i, j] = xi , xj . [sent-28, score-0.448]
18 For the sample compression analysis the compression function Λ induced by a sample compression learning algorithm A on training set S is the map Λ : S −→ Λ(S) such that the compression set Λ(S) ⊂ S is returned by A. [sent-41, score-2.221]
19 A reconstruction function Φ is a mapping from a compression set Λ(S) to a set F of functions Φ : Λ(S) −→ F . [sent-42, score-0.533]
20 Therefore, a sample compression scheme is a reconstruction function Φ mapping a compression set Λ(S) to some set of functions F such that A(S) = Φ(Λ(S)). [sent-44, score-1.135]
21 If F is the set of Boolean-valued or Real-valued functions then the sample compression scheme is said to be a classification or regression algorithm, respectively. [sent-45, score-0.634]
22 1 Sparse kernel principal components analysis Principal components analysis [6] can be expressed as the following maximisation problem: w X Xw max , (1) w ww where w is the weight vector. [sent-47, score-0.418]
23 In a sparse KPCA algorithm we would like to find a sparsely repre˜ sented vector w = X[i, :] α, that is a linear combination of a small number of training examples indexed by vector i. [sent-48, score-0.246]
24 Clearly maximising the quantity above will lead to ˜ the maximisation of the generalised eigenvalues corresponding to α – and hence a sparse subset of the original KPCA problem. [sent-50, score-0.218]
25 The procedure involves choosing basis vectors that maximise the Rayleigh quotient without the set of eigenvectors. [sent-53, score-0.34]
26 Choosing basis vectors iteratively until some pre-specified number of k vectors are chosen. [sent-54, score-0.216]
27 An orthogonalisation of the kernel matrix at each step ensures future potential basis vectors will be orthogonal to those already chosen. [sent-55, score-0.379]
28 The quotient to maximise is: ei K2 ei max ρi = , (2) ei Kei where ei is the ith unit vector. [sent-56, score-0.831]
29 After this maximisation we need to orthogonalise (deflate) the kernel matrix to create a projection into the space orthogonal to the basis vectors chosen to ensure we find the maximum variance of the data in the projected space. [sent-57, score-0.565]
30 Let τ = K[:, i] = XX ei where ei is the ith unit vector. [sent-59, score-0.335]
31 We know that primal PCA deflation can be carried out with respect to the features in the following way: uu ˆ X = I− X, uu ˆ where u is the projection directions defined by the chosen eigenvector and X is the deflated matrix. [sent-60, score-0.341]
32 However, in sparse KPCA, u = X ei because the projection directions are simply the examples in X. [sent-61, score-0.369]
33 Therefore, for sparse KPCA we have: uu uu XX ei ei XX K[:, i]K[:, i] ˆˆ XX = X I − I− X = XX − =K− . [sent-62, score-0.629]
34 uu uu ei XX ei K[i, i] 2 ˆ Therefore, given a kernel matrix K the deflated kernel matrix K can be computed as follows: ˆ K = K− ττ K[ik , ik ] (3) where τ = K[:, ik ] and ik denotes the latest element in the vector i. [sent-63, score-1.361]
35 Algorithm 1: A matching pursuit algorithm for kernel principal components analysis (i. [sent-67, score-0.725]
36 However, their motivation comes from the stance of finding a low rank matrix approximation of ˜ the kernel matrix. [sent-72, score-0.296]
37 Their algorithm finds the set of indices i and the projection matrix T . [sent-74, score-0.21]
38 However, the use of T in computing the low rank matrix approximation seems to imply the need for additional information from outside of the chosen basis vectors in order to construct this approximation. [sent-75, score-0.355]
39 However, we show that a projection into the space defined solely by the chosen indices is enough to reconstruct the kernel matrix and does not require any extra information. [sent-76, score-0.376]
40 o An orthogonal projection Pi (φ(xj )) of a feature vector φ(xj ) into a subspace defined only by the ˜ ˜˜ ˜ ˜ set of indices i can be expressed as: Pi (xj ) = X (XX )−1 Xφ(xj ), where X = X[i, :] are the i training examples from data matrix X. [sent-78, score-0.34]
41 The sparse kernel principal components analysis algorithm is a compression scheme. [sent-82, score-0.95]
42 We now prove that Smola and Sch¨ lkopf’s low rank matrix approximation algorithm [8] (without o sub-sampling)3 is equivalent to sparse kernel principal components analysis presented in this paper (Algorithm 1). [sent-86, score-0.57]
43 2 In their book, Smola and Sch¨ lkopf redefine their kernel approximation in the same way as we have done o [5], however they do not make the connection that it is a compression scheme (see Claim 1). [sent-89, score-0.768]
44 Let K be the kernel matrix and let K[:, i] be the ith column of the kernel matrix. [sent-91, score-0.409]
45 2 of their paper that their algorithm 2 finds a low rank approximation of the kernel matrix such that ˜ ˜ ˜ it minimises the Frobenius norm X− X 2 Frob = tr{K− K} where X is the low rank approximation of X. [sent-97, score-0.444]
46 We would like to show that the maximum reduction in the Frobenius norm between the kernel K ˜ and its projection K is in actual fact the choice of basis vectors that maximise the Rayleigh quotient and deflate according to Equation (3). [sent-99, score-0.578]
47 K[ik , ik ] K[ik , ik ] K[ik , ik ] The last term of the final equation corresponds exactly to the Rayleigh quotient of Equation (2). [sent-103, score-0.529]
48 Therefore the maximisation of the Rayleigh quotient does indeed correspond to the maximum re˜ duction in the Frobenius norm between the approximated matrix X and X. [sent-104, score-0.235]
49 2 A generalisation error bound for sparse kernel principal components analysis We use the sample compression framework of [3] to bound the generalisation error of the sparse KPCA algorithm. [sent-106, score-1.642]
50 Note that kernel PCA bounds [7] do not use sample compression in order to bound the true error. [sent-107, score-0.957]
51 As pointed out above, we use the simple fact that this algorithm can be viewed as a compression scheme. [sent-108, score-0.541]
52 That said the usual application of compression bounds has been for classification algorithms, while here we are considering a subspace method. [sent-110, score-0.613]
53 Let Ak be any learning algorithm having a reconstruction function that maps compression sets to subspaces. [sent-112, score-0.562]
54 Consider the case where we have a compression set of size k. [sent-115, score-0.512]
55 Then we have m different k ways of choosing the compression set. [sent-116, score-0.512]
56 Given δ confidence we apply Hoeffding’s bound to the m − k m points not in the compression set once for each choice by setting it equal to δ/ k . [sent-117, score-0.738]
57 We now consider the application of the above bound to sparse KPCA. [sent-120, score-0.31]
58 Let the corresponding loss function be defined as (At (S))(x) = x − Pit (x) 2 , where x is a test point and Pit (x) its projection into the subspace determined by the set it of indices returned by At (S). [sent-121, score-0.264]
59 Thus we can give a more specific loss bound in the case where we use a Gaussian kernel in the sparse kernel principal components analysis. [sent-122, score-0.834]
60 Using a Gaussian kernel and all of the definitions from Theorem 2, we get the following bound: E[ (A(S))] ≤ min 1≤t≤k 1 m−t m−t xi − Pit (xi ) i=1 4 2 + 1 em t ln + ln 2(m − t) t 2m δ , Note that R corresponds to the smallest radius of a ball that encloses all of the training points. [sent-124, score-0.314]
61 We now compare the sample compression bound proposed above for sparse KPCA with the kernel PCA bound introduced by [7]. [sent-126, score-1.231]
62 The left hand side of Figure 1 shows plots for the test error residuals (for the Boston housing data set) together with its upper bounds computed using the bound of [7] and the sample compression bound of Corollary 1. [sent-127, score-1.157]
63 The sample compression bound is much tighter than the KPCA bound and also non-trivial (unlike the KPCA bound). [sent-128, score-0.982]
64 The sample compression bound is at its lowest point after 43 basis vectors have been added. [sent-129, score-0.927]
65 We carry out an extra toy experiment to help assess whether or not this is true and to show that the sample compression bound can help indicate when the principal components have captured most of the actual data. [sent-132, score-0.913]
66 From the right plot of Figure 1 we see that the test residual keeps dropping at a constant rate after 50 basis vectors have been added. [sent-135, score-0.233]
67 The compression bound picks 46 dimensions with the largest eigenvalues, however, the KPCA bound of [7] is much more optimistic and is at its lowest point after 30 basis vectors, suggesting erroneously that SKPCA has captured most of the data in 30 dimensions. [sent-136, score-1.049]
68 Therefore, as well as being tighter and non-trivial, the compression bound is much better at predicting the best choice for the number of dimensions to use with sparse KPCA. [sent-137, score-0.899]
69 Note that we carried out this experiment without randomly permuting the projections into a subspace because SKPCA is rotation invariant and will always choose the principal components with the largest eigenvalues. [sent-138, score-0.245]
70 Bound plots for sparse kernel PCA Bound plots for sparse kernel PCA 2. [sent-139, score-0.612]
71 8 PCA bound sample compression bound test residual PCA bound sample compression bound test residual 1. [sent-141, score-2.022]
72 3 Kernel matching pursuit Unfortunately, the theory of the last section, where we gave a sample compression bound for SKPCA cannot be applied to KMP. [sent-154, score-1.159]
73 This is because the algorithm needs information from outside of the compression set in order to construct its regressors and make predictions. [sent-155, score-0.587]
74 However, we can use a VC argument together with a sample compression trick in order to derive a bound for KMP in terms of the level of sparsity achieved, by viewing the sparsity achieved in the feature space as a 5 compression scheme. [sent-156, score-1.335]
75 1 A generalisation error bound for kernel matching pursuit VC bounds have commonly been used to bound learning algorithms whose hypothesis spaces are infinite. [sent-159, score-1.075]
76 In the kernel matching pursuit algorithm this translates directly into the number of basis vectors chosen and hence a standard VC argument. [sent-164, score-0.772]
77 The natural loss function for KMP is regression – however in order to use standard VC bounds we map the regression loss into a classification loss in the following way. [sent-165, score-0.298]
78 Given the error (f ) = |f (x) − y| for a regression function f between training example x and regression output y we can define, for some fixed positive scalar α ∈ R, the corresponding true classification loss (error) as α (f ) = {|f (x) − y| > α} . [sent-168, score-0.197]
79 Now that we have a loss function that is binary we can make a simple sample compression argument, that counts the number of possible subspaces, together with a traditional VC style bound to upper bound the expected loss of KMP. [sent-170, score-1.132]
80 To help keep the notation consistent with earlier definitions we will denote the indices of the chosen basis vectors by i. [sent-171, score-0.236]
81 Given these definitions and the bound of Vapnik and Chervonenkis [9] we can upper bound the true loss of KMP as follows. [sent-173, score-0.482]
82 Let A be the regression algorithm of KMP, m the size of the training set S and k the size of the chosen basis vectors i. [sent-176, score-0.282]
83 Then with probability 1 − δ over the generation of the training set S the expected loss E[ (·)] of algorithm A can be bounded by, E[ (A(S))] ≤ 4e(m − k − t) em + k log k+1 k e(m − k) 2m2 +t log + log . [sent-178, score-0.223]
84 First consider a fixed size k for the compression set and number of errors t. [sent-180, score-0.512]
85 , xik+t } ¯ the set of points erred on in training and S = S (S1 ∪ S2 ) the points outside of the compression set (S1 ) and training error set (S2 ). [sent-187, score-0.731]
86 Suppose that the first k points form the compression set and ¯ the next t are the errors of the KMP regressor. [sent-188, score-0.543]
87 We now need to consider all of the ways that the 6 k basis vectors and t error points might have occurred and apply the union bound over all of these possibilities. [sent-190, score-0.395]
88 This is the first upper bound on the generalisation error for KMP that we are aware of and as such we cannot compare the bound against any others. [sent-197, score-0.499]
89 1 0 0 5 10 15 20 25 30 Level of sparsity 35 40 45 50 Figure 2: Plot of KMP bound against its test error. [sent-207, score-0.228]
90 This motivates a training algorithm for KMP that would use the bound as the minimisation criteria and stop once the bound fails to become smaller. [sent-213, score-0.495]
91 4 Extensions The same approach that we have used for bounding the performance of kernel matching pursuit can be used to bound a matching pursuit version of kernel canonical correlation analysis (KCCA) [6]. [sent-215, score-1.327]
92 This again means that the overall algorithm fails to be a compression scheme as side information is required. [sent-217, score-0.607]
93 However, we can use the same approach described for KMP to bound the expected fit of the projections from the two views. [sent-218, score-0.221]
94 Then with probability 1 − δ over the generation of the paired training sets S X ×Y the expected loss E[ (·)] of algorithm A can be bounded by, E[ (A(S))] ≤ 5 4e(m − k − t) em + k log k+1 k 2 e(m − k) 2m +t log + log . [sent-224, score-0.256]
95 t δ 2 (k + 1) log m−k−t Discussion Matching pursuit is a meta-scheme for creating learning algorithms for a variety of tasks. [sent-225, score-0.271]
96 We have presented novel techniques that make it possible to analyse this style of algorithm using a combination of compression scheme ideas and more traditional learning theory. [sent-226, score-0.648]
97 We have shown how sparse KPCA is in fact a compression scheme and demonstrated bounds that are able to accurately guide dimension selection in some cases. [sent-227, score-0.703]
98 We have also used the techniques to bound the performance of the kernel matching pursuit (KMP) algorithm and to reinforce the generality of the approach indicated and how the approach can be extended to a matching pursuit version of KCCA. [sent-228, score-1.192]
99 The results in this paper imply that the performance of any learning algorithm from the matching pursuit family can be analysed using a combination of sparse and traditional learning bounds. [sent-229, score-0.578]
100 The bounds give a general theoretical justification of the framework and suggest potential applications of matching pursuit methods to other learning tasks such as novelty detection, ranking and so on. [sent-230, score-0.438]
wordName wordTfidf (topN-words)
[('compression', 0.512), ('kmp', 0.424), ('kpca', 0.282), ('pursuit', 0.271), ('bound', 0.195), ('kernel', 0.164), ('ei', 0.151), ('ik', 0.141), ('matching', 0.131), ('xx', 0.126), ('pca', 0.122), ('sparse', 0.115), ('uu', 0.106), ('quotient', 0.106), ('vc', 0.091), ('rayleigh', 0.091), ('maximise', 0.088), ('principal', 0.087), ('maximisation', 0.081), ('ate', 0.081), ('skpca', 0.081), ('basis', 0.076), ('projection', 0.074), ('tr', 0.072), ('vectors', 0.07), ('loss', 0.066), ('subspace', 0.065), ('xik', 0.061), ('generalisation', 0.06), ('indices', 0.059), ('residual', 0.059), ('pit', 0.053), ('sample', 0.05), ('matrix', 0.048), ('dimensions', 0.047), ('frobenius', 0.047), ('outside', 0.046), ('analyse', 0.045), ('training', 0.044), ('ak', 0.044), ('components', 0.043), ('housing', 0.043), ('xj', 0.043), ('scheme', 0.04), ('em', 0.04), ('rank', 0.038), ('smola', 0.036), ('bounds', 0.036), ('fy', 0.035), ('minimises', 0.035), ('boston', 0.035), ('ith', 0.033), ('paired', 0.033), ('pi', 0.033), ('sparsity', 0.033), ('ln', 0.033), ('minimisation', 0.032), ('reordered', 0.032), ('analysed', 0.032), ('ated', 0.032), ('regression', 0.032), ('chosen', 0.031), ('points', 0.031), ('cruz', 0.03), ('vapnik', 0.03), ('sch', 0.03), ('tighter', 0.03), ('nitions', 0.03), ('pr', 0.03), ('algorithm', 0.029), ('examples', 0.029), ('sparsely', 0.029), ('hyperplanes', 0.029), ('ation', 0.029), ('plot', 0.028), ('fx', 0.027), ('nystr', 0.027), ('lkopf', 0.027), ('plots', 0.027), ('side', 0.026), ('projections', 0.026), ('md', 0.026), ('santa', 0.026), ('toy', 0.026), ('upper', 0.026), ('approximation', 0.025), ('carried', 0.024), ('lowest', 0.024), ('residuals', 0.024), ('generation', 0.023), ('error', 0.023), ('parallel', 0.023), ('eigenvalues', 0.022), ('style', 0.022), ('low', 0.021), ('fix', 0.021), ('reconstruction', 0.021), ('orthogonal', 0.021), ('bounded', 0.021), ('theorem', 0.021)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000005 238 nips-2008-Theory of matching pursuit
Author: Zakria Hussain, John S. Shawe-taylor
Abstract: We analyse matching pursuit for kernel principal components analysis (KPCA) by proving that the sparse subspace it produces is a sample compression scheme. We show that this bound is tighter than the KPCA bound of Shawe-Taylor et al [7] and highly predictive of the size of the subspace needed to capture most of the variance in the data. We analyse a second matching pursuit algorithm called kernel matching pursuit (KMP) which does not correspond to a sample compression scheme. However, we give a novel bound that views the choice of subspace of the KMP algorithm as a compression scheme and hence provide a VC bound to upper bound its future loss. Finally we describe how the same bound can be applied to other matching pursuit related algorithms. 1
2 0.26082489 200 nips-2008-Robust Kernel Principal Component Analysis
Author: Minh H. Nguyen, Fernando Torre
Abstract: Kernel Principal Component Analysis (KPCA) is a popular generalization of linear PCA that allows non-linear feature extraction. In KPCA, data in the input space is mapped to higher (usually) dimensional feature space where the data can be linearly modeled. The feature space is typically induced implicitly by a kernel function, and linear PCA in the feature space is performed via the kernel trick. However, due to the implicitness of the feature space, some extensions of PCA such as robust PCA cannot be directly generalized to KPCA. This paper presents a technique to overcome this problem, and extends it to a unified framework for treating noise, missing data, and outliers in KPCA. Our method is based on a novel cost function to perform inference in KPCA. Extensive experiments, in both synthetic and real data, show that our algorithm outperforms existing methods. 1
3 0.19842245 199 nips-2008-Risk Bounds for Randomized Sample Compressed Classifiers
Author: Mohak Shah
Abstract: We derive risk bounds for the randomized classifiers in Sample Compression setting where the classifier-specification utilizes two sources of information viz. the compression set and the message string. By extending the recently proposed Occam’s Hammer principle to the data-dependent settings, we derive point-wise versions of the bounds on the stochastic sample compressed classifiers and also recover the corresponding classical PAC-Bayes bound. We further show how these compare favorably to the existing results.
4 0.093921788 216 nips-2008-Sparse probabilistic projections
Author: Cédric Archambeau, Francis R. Bach
Abstract: We present a generative model for performing sparse probabilistic projections, which includes sparse principal component analysis and sparse canonical correlation analysis as special cases. Sparsity is enforced by means of automatic relevance determination or by imposing appropriate prior distributions, such as generalised hyperbolic distributions. We derive a variational Expectation-Maximisation algorithm for the estimation of the hyperparameters and show that our novel probabilistic approach compares favourably to existing techniques. We illustrate how the proposed method can be applied in the context of cryptoanalysis as a preprocessing tool for the construction of template attacks. 1
5 0.092741951 113 nips-2008-Kernelized Sorting
Author: Novi Quadrianto, Le Song, Alex J. Smola
Abstract: Object matching is a fundamental operation in data analysis. It typically requires the definition of a similarity measure between the classes of objects to be matched. Instead, we develop an approach which is able to perform matching by requiring a similarity measure only within each of the classes. This is achieved by maximizing the dependency between matched pairs of observations by means of the Hilbert Schmidt Independence Criterion. This problem can be cast as one of maximizing a quadratic assignment problem with special structure and we present a simple algorithm for finding a locally optimal solution. 1
6 0.09047509 227 nips-2008-Supervised Exponential Family Principal Component Analysis via Convex Optimization
7 0.090301998 62 nips-2008-Differentiable Sparse Coding
8 0.089140229 215 nips-2008-Sparse Signal Recovery Using Markov Random Fields
9 0.078032434 161 nips-2008-On the Complexity of Linear Prediction: Risk Bounds, Margin Bounds, and Regularization
10 0.077087782 2 nips-2008-A Convex Upper Bound on the Log-Partition Function for Binary Distributions
11 0.077027172 239 nips-2008-Tighter Bounds for Structured Estimation
12 0.076336749 57 nips-2008-Deflation Methods for Sparse PCA
13 0.075556703 79 nips-2008-Exploring Large Feature Spaces with Hierarchical Multiple Kernel Learning
14 0.074356444 80 nips-2008-Extended Grassmann Kernels for Subspace-Based Learning
15 0.072376654 56 nips-2008-Deep Learning with Kernel Regularization for Visual Recognition
16 0.069478519 63 nips-2008-Dimensionality Reduction for Data in Multiple Feature Representations
17 0.06696777 51 nips-2008-Convergence and Rate of Convergence of a Manifold-Based Dimension Reduction Algorithm
18 0.065521896 40 nips-2008-Bounds on marginal probability distributions
19 0.065031983 250 nips-2008-Weighted Sums of Random Kitchen Sinks: Replacing minimization with randomization in learning
20 0.063993111 106 nips-2008-Inferring rankings under constrained sensing
topicId topicWeight
[(0, -0.203), (1, -0.061), (2, -0.112), (3, 0.103), (4, 0.032), (5, 0.005), (6, -0.04), (7, -0.044), (8, -0.027), (9, 0.014), (10, 0.12), (11, -0.084), (12, 0.114), (13, 0.124), (14, 0.037), (15, -0.004), (16, 0.166), (17, 0.022), (18, 0.062), (19, -0.117), (20, 0.102), (21, -0.132), (22, 0.045), (23, -0.095), (24, -0.124), (25, -0.069), (26, -0.092), (27, -0.036), (28, -0.089), (29, 0.184), (30, 0.123), (31, -0.102), (32, -0.06), (33, 0.044), (34, 0.01), (35, 0.097), (36, 0.084), (37, -0.099), (38, -0.012), (39, 0.088), (40, 0.04), (41, 0.039), (42, 0.094), (43, 0.043), (44, -0.051), (45, -0.163), (46, -0.108), (47, -0.017), (48, -0.013), (49, -0.017)]
simIndex simValue paperId paperTitle
same-paper 1 0.93003738 238 nips-2008-Theory of matching pursuit
Author: Zakria Hussain, John S. Shawe-taylor
Abstract: We analyse matching pursuit for kernel principal components analysis (KPCA) by proving that the sparse subspace it produces is a sample compression scheme. We show that this bound is tighter than the KPCA bound of Shawe-Taylor et al [7] and highly predictive of the size of the subspace needed to capture most of the variance in the data. We analyse a second matching pursuit algorithm called kernel matching pursuit (KMP) which does not correspond to a sample compression scheme. However, we give a novel bound that views the choice of subspace of the KMP algorithm as a compression scheme and hence provide a VC bound to upper bound its future loss. Finally we describe how the same bound can be applied to other matching pursuit related algorithms. 1
2 0.79413128 200 nips-2008-Robust Kernel Principal Component Analysis
Author: Minh H. Nguyen, Fernando Torre
Abstract: Kernel Principal Component Analysis (KPCA) is a popular generalization of linear PCA that allows non-linear feature extraction. In KPCA, data in the input space is mapped to higher (usually) dimensional feature space where the data can be linearly modeled. The feature space is typically induced implicitly by a kernel function, and linear PCA in the feature space is performed via the kernel trick. However, due to the implicitness of the feature space, some extensions of PCA such as robust PCA cannot be directly generalized to KPCA. This paper presents a technique to overcome this problem, and extends it to a unified framework for treating noise, missing data, and outliers in KPCA. Our method is based on a novel cost function to perform inference in KPCA. Extensive experiments, in both synthetic and real data, show that our algorithm outperforms existing methods. 1
3 0.5578118 227 nips-2008-Supervised Exponential Family Principal Component Analysis via Convex Optimization
Author: Yuhong Guo
Abstract: Recently, supervised dimensionality reduction has been gaining attention, owing to the realization that data labels are often available and indicate important underlying structure in the data. In this paper, we present a novel convex supervised dimensionality reduction approach based on exponential family PCA, which is able to avoid the local optima of typical EM learning. Moreover, by introducing a sample-based approximation to exponential family models, it overcomes the limitation of the prevailing Gaussian assumptions of standard PCA, and produces a kernelized formulation for nonlinear supervised dimensionality reduction. A training algorithm is then devised based on a subgradient bundle method, whose scalability can be gained using a coordinate descent procedure. The advantage of our global optimization approach is demonstrated by empirical results over both synthetic and real data. 1
4 0.55286676 31 nips-2008-Bayesian Exponential Family PCA
Author: Shakir Mohamed, Zoubin Ghahramani, Katherine A. Heller
Abstract: Principal Components Analysis (PCA) has become established as one of the key tools for dimensionality reduction when dealing with real valued data. Approaches such as exponential family PCA and non-negative matrix factorisation have successfully extended PCA to non-Gaussian data types, but these techniques fail to take advantage of Bayesian inference and can suffer from problems of overfitting and poor generalisation. This paper presents a fully probabilistic approach to PCA, which is generalised to the exponential family, based on Hybrid Monte Carlo sampling. We describe the model which is based on a factorisation of the observed data matrix, and show performance of the model on both synthetic and real data. 1
5 0.50831699 199 nips-2008-Risk Bounds for Randomized Sample Compressed Classifiers
Author: Mohak Shah
Abstract: We derive risk bounds for the randomized classifiers in Sample Compression setting where the classifier-specification utilizes two sources of information viz. the compression set and the message string. By extending the recently proposed Occam’s Hammer principle to the data-dependent settings, we derive point-wise versions of the bounds on the stochastic sample compressed classifiers and also recover the corresponding classical PAC-Bayes bound. We further show how these compare favorably to the existing results.
6 0.48204559 216 nips-2008-Sparse probabilistic projections
7 0.43849018 83 nips-2008-Fast High-dimensional Kernel Summations Using the Monte Carlo Multipole Method
8 0.43202516 61 nips-2008-Diffeomorphic Dimensionality Reduction
9 0.41993046 68 nips-2008-Efficient Direct Density Ratio Estimation for Non-stationarity Adaptation and Outlier Detection
10 0.41650182 113 nips-2008-Kernelized Sorting
11 0.41281787 106 nips-2008-Inferring rankings under constrained sensing
12 0.40595448 188 nips-2008-QUIC-SVD: Fast SVD Using Cosine Trees
13 0.39474925 215 nips-2008-Sparse Signal Recovery Using Markov Random Fields
14 0.37091225 2 nips-2008-A Convex Upper Bound on the Log-Partition Function for Binary Distributions
15 0.34835297 57 nips-2008-Deflation Methods for Sparse PCA
16 0.34740153 189 nips-2008-Rademacher Complexity Bounds for Non-I.I.D. Processes
17 0.34662354 63 nips-2008-Dimensionality Reduction for Data in Multiple Feature Representations
18 0.33746648 62 nips-2008-Differentiable Sparse Coding
19 0.3322559 250 nips-2008-Weighted Sums of Random Kitchen Sinks: Replacing minimization with randomization in learning
20 0.32964593 218 nips-2008-Spectral Clustering with Perturbed Data
topicId topicWeight
[(6, 0.067), (7, 0.082), (12, 0.036), (15, 0.012), (16, 0.011), (25, 0.278), (28, 0.148), (57, 0.049), (59, 0.017), (63, 0.031), (71, 0.028), (77, 0.089), (78, 0.013), (83, 0.06)]
simIndex simValue paperId paperTitle
1 0.87157613 204 nips-2008-Self-organization using synaptic plasticity
Author: Vicençc Gómez, Andreas Kaltenbrunner, Vicente López, Hilbert J. Kappen
Abstract: Large networks of spiking neurons show abrupt changes in their collective dynamics resembling phase transitions studied in statistical physics. An example of this phenomenon is the transition from irregular, noise-driven dynamics to regular, self-sustained behavior observed in networks of integrate-and-fire neurons as the interaction strength between the neurons increases. In this work we show how a network of spiking neurons is able to self-organize towards a critical state for which the range of possible inter-spike-intervals (dynamic range) is maximized. Self-organization occurs via synaptic dynamics that we analytically derive. The resulting plasticity rule is defined locally so that global homeostasis near the critical state is achieved by local regulation of individual synapses. 1
same-paper 2 0.76047403 238 nips-2008-Theory of matching pursuit
Author: Zakria Hussain, John S. Shawe-taylor
Abstract: We analyse matching pursuit for kernel principal components analysis (KPCA) by proving that the sparse subspace it produces is a sample compression scheme. We show that this bound is tighter than the KPCA bound of Shawe-Taylor et al [7] and highly predictive of the size of the subspace needed to capture most of the variance in the data. We analyse a second matching pursuit algorithm called kernel matching pursuit (KMP) which does not correspond to a sample compression scheme. However, we give a novel bound that views the choice of subspace of the KMP algorithm as a compression scheme and hence provide a VC bound to upper bound its future loss. Finally we describe how the same bound can be applied to other matching pursuit related algorithms. 1
3 0.71835732 24 nips-2008-An improved estimator of Variance Explained in the presence of noise
Author: Ralf M. Haefner, Bruce G. Cumming
Abstract: A crucial part of developing mathematical models of information processing in the brain is the quantification of their success. One of the most widely-used metrics yields the percentage of the variance in the data that is explained by the model. Unfortunately, this metric is biased due to the intrinsic variability in the data. We derive a simple analytical modification of the traditional formula that significantly improves its accuracy (as measured by bias) with similar or better precision (as measured by mean-square error) in estimating the true underlying Variance Explained by the model class. Our estimator advances on previous work by a) accounting for overfitting due to free model parameters mitigating the need for a separate validation data set, b) adjusting for the uncertainty in the noise estimate and c) adding a conditioning term. We apply our new estimator to binocular disparity tuning curves of a set of macaque V1 neurons and find that on a population level almost all of the variance unexplained by Gabor functions is attributable to noise. 1
4 0.60354519 216 nips-2008-Sparse probabilistic projections
Author: Cédric Archambeau, Francis R. Bach
Abstract: We present a generative model for performing sparse probabilistic projections, which includes sparse principal component analysis and sparse canonical correlation analysis as special cases. Sparsity is enforced by means of automatic relevance determination or by imposing appropriate prior distributions, such as generalised hyperbolic distributions. We derive a variational Expectation-Maximisation algorithm for the estimation of the hyperparameters and show that our novel probabilistic approach compares favourably to existing techniques. We illustrate how the proposed method can be applied in the context of cryptoanalysis as a preprocessing tool for the construction of template attacks. 1
5 0.59676683 79 nips-2008-Exploring Large Feature Spaces with Hierarchical Multiple Kernel Learning
Author: Francis R. Bach
Abstract: For supervised and unsupervised learning, positive definite kernels allow to use large and potentially infinite dimensional feature spaces with a computational cost that only depends on the number of observations. This is usually done through the penalization of predictor functions by Euclidean or Hilbertian norms. In this paper, we explore penalizing by sparsity-inducing norms such as the ℓ1 -norm or the block ℓ1 -norm. We assume that the kernel decomposes into a large sum of individual basis kernels which can be embedded in a directed acyclic graph; we show that it is then possible to perform kernel selection through a hierarchical multiple kernel learning framework, in polynomial time in the number of selected kernels. This framework is naturally applied to non linear variable selection; our extensive simulations on synthetic datasets and datasets from the UCI repository show that efficiently exploring the large feature space through sparsity-inducing norms leads to state-of-the-art predictive performance.
6 0.59182203 205 nips-2008-Semi-supervised Learning with Weakly-Related Unlabeled Data : Towards Better Text Categorization
7 0.59162992 202 nips-2008-Robust Regression and Lasso
8 0.59120792 245 nips-2008-Unlabeled data: Now it helps, now it doesn't
9 0.58967495 62 nips-2008-Differentiable Sparse Coding
10 0.58960724 194 nips-2008-Regularized Learning with Networks of Features
11 0.58865684 135 nips-2008-Model Selection in Gaussian Graphical Models: High-Dimensional Consistency of \boldmath$\ell 1$-regularized MLE
12 0.5869742 87 nips-2008-Fitted Q-iteration by Advantage Weighted Regression
13 0.58673471 175 nips-2008-PSDBoost: Matrix-Generation Linear Programming for Positive Semidefinite Matrices Learning
14 0.58515531 63 nips-2008-Dimensionality Reduction for Data in Multiple Feature Representations
15 0.58476585 49 nips-2008-Clusters and Coarse Partitions in LP Relaxations
16 0.58460349 21 nips-2008-An Homotopy Algorithm for the Lasso with Online Observations
17 0.58387554 195 nips-2008-Regularized Policy Iteration
18 0.58366495 173 nips-2008-Optimization on a Budget: A Reinforcement Learning Approach
19 0.58343458 37 nips-2008-Biasing Approximate Dynamic Programming with a Lower Discount Factor
20 0.5831461 196 nips-2008-Relative Margin Machines