nips nips2008 nips2008-238 nips2008-238-reference knowledge-graph by maker-knowledge-mining

238 nips-2008-Theory of matching pursuit


Source: pdf

Author: Zakria Hussain, John S. Shawe-taylor

Abstract: We analyse matching pursuit for kernel principal components analysis (KPCA) by proving that the sparse subspace it produces is a sample compression scheme. We show that this bound is tighter than the KPCA bound of Shawe-Taylor et al [7] and highly predictive of the size of the subspace needed to capture most of the variance in the data. We analyse a second matching pursuit algorithm called kernel matching pursuit (KMP) which does not correspond to a sample compression scheme. However, we give a novel bound that views the choice of subspace of the KMP algorithm as a compression scheme and hence provide a VC bound to upper bound its future loss. Finally we describe how the same bound can be applied to other matching pursuit related algorithms. 1


reference text

[1] M. Anthony. Partitioning points by parallel planes. Discrete Mathematics, 282:17–21, 2004.

[2] S. Floyd and M. Warmuth. Sample compression, learnability, and the Vapnik-Chervonenkis dimension. Machine Learning, 21(3):269–304, 1995.

[3] N. Littlestone and M. K. Warmuth. Relating data compression and learnability. Technical report, University of California Santa Cruz, Santa Cruz, CA, 1986.

[4] S. Mallat and Z. Zhang. Matching pursuit with time-frequency dictionaries. IEEE Transactions on Signal Processing, 41(12):3397–3415, 1993.

[5] B. Sch¨ lkopf and A. Smola. Learning with Kernels. MIT Press, Cambridge, MA, 2002. o

[6] J. Shawe-Taylor and N. Cristianini. Kernel Methods for Pattern Analysis. Cambridge University Press, Cambridge, U.K., 2004.

[7] J. Shawe-Taylor, C. K. I. Williams, N. Cristianini, and J. Kandola. On the eigenspectrum of the Gram matrix and the generalization error of kernel-PCA. IEEE Transactions on Information Theory, 51(7):2510–2522, 2005.

[8] A. J. Smola and B. Sch¨ lkopf. Sparse greedy matrix approximation for machine learning. In o Proceedings of 17th International Conference on Machine Learning, pages 911–918. Morgan Kaufmann, San Francisco, CA, 2000.

[9] V. N. Vapnik and A. Y. Chervonenkis. On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability and its Applications, 16(2):264–280, 1971.

[10] P. Vincent and Y. Bengio. Kernel matching pursuit. Machine Learning, 48:165–187, 2002.

[11] C. K. I. Williams and M. Seeger. Using the Nystr¨ m method to speed up kernel machines. In o Advances in Neural Information Processing Systems, volume 13, pages 682–688. MIT Press, 2001. 8