jmlr jmlr2011 jmlr2011-91 jmlr2011-91-reference knowledge-graph by maker-knowledge-mining

91 jmlr-2011-The Sample Complexity of Dictionary Learning

Source: pdf

Author: Daniel Vainsencher, Shie Mannor, Alfred M. Bruckstein

Abstract: A large set of signals can sometimes be described sparsely using a dictionary, that is, every element can be represented as a linear combination of few elements from the dictionary. Algorithms for various signal processing applications, including classiﬁcation, denoising and signal separation, learn a dictionary from a given set of signals to be represented. Can we expect that the error in representing by such a dictionary a previously unseen signal from the same source will be of similar magnitude as those for the given examples? We assume signals are generated from a ﬁxed distribution, and study these questions from a statistical learning theory perspective. We develop generalization bounds on the quality of the learned dictionary for two types of constraints on the coefﬁcient selection, as measured by the expected L2 error in representation when the dictionary is used. For the case of l1 regularized coefﬁcient selection we provide a generalnp ln(mλ)/m , where n is the dimension, p is the number of ization bound of the order of O elements in the dictionary, λ is a bound on the l1 norm of the coefﬁcient vector and m is the number of samples, which complements existing results. For the case of representing a new signal as a combination of at most k dictionary elements, we provide a bound of the order O( np ln(mk)/m) under an assumption on the closeness to orthogonality of the dictionary (low Babel function). We further show that this assumption holds for most dictionaries in high dimensions in a strong probabilistic sense. Our results also include bounds that converge as 1/m, not previously known for this problem. We provide similar results in a general setting using kernels with weak smoothness requirements. Keywords: dictionary learning, generalization bound, sparse representation

reference text

Michal Aharon, Michael Elad, and Alfred M. Bruckstein. K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation. IEEE Transactions on Signal Processing, 54 (11):4311–4322, 2006. 3279 VAINSENCHER , M ANNOR AND B RUCKSTEIN Martin Anthony and Peter L. Bartlett. Neural Network Learning: Theoretical Foundations. Cambridge University Press, 1999. Peter L. Bartlett, Tam´ s Linder, and G´ bor Lugosi. The minimax distortion redundancy in empirical a a quantizer design. IEEE Transactions on Information Theory, 44(5):1802–1813, 1998. Peter L. Bartlett, Olivier Bousquet, and Shahar Mendelson. Local Rademacher complexities. Ann. Statist., 33:1497–1537, 2005. G´ rard Biau, Luc Devroye, and G´ bor Lugosi. On the performance of clustering in Hilbert spaces. e a IEEE Transactions on Information Theory, 54(2):781–790, 2008. Gilles Blanchard, Olivier Bousquet, and Laurent Zwald. Statistical properties of kernel principal component analysis. Machine Learning, 66(2):259–294, 2007. Alfred M. Bruckstein, David L. Donoho, and Michael Elad. From sparse solutions of systems of equations to sparse modeling of signals and images. SIAM Review, 51(1):34–81, 2009. Emmanuel J. Candes and Terence Tao. Near-optimal signal recovery from random projections: Universal encoding strategies? IEEE Transactions on Information Theory, 52(12):5406–5425, 2006. Scott S. Chen, David L. Donoho, and Michael A. Saunders. Atomic decomposition by basis pursuit. SIAM Review, 43(1):129–159, 2001. Sheng Chen, Stephen A. Billings, and Wan Luo. Orthogonal least squares methods and their application to non-linear system identiﬁcation. International Journal of Control, 50(5):1873–1896, 1989. Felipe Cucker and Stephen Smale. On the mathematical foundations of learning. Bull. Amer. Math. Soc, 39(1):1–50, 2002. Geoff Davis, St` phane Mallat, and Marco Avellaneda. Adaptive greedy approximations. Construce tive approximation, 13(1):57–98, 1997. David L. Donoho and Michael Elad. Optimally sparse representation in general (nonorthogonal) dictionaries via ℓ1 minimization. Proceedings of the National Academy of Sciences, 100(5): 2197–2202, 2003. Richard J. Dufﬁn and Albert C. Schaeer. A class of nonharmonic Fourier series. Trans. Amer. Math. Soc, 72:341–366, 1952. Ralf Herbrich and Robert Williamson. Algorithmic luckiness. Journal of Machine Learning Research, 3:175–212, 2003. Roger A. Horn and Charles R. Johnson. Matrix Analysis. Cambridge University Press, 1990. Kenneth Kreutz-Delgado, Joseph F. Murray, Bhaskar D. Rao, Kjersti Engan, Te-Won Lee, and Terrance J. Sejnowski. Dictionary learning algorithms for sparse representation. Neural Computation, 15(2):349–396, 2003. 3280 T HE S AMPLE C OMPLEXITY OF D ICTIONARY L EARNING Honglak Lee, Alexis Battle, Rajat Raina, and Andrew Y. Ng. Efﬁcient sparse coding algorithms. In Advances in Neural Information Processing Systems 19, pages 801–808. MIT Press, Cambridge, MA, 2007. Paul L´ vy. Probl` mes concrets d’analyse fonctionnelle. Gauthier-Villars, Paris, 1951. e e Michael S. Lewicki, Terrence J. Sejnowski, and Howard Hughes. Learning overcomplete representations. Neural Computation, 12:337–365, 1998. Julien Mairal, Francis Bach, Jean Ponce, and Guillermo Sapiro. Online learning for matrix factorization and sparse coding. Journal of Machine Learning Research, 11:19–60, 2010. Andreas Maurer and Massimiliano Pontil. K-dimensional coding schemes in hilbert spaces. IEEE Transactions on Information Theory, 56:5839–5846, November 2010. Shahar Mendelson. A few notes on statistical learning theory. Advanced Lectures on Machine Learning, pages 1–40, 2003. Bruno A. Olshausen and David J. Field. Sparse coding with an overcomplete basis set: a strategy employed by V1. Vision Research, 37:3311–3325, 1997. Gabriel Peyr´ . Sparse modeling of textures. Journal of Mathematical Imaging and Vision, 34(1): e 17–31, 2009. Matan Protter and Michael Elad. Sparse and redundant representations and motion-estimation-free algorithm for video denoising. Wavelets XII. Proceedings of the SPIE, 6701:43, 2007. John Shawe-Taylor and Nello Cristianini. Kernel Methods for Pattern Analysis. Cambridge University Press, 2004. John Shawe-Taylor, Christopher K. I. Williams, Nello Cristianini, and Jaz Kandola. On the eigenspectrum of the Gram matrix and the generalization error of kernel-PCA. IEEE Transactions on Information Theory, 51(7):2510–2522, 2005. Joel A. Tropp. Greed is good: Algorithmic results for sparse approximation. IEEE Transactions on Information Theory, 50:2231–2242, 2004. John Wright, Allen Y. Yang, Arvind Ganesh, S. Shankar Sastry, and Yi Ma. Robust face recognition via sparse representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, pages 210–227, 2008. 3281