jmlr jmlr2008 jmlr2008-51 jmlr2008-51-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Andreas Maurer
Abstract: A method is introduced to learn and represent similarity with linear operators in kernel induced Hilbert spaces. Transferring error bounds for vector valued large-margin classifiers to the setting of Hilbert-Schmidt operators leads to dimension free bounds on a risk functional for linear representations and motivates a regularized objective functional. Minimization of this objective is effected by a simple technique of stochastic gradient descent. The resulting representations are tested on transfer problems in image processing, involving plane and spatial geometric invariants, handwritten characters and face recognition. Keywords: learning similarity, similarity, transfer learning
M. Anthony and P. Bartlett. Learning in Neural Networks: Theoretical Foundations. Cambridge University Press, 1999. A. Argyriou, T. Evgeniou, and M. Pontil. Multi-task feature learning. In Advances in Neural Information Processing Systems, 2006. P. L. Bartlett and S. Mendelson. Rademacher and Gaussian complexities: Risk bounds and structural results. Journal of Machine Learning Research, 2002. P. Bartlett, O. Bousquet and S. Mendelson. Local Rademacher complexities. Available online: http://www.stat.berkeley.edu/˜bartlett/papers/bbm-lrc-02b.pdf. A. Bar-Hillel, T. Hertz, N. Shental, and D. Weinshall. Learning a Mahalanobis metric from equivalence constraints. Journal of Machine Learning Research, 6: 937–965, 2005. J. Baxter. Theoretical models of learning to learn. In Learning to Learn, S. Thrun and L. Pratt, Eds., Springer, 1998. J. Baxter. A model of inductive bias learning. Journal of Artificial Intelligence Research, 12: 149– 198, 2000. R. Caruana. Multitask learning. In Learning to Learn, S. Thrun and L. Pratt, Eds., Springer, 1998. S. Chopra, R. Hadsell, and Y. LeCun. Learning a similarity metric discriminatively, with application to face verification. In CVPR, 2005. N. Cristianini and J. Shawe-Taylor. Support Vector Machines. Cambridge University Press, 2000. T. Evgeniou and M. Pontil. Regularized multi-task learning. In Proc. Conference on Knowledge Discovery and Data Mining, 2004. F. Fleuret and G. Blanchard. Pattern recognition from one example by chopping. In Advances in Neural Information Processing Systems, 2005. K. Fukunaga. Introduction to Statistical Pattern Classification. Academic Press, 1990. J. Goldberger, S. Roweis, G. Hinton, and R. Salakhutdinov. Neighbourhood component analysis. In Advances in Neural Information Processing Systems, 2004. V. Koltchinskii and D. Panchenko. Empirical margin distributions and bounding the generalization error of combined classifiers. The Annals of Statistics, 30(1): 1–50, 2002. A. Maurer. Generalization bounds for subspace selection and hyperbolic PCA. In Subspace, Latent Structure and Feature Selection. LNCS 3940: 185–197, Springer, 2006a. A. Maurer. Bounds for linear multi-task learning. Journal of Machine Learning Research, 7:117– 139, 2006b. A. Maurer. Learning to compare using operator-valued large-margin classifiers. In Advances in Neural Information Processing Systems, 2006c. 1081 M AURER C. McDiarmid. Concentration. In Probabilistic Methods of Algorithmic Discrete Mathematics, pages 195–248. Springer, Berlin, 1998. M. Reed and B. Simon. Functional analysis. In Methods of Mathematical Physics, Academic Press, 1980. A. Robins. Transfer in cognition. In Learning to Learn, S. Thrun and L. Pratt, Eds., Springer, 1998. J. Shawe-Taylor and N. Christianini. Estimating the moments of a random vector. In Proceedings of GRETSI 2003 Conference I, pages 47–52, 2003. B. Simon. Trace Ideals and Their Applications. Cambridge University Press, London, 1979. S. Thrun and T. M. Mitchell. Learning one more thing. In Proceedings of IJCAI, 1995. S. Thrun. Lifelong learning algorithms. In Learning to Learn, S. Thrun and L. Pratt, Eds., Springer, 1998. E. P. Xing, A. Y. Ng, M. I. Jordan and S. Russell. Distance metric learning, with application to clustering with side information. In S. Becker, S. Thrun, and K. Obermayer, Eds., Advances in Neural Information Processing Systems 14, MIT Press, Cambridge, MA, 2002. J. Ye and T. Xiong. Computational and theoretical analysis of null-space and orthogonal linear discriminant analysis. Journal of Machine Learning Research, 7:1183–1204, 2006. 1082