jmlr jmlr2008 jmlr2008-51 jmlr2008-51-reference knowledge-graph by maker-knowledge-mining

51 jmlr-2008-Learning Similarity with Operator-valued Large-margin Classifiers

Source: pdf

Author: Andreas Maurer

Abstract: A method is introduced to learn and represent similarity with linear operators in kernel induced Hilbert spaces. Transferring error bounds for vector valued large-margin classiﬁers to the setting of Hilbert-Schmidt operators leads to dimension free bounds on a risk functional for linear representations and motivates a regularized objective functional. Minimization of this objective is effected by a simple technique of stochastic gradient descent. The resulting representations are tested on transfer problems in image processing, involving plane and spatial geometric invariants, handwritten characters and face recognition. Keywords: learning similarity, similarity, transfer learning

reference text

M. Anthony and P. Bartlett. Learning in Neural Networks: Theoretical Foundations. Cambridge University Press, 1999. A. Argyriou, T. Evgeniou, and M. Pontil. Multi-task feature learning. In Advances in Neural Information Processing Systems, 2006. P. L. Bartlett and S. Mendelson. Rademacher and Gaussian complexities: Risk bounds and structural results. Journal of Machine Learning Research, 2002. P. Bartlett, O. Bousquet and S. Mendelson. Local Rademacher complexities. Available online: http://www.stat.berkeley.edu/˜bartlett/papers/bbm-lrc-02b.pdf. A. Bar-Hillel, T. Hertz, N. Shental, and D. Weinshall. Learning a Mahalanobis metric from equivalence constraints. Journal of Machine Learning Research, 6: 937–965, 2005. J. Baxter. Theoretical models of learning to learn. In Learning to Learn, S. Thrun and L. Pratt, Eds., Springer, 1998. J. Baxter. A model of inductive bias learning. Journal of Artiﬁcial Intelligence Research, 12: 149– 198, 2000. R. Caruana. Multitask learning. In Learning to Learn, S. Thrun and L. Pratt, Eds., Springer, 1998. S. Chopra, R. Hadsell, and Y. LeCun. Learning a similarity metric discriminatively, with application to face veriﬁcation. In CVPR, 2005. N. Cristianini and J. Shawe-Taylor. Support Vector Machines. Cambridge University Press, 2000. T. Evgeniou and M. Pontil. Regularized multi-task learning. In Proc. Conference on Knowledge Discovery and Data Mining, 2004. F. Fleuret and G. Blanchard. Pattern recognition from one example by chopping. In Advances in Neural Information Processing Systems, 2005. K. Fukunaga. Introduction to Statistical Pattern Classiﬁcation. Academic Press, 1990. J. Goldberger, S. Roweis, G. Hinton, and R. Salakhutdinov. Neighbourhood component analysis. In Advances in Neural Information Processing Systems, 2004. V. Koltchinskii and D. Panchenko. Empirical margin distributions and bounding the generalization error of combined classiﬁers. The Annals of Statistics, 30(1): 1–50, 2002. A. Maurer. Generalization bounds for subspace selection and hyperbolic PCA. In Subspace, Latent Structure and Feature Selection. LNCS 3940: 185–197, Springer, 2006a. A. Maurer. Bounds for linear multi-task learning. Journal of Machine Learning Research, 7:117– 139, 2006b. A. Maurer. Learning to compare using operator-valued large-margin classiﬁers. In Advances in Neural Information Processing Systems, 2006c. 1081 M AURER C. McDiarmid. Concentration. In Probabilistic Methods of Algorithmic Discrete Mathematics, pages 195–248. Springer, Berlin, 1998. M. Reed and B. Simon. Functional analysis. In Methods of Mathematical Physics, Academic Press, 1980. A. Robins. Transfer in cognition. In Learning to Learn, S. Thrun and L. Pratt, Eds., Springer, 1998. J. Shawe-Taylor and N. Christianini. Estimating the moments of a random vector. In Proceedings of GRETSI 2003 Conference I, pages 47–52, 2003. B. Simon. Trace Ideals and Their Applications. Cambridge University Press, London, 1979. S. Thrun and T. M. Mitchell. Learning one more thing. In Proceedings of IJCAI, 1995. S. Thrun. Lifelong learning algorithms. In Learning to Learn, S. Thrun and L. Pratt, Eds., Springer, 1998. E. P. Xing, A. Y. Ng, M. I. Jordan and S. Russell. Distance metric learning, with application to clustering with side information. In S. Becker, S. Thrun, and K. Obermayer, Eds., Advances in Neural Information Processing Systems 14, MIT Press, Cambridge, MA, 2002. J. Ye and T. Xiong. Computational and theoretical analysis of null-space and orthogonal linear discriminant analysis. Journal of Machine Learning Research, 7:1183–1204, 2006. 1082