nips nips2012 nips2012-188 nips2012-188-reference knowledge-graph by maker-knowledge-mining

188 nips-2012-Learning from Distributions via Support Measure Machines

Source: pdf

Author: Krikamol Muandet, Kenji Fukumizu, Francesco Dinuzzo, Bernhard Schölkopf

Abstract: This paper presents a kernel-based discriminative learning framework on probability measures. Rather than relying on large collections of vectorial training examples, our framework learns using a collection of probability distributions that have been constructed to meaningfully represent training data. By representing these probability distributions as mean embeddings in the reproducing kernel Hilbert space (RKHS), we are able to apply many standard kernel-based learning techniques in straightforward fashion. To accomplish this, we construct a generalization of the support vector machine (SVM) called a support measure machine (SMM). Our analyses of SMMs provides several insights into their relationship to traditional SVMs. Based on such insights, we propose a ﬂexible SVM (FlexSVM) that places different kernel functions on each training example. Experimental results on both synthetic and real-world data demonstrate the effectiveness of our proposed framework. 1

reference text

[1] Y. H. Yang and T. Speed. Design issues for cDNA microarray experiments. Nat. Rev. Genet., 3(8):579–588, 2002.

[2] T. Jebara, R. Kondor, A. Howard, K. Bennett, and N. Cesa-bianchi. Probability product kernels. Journal of Machine Learning Research, 5:819–844, 2004.

[3] A. Bhattacharyya. On a measure of divergence between two statistical populations deﬁned by their probability distributions. Bull. Calcutta Math Soc., 1943.

[4] P. J. Moreno, P. P. Ho, and N. Vasconcelos. A Kullback-Leibler divergence based kernel for SVM classiﬁcation in multimedia applications. In Proceedings of Advances in Neural Information Processing Systems. MIT Press, 2004.

[5] M. Hein and O. Bousquet. Hilbertian metrics and positive deﬁnite kernels on probability. In Proceedings of The 12th International Conference on Artiﬁcial Intelligence and Statistics, pages 136–143, 2005.

[6] M. Cuturi, K. Fukumizu, and J-P. Vert. Semigroup kernels on measures. Journal of Machine Learning Research, 6:1169–1198, 2005.

[7] Andr´ F. T. Martins, Noah A. Smith, Eric P. Xing, Pedro M. Q. Aguiar, and M´ rio A. T. e a Figueiredo. Nonextensive information theoretic kernels on measures. Journal of Machine Learning Research, 10:935–975, 2009.

[8] A. Berlinet and Thomas C. Agnan. Reproducing kernel Hilbert spaces in probability and statistics. Kluwer Academic Publishers, 2004.

[9] A. Smola, A. Gretton, L. Song, and B. Sch¨ lkopf. A hilbert space embedding for distributions. o In Proceedings of the 18th International Conference on Algorithmic Learning Theory, pages 13–31. Springer-Verlag, 2007.

[10] B. K. Sriperumbudur, A. Gretton, K. Fukumizu, B. Sch¨ lkopf, and Gert R. G. Lanckriet. o Hilbert space embeddings and metrics on probability measures. Journal of Machine Learning Research, 99:1517–1561, 2010.

[11] B. Sch¨ lkopf, R. Herbrich, and A. J. Smola. A generalized representer theorem. In COLT o ’01/EuroCOLT ’01, pages 416–426. Springer-Verlag, 2001.

[12] F. Dinuzzo and B. Sch¨ lkopf. The representer theorem for Hilbert spaces: a necessary and o sufﬁcient condition. In Advances in Neural Information Processing Systems 25, pages 189– 196. 2012.

[13] I. Steinwart. On the inﬂuence of the kernel on the consistency of support vector machines. Journal of Machine Learning Research, 2:67–93, 2001.

[14] A. Christmann and I. Steinwart. Universal kernels on non-standard input spaces. In Proceedings of Advances in Neural Information Processing Systems, pages 406–414. 2010.

[15] N. A. Mehta and A. G. Gray. Generative and latent mean map kernels. CoRR, abs/1005.0188, 2010.

[16] G. Blanchard, G. Lee, and C. Scott. Generalizing from several related classiﬁcation tasks to a new unlabeled sample. In Advances in Neural Information Processing Systems 24, pages 2178–2186. 2011.

[17] P. K. Shivaswamy, C. Bhattacharyya, and A. J. Smola. Second order cone programming approaches for handling missing and uncertain data. Journal of Machine Learning Research, 7:1283–1314, 2006.

[18] H.S. Anderson and M.R. Gupta. Expected kernel for missing features in support vector machines. In Statistical Signal Processing Workshop, pages 285–288, 2011.

[19] L. Fei-fei. A bayesian hierarchical model for learning natural scene categories. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 524–531, 2005.

[20] D. G. Lowe. Object recognition from local scale-invariant features. In Proceedings of the International Conference on Computer Vision, pages 1150–1157, Washington, DC, USA, 1999.

[21] A. Vedaldi, V. Gulshan, M. Varma, and A. Zisserman. Multiple kernels for object detection. In Proceedings of the International Conference on Computer Vision, pages 606–613, 2009. 9