jmlr jmlr2006 jmlr2006-44 jmlr2006-44-reference knowledge-graph by maker-knowledge-mining

44 jmlr-2006-Large Scale Transductive SVMs


Source: pdf

Author: Ronan Collobert, Fabian Sinz, Jason Weston, Léon Bottou

Abstract: We show how the concave-convex procedure can be applied to transductive SVMs, which traditionally require solving a combinatorial search problem. This provides for the first time a highly scalable algorithm in the nonlinear case. Detailed experiments verify the utility of our approach. Software is available at http://www.kyb.tuebingen.mpg.de/bs/people/fabee/transduction. html. Keywords: transduction, transductive SVMs, semi-supervised learning, CCCP


reference text

M. Belkin and P. Niyogi. Using manifold structure for partially labelled classification. In Advances in Neural Information Processing Systems. MIT Press, 2002. 1709 C OLLOBERT, S INZ , W ESTON AND B OTTOU K. Bennett and A. Demiriz. Semi-supervised support vector machines. In M. S. Kearns, S. A. Solla, and D. A. Cohn, editors, Advances in Neural Information Processing Systems 12, pages 368–374. MIT Press, Cambridge, MA, 1998. T. De Bie and N. Cristianini. Convex methods for transduction. In S. Thrun, L. Saul, and B. Sch¨ lkopf, editors, Advances in Neural Information Processing Systems 16. MIT Press, Camo bridge, MA, 2004. A. Bordes, S. Ertekin, J. Weston, and L. Bottou. Fast kernel classifiers with online and active learning. Journal of Machine Learning Research, May 2005. http://jmlr.csail.mit.edu/papers/v6/bordes05a.html. B. E. Boser, I. M. Guyon, and V. N. Vapnik. A training algorithm for optimal margin classifiers. In Proceedings of the 5th Annual ACM Workshop on Computational Learning Theory, pages 144–152, Pittsburgh, PA, 1992. ACM Press. L. Bottou, C. Cortes, and V. Vapnik. On the effective VC dimension. Technical Report bottoueffvc.ps.Z, Neuroprose (ftp://archive.cse.ohio-state.edu/pub/neuroprose), 1994. O. Chapelle and A. Zien. Semi-supervised classification by low density separation. In Proceedings of the Tenth International Workshop on Artificial Intelligence and Statistics, 2005. O. Chapelle, J. Weston, and B. Sch¨ lkopf. Cluster kernels for semi-supervised learning. Neural o Information Processing Systems 15, 2002. R. Collobert, F. Sinz, J. Weston, and L. Bottou. Trading convexity for scalability. In ICML ’06: Proceedings of the 23rd international conference on Machine learning, pages 201–208, New York, NY, USA, 2006. ACM Press. P. Derbeko, R. El-Yaniv, and R. Meir. Explicit learning curves for transduction and application to clustering and compression algorithms. Journal of Artificial Intelligence Research, 22:117–142, 2004. G. Fung and O. Mangasarian. Semi-supervised support vector machines for unlabeled data classification. In Optimisation Methods and Software, pages 1–14. Kluwer Academic Publishers, Boston, 2001. T. Graepel, R. Herbrich, and K. Obermayer. Bayesian transduction. In Advances in Neural Information Processing Systems 12, NIPS, pages 456–462, 2000. T. Joachims. Making large-scale support vector machine learning practical. In B. Sch¨ lkopf, o C. Burges, and A. Smola, editors, Advances in Kernel Methods. The MIT Press, 1999a. T. Joachims. Transductive inference for text classification using support vector machines. In International Conference on Machine Learning, ICML, 1999b. S. Keerthi and D. DeCoste. A modified finite newton method for fast solution of large scale linear svms. Journal of Machine Learning Research, 6:341–361, 2005. S. S. Keerthi and C.-J. Lin. Asymptotic behaviors of support vector machines with gaussian kernel. Neural Comp., 15:2667–1689, 2003. 1710 L ARGE S CALE T RANSDUCTIVE SVM S N. Krause and Y. Singer. Leveraging the margin more carefully. In International Conference on Machine Learning, ICML, 2004. N. D. Lawrence and M. I. Jordan. Semi-supervised learning via gaussian processes. In Advances in Neural Information Processing Systems, NIPS. MIT Press, 2005. H. A. Le Thi. Analyse num´ rique des algorithmes de l’optimisation D.C. Approches locales et e globale. Codes et simulations num´ riques en grande dimension. Applications. PhD thesis, INSA, e Rouen, 1994. Y. LeCun, L. Bottou, G. B. Orr, and K.-R. M¨ ller. Efficient backprop. In G.B. Orr and K.-R. M¨ ller, u u editors, Neural Networks: Tricks of the Trade, pages 9–50. Springer, 1998. D. D. Lewis, Y. Yang, T. Rose, and F. Li. Rcv1: A new benchmark collection for text categorization research. Journal of Machine Learning Research, 5:361–397, 2004. URL http://www.jmlr. org/papers/volume5/lewis04a/lewis04a.pdf. L. Mason, P. L. Bartlett, and J. Baxter. Improved generalization through explicit optimization of margins. Machine Learning, 38(3):243–255, 2000. J. C. Platt. Fast training of support vector machines using sequential minimal optimization. In B. Sch¨ lkopf, C. Burges, and A. Smola, editors, Advances in Kernel Methods. The MIT Press, o 1999. S.A.Nene, S.K.Nayar, and H.Murase. Columbia object image libary (coil-20). Technical Report CUS-005-96, Columbia Univ. USA, Febuary 1996. B. Sch¨ lkopf, A. Smola, and K.-R. M¨ ller. Kernel principal component analysis. In Proceedings o u ICANN97, Springer Lecture Notes in Computer Science, page 583, 1997. X. Shen, G. C. Tseng, X. Zhang, and W. H. Wong. On (psi)-learning. Journal of the American Statistical Association, 98(463):724–734, 2003. V. Sindhwani, P. Niyogi, and M. Belkin. Beyond the point cloud: from transductive to semisupervised learning. In International Conference on Machine Learning, ICML, 2005. A. J. Smola, S. V. N. Vishwanathan, and T. Hofmann. Kernel methods for missing variables. In Proceedings of the Tenth International Workshop on Artificial Intelligence and Statistics, 2005. M. Szummer and T. Jaakkola. Partially labeled classification with markov random walks. NIPS, 14, 2001a. M. Szummer and T. Jaakkola. Partially labeled classification with Markov random walks. Neural Information Processing Systems 14, 2001b. V. Vapnik. The Nature of Statistical Learning Theory. Springer, second edition, 1995. V. N. Vapnik. Estimation of Dependences Based on Empirical Data. Springer Verlag Series in Statistics. Springer Verlag, 1982. 1711 C OLLOBERT, S INZ , W ESTON AND B OTTOU J. Weston, C. Leslie, D. Zhou, A. Elisseeff, and W. S. Noble. Cluster kernels for semi-supervised protein classification. Advances in Neural Information Processing Systems 17, 2003. L. Xu, J. Neufeld, B. Larson, and D. Schuurmans. Maximum margin clustering. In L. K. Saul, Y. Weiss, and L. Bottou, editors, Advances in Neural Information Processing Systems 17, pages 1537–1544. MIT Press, Cambridge, MA, 2005. A. L. Yuille and A. Rangarajan. The concave-convex procedure (CCCP). In T. G. Dietterich, S. Becker, and Z. Ghahramani, editors, Advances in Neural Information Processing Systems 14, Cambridge, MA, 2002. MIT Press. D. Zhou, O. Bousquet, T. N. Lal, J. Weston, and B. Sch¨ lkopf. Learning with local and global o consistency. In Advances in Neural Information Processing Systems, NIPS, 2004. X. Zhu, Z. Ghahramani, and J. Lafferty. Semi-supervised learning using gaussian fields and harmonic functions. In The Twentieth International Conference on Machine Learning, pages 912– 919, 2003. 1712