nips nips2007 nips2007-62 nips2007-62-reference knowledge-graph by maker-knowledge-mining

62 nips-2007-Convex Learning with Invariances


Source: pdf

Author: Choon H. Teo, Amir Globerson, Sam T. Roweis, Alex J. Smola

Abstract: Incorporating invariances into a learning algorithm is a common problem in machine learning. We provide a convex formulation which can deal with arbitrary loss functions and arbitrary losses. In addition, it is a drop-in replacement for most optimization algorithms for kernels, including solvers of the SVMStruct family. The advantage of our setting is that it relies on column generation instead of modifying the underlying optimization problem directly. 1


reference text

[1] Y. Abu-Mostafa. A method for learning from hints. In S. J. Hanson, J. D. Cowan, and C. L. Giles, editors, NIPS 5, 1992.

[2] C. Bhattacharyya, K. S. Pannagadatta, and A. J. Smola. A second order cone programming formulation for classifying missing data. In L. K. Saul, Y. Weiss, and L. Bottou, editors, NIPS 17, 2005.

[3] C. J. C. Burges. Geometry and invariance in kernel based methods. In B. Sch¨ lkopf, C. J. C. Burges, and o A. J. Smola, editors, Advances in Kernel Methods — Support Vector Learning, pages 89–116, Cambridge, MA, 1999. MIT Press.

[4] N. Dalvi, P. Domingos, Mausam, S. Sanghai, and D. Verma. Adversarial classification. In KDD, 2004.

[5] D. DeCoste and B. Sch¨ lkopf. Training invariant support vector machines. Machine Learning, 46:161– o 190, 2002.

[6] M. Ferraro and T. M. Caelli. Lie transformation groups, integral transforms, and invariant pattern recognition. Spatial Vision, 8:33–44, 1994.

[7] A. Globerson and S. Roweis. Nightmare at test time: Robust learning by feature deletion. In ICML, 2006.

[8] T. Graepel and R. Herbrich. Invariant pattern recognition by semidefinite programming machines. In S. Thrun, L. Saul, and B. Sch¨ lkopf, editors, NIPS 16, 2004. o

[9] G. E. Hinton. Learning translation invariant recognition in massively parallel networks. In Proceedings Conference on Parallel Architectures and Laguages Europe, pages 1–13. Springer, 1987.

[10] T. Joachims. Training linear SVMs in linear time. In KDD, 2006.

[11] Y. LeCun, L. D. Jackel, L. Bottou, A. Brunot, C. Cortes, J. S. Denker, H. Drucker, I. Guyon, U. A. M¨ ller, E. S¨ ckinger, P. Simard, and V. Vapnik. Comparison of learning algorithms for handwritten digit u a recognition. In F. Fogelman-Souli´ and P. Gallinari, editors, ICANN, 1995. e

[12] S. Shalev-Shwartz, Y. Singer, and N. Srebro. Pegasos: Primal estimated sub-gradient solver for SVM. In ICML, 2007.

[13] P. Simard, Y. LeCun, and J. Denker. Efficient pattern recognition using a new transformation distance. In S. J. Hanson, J. D. Cowan, and C. L. Giles, editors, NIPS 5, 1993.

[14] B. Taskar, C. Guestrin, and D. Koller. Max-margin Markov networks. In S. Thrun, L. Saul, and B. Sch¨ lkopf, editors, NIPS 16, 2004. o

[15] C.H. Teo, Q. Le, A.J. Smola, and S.V.N. Vishwanathan. A scalable modular convex solver for regularized risk minimization. In KDD, 2007.

[16] I. Tsochantaridis, T. Joachims, T. Hofmann, and Y. Altun. Large margin methods for structured and interdependent output variables. J. Mach. Learn. Res., 6:1453–1484, 2005. 8