jmlr jmlr2005 jmlr2005-56 jmlr2005-56-reference knowledge-graph by maker-knowledge-mining

56 jmlr-2005-Maximum Margin Algorithms with Boolean Kernels


Source: pdf

Author: Roni Khardon, Rocco A. Servedio

Abstract: Recent work has introduced Boolean kernels with which one can learn linear threshold functions over a feature space containing all conjunctions of length up to k (for any 1 ≤ k ≤ n) over the original n Boolean features in the input space. This motivates the question of whether maximum margin algorithms such as Support Vector Machines can learn Disjunctive Normal Form expressions in the Probably Approximately Correct (PAC) learning model by using this kernel. We study this question, as well as a variant in which structural risk minimization (SRM) is performed where the class hierarchy is taken over the length of conjunctions. We show that maximum margin algorithms using the Boolean kernels do not PAC learn t(n)term DNF for any t(n) = ω(1), even when used with such a SRM scheme. We also consider PAC learning under the uniform distribution and show that if the kernel uses conjunctions of length √ ˜ ω( n) then the maximum margin hypothesis will fail on the uniform distribution as well. Our results concretely illustrate that margin based algorithms may overfit when learning simple target functions with natural kernels. Keywords: computational learning theory, kernel methods, PAC learning, Boolean functions


reference text

S. Ben-David, N. Eiron, and H.-U. Simon. Limitations of learning via embeddings in euclidean half spaces. Journal of Machine Learning Research, 3:441–461, 2002. A. Blum. Separating distribution-free and mistake-bound learning models over the boolean domain. SIAM Journal on Computing, 23(5):990–1000, 1994. A. Blum, M. Furst, J. Jackson, M. Kearns, Y. Mansour, and S. Rudich. Weakly learning DNF and characterizing statistical query learning using Fourier analysis. In Proceedings of the TwentySixth Annual Symposium on Theory of Computing, pages 253–262, 1994. A. Blum and S. Rudich. Fast learning of k-term DNF formulas with queries. Journal of Computer and System Sciences, 51(3):367–373, 1995. A. Blumer, A. Ehrenfeucht, D. Haussler, and M. Warmuth. Occam’s razor. Information Processing Letters, 24:377–380, 1987. B. Boser, I. Guyon, and V. Vapnik. A training algorithm for optimal margin classifiers. In Proceedings of the Fifth Annual Workshop on Computational Learning Theory, pages 144–152, 1992. N. Bshouty. A subexponential exact learning algorithm for DNF using equivalence queries. Information Processing Letters, 59:37–39, 1996. N. Bshouty and C. Tamon. On the Fourier spectrum of monotone functions. Journal of the ACM, 43(4):747–770, 1996. J. Forster, N. Schmitt, H.-U. Simon, and T. Suttorp. Estimating the optimal margins of embeddings in euclidean half spaces. Machine Learning, 51(3):263–281, 2003. T. Friess, N. Cristianini, and C. Campbell. The kernel adatron algorithm: a fast and simple learning procedure for support vector machine. In Proceedings of the 15th International Conference on Machine Learning, pages 188–196, 1998. C. Gentile. A new approximate maximal margin classification algorithm. Journal of Machine Learning Research, 2:213–242, 2001. T. Hancock and Y. Mansour. Learning monotone k-µ DNF formulas on product distributions. In Proceedings of the Fourth Annual Conference on Computational Learning Theory, pages 179– 193, 1991. J. Jackson. An efficient membership-query algorithm for learning DNF with respect to the uniform distribution. Journal of Computer and System Sciences, 55:414–440, 1997. M. Kearns and U. Vazirani. An introduction to computational learning theory. MIT Press, Cambridge, MA, 1994. R. Khardon. On using the Fourier transform to learn disjoint DNF. Information Processing Letters, 49:219–222, 1994. 1427 K HARDON AND S ERVEDIO R. Khardon, D. Roth, and R. Servedio. Efficiency versus convergence of Boolean kernels for online learning algorithms. In T. G. Dietterich, S. Becker, and Z. Ghahramani, editors, Advances in Neural Information Processing Systems 14, Cambridge, MA, 2002. MIT Press. ˜ 1/3 A. Klivans and R. Servedio. Learning DNF in time 2O(n ) . In Proceedings of the Thirty-Third Annual Symposium on Theory of Computing, pages 258–265, 2001. A. Kowalczyk, A. J. Smola, and R. C. Williamson. Kernel machines and Boolean functions. In T. G. Dietterich, S. Becker, and Z. Ghahramani, editors, Advances in Neural Information Processing Systems 14, Cambridge, MA, 2002. MIT Press. L. Kucera, A. Marchetti-Spaccamela, and M. Protassi. On learning monotone DNF formulae under uniform distributions. Information and Computation, 110:84–95, 1994. E. Kushilevitz and Y. Mansour. Learning decision trees using the Fourier spectrum. SIAM J. on Computing, 22(6):1331–1348, 1993. E. Kushilevitz and D. Roth. On learning visual concepts and DNF formulae. In Proceedings of the Sixth Annual Conference on Computational Learning Theory, pages 317–326, 1993. N. Linial, Y. Mansour, and N. Nisan. Constant depth circuits, Fourier transform and learnability. Journal of the ACM, 40(3):607–620, 1993. Y. Mansour. An o(nlog log n ) learning algorithm for DNF under the uniform distribution. Journal of Computer and System Sciences, 50:543–550, 1995. M. Minsky and S. Papert. Perceptrons: an introduction to computational geometry. MIT Press, Cambridge, MA, 1968. K. Sadohara. Learning of Boolean functions using support vector machines. In Proc. of the 12th International Conference on Algorithmic Learning Theory, pages 106–118. Springer, 2001. LNAI 2225. Y. Sakai and A. Maruoka. Learning monotone log-term DNF formulas under the uniform distribution. Theory of Computing Systems, 33:17–33, 2000. R. Servedio. On PAC learning using Winnow, Perceptron, and a Perceptron-like algorithm. In Proceedings of the Twelfth Annual Conference on Computational Learning Theory, pages 296– 307, 1999. R. Servedio. On learning monotone DNF under product distributions. In Proceedings of the Fourteenth Annual Conference on Computational Learning Theory, pages 473–489, 2001. J. Shawe-Taylor, P. Bartlett, R. Williamson, and M. Anthony. Structural risk minimization over data-dependent hierarchies. IEEE Transactions on Information Theory, 44(5):1926–1940, 1998. J. Shawe-Taylor and N. Cristianini. An introduction to support vector machines. Cambridge University Press, 2000. J. Tarui and T. Tsukiji. Learning DNF by approximating inclusion-exclusion formulae. In Proceedings of the Fourteenth Conference on Computational Complexity, pages 215–220, 1999. 1428 M AXIMUM M ARGIN A LGORITHMS WITH B OOLEAN K ERNELS L. Valiant. A theory of the learnable. Communications of the ACM, 27(11):1134–1142, 1984. K. Verbeurgt. Learning DNF under the uniform distribution in quasi-polynomial time. In Proceedings of the Third Annual Workshop on Computational Learning Theory, pages 314–326, 1990. K. Verbeurgt. Learning sub-classes of monotone DNF on the uniform distribution. In Proceedings of the Ninth Conference on Algorithmic Learning Theory, pages 385–399, 1998. C. Watkins. Kernels from matching operations. Technical Report CSD-TR-98-07, Computer Science Department, Royal Holloway, University of London, 1999. 1429