jmlr jmlr2007 jmlr2007-58 jmlr2007-58-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Roni Khardon, Gabriel Wachman
Abstract: A large number of variants of the Perceptron algorithm have been proposed and partially evaluated in recent work. One type of algorithm aims for noise tolerance by replacing the last hypothesis of the perceptron with another hypothesis or a vote among hypotheses. Another type simply adds a margin term to the perceptron in order to increase robustness and accuracy, as done in support vector machines. A third type borrows further from support vector machines and constrains the update function of the perceptron in ways that mimic soft-margin techniques. The performance of these algorithms, and the potential for combining different techniques, has not been studied in depth. This paper provides such an experimental study and reveals some interesting facts about the algorithms. In particular the perceptron with margin is an effective method for tolerating noise and stabilizing the algorithm. This is surprising since the margin in itself is not designed or used for noise tolerance, and there are no known guarantees for such performance. In most cases, similar performance is obtained by the voted-perceptron which has the advantage that it does not require parameter selection. Techniques using soft margin ideas are run-time intensive and do not give additional performance benefits. The results also highlight the difficulty with automatic parameter selection which is required with some of these variants. Keywords: perceptron algorithm, on-line learning, noise tolerance, kernel methods
B. Boser, I. Guyon, and V. Vapnik. A training algorithm for optimal margin classifiers. In Proceedings of the Fifth Annual Workshop on Computational Learning Theory, pages 144–152, 1992. N. Cesa-Bianchi, A. Conconi, and C. Gentile. On the generalization ability of on-line learning algorithms. Information Theory, IEEE Transactions on, 50(9):2050–2057, 2004. N. Cesa-Bianchi, A. Conconi, and C. Gentile. A second-order Perceptron algorithm. SIAM Journal on Computing, 34(3):640–668, 2005. W. Cohen. Fast effective rule induction. In Proceedings of the Twelfth International Conference on Machine Learning, pages 115–123, 1995. M. Collins. Ranking algorithms for named entity extraction: Boosting and the voted perceptron. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, pages 489–496, 2002. K. Crammer, O. Dekel, S. Shalev-Shwartz, and Y. Singer. Online passive aggressive algorithms. Journal of Machine Learning Research, 7:551–585, 2005. N. Cristianini and J. Shawe-Taylor. An introduction to support vector machines. Cambridge University Press, 2000. O. Dekel and Y. Singer. Data driven online to batch conversions. In Advances in Neural Information Processing Systems 18 (Proceedings of NIPS 2005), 2005. T. Dietterich. An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting, and randomization. Machine Learning, 40:139–158, 2000. T. Dietterich, M. Kearns, and Y. Mansour. Applying the weak learning framework to understand and improve c4.5. In Proceedings of the International Conference on Machine Learning, pages 96–104, 1996. Y. Freund and R. Schapire. Large margin classification using the perceptron algorithm. Machine Learning, 37:277–296, 1999. T. Friess, N. Cristianini, and C. Campbell. The kernel adatron algorithm: A fast and simple learning procedure for support vector machines. In Proceedings of the International Conference on Machine Learning, pages 188–196, 1998. 246 N OISE T OLERANT P ERCEPTRON VARIANTS S. Gallant. Perceptron-based learning algorithms. IEEE Transactions on Neural Networks, 1(2): 179–191, 1990. A. Garg and D. Roth. Margin distribution and learning algorithms. In Proc. of the International Conference on Machine Learning (ICML), pages 210–217, 2003. C. Gentile. A new approximate maximal margin classification algorithm. Journal of Machine Learning Research, 2:213–242, 2001. T. Graepel, R. Herbrich, and R. Williamson. From margin to sparsity. In Advances in Neural Information Processing Systems 13 (Proceedings of NIPS 2000), pages 210–216, 2000. T. Joachims. Making large-scale svm learning practical. In B. Scholkopf, C. Burges, and A. Smola, editors, Advances in Kernel Methods - Support Vector Learning. MIT Press, 1999. M. Kearns, M. Li, L. Pitt, and L. G. Valiant. Recent results on boolean concept learning. In Proceedings of the Fourth International Workshop on Machine Learning, pages 337–352, 1987. M. Kearns, R. Schapire, and L. Sellie. Toward efficient agnostic learning. Machine Learning, 17 (2/3):115–141, 1994. J. Kivinen, A. J. Smola, and R. C. Williamson. Online learning with kernels. IEEE Transactions on Signal Processing, 52(8):2165–2176, 2004. A. Kowalczyk, A. Smola, and R. Williamson. Kernel machines and boolean functions. In Advances in Neural Information Processing Systems 14 (Proceedings of NIPS 2001), pages 439–446, 2001. W. Krauth and M. M´ zard. Learning algorithms with optimal stability in neural networks. Journal e of Physics A, 20(11):745–752, 1987. Y. Li. Selective voting for perception-like online learning. In Proceedings of the International Conference on Machine Learning, pages 559–566, 2000. Y. Li and P. Long. The relaxed online maximum margin algorithm. Machine Learning, 46:361–387, 2002. Y. Li, H. Zaragoza, R. Herbrich, J. Shawe-Taylor, and J. Kandola. The perceptrom algorithm with uneven margins. In International Conference on Machine Learning, pages 379–386, 2002. N. Littlestone. From on-line to batch learning. In Proceedings of the Conference on Computational Learning Theory, pages 269–284, 1989. T.M. Mitchell. Machine Learning. McGraw-Hill, 1997. D.J. Newman, S. Hettich, C.L. Blake, and C.J. Merz. UCI repository of machine learning databases, 1998. URL http://www.ics.uci.edu/˜mlearn/MLRepository.html. A. B. Novikoff. On convergence proofs on perceptrons. Symposium on the Mathematical Theory of Automata, 12:615–622, 1962. F. Rosenblatt. The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review, 65:386–407, 1958. 247 K HARDON AND WACHMAN R.A. Servedio. On PAC learning using Winnow, Perceptron, and a Perceptron-like algorithm. In Proceedings of the Twelfth Annual Conference on Computational learning theory, pages 296– 307, 1999. S. Shalev-Shwartz and Y. Singer. A new perspective on an old perceptron algorithm. In Proceedings of the Sixteenth Annual Conference on Computational Learning Theory, pages 264–278, 2005. P. Tsampouka and J. Shawe-Taylor. Analysis of generic perceptron-like large margin classifiers. In Proceedings of the European Conference on Machine Learning, pages 750–758, 2005. L. G. Valiant. A theory of the learnable. Communications of the ACM, 27(11):1134–1142, 1984. 248