nips nips2000 nips2000-111 nips2000-111-reference knowledge-graph by maker-knowledge-mining

111 nips-2000-Regularized Winnow Methods

Source: pdf

Author: Tong Zhang

Abstract: In theory, the Winnow multiplicative update has certain advantages over the Perceptron additive update when there are many irrelevant attributes. Recently, there has been much effort on enhancing the Perceptron algorithm by using regularization, leading to a class of linear classification methods called support vector machines. Similarly, it is also possible to apply the regularization idea to the Winnow algorithm, which gives methods we call regularized Winnows. We show that the resulting methods compare with the basic Winnows in a similar way that a support vector machine compares with the Perceptron. We investigate algorithmic issues and learning properties of the derived methods. Some experimental results will also be provided to illustrate different methods.

reference text

[1] lK Anlauf and M. Biehl. The AdaTron: an adaptive perceptron algorithm. Europhys. Lett., 10(7):687-692, 1989.

[2] C. Cortes and V,N. Vapnik. Support vector networks. Machine Learning, 20:273-297, 1995.

[3] I. Dagan, Y. Karov, and D. Roth . Mistake-driven learning in text categorization. In Proceedings of the Second Conference on Empirical Methods in NLP, 1997.

[4] A. Grove and D. Roth. Linear concepts and hidden variables. Machine Learning, 2000. To Appear; early version appeared in NIPS-IO.

[5] A.J. Grove, N. Littlestone, and D. Schuurmans. General convergence results for linear discriminant updates. In Proc. lOthAnnu. Con! on Comput. Learning Theory, pages 171-183,1997.

[6] Tommi Jaakkola, Marina Meila, and Tony Jebara. Maximum entropy discrimination. In S.A. Solla, T.K Leen, and K-R. Miiller, editors, Advances in Neural Information Processing Systems 12, pages 470-476. MIT Press, 2000.

[7] T.S. Jaakkola, Mark Diekhans, and D. Haussler. A discriminative framework for detecting remote protein homologies. Journal of Computational Biology, to appear.

[8] W. Kinzel. Statistical mechanics of the perceptron with maximal stability. In Lecture Notes in Physics, volume 368, pages 175-188. Springer-Verlag, 1990.

[9] l Kivinen and M.K Warmuth. Additive versus exponentiated gradient updates for linear prediction. Journal of Information and Computation, 132: 1-64, 1997.

[10] N. Littlestone. Learning quickly when irrelevant attributes abound: a new linearthreshold algorithm. Machine Learning, 2:285-318, 1988.

[11] M. Opper. Learning times of neural networks: Exact solution for a perceptron algorithm. Phys. Rev. A, 38(7):3824-3826, 1988.

[12] F. Rosenblatt. Principles of Neurodynamics,' Perceptrons and the Theory of Brain Mechanisms. Spartan, New York, 1962. [l3] Bernhard Scholkopf, Christopher l C. Burges, and Alexander l Smola, editors. Advances in Kernel Methods,' Support Vector Learning. The MIT press, 1999.

[14] l Shawe-Taylor, P.L. Bartlett, R.C. Williamson, and M. Anthony. Structural risk minimization over data-dependent hierarchies. IEEE Trans. In! Theory, 44(5):19261940,1998.

[15] V.N. Vapnik. Statistical learning theory. John Wiley & Sons, New York, 1998.

[16] Tong Zhang. Analysis of regularized linear functions for classification problems. Technical Report RC-21572, IBM, 1999. Abstract in NIPS'99, pp. 370-376.