nips nips2004 nips2004-119 nips2004-119-reference knowledge-graph by maker-knowledge-mining

119 nips-2004-Mistake Bounds for Maximum Entropy Discrimination


Source: pdf

Author: Philip M. Long, Xinyu Wu

Abstract: We establish a mistake bound for an ensemble method for classification based on maximizing the entropy of voting weights subject to margin constraints. The bound is the same as a general bound proved for the Weighted Majority Algorithm, and similar to bounds for other variants of Winnow. We prove a more refined bound that leads to a nearly optimal algorithm for learning disjunctions, again, based on the maximum entropy principle. We describe a simplification of the on-line maximum entropy method in which, after each iteration, the margin constraints are replaced with a single linear inequality. The simplified algorithm, which takes a similar form to Winnow, achieves the same mistake bounds. 1


reference text

[1] D. Angluin and L. Valiant. Fast probabilistic algorithms for Hamiltonion circuits and matchings. Journal of Computer and System Sciences, 18(2):155–193, 1979.

[2] A. L. Berger, S. Della Pietra, and V. J. Della Pietra. A maximum entropy approach to natural language processing. Computational Linguistics, 22(1):39–71, 1996.

[3] D. Blackwell. An analog of the minimax theorem for vector payoffs. Pac. J. Math., 6:1–8, 1956.

[4] A. Blum, 2002. http://www-2.cs.cmu.edu/∼avrim/ML02/lect0418.txt.

[5] N. Cesa-Bianchi, A. Krogh, and M. Warmuth. Bounds on approximate steepest descent for likelihood maximization in exponential families. IEEE Transactions on Information Theory, 40(4):1215–1218, 1994.

[6] T. Cover and J. Thomas. Elements of Information Theory. Wiley, 1991.

[7] K. Crammer, O. Dekel, S. Shalev-Shwartz, and Y. Singer. Online passive-aggressive algorithms. NIPS, 2003.

[8] Koby Crammer and Yoram Singer. Ultraconservative online algorithms for multiclass problems. In COLT, pages 99–115, 2001.

[9] I. Csisz´ r. I-divergence geometry of probability distributions and minimization proba lems. Annals of Probability, 3:146–158, 1975.

[10] D. P. Dubhashi and A. Panconesi. Concentration of measure for the analysis of randomized algorithms, 1998. Monograph.

[11] Geoffrey J. Gordon. Regret bounds for prediction problems. In Proc. 12th Annu. Conf. on Comput. Learning Theory, pages 29–40. ACM Press, New York, NY, 1999.

[12] D. P. Helmbold, N. Littlestone, and P. M. Long. On-line learning with linear loss constraints. Information and Computation, 161(2):140–171, 2000.

[13] M. Herbster and M. K. Warmuth. Tracking the best linear predictor. Journal of Machine Learning Research, 1:281–309, 2001.

[14] T. Jaakkola, M. Meila, and T. Jebara. Maximum entropy discrimination. NIPS, 1999.

[15] J. Kivinen and M. Warmuth. Boosting as entropy projection. COLT, 1999.

[16] J. Kivinen and M. K. Warmuth. Additive versus exponentiated gradient updates for linear prediction. Information and Computation, 132(1):1–63, 1997.

[17] J. Langford, M. Seeger, and N. Megiddo. An improved predictive accuracy bound for averaging classifiers. ICML, pages 290–297, 2001.

[18] Y. Li and P. M. Long. The relaxed online maximum margin algorithm. Machine Learning, 46(1-3):361–387, 2002.

[19] N. Littlestone. Mistake Bounds and Logarithmic Linear-threshold Learning Algorithms. PhD thesis, UC Santa Cruz, 1989.

[20] N. Littlestone, P. M. Long, and M. K. Warmuth. On-line learning of linear functions. Computational Complexity, 5:1–23, 1995. Preliminary version in STOC’91.

[21] N. Littlestone and M. K. Warmuth. The weighted majority algorithm. Information and Computation, 108:212–261, 1994.

[22] A. B. J. Novikoff. On convergence proofs on perceptrons. In Proceedings of the Symposium on the Mathematical Theory of Automata, pages 615–622, 1962.

[23] M. S. Pinsker. Information and Information Stability of Random Variables and Processes. Holden-Day, 1964.

[24] F. Topsoe. Some inequalities for information divergence and related measures of discrimination. IEEE Trans. Inform. Theory, 46(4):1602–1609, 2001.

[25] P. Vaidya. A new algorithm for minimizing convex functions over convex sets. FOCS, pages 338–343, 1989.

[26] T. Zhang. Regularized winnow methods. NIPS, pages 703–709, 2000.

[27] T. Zhang. A sequential approximation bound for some sample-dependent convex optimization problems with applications in learning. COLT, pages 65–81, 2001.