nips nips2010 nips2010-269 nips2010-269-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Uwe Dick, Peter Haider, Thomas Vanck, Michael Brückner, Tobias Scheffer
Abstract: We study a setting in which Poisson processes generate sequences of decisionmaking events. The optimization goal is allowed to depend on the rate of decision outcomes; the rate may depend on a potentially long backlog of events and decisions. We model the problem as a Poisson process with a throttling policy that enforces a data-dependent rate limit and reduce the learning problem to a convex optimization problem that can be solved efficiently. This problem setting matches applications in which damage caused by an attacker grows as a function of the rate of unsuppressed hostile events. We report on experiments on abuse detection for an email service. 1
[1] J.A. Bagnell, S. Kakade, A. Ng, and J. Schneider. Policy search by dynamic programming. Advances in Neural Information Processing Systems, 16, 2004.
[2] D. Blatt and A.O. Hero. From weighted classification to policy search. Advances in Neural Information Processing Systems, 18, 2006.
[3] C. Dimitrakakis and M.G. Lagoudakis. Rollout sampling approximate policy iteration. Machine Learning, 72(3):157–171, 2008. 8
[4] M. Ghavamzadeh and Y. Engel. Bayesian policy gradient algorithms. Advances in Neural Information Processing Systems, 19, 2007.
[5] D.L. Jagerman, B. Melamed, and W. Willinger. Stochastic modeling of traffic processes. Frontiers in queueing: models, methods and problems, pages 271–370, 1996.
[6] M.G. Lagoudakis and R. Parr. Reinforcement learning as classification: Leveraging modern classifiers. In Proceedings of the 20th International Conference on Machine Learning, 2003.
[7] J. Langford and B. Zadrozny. Relating reinforcement learning performance to classification performance. In Proceedings of the 22nd International Conference on Machine learning, 2005.
[8] R.S. Sutton, D. McAllester, S. Singh, and Y. Mansour. Policy gradient methods for reinforcement learning with function approximation. Advances in Neural Information Processing Systems, 12, 2000. 9