nips nips2009 nips2009-113 nips2009-113-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Guy Shani, Christopher Meek
Abstract: An automated recovery system is a key component in a large data center. Such a system typically employs a hand-made controller created by an expert. While such controllers capture many important aspects of the recovery process, they are often not systematically optimized to reduce costs such as server downtime. In this paper we describe a passive policy learning approach for improving existing recovery policies without exploration. We explain how to use data gathered from the interactions of the hand-made controller with the system, to create an improved controller. We suggest learning an indefinite horizon Partially Observable Markov Decision Process, a model for decision making under uncertainty, and solve it using a point-based algorithm. We describe the complete process, starting with data gathering, model learning, model checking procedures, and computing a policy. 1
[1] Leonard E. Baum, Ted Petrie, George Soules, and Norman Weiss. A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. The Annals of Mathematical Statistics, 41(1):164–171, 1970.
[2] Lonnie Chrisman. Reinforcement learning with perceptual aliasing: The perceptual distinctions approach. In In Proceedings of the Tenth National Conference on Artificial Intelligence, pages 183–188. AAAI Press, 1992. 8
[3] Andrew Gelman, John B. Carlin, Hal S. Stern, and Donald B. Rubin. Bayesian Data Analysis. Chapman and Hall, 1996.
[4] Eric A. Hansen. Indefinite-horizon POMDPs with action-based termination. In AAAI, pages 1237–1242, 2007.
[5] David Heckerman, John S. Breese, and Koos Rommelse. Decision-theoretic troubleshooting. Commun. ACM, 38(3):49–57, 1995.
[6] Michael Isard. Autopilot: automatic data center management. Operating Systems Review, 41(2):60–67, 2007.
[7] Kaustubh R. Joshi, William H. Sanders, Matti A. Hiltunen, and Richard D. Schlichting. Automatic modeldriven recovery in distributed systems. In SRDS, pages 25–38, 2005.
[8] Michael L. Littman and Nishkam Ravi. An instance-based state representation for network repair. In in Proceedings of the Nineteenth National Conference on Artificial Intelligence (AAAI, pages 287–292, 2004.
[9] Andrew Kachites Mccallum. Reinforcement learning with selective perception and hidden state. PhD thesis, 1996. Supervisor-Ballard, Dana.
[10] Joelle Pineau, Geoffrey Gordon, and Sebastian Thrun. Point-based value iteration: An anytime algorithm for POMDPs. In International Joint Conference on Artificial Intelligence (IJCAI), pages 1025 – 1032, August 2003.
[11] Guy Shani and Ronen I. Brafman. Resolving perceptual aliasing in the presence of noisy sensors. In NIPS, 2004.
[12] R. D. Smallwood and E. J. Sondik. The optimal control of partially observable Markov decision processes over a finite horizon. Operations Research, 21:1071–1098, 1973.
[13] Matthijs T. J. Spaan and Nikos Vlassis. Perseus: Randomized point-based value iteration for POMDPs. Journal of Artificial Intelligence Research, 24:195–220, 2005.
[14] Richard S. Sutton and Andrew Barto. Reinforcement Learning: An Introduction. MIT Press, 1998.
[15] Daan Wierstra and Marco Wiering. Utile distinction hidden Markov models. In ICML ’04: Proceedings of the twenty-first international conference on Machine learning, page 108, New York, NY, USA, 2004. ACM.
[16] Valentina Bayer Zubek and Thomas G. Dietterich. Integrating learning from examples into the search for diagnostic policies. J. Artif. Intell. Res. (JAIR), 24:263–303, 2005. 9