nips nips2008 nips2008-173 nips2008-173-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Paul L. Ruvolo, Ian Fasel, Javier R. Movellan
Abstract: Many popular optimization algorithms, like the Levenberg-Marquardt algorithm (LMA), use heuristic-based “controllers” that modulate the behavior of the optimizer during the optimization process. For example, in the LMA a damping parameter λ is dynamically modified based on a set of rules that were developed using heuristic arguments. Reinforcement learning (RL) is a machine learning approach to learn optimal controllers from examples and thus is an obvious candidate to improve the heuristic-based controllers implicit in the most popular and heavily used optimization algorithms. Improving the performance of off-the-shelf optimizers is particularly important for time-constrained optimization problems. For example the LMA algorithm has become popular for many real-time computer vision problems, including object tracking from video, where only a small amount of time can be allocated to the optimizer on each incoming video frame. Here we show that a popular modern reinforcement learning technique using a very simple state space can dramatically improve the performance of general purpose optimizers, like the LMA. Surprisingly the controllers learned for a particular domain also work well in very different optimization domains. For example we used RL methods to train a new controller for the damping parameter of the LMA. This controller was trained on a collection of classic, relatively small, non-linear regression problems. The modified LMA performed better than the standard LMA on these problems. This controller also dramatically outperformed the standard LMA on a difficult computer vision problem for which it had not been trained. Thus the controller appeared to have extracted control rules that were not just domain specific but generalized across a range of optimization domains. 1
[1] K. Levenberg, “A method for the solution of certain problems in least squares,” Applied Math Quarterly, 1944.
[2] D. Marquardt, “An algorithm for least-squares estimation of nonlinear parameters,” SIAM Journal of Applied Mathematics, 1963.
[3] V. V. Miagkikh and W. F. P. III, “Global search in combinatorial optimization using reinforcement learning algorithms,” in Proceedings of the Congress on Evolutionary Computation, vol. 1. IEEE Press, 6-9 1999, pp. 189–196.
[4] Y. Zhang, “Solving large-scale linear programs by interior-point methods under the MATLAB environment,” Optimization Methods and Software, vol. 10, pp. 1–31, 1998.
[5] L. M. Gambardella and M. Dorigo, “Ant-q: A reinforcement learning approach to the traveling salesman problem,” in International Conference on Machine Learning, 1995, pp. 252–260.
[6] J. A. Boyan and A. W. Moore, “Learning evaluation functions for global optimization and boolean satisfiability,” in AAAI/IAAI, 1998, pp. 3–10.
[7] R. Moll, T. J. Perkins, and A. G. Barto, “Machine learning for subproblem selection,” in ICML ’00: Proceedings of the Seventeenth International Conference on Machine Learning. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 2000, pp. 615–622.
[8] M. I. Lourakis and A. A. Argyros, “Is levenberg-marquardt the most efficient optimization algorithm for implementing bundle adjustment?” Proceedings of ICCV, 2005.
[9] D. Cristinacce and T. F. Cootes, “Feature detection and tracking with constrained local models,” BMVC, pp. 929–938, 2006.
[10] M. Pollefeys, L. V. Gool, M. Vergauwen, F. Verbiest, K. Cornelis, J. Tops, and R. Koch, “Visual modeling with a hand-held camera,” IJCV, vol. 59, no. 3, pp. 207–232, 2004.
[11] P. Beardsley, P. Torr, and A. Zisserman, “3d model acquisition from extended image sequences.” Proceedings of ECCV, pp. 683–695, 1996.
[12] M. Lagoudakis and R. Parr, “Least-squares policy iteration,” Journal of Machine Learning Research, 2003.
[13] H. B. Nielsen, “Uctp problems for unconstrained optimization,” Technical Report, Technical University of Denmark, 2000.
[14] P. Viola and M. Jones, “Robust real-time object detection,” International Journal of Computer Vision, 2002.
[15] P. Buhlmann and B. Yu, “Boosting with the l2 loss: Regression and classification,” Journal of the American Statistical Association, 2003. 8