nips nips2013 nips2013-137 nips2013-137-reference knowledge-graph by maker-knowledge-mining

137 nips-2013-High-Dimensional Gaussian Process Bandits


Source: pdf

Author: Josip Djolonga, Andreas Krause, Volkan Cevher

Abstract: Many applications in machine learning require optimizing unknown functions defined over a high-dimensional space from noisy samples that are expensive to obtain. We address this notoriously hard challenge, under the assumptions that the function varies only along some low-dimensional subspace and is smooth (i.e., it has a low norm in a Reproducible Kernel Hilbert Space). In particular, we present the SI-BO algorithm, which leverages recent low-rank matrix recovery techniques to learn the underlying subspace of the unknown function and applies Gaussian Process Upper Confidence sampling for optimization of the function. We carefully calibrate the exploration–exploitation tradeoff by allocating the sampling budget to subspace estimation and function optimization, and obtain the first subexponential cumulative regret bounds and convergence rates for Bayesian optimization in high-dimensions under noisy observations. Numerical results demonstrate the effectiveness of our approach in difficult scenarios. 1


reference text

[1] D. Lizotte, T. Wang, M. Bowling, and D. Schuurmans. Automatic gait optimization with gaussian process regression. In Proc. of IJCAI, pages 944–949, 2007.

[2] J. Bergstra and Y. Bengio. Random search for hyper-parameter optimization. The Journal of Machine Learning Research, 13:281–305, 2012.

[3] J. Snoek, H. Larochelle, and R. P. Adams. Practical bayesian optimization of machine learning algorithms. In Neural Information Processing Systems, 2012.

[4] Ker-Chau Li. Sliced inverse regression for dimension reduction. Journal of the American Statistical Association, 86(414):316–327, 1991. 8

[5] G. Raskutti, M.J. Wainwright, and B. Yu. Minimax rates of estimation for high-dimensional linear regression over q -balls. Information Theory, IEEE Transactions on, 57(10):6976–6994, 2011.

[6] S. Mukherjee, Q. Wu, and D. Zhou. Learning gradients on manifolds. Bernoulli, 16(1):181–207, 2010.

[7] H. Tyagi and V. Cevher. Learning non-parametric basis independent models from point queries via lowrank methods. Applied and Computational Harmonic Analysis, (0):–, 2014.

[8] H. Robbins. Some aspects of the sequential design of experiments. Bulletin of the American Mathematical Society, 58(5):527–535, 1952.

[9] P. Auer. Using confidence bounds for exploitation-exploration trade-offs. The Journal of Machine Learning Research, 3:397–422, 2003.

[10] R. Kleinberg, A. Slivkins, and E. Upfal. Multi-armed bandits in metric spaces. In STOC, pages 681–690, 2008.

[11] S. Bubeck, R. Munos, G. Stoltz, and C. Szepesv´ ri. Online optimization in X-armed bandits. In NIPS, a 2008.

[12] N. Srinivas, A. Krause, S. Kakade, and M. Seeger. Information-theoretic regret bounds for gaussian process optimization in the bandit setting. IEEE Transactions on Information Theory, 58(5):3250–3265, May 2012.

[13] E. Brochu, V.M. Cora, and N. De Freitas. A tutorial on bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning. arXiv preprint arXiv:1012.2599, 2010.

[14] J. Moˇ kus. On bayesian methods for seeking the extremum. In Optimization Techniques IFIP Technical c Conference Novosibirsk, July 1–7, 1974, pages 400–404. Springer, 1975.

[15] A.D. Bull. Convergence rates of efficient global optimization algorithms. The Journal of Machine Learning Research, 12:2879–2904, 2011.

[16] A. Carpentier and R. Munos. Bandit theory meets compressed sensing for high dimensional stochastic linear bandit. Journal of Machine Learning Research - Proceedings Track, 22:190–198, 2012.

[17] Y. Abbasi-Yadkori, D. Pal, and C. Szepesvari. Online-to-confidence-set conversions and application to sparse stochastic bandits. In Conference on Artificial Intelligence and Statistics (AISTATS), 2012.

[18] B. Chen, R. Castro, and A. Krause. Joint optimization and variable selection of high-dimensional gaussian processes. In Proc. International Conference on Machine Learning (ICML), 2012.

[19] Z. Wang, M. Zoghi, F. Hutter, D. Matheson, and N. de Freitas. Bayesian optimization in high dimensions via random embeddings. In In Proc. IJCAI, 2013.

[20] H. Tyagi and B. G¨ rtner. Continuum armed bandit problem of few variables in high dimensions. CoRR, a abs/1304.5793, 2013.

[21] R.A. DeVore and G.G. Lorentz. Constructive approximation, volume 303. Springer Verlag, 1993.

[22] M. Fornasier, K. Schnass, and J. Vybiral. Learning functions of few arbitrary linear parameters in high dimensions. Foundations of Computational Mathematics, pages 1–34, 2012.

[23] B. Sch¨ lkopf and A.J. Smola. Learning with kernels: Support vector machines, regularization, optimizao tion, and beyond. MIT press, 2001.

[24] E.J. Candes and Y. Plan. Tight oracle inequalities for low-rank matrix recovery from a minimal number of noisy random measurements. Information Theory, IEEE Transactions on, 57(4):2342–2359, 2011.

[25] P. A. Wedin. Perturbation bounds in connection with singular value decomposition. BIT Numerical Mathematics, 12(1):99–111, 1972.

[26] G.W. Stewart and J. Sun. Matrix Perturbation Theory, volume 175. Academic Press New York, 1990.

[27] J. G. Daugman. Uncertainty relation for resolution in space, spatial frequency, and orientation optimized by two-dimensional visual cortical filters. Optical Society of America, Journal, A: Optics and Image Science, 2:1160–1169, 1985. 9