nips nips2013 nips2013-201 nips2013-201-reference knowledge-graph by maker-knowledge-mining

201 nips-2013-Multi-Task Bayesian Optimization


Source: pdf

Author: Kevin Swersky, Jasper Snoek, Ryan P. Adams

Abstract: Bayesian optimization has recently been proposed as a framework for automatically tuning the hyperparameters of machine learning models and has been shown to yield state-of-the-art performance with impressive ease and efficiency. In this paper, we explore whether it is possible to transfer the knowledge gained from previous optimizations to new tasks in order to find optimal hyperparameter settings more efficiently. Our approach is based on extending multi-task Gaussian processes to the framework of Bayesian optimization. We show that this method significantly speeds up the optimization process when compared to the standard single-task approach. We further propose a straightforward extension of our algorithm in order to jointly minimize the average error across multiple tasks and demonstrate how this can be used to greatly speed up k-fold cross-validation. Lastly, we propose an adaptation of a recently developed acquisition function, entropy search, to the cost-sensitive, multi-task setting. We demonstrate the utility of this new acquisition function by leveraging a small dataset to explore hyperparameter settings for a large dataset. Our algorithm dynamically chooses which dataset to query in order to yield the most information per unit cost. 1


reference text

[1] Eric Brochu, T Brochu, and Nando de Freitas. A Bayesian interactive optimization approach to procedural animation design. In ACM SIGGRAPH/Eurographics Symposium on Computer Animation, 2010. 8

[2] Niranjan Srinivas, Andreas Krause, Sham Kakade, and Matthias Seeger. Gaussian process optimization in the bandit setting: no regret and experimental design. In ICML, 2010.

[3] Frank Hutter, Holger H. Hoos, and Kevin Leyton-Brown. Sequential model-based optimization for general algorithm configuration. In Learning and Intelligent Optimization 5, 2011.

[4] M. A. Osborne, R. Garnett, and S. J. Roberts. Gaussian processes for global optimization. In LION, 2009.

[5] James Bergstra, R´ mi Bardenet, Yoshua Bengio, and B´ l´ zs K´ gl. Algorithms for hyper-parameter optie aa e mization. In NIPS. 2011.

[6] Jasper Snoek, Hugo Larochelle, and Ryan P. Adams. Practical Bayesian optimization of machine learning algorithms. In NIPS, 2012.

[7] James Bergstra, Daniel Yamins, and David Cox. Making a science of model search: hyperparameter optimization in hundreds of dimensions for vision architectures. In ICML, 2013.

[8] Carl E. Rasmussen and Christopher Williams. Gaussian Processes for Machine Learning. MIT Press, 2006.

[9] Iain Murray and Ryan P. Adams. Slice sampling covariance hyperparameters of latent Gaussian models. In NIPS. 2010.

[10] Andre G. Journel and Charles J. Huijbregts. Mining Geostatistics. Academic press London, 1978.

[11] Pierre Goovaerts. Geostatistics for natural resources evaluation. Oxford University Press, 1997.

[12] Matthias Seeger, Yee-Whye Teh, and Michael I. Jordan. Semiparametric latent factor models. In AISTATS, 2005.

[13] Edwin V. Bonilla, Kian Ming A. Chai, and Christopher K. I. Williams. Multi-task Gaussian process prediction. In NIPS, 2008.

[14] Mauricio A Alvarez and Neil D Lawrence. Computationally efficient convolved multiple output Gaussian processes. Journal of Machine Learning Research, 12, 2011.

[15] Jonas Mockus, Vytautas Tiesis, and Antanas Zilinskas. The application of Bayesian methods for seeking the extremum. Towards Global Optimization, 2, 1978.

[16] Matthew Hoffman, Eric Brochu, and Nando de Freitas. Portfolio allocation for Bayesian optimization. In UAI, 2011.

[17] Donald R. Jones. A taxonomy of global optimization methods based on response surfaces. Journal of Global Optimization, 21, 2001.

[18] Philipp Hennig and Christian J. Schuler. Entropy search for information-efficient global optimization. Journal of Machine Learning Research, 13, 2012.

[19] Andreas Krause and Cheng Soon Ong. Contextual gaussian process bandit optimization. In NIPS, 2011.

[20] R´ mi Bardenet, M´ ty´ s Brendel, Bal´ zs K´ gl, and Mich` le Sebag. Collaborative hyperparameter tuning. e a a a e e In ICML, 2013.

[21] Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Bo Wu, and Andrew Y Ng. Reading digits in natural images with unsupervised feature learning. In NIPS Workshop on Deep Learning and Unsupervised Feature Learning, 2011.

[22] Alex Krizhevsky. Learning multiple layers of features from tiny images. Technical report, Department of Computer Science, University of Toronto, 2009.

[23] Pierre Sermanet, Soumith Chintala, and Yann LeCun. Convolutional neural networks applied to house numbers digit classification. In ICPR, 2012.

[24] Adam Coates, Honglak Lee, and Andrew Y Ng. An analysis of single-layer networks in unsupervised feature learning. AISTATS, 2011.

[25] Robert Gens and Pedro Domingos. Discriminative learning of sum-product networks. In NIPS, 2012.

[26] Geoffrey E. Hinton, Nitish Srivastava, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint, 2012.

[27] Liefeng Bo, Xiaofeng Ren, and Dieter Fox. Unsupervised feature learning for RGB-D based object recognition. ISER,, 2012.

[28] Ruslan Salakhutdinov and Andriy Mnih. Probabilistic matrix factorization. NIPS, 2008.

[29] Jonathan L. Herlocker, Joseph A. Konstan, Al Borchers, and John Riedl. An algorithmic framework for performing collaborative filtering. In ACM SIGIR Conference on Research and Development in Information Retrieval, 1999.

[30] Matthew Hoffman, David M. Blei, and Francis Bach. Online learning for latent Dirichlet allocation. In NIPS, 2010. 9