jmlr jmlr2012 jmlr2012-38 jmlr2012-38-reference knowledge-graph by maker-knowledge-mining

38 jmlr-2012-Entropy Search for Information-Efficient Global Optimization

Source: pdf

Author: Philipp Hennig, Christian J. Schuler

Abstract: Contemporary global optimization algorithms are based on local measures of utility, rather than a probability measure over location and value of the optimum. They thus attempt to collect low function values, not to learn about the optimum. The reason for the absence of probabilistic global optimizers is that the corresponding inference problem is intractable in several ways. This paper develops desiderata for probabilistic optimization algorithms, then presents a concrete algorithm which addresses each of the computational intractabilities with a sequence of approximations and explicitly addresses the decision problem of maximizing information gain from each evaluation. Keywords: optimization, probability, information, Gaussian processes, expectation propagation

reference text

R.J. Adler. The Geometry of Random Fields. Wiley, 1981. R.J. Adler. An introduction to continuity, extrema, and related topics for general Gaussian processes. Lecture Notes-Monograph Series, 12:i–iii+v–vii+ix+1–55, 1990. ´ Benoit. Note sˆ re une m´ thode de r´ solution des equations normales provenant de l’application u e e de la m´ thode des moindres carr´ s a un syst` me d’´ quations lin´ aires en nombre inf´ rieure a e e e e e e celui des inconnues. Application de la m´ thode a la r´ solution d’un syst` me d´ ﬁni d’´ quations e e e e e lin´ aires. (proc´ d´ du Commandant Cholesky). Bulletin Geodesique, 7(1):67–77, 1924. e e e S.P. Boyd and L. Vandenberghe. Convex Optimization. Cambridge Univ Press, 2004. 1834 E NTROPY S EARCH C.G. Broyden. A class of methods for solving nonlinear simultaneous equations. Math. Comp., 19 (92):577–593, 1965. R.H. Byrd, M.E. Hribar, and J. Nocedal. An interior point algorithm for large-scale nonlinear programming. SIAM Journal on Optimization, 9(4):877–900, 1999. R.H. Byrd, J.C. Gilbert, and J. Nocedal. A trust region method based on interior point techniques for nonlinear programming. Mathematical Programming, 89(1):149–185, 2000. T.F. Coleman and Y. Li. On the convergence of interior-reﬂective newton methods for nonlinear minimization subject to bounds. Mathematical Programming, 67(1):189–224, 1994. T.F. Coleman and Y. Li. An interior trust region approach for nonlinear minimization subject to bounds. SIAM Journal on Optimization, 6(2):418–445, 1996. R.T. Cox. Probability, frequency and reasonable expectation. American Journal of Physics, 14(1): 1–13, 1946. J. Cunningham, P. Hennig, and S. Lacoste-Julien. Gaussian probabilities and expectation propagation. under review. Preprint at arXiv:1111.6832 [stat.ML], November 2011. R. Fletcher. A new approach to variable metric algorithms. The Computer Journal, 13(3):317, 1970. D. Goldfarb. A family of variable metric updates derived by variational means. Math. Comp., 24 (109):23–26, 1970. S. Gr¨ new¨ lder, J.Y. Audibert, M. Opper, and J. Shawe-Taylor. Regret bounds for Gaussian process u a bandit problems. In Proceedings of the 14th International Conference on Artiﬁcial Intelligence and Statistics (AISTATS), 2010. N. Hansen and A. Ostermeier. Completely derandomized self-adaptation in evolution strategies. Evolutionary Computation, 9(2):159–195, 2001. P. Hennig. Optimal reinforcement learning for Gaussian systems. In Advances in Neural Information Processing Systems, 2011. M.R. Hestenes and E. Stiefel. Methods of conjugate gradients for solving linear systems. Journal of Research of the National Bureau of Standards, 49(6):409–436, 1952. K. It¯ . On stochastic differential equations. Memoirs of the American Mathematical Society, 4, o 1951. E.T. Jaynes and G.L. Bretthorst. Probability Theory: the Logic of Science. Cambridge University Press, 2003. D.R. Jones, M. Schonlau, and W.J. Welch. Efﬁcient global optimization of expensive black-box functions. Journal of Global Optimization, 13(4):455–492, 1998. R. Kleinberg. Nearly tight bounds for the continuum-armed bandit problem. Advances in Neural Information Processing Systems, 18, 2005. 1835 H ENNIG AND S CHULER A.N. Kolmogorov. Grundbegriffe der Wahrscheinlichkeitsrechnung. Ergebnisse der Mathematik und ihrer Grenzgebiete, 2, 1933. D.G. Krige. A statistical approach to some basic mine valuation and allied problems at the Witwatersrand. Master’s thesis, University of Witwatersrand, 1951. S. Kullback and R.A. Leibler. On information and sufﬁciency. Annals of Mathematical Statistics, 22(1):79–86, 1951. H. Lazard-Holly and A. Holly. Computation of the probability that a d-dimensional normal variable belongs to a polyhedral cone with arbitrary vertex. Technical report, Mimeo, 2003. D.J. Lizotte. Practical Bayesian Optimization. PhD thesis, University of Alberta, 2008. D.J.C. MacKay. Choice of basis for Laplace approximation. Machine Learning, 33(1):77–86, 1998a. D.J.C. MacKay. Introduction to Gaussian processes. NATO ASI Series F Computer and Systems Sciences, 168:133–166, 1998b. B. Mat´ rn. Spatial variation. Meddelanden fran Statens Skogsforskningsinstitut, 49(5), 1960. e T.P. Minka. Deriving quadrature rules from Gaussian processes. Technical report, Statistics Department, Carnegie Mellon University, 2000. T.P. Minka. Expectation Propagation for approximate Bayesian inference. In Proceedings of the 17th Conference in Uncertainty in Artiﬁcial Intelligence, pages 362–369, San Francisco, CA, USA, 2001. Morgan Kaufmann. ISBN 1-55860-800-1. I. Murray and R.P. Adams. Slice sampling covariance hyperparameters of latent Gaussian models. arXiv:1006.0868, 2010. J. Nocedal and S.J. Wright. Numerical Optimization. Springer Verlag, 1999. M.A. Osborne, R. Garnett, and S.J. Roberts. Gaussian processes for global optimization. In 3rd International Conference on Learning and Intelligent Optimization (LION3), 2009. R.L. Plackett. A reduction formula for normal multivariate integrals. Biometrika, 41(3-4):351, 1954. M.J.D. Powell. The convergence of variable metric methods for nonlinearly constrained optimization calculations. Nonlinear Programming, 3(0):27–63, 1978a. M.J.D. Powell. A fast algorithm for nonlinearly constrained optimization calculations. Numerical Analysis, pages 144–157, 1978b. C.E. Rasmussen and C.K.I. Williams. Gaussian Processes for Machine Learning. MIT Press, 2006. L.M. Schmitt. Theory of genetic algorithms II: models for genetic operators over the string-tensor representation of populations and convergence to global optima for arbitrary ﬁtness function under scaling. Theoretical Computer Science, 310(1-3):181–231, 2004. 1836 E NTROPY S EARCH M. Seeger. Expectation propagation for exponential families. Technical report, U.C. Berkeley, 2008. D.F. Shanno. Conditioning of quasi-Newton methods for function minimization. Math. Comp., 24 (111):647–656, 1970. C.E. Shannon. A mathematical theory of communication. Bell System Technical Journal, 27, 1948. N. Srinivas, A. Krause, S. Kakade, and M. Seeger. Gaussian process optimization in the bandit setting: No regret and experimental design. In International Conference on Machine Learning, 2010. M.B. Thompson and R.M. Neal. Slice sampling with adaptive multivariate steps: The shrinkingrank method. preprint at arXiv:1011.4722, 2010. G.E. Uhlenbeck and L.S. Ornstein. On the theory of the Brownian motion. Physical Review, 36(5): 823, 1930. A.W. van der Vaart and J.H. van Zanten. Information rates of nonparametric Gaussian process methods. Journal of Machine Learning Research, 12:2095–2119, 2011. J. Villemonteix, E. Vazquez, and E. Walter. An informational approach to the global optimization of expensive-to-evaluate functions. Journal of Global Optimization, 44(4):509–534, 2009. R.A. Waltz, J.L. Morales, J. Nocedal, and D. Orban. An interior algorithm for nonlinear optimization that combines line search and trust region steps. Mathematical Programming, 107(3): 391–408, 2006. N. Wiener and P. Masani. The prediction theory of multivariate stochastic processes. Acta Mathematica, 98(1):111–150, 1957. 1837