jmlr jmlr2013 jmlr2013-90 jmlr2013-90-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Philipp Hennig, Martin Kiefel
Abstract: Four decades after their invention, quasi-Newton methods are still state of the art in unconstrained numerical optimization. Although not usually interpreted thus, these are learning algorithms that fit a local quadratic approximation to the objective function. We show that many, including the most popular, quasi-Newton methods can be interpreted as approximations of Bayesian linear regression under varying prior assumptions. This new notion elucidates some shortcomings of classical algorithms, and lights the way to a novel nonparametric quasi-Newton method, which is able to make more efficient use of available information at computational cost similar to its predecessors. Keywords: optimization, numerical analysis, probability, Gaussian processes
S.P. Boyd and L. Vandenberghe. Convex Optimization. Cambridge Univ Press, 2004. C.G. Broyden. A class of methods for solving nonlinear simultaneous equations. Math. Comp., 19 (92):577–593, 1965. C.G. Broyden. Quasi-Newton methods and their application to function minimization. Math. Comp., 21(368):45, 1967. C.G. Broyden. A new double-rank minimization algorithm. Notices American Math. Soc, 16:670, 1969. C.G. Broyden. The convergence of a class of double-rank minimization algorithms 1. general considerations. IMA Journal of Applied Mathematics, 6(1):76, 1970. W.C. Davidon. Variable metric method for minimization. Technical report, Argonne National Laboratories, Ill., 1959. J.E. Jr Dennis and J.J. Mor´ . Quasi-Newton methods, motivation and theory. SIAM Review, pages e 46–89, 1977. R. Fletcher. A new approach to variable metric algorithms. The Computer Journal, 13(3):317, 1970. R. Fletcher and M.J.D. Powell. A rapidly convergent descent method for minimization. The Computer Journal, 6(2):163–168, 1963. A. Genz. Numerical computation of rectangular bivariate and trivariate normal and t probabilities. Statistics and Computing, 14(3):251–260, 2004. D. Goldfarb. A family of variable metric updates derived by variational means. Math. Comp., 24 (109):23–26, 1970. J. Greenstadt. Variations on variable-metric methods. Math. Comp, 24:1–22, 1970. P. Hennig. Fast probabilistic optimization from noisy gradients. In Proceedings of the 30th International Conference on Machine Learning (ICML), 2013. M.R. Hestenes and E. Stiefel. Methods of conjugate gradients for solving linear systems. Journal of Research of the National Bureau of Standards, 49(6):409–436, 1952. H. L¨ tkepohl. Handbook of Matrices. Wiley, 1996. u 864 Q UASI -N EWTON M ETHODS : A N EW D IRECTION T.P. Minka. Old an new linear algebra useful for statistics. Technical report, MIT Media Lab Note, 2000a. T.P. Minka. Deriving quadrature rules from Gaussian processes. Technical report, Statistics Department, Carnegie Mellon University, 2000b. J. Nocedal. Updating quasi-Newton matrices with limited storage. Math. Comp., 35(151):773–782, 1980. J. Nocedal and S.J. Wright. Numerical Optimization. Springer Verlag, 1999. J.D. Pearson. Variable metric methods of minimisation. The Computer Journal, 12(2):171–178, 1969. J. Peltonen. Personal communication, 2012. M.J.D. Powell. A new algorithm for unconstrained optimization. In O. L. Mangasarian and K. Ritter, editors, Nonlinear Programming. AP, 1970. C.E. Rasmussen and C.K.I. Williams. Gaussian Processes for Machine Learning. MIT Press, 2006. R.P. Savage. The space of positive definite matrices and Gromov’s invariant. Transactions of the AMS, 274(1):239–263, November 1982. D.F. Shanno. Conditioning of quasi-Newton methods for function minimization. Math. Comp., 24 (111):647–656, 1970. 865