nips nips2013 nips2013-345 nips2013-345-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Chong Wang, Xi Chen, Alex Smola, Eric Xing
Abstract: Stochastic gradient optimization is a class of widely used algorithms for training machine learning models. To optimize an objective, it uses the noisy gradient computed from the random data samples instead of the true gradient computed from the entire dataset. However, when the variance of the noisy gradient is large, the algorithm might spend much time bouncing around, leading to slower convergence and worse performance. In this paper, we develop a general approach of using control variate for variance reduction in stochastic gradient. Data statistics such as low-order moments (pre-computed or estimated online) is used to form the control variate. We demonstrate how to construct the control variate for two practical problems using stochastic gradient optimization. One is convex—the MAP estimation for logistic regression, and the other is non-convex—stochastic variational inference for latent Dirichlet allocation. On both problems, our approach shows faster convergence and better performance than the classical approach. 1
[1] Spall, J. Introduction to stochastic search and optimization: Estimation, simulation, and control. John Wiley and Sons, 2003.
[2] Bottou, L. Stochastic learning. In O. Bousquet, U. von Luxburg, eds., Advanced Lectures on Machine Learning, Lecture Notes in Artificial Intelligence, LNAI 3176, pages 146–168. Springer Verlag, Berlin, 2004.
[3] Ross, S. M. Simulation. Elsevier, fourth edn., 2006.
[4] Nemirovski, A., A. Juditsky, G. Lan, et al. Robust stochastic approximation approach to stochastic programming. SIAM Journal on Optimization, 19(4):1574–1609, 2009.
[5] Paisley, J., D. Blei, M. Jordan. Variational Bayesian inference with stochastic search. In International Conference on Machine Learning. 2012.
[6] Lan, G. An optimal method for stochastic composite optimization. Mathematical Programming, 133:365– 397, 2012.
[7] Chen, X., Q. Lin, J. Pena. Optimal regularized dual averaging methods for stochastic optimization. In Advances in Neural Information Processing Systems (NIPS). 2012.
[8] Boyd, S., L. Vandenberghe. Convex Optimization. Cambridge University Press, 2004.
[9] Schaul, T., S. Zhang, Y. LeCun. No More Pesky Learning Rates. ArXiv e-prints, 2012.
[10] Ranganath, R., C. Wang, D. M. Blei, et al. An adaptive learning rate for stochastic variational inference. In International Conference on Machine Learning. 2013.
[11] Hoffman, M., D. Blei, F. Bach. Online inference for latent Drichlet allocation. In Neural Information Processing Systems. 2010.
[12] Teh, Y., M. Jordan, M. Beal, et al. Hierarchical Dirichlet processes. Journal of the American Statistical Association, 101(476):1566–1581, 2007.
[13] Wang, C., J. Paisley, D. Blei. Online variational inference for the hierarchical Dirichlet process. In International Conference on Artificial Intelligence and Statistics. 2011.
[14] Seung, D., L. Lee. Algorithms for non-negative matrix factorization. In Neural Information Processing Systems. 2001.
[15] Bishop, C. Pattern Recognition and Machine Learning. Springer New York., 2006.
[16] Jaakkola, T., M. Jordan. A variational approach to Bayesian logistic regression models and their extensions. In International Workshop on Artificial Intelligence and Statistics. 1996.
[17] Blei, D., A. Ng, M. Jordan. Latent Dirichlet allocation. Journal of Machine Learning Research, 3:993–1022, 2003.
[18] Blei, D., J. Lafferty. Topic models. In A. Srivastava, M. Sahami, eds., Text Mining: Theory and Applications. Taylor and Francis, 2009.
[19] Jordan, M., Z. Ghahramani, T. Jaakkola, et al. Introduction to variational methods for graphical models. Machine Learning, 37:183–233, 1999.
[20] Amari, S. Natural gradient works efficiently in learning. Neural computation, 10(2):251–276, 1998.
[21] Asuncion, A., M. Welling, P. Smyth, et al. On smoothing and inference for topic models. In Uncertainty in Artificial Intelligence. 2009.
[22] Hoffman, M., D. Blei, C. Wang, et al. Stochastic Variational Inference. Journal of Machine Learning Research, 2013. 9