nips nips2012 nips2012-300 nips2012-300-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Suvrit Sra
Abstract: We study a class of large-scale, nonsmooth, and nonconvex optimization problems. In particular, we focus on nonconvex problems with composite objectives. This class includes the extensively studied class of convex composite objective problems as a subclass. To solve composite nonconvex problems we introduce a powerful new framework based on asymptotically nonvanishing errors, avoiding the common stronger assumption of vanishing errors. Within our new framework we derive both batch and incremental proximal splitting algorithms. To our knowledge, our work is first to develop and analyze incremental nonconvex proximalsplitting algorithms, even if we were to disregard the ability to handle nonvanishing errors. We illustrate one instance of our general framework by showing an application to large-scale nonsmooth matrix factorization. 1
[1] H. Attouch, J. Bolte, and B. F. Svaiter. Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward-backward splitting, and regularized Gauss-Seidel methods. Math. 8 Programming Series A, Aug. 2011. Online First.
[2] F. Bach, R. Jenatton, J. Mairal, and G. Obozinski. Convex optimization with sparsity-inducing norms. In S. Sra, S. Nowozin, and S. J. Wright, editors, Optimization for Machine Learning. MIT Press, 2011.
[3] A. Beck and M. Teboulle. A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems. SIAM J. Imgaging Sciences, 2(1):183–202, 2009.
[4] D. P. Bertsekas. Nonlinear Programming. Athena Scientific, second edition, 1999.
[5] D. P. Bertsekas. Incremental Gradient, Subgradient, and Proximal Methods for Convex Optimization: A Survey. Technical Report LIDS-P-2848, MIT, August 2010.
[6] S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, March 2004.
[7] F. H. Clarke. Optimization and nonsmooth analysis. John Wiley & Sons, Inc., 1983.
[8] P. L. Combettes and J.-C. Pesquet. Proximal Splitting Methods in Signal Processing. arXiv:0912.3522v4, May 2010.
[9] P. L. Combettes and V. R. Wajs. Signal recovery by proximal forward-backward splitting. Multiscale Modeling and Simulation, 4(4):1168–1200, 2005.
[10] T. A. Davis and Y. Hu. The University of Florida Sparse Matrix Collection. ACM Transactions on Mathematical Software, 2011. To appear.
[11] J. Duchi and Y. Singer. Online and Batch Learning using Forward-Backward Splitting. J. Mach. Learning Res. (JMLR), Sep. 2009.
[12] Y. M. Ermoliev and V. I. Norkin. Stochastic generalized gradient method for nonconvex nonsmooth stochastic optimization. Cybernetics and Systems Analysis, 34:196–215, 1998.
[13] M. A. T. Figueiredo, R. D. Nowak, and S. J. Wright. Gradient Projection for Sparse Reconstruction: Application to Compressed Sensing and Other Inverse Problems. IEEE J. Selected Topics in Sig. Proc., 1 (4):586–597, 2007.
[14] M. Fukushima and H. Mine. A generalized proximal point algorithm for certain non-convex minimization problems. Int. J. Systems Science, 12(8):989–1000, 1981.
[15] E. M. Gafni and D. P. Bertsekas. Two-metric projection methods for constrained optimization. SIAM Journal on Control and Optimization, 22(6):936–964, 1984.
[16] A. A. Gaivoronski. Convergence properties of backpropagation for neural nets via theory of stochastic gradient methods. Part 1. Optimization methods and Software, 4(2):117–134, 1994.
[17] S. Haykin. Neural Networks: A Comprehensive Foundation. Prentice Hall PTR, 1st edition, 1994.
[18] K. Kreutz-Delgado, J. F. Murray, B. D. Rao, K. Engan, T.-W. Lee, and T. J. Sejnowski. Dictionary learning algorithms for sparse representation. Neural Computation, 15:349–396, 2003.
[19] D. Kundur and D. Hatzinakos. Blind image deconvolution. IEEE Signal Processing Magazine, 13(3), May 1996.
[20] D. D. Lee and H. S. Seung. Algorithms for Nonnegative Matrix Factorization. In NIPS, 2000.
[21] K.C. Lee, J. Ho, and D. Kriegman. Acquiring linear subspaces for face recognition under variable lighting. IEEE Trans. Pattern Anal. Mach. Intelligence, 27(5):684–698, 2005.
[22] J. Liu and J. Ye. Efficient Euclidean projections in linear time. In ICML, Jun. 2009.
[23] J. Mairal, F. Bach, J. Ponce, and G. Sapiro. Online Learning for Matrix Factorization and Sparse Coding. JMLR, 11:10–60, 2010.
[24] J. J. Moreau. Fonctions convexes duales et points proximaux dans un espace hilbertien. C. R. Acad. Sci. Paris Sr. A Math., 255:2897–2899, 1962.
[25] Y. Nesterov. Introductory Lectures on Convex Optimization: A Basic Course. Springer, 2004.
[26] Y. Nesterov. Gradient methods for minimizing composite objective function. Technical Report 2007/76, Universit catholique de Louvain, September 2007.
[27] R. T. Rockafellar. Monotone operators and the proximal point algorithm. SIAM J. Control and Optimization, 14, 1976.
[28] R. T. Rockafellar and R. J.-B. Wets. Variational analysis. Springer, 1998.
[29] M. V. Solodov. Convergence analysis of perturbed feasible descent methods. J. Optimization Theory and Applications, 93(2):337–353, 1997.
[30] M. V. Solodov. Incremental gradient algorithms with stepsizes bounded away from zero. Computational Optimization and Applications, 11:23–35, 1998.
[31] M. V. Solodov and S. K. Zavriev. Error stability properties of generalized gradient-type algorithms. J. Optimization Theory and Applications, 98(3):663–680, 1998.
[32] S. Sra. Nonconvex proximal-splitting: Batch and incremental algorithms. Sep. 2012. arXiv:1109.0258v2.
[33] K.-K. Sung. Learning and Example Selection for Object and Pattern Recognition. PhD thesis, MIT, 1996.
[34] L. Xiao. Dual averaging method for regularized stochastic learning and online optimization. In NIPS, 2009. 9