nips nips2010 nips2010-195 nips2010-195-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Uri Shalit, Daphna Weinshall, Gal Chechik
Abstract: When learning models that are represented in matrix forms, enforcing a low-rank constraint can dramatically improve the memory and run time complexity, while providing a natural regularization of the model. However, naive approaches for minimizing functions over the set of low-rank matrices are either prohibitively time consuming (repeated singular value decomposition of the matrix) or numerically unstable (optimizing a factored representation of the low rank matrix). We build on recent advances in optimization over manifolds, and describe an iterative online learning procedure, consisting of a gradient step, followed by a second-order retraction back to the manifold. While the ideal retraction is hard to compute, and so is the projection operator that approximates it, we describe another second-order retraction that can be computed efficiently, with run time and memory complexity of O ((n + m)k) for a rank-k matrix of dimension m × n, given rank-one gradients. We use this algorithm, LORETA, to learn a matrixform similarity measure over pairs of documents represented as high dimensional vectors. LORETA improves the mean average precision over a passive- aggressive approach in a factorized model, and also improves over a full model trained over pre-selected features using the same memory requirements. LORETA also showed consistent improvement over standard methods in a large (1600 classes) multi-label image classification task. 1
[1] B.K. Natarajan. Sparse approximate solutions to linear systems. SIAM journal on computing, 24(2):227–234, 1995.
[2] M. Fazel, H. Hindi, and S. Boyd. Rank minimization and applications in system theory. In Proceedings of the 2004 American Control Conference, pages 3273–3278. IEEE, 2005.
[3] B. Bai, J. Weston, R. Collobert, and D. Grangier. Supervised semantic indexing. Advances in Information Retrieval, pages 761–765, 2009.
[4] P.A. Absil, R. Mahony, and R. Sepulchre. Optimization Algorithms on Matrix Manifolds. Princeton Univ Press, 2008.
[5] P.-A. Absil and J´ rˆ me Malick. Projection-like retractions on matrix manifolds. Technical Reeo port UCL-INMA-2010.038, Department of Mathematical Engineering, Universit´ catholique e de Louvain, July 2010.
[6] B. Vandereycken and S. Vandewalle. A Riemannian optimization approach for computing lowrank solutions of Lyapunov equations. SIAM Journal on Matrix Analysis and Applications, 31:2553, 2010.
[7] D. Grangier D. and S. Bengio. A discriminative kernel-based model to rank images from text queries. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30:1371–1384, 2008.
[8] G. Chechik, V. Sharma, U. Shalit, and S. Bengio. Large scale online learning of image similarity through ranking. Journal of Machine Learning Research, 11:1109–1135, 2010.
[9] C.D. Meyer. Generalized inversion of modified matrices. SIAM Journal on Applied Mathematics, 24(3):315–323, 1973.
[10] M. Journee, F. Bach, PA Absil, and R. Sepulchre. Low-Rank Optimization on the Cone of Positive Semidefinite Matrices. SIAM Journal on Optimization, 20:2327–2351, 2010.
[11] M. Fazel. Matrix rank minimization with applications. PhD thesis, Electrical Engineering Department, Stanford University, 2002.
[12] Benjamin Recht, Maryam Fazel, and Pablo A. Parrilo. Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM Review, 52(3):471–501, 2010.
[13] B. Kulis, M.A. Sustik, and I.S. Dhillon. Low-rank kernel learning with bregman matrix divergences. The Journal of Machine Learning Research, 10:341–376, 2009.
[14] R. Meka, P. Jain, C. Caramanis, and I.S. Dhillon. Rank minimization via online learning. In Proceedings of the 25th International Conference on Machine learning, pages 656–663, 2008.
[15] K. Lang. Learning to filter netnews. In Proceeding of the 12th Internation Conference on Machine Learning, pages 331–339, 1995.
[16] J. Deng, W. Dong, R. Socher, L.J. Li, K. Li, and L. Fei-Fei. ImageNet: a large-scale hierarchical image database. In Proceedings of the 22nd IEEE Conference on Computer Vision and Pattern Recognition, 2009.
[17] L. Yang. An overview of distance metric learning. Technical report, School of Computer Science, Carnegie Mellon University, 2007.
[18] Y. Yang and J.O. Pedersen. A comparative study on feature selection in text categorization. In Proceedings of the 14th International Conference on Machine learning, pages 412–420, 1997.
[19] K. Crammer, O. Dekel, J. Keshet, S. Shalev-Shwartz, and Y. Singer. Online passive-aggressive algorithms. Journal of Machine Learning Research, 7:551–585, 2006.
[20] P. Jain, B. Kulis, I.S. Dhillon, and K. Grauman. Online metric learning and fast similarity search. Advances in Neural Information Processing Systems, pages 761–768, 2008.
[21] Sham M. Kakade, Shai Shalev-Shwartz, and Ambuj Tewari. Regularization techniques for learning with matrices, 2010. preprint. 9