jmlr jmlr2012 jmlr2012-83 jmlr2012-83-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Uri Shalit, Daphna Weinshall, Gal Chechik
Abstract: When learning models that are represented in matrix forms, enforcing a low-rank constraint can dramatically improve the memory and run time complexity, while providing a natural regularization of the model. However, naive approaches to minimizing functions over the set of low-rank matrices are either prohibitively time consuming (repeated singular value decomposition of the matrix) or numerically unstable (optimizing a factored representation of the low-rank matrix). We build on recent advances in optimization over manifolds, and describe an iterative online learning procedure, consisting of a gradient step, followed by a second-order retraction back to the manifold. While the ideal retraction is costly to compute, and so is the projection operator that approximates it, we describe another retraction that can be computed efficiently. It has run time and memory complexity of O ((n + m)k) for a rank-k matrix of dimension m × n, when using an online procedure with rank-one gradients. We use this algorithm, L ORETA, to learn a matrix-form similarity measure over pairs of documents represented as high dimensional vectors. L ORETA improves the mean average precision over a passive-aggressive approach in a factorized model, and also improves over a full model trained on pre-selected features using the same memory requirements. We further adapt L ORETA to learn positive semi-definite low-rank matrices, providing an online algorithm for low-rank metric learning. L ORETA also shows consistent improvement over standard weakly supervised methods in a large (1600 classes and 1 million images, using ImageNet) multi-label image classification task. Keywords: low rank, Riemannian manifolds, metric learning, retractions, multitask learning, online learning
P.-A. Absil and J. Malick. Projection-like retractions on matrix manifolds. Technical Report UCLINMA-2010.038, Department of Mathematical Engineering, Université catholique de Louvain, July 2010. P.-A. Absil, R. Mahony, and R. Sepulchre. Optimization Algorithms on Matrix Manifolds. Princeton Univ Press, 2008. B. Bai, J. Weston, R. Collobert, and D. Grangier. Supervised semantic indexing. Advances in Information Retrieval, pages 761–765, 2009. A. Bar-Hillel, T. Hertz, N. Shental, and D. Weinshall. Learning a mahalanobis metric from equivalence constraints. Journal of Machine Learning Research, 6(1):937–965, 2006. 454 O NLINE L EARNING IN THE E MBEDDED M ANIFOLD OF L OW- RANK M ATRICES Algorithm 7 : Rank one pseudo-inverse update n×k Input: Matrices A, A† ∈ R∗ , such that A† is the pseudo-inverse of A, vectors c ∈ Rn×1 , d ∈ Rk×1 Output: Matrix Z † ∈ Rk×n , such that Z † is the pseudo-inverse of A + cd T . ∗ Compute: v = A† c β = 1 + dT v n = A†T d n = A† n ˆ w = c − Av if β = 0 AND w = 0 1 G = β nwT ˆ s= t= w w β 2 2 matrix dimension k×1 1×1 n×1 k×1 n×1 k×n β n 2 +β2 1×1 n+v ˆ n β ˆ G = s·t 2 k×1 w+n T ˆ G = G−G elseif β = 0 AND w = 0 n G = −A† n 2 G = GnT T ˆ G=v w 2 w ˆ G = G−G elseif β = 0 AND w = 0 1 G = − β vnT elseif β = 0 AND w = 0 v = v1 2 v vT A† ˆ 1 n = n 2 A† n nT ˆ T † v ˆ ˆ G = v 2A nn 2 vnT − v − n endif Z † = A† + G k×n k×n k×1 k×1 k×n k×n k×n k×n k×n k×n J. Briët, F.M. de Oliveira Filho, and F. Vallentin. The Grothendieck problem with rank constraint. In Proceedings of the 19th International Symposium on Mathematical Theory of Networks and Systems, MTNS, 2010. G. Chechik, V. Sharma, U. Shalit, and S. Bengio. Large scale online learning of image similarity through ranking. Journal of Machine Learning Research, 11:1109–1135, 2010. K. Crammer, O. Dekel, J. Keshet, S. Shalev-Shwartz, and Y. Singer. Online passive-aggressive algorithms. Journal of Machine Learning Research, 7:551–585, 2006. J. Deng, W. Dong, R. Socher, L.J. Li, K. Li, and L. Fei-Fei. Imagenet: A large-scale hierarchical image database. In Proceedings of the 22nd IEEE Conference on Computer Vision and Pattern 455 S HALIT, W EINSHALL AND C HECHIK Recognition, pages 248–255, 2009. J. Deng, A. Berg, and L. Fei-Fei. Hierarchical Semantic Indexing for Large Scale Image Retrieval. In Proceedings of the 24th IEEE Conference on Computer Vision and Pattern Recognition, pages 785–792, 2011. M.P. Do Carmo. Riemannian Geometry. Birkhauser, 1992. L. Eldén and B. Savas. A Newton–Grassmann method for computing the best multi-linear rank-(r1, r2, r3) approximation of a tensor. SIAM Journal on Matrix Analysis and applications, 31(2): 248–271, 2009. M. Fazel. Matrix Rank Minimization with Applications. PhD thesis, Electrical Engineering Department, Stanford University, 2002. M. Fazel, H. Hindi, and S. Boyd. Rank minimization and applications in system theory. In Proceedings of the 2004 American Control Conference, pages 3273–3278. IEEE, 2005. A. Globerson and S. Roweis. Metric learning by collapsing classes. In Advances in Neural Information Processing Systems, volume 18, page 451, 2006. J. Goldberger, S. Roweis, G. Hinton, and R. Salakhutdinov. Neighbourhood components analysis. In Advances in Neural Information Processing Systems, volume 17, 2005. D. Grangier and S. Bengio. A discriminative kernel-based model to rank images from text queries. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30:1371–1384, 2008. M. Ishteva, L. De Lathauwer, P.-A. Absil, and S. Van Huffel. Best low multilinear rank approximation of higher-order tensors, based on the Riemannian trust-region scheme. SIAM Journal on Matrix Analysis and Applications, 32(1):115–132, 2011. P. Jain, B. Kulis, I.S. Dhillon, and K. Grauman. Online metric learning and fast similarity search. In Advances in Neural Information Processing Systems, volume 20, pages 761–768, 2008. P. Jain, R. Meka, and I. Dhillon. Guaranteed rank minimization via singular value projection. In Advances in Neural Information Processing Systems, volume 24, pages 937–945, 2011. M. Journée, F. Bach, P.-A. Absil, and R. Sepulchre. Low-Rank Optimization on the Cone of Positive Semidefinite Matrices. SIAM Journal on Optimization, 20(5):2327–2351, 2010a. M. Journée, Y. Nesterov, P. Richtárik, and R. Sepulchre. Generalized power method for sparse principal component analysis. The Journal of Machine Learning Research, 11:517–553, 2010b. S.M. Kakade, S. Shalev-Shwartz, and A. Tewari. Regularization techniques for learning with matrices, 2010. Arxiv preprint arXiv:0910.0610v2. R.H. Keshavan, A. Montanari, and S. Oh. Matrix completion from noisy entries. The Journal of Machine Learning Research, 99:2057–2078, 2010. B. Kulis, M.A. Sustik, and I.S. Dhillon. Low-rank kernel learning with bregman matrix divergences. The Journal of Machine Learning Research, 10:341–376, 2009. 456 O NLINE L EARNING IN THE E MBEDDED M ANIFOLD OF L OW- RANK M ATRICES K. Lang. Learning to filter netnews. In Proceeding of the 12th Internation Conference on Machine Learning, pages 331–339, 1995. C.D. Manning, P. Raghavan, H. Schutze, and Ebooks Corporation. Introduction to Information Retrieval, volume 1. Cambridge University Press Cambridge, UK, 2008. R. Meka, P. Jain, C. Caramanis, and I.S. Dhillon. Rank minimization via online learning. In Proceedings of the 25th International Conference on Machine learning, pages 656–663, 2008. C.D. Meyer. Generalized inversion of modified matrices. SIAM Journal on Applied Mathematics, 24(3):315–323, 1973. G. Meyer, S. Bonnabel, and R. Sepulchre. Regression on fixed-rank positive semidefinite matrices: a Riemannian approach. The Journal of Machine Learning Research, 12:593–625, 2011. B.K. Natarajan. Sparse approximate solutions to linear systems. SIAM Journal on Computing, 24 (2):227–234, 1995. Sahand Negahban and Martin J. Wainwright. Estimation of (near) low-rank matrices with noise and high-dimensional scaling. In Proceedings of the 27th International Conference on Machine Learning, pages 823–830, 2010. C. Oberlin and S.J. Wright. Active set identification in nonlinear programming. SIAM Journal on Optimization, 17(2):577–605, 2007. K.B. Petersen and M.S. Pedersen. The matrix cookbook, Oct. 2008. B. Recht, M. Fazel, and P.A. Parrilo. Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM Review, 52(3):471–501, 2010. S. Shalev-Shwartz, Y. Singer, and A.Y. Ng. Online and batch learning of pseudo-metrics. In Proceedings of the Twenty-first International Conference on Machine Learning, page 94. ACM, 2004. Uri Shalit, Daphna Weinshall, and Gal Chechik. Online learning in the manifold of low-rank matrices. In Advances in Neural Information Processing Systems 23, pages 2128–2136. MIT Press, 2010. B. Vandereycken and S. Vandewalle. A Riemannian optimization approach for computing low-rank solutions of Lyapunov equations. SIAM Journal on Matrix Analysis and Applications, 31(5): 2553–2579, 2010. B. Vandereycken, P.-A. Absil, and S. Vandewalle. Embedded geometry of the set of symmetric positive semidefinite matrices of fixed rank. In Statistical Signal Processing, 2009. SSP’09. IEEE/SP 15th Workshop on, pages 389–392. IEEE, 2009. K.Q. Weinberger and L.K. Saul. Distance metric learning for large margin nearest neighbor classification. The Journal of Machine Learning Research, 10:207–244, 2009. 457 S HALIT, W EINSHALL AND C HECHIK J. Weston, S. Bengio, and N. Usunier. Wsabie: Scaling up to large vocabulary image annotation. In Proceedings of the 22nd International Joint Conference on Artificial Intelligence (IJCAI–11), 2011. E.P. Xing, A.Y. Ng, M.I. Jordan, and S. Russell. Distance metric learning, with application to clustering with side-information. In Advances in Neural Information Processing Systems, volume 15, pages 505–512. MIT Press, 2002. L. Yang. An overview of distance metric learning. Technical report, School of Computer Science, Carnegie Mellon University, 2007. Y. Yang and J.O. Pedersen. A comparative study on feature selection in text categorization. In Proceedings of the 14th International Conference on Machine Learning, pages 412–420, 1997. 458