nips nips2010 nips2010-70 nips2010-70-reference knowledge-graph by maker-knowledge-mining

70 nips-2010-Efficient Optimization for Discriminative Latent Class Models

Source: pdf

Author: Armand Joulin, Jean Ponce, Francis R. Bach

Abstract: Dimensionality reduction is commonly used in the setting of multi-label supervised classiﬁcation to control the learning capacity and to provide a meaningful representation of the data. We introduce a simple forward probabilistic model which is a multinomial extension of reduced rank regression, and show that this model provides a probabilistic interpretation of discriminative clustering methods with added beneﬁts in terms of number of hyperparameters and optimization. While the expectation-maximization (EM) algorithm is commonly used to learn these probabilistic models, it usually leads to local maxima because it relies on a non-convex cost function. To avoid this problem, we introduce a local approximation of this cost function, which in turn leads to a quadratic non-convex optimization problem over a product of simplices. In order to maximize quadratic functions, we propose an efﬁcient algorithm based on convex relaxations and lowrank representations of the data, capable of handling large-scale problems. Experiments on text document classiﬁcation show that the new model outperforms other supervised dimensionality reduction methods, while simulations on unsupervised clustering show that our probabilistic formulation has better properties than existing discriminative clustering methods. 1

reference text

[1] David M. Blei, Andrew Y. Ng, Michael I. Jordan, and John Lafferty. Latent Dirichlet Allocation. Journal of Machine Learning Research, 3, 2003.

[2] David M. Blei and Jon D. Mcauliffe. Supervised topic models. In Advances in Neural Information Processing Systems (NIPS), 2007.

[3] R. A. Jacobs, M. I. Jordan, S. J. Nowlan, and G. E. Hinton. Adaptive mixtures of local experts. Neural Computation, 3(1):79–87, 1991.

[4] H. Larochelle and Y. Bengio. Classiﬁcation using discriminative restricted boltzmann machines. In Proceedings of the international conference on Machine learning (ICML), 2008.

[5] Linli Xu, James Neufeld, Bryce Larson, and Dale Schuurmans. Maximum margin clustering. In Advances in Neural Information Processing Systems (NIPS), 2004.

[6] Linli Xu. Unsupervised and semi-supervised multi-class support vector machines. In AAAI, 2005.

[7] F. Bach and Z. Harchaoui. Diffrac : a discriminative and ﬂexible framework for clustering. In Advances in Neural Information Processing Systems (NIPS), 2007.

[8] N. Quadrianto, T. Caetano, J. Lim, and D. Schuurmans. Convex relaxation of mixture regression with efﬁcient algorithms. In Advances in Neural Information Processing Systems (NIPS), 2009.

[9] G. E. Hinton and R. R. Salakhutdinov. Reducing the dimensionality of data with neural networks. Science, 313(5786):504, 2006.

[10] T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning. Springer-Verlag, 2001.

[11] David R Hunter and Kenneth Lange. A tutorial on MM algorithms. The American Statistician, 58(1):30– 37, February 2004.

[12] J Shawe-Taylor and N Cristianini. Kernel Methods for Pattern Analysis. Cambridge Univ Press, 2004.

[13] Gene H. Golub and Charles F. Van Loan. Matrix computations. Johns Hopkins University Press, 3rd edition, October 1996.

[14] Kurt Anstreicher and Samuel Burer. D.C. versus copositive bounds for standard QP. Journal of Global Optimization, 33(2):299–312, October 2005.

[15] S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge Univ. Press, 2004.

[16] Samuel Burer. Optimizing a polyhedral-semideﬁnite relaxation of completely positive programs. Mathematical Programming Computation, 2(1):1–19, March 2010.

[17] M. Journ´ e, F. Bach, P.-A. Absil, and R. Sepulchre. Low-rank optimization for semideﬁnite convex e problems. volume 20, pages 2327–2351. SIAM Journal on Optimization, 2010.

[18] A. Berman and N. Shaked-Monderer. Completely Positive Matrices. World Scientiﬁc Publishing Company, 2003.

[19] D. Bertsekas. Nonlinear programming. Athena Scientiﬁc, 1995.

[20] P. Brucker. An O(n) algorithm for quadratic knapsack problems. In Journal of Optimization Theory and Applications, volume 134, pages 549–554, 1984.

[21] M. Grant and S. Boyd. CVX: Matlab software for disciplined convex programming, version 1.21. http://cvxr.com/cvx, August 2010.

[22] Patrick L. Combettes. Solving monotone inclusions via compositions of nonexpansive averaged operators. Optimization, 53:475–504, 2004.

[23] Simon Lacoste-Julien. Discriminative Machine Learning with Structure. PhD thesis, University of California, Berkeley, 2009.

[24] Kai Zhang, Ivor W. Tsang, and James T. Kwok. Maximum margin clustering made practical. In Proceedings of the international conference on Machine learning (ICML), 2007.

[25] Thomas G. Dietterich and Richard H. Lathrop. Solving the multiple-instance problem with axis-parallel rectangles. Artiﬁcial Intelligence, 89:31–71, 1997.

[26] P. Felzenszwalb, D. Mcallester, and D. Ramanan. A discriminatively trained, multiscale, deformable part model. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR), 2008.

[27] A. Joulin, F. Bach, and J. Ponce. Discriminative clustering for image co-segmentation. In Proceedings of the Conference on Computer Vision and Pattern Recognition (CVPR), 2010. 9