nips nips2011 nips2011-14 nips2011-14-reference knowledge-graph by maker-knowledge-mining

14 nips-2011-A concave regularization technique for sparse mixture models

Source: pdf

Author: Martin O. Larsson, Johan Ugander

Abstract: Latent variable mixture models are a powerful tool for exploring the structure in large datasets. A common challenge for interpreting such models is a desire to impose sparsity, the natural assumption that each data point only contains few latent features. Since mixture distributions are constrained in their L1 norm, typical sparsity techniques based on L1 regularization become toothless, and concave regularization becomes necessary. Unfortunately concave regularization typically results in EM algorithms that must perform problematic non-concave M-step maximizations. In this work, we introduce a technique for circumventing this difﬁculty, using the so-called Mountain Pass Theorem to provide easily veriﬁable conditions under which the M-step is well-behaved despite the lacking concavity. We also develop a correspondence between logarithmic regularization and what we term the pseudo-Dirichlet distribution, a generalization of the ordinary Dirichlet distribution well-suited for inducing sparsity. We demonstrate our approach on a text corpus, inferring a sparse topic mixture model for 2,406 weblogs. 1

reference text

[1] T. Hofmann. Unsupervised learning by probabilistic latent semantic analysis. Machine Learning, 42:177–196, 2001.

[2] D.M. Blei, A.Y. Ng, and M.I. Jordan. Latent Dirichlet allocation. Journal of Machine Learning Research, 3:993–1022, 2003.

[3] A. Bosch, A. Zisserman, and X. Munoz. Scene Classiﬁcation via pLSA. In European Conference on Computer Vision, 2006.

[4] I. Psorakis and B. Sheldon. Soft Partitioning in Networks via Baysian Non-negative Matrix Factorization. In NIPS, 2010.

[5] C. Ding, T. Li, and W. Peng. Nonnegative matrix factorization and probabilistic latent semantic indexing: Equivalence chi-square statistic, and a hybrid method. In Proceedings of AAAI ’06, volume 21, page 342, 2006.

[6] E. Gaussier and C. Goutte. Relation between PLSA and NMF and implications. In Proceedings of ACM SIGIR, pages 601–602. ACM, 2005.

[7] A. Asuncion, M. Welling, P. Smyth, and Y.W. Teh. On smoothing and inference for topic models. In Proc. of the 25th Conference on Uncertainty in Artiﬁcial Intelligence, pages 27–34, 2009.

[8] A. Gelman. Bayesian data analysis. CRC Press, 2004.

[9] R. Courant. Dirichlet’s principle, conformal mapping, and minimal surfaces. Interscience, New York, 1950.

[10] Y. Jabri. The Mountain Pass Theorem: Variants, Generalizations and Some Applications. Cambridge University Press, 2003.

[11] J. Schler, M. Koppel, S. Argamon, and J. Pennebaker. Effects of age and gender on blogging. In Proc. of the AAAI Spring Symposium on Computational Approaches for Analyzing Weblogs, pages 191–197, 2006.

[12] S. Bird, E. Klein, and Loper E. Natural language processing with Python. O’Reilly Media, 2009.

[13] E.J. Cand` s, M.B. Wakin, and S.P. Boyd. Enhancing sparsity by reweighted e Journal of Fourier Analysis and Applications, 14:877–905, 2008. 9 1 minimization.