nips nips2011 nips2011-14 nips2011-14-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Martin O. Larsson, Johan Ugander
Abstract: Latent variable mixture models are a powerful tool for exploring the structure in large datasets. A common challenge for interpreting such models is a desire to impose sparsity, the natural assumption that each data point only contains few latent features. Since mixture distributions are constrained in their L1 norm, typical sparsity techniques based on L1 regularization become toothless, and concave regularization becomes necessary. Unfortunately concave regularization typically results in EM algorithms that must perform problematic non-concave M-step maximizations. In this work, we introduce a technique for circumventing this difficulty, using the so-called Mountain Pass Theorem to provide easily verifiable conditions under which the M-step is well-behaved despite the lacking concavity. We also develop a correspondence between logarithmic regularization and what we term the pseudo-Dirichlet distribution, a generalization of the ordinary Dirichlet distribution well-suited for inducing sparsity. We demonstrate our approach on a text corpus, inferring a sparse topic mixture model for 2,406 weblogs. 1
[1] T. Hofmann. Unsupervised learning by probabilistic latent semantic analysis. Machine Learning, 42:177–196, 2001.
[2] D.M. Blei, A.Y. Ng, and M.I. Jordan. Latent Dirichlet allocation. Journal of Machine Learning Research, 3:993–1022, 2003.
[3] A. Bosch, A. Zisserman, and X. Munoz. Scene Classification via pLSA. In European Conference on Computer Vision, 2006.
[4] I. Psorakis and B. Sheldon. Soft Partitioning in Networks via Baysian Non-negative Matrix Factorization. In NIPS, 2010.
[5] C. Ding, T. Li, and W. Peng. Nonnegative matrix factorization and probabilistic latent semantic indexing: Equivalence chi-square statistic, and a hybrid method. In Proceedings of AAAI ’06, volume 21, page 342, 2006.
[6] E. Gaussier and C. Goutte. Relation between PLSA and NMF and implications. In Proceedings of ACM SIGIR, pages 601–602. ACM, 2005.
[7] A. Asuncion, M. Welling, P. Smyth, and Y.W. Teh. On smoothing and inference for topic models. In Proc. of the 25th Conference on Uncertainty in Artificial Intelligence, pages 27–34, 2009.
[8] A. Gelman. Bayesian data analysis. CRC Press, 2004.
[9] R. Courant. Dirichlet’s principle, conformal mapping, and minimal surfaces. Interscience, New York, 1950.
[10] Y. Jabri. The Mountain Pass Theorem: Variants, Generalizations and Some Applications. Cambridge University Press, 2003.
[11] J. Schler, M. Koppel, S. Argamon, and J. Pennebaker. Effects of age and gender on blogging. In Proc. of the AAAI Spring Symposium on Computational Approaches for Analyzing Weblogs, pages 191–197, 2006.
[12] S. Bird, E. Klein, and Loper E. Natural language processing with Python. O’Reilly Media, 2009.
[13] E.J. Cand` s, M.B. Wakin, and S.P. Boyd. Enhancing sparsity by reweighted e Journal of Fourier Analysis and Applications, 14:877–905, 2008. 9 1 minimization.