nips nips2007 nips2007-181 nips2007-181-reference knowledge-graph by maker-knowledge-mining

181 nips-2007-Sparse Overcomplete Latent Variable Decomposition of Counts Data

Source: pdf

Author: Madhusudana Shashanka, Bhiksha Raj, Paris Smaragdis

Abstract: An important problem in many ﬁelds is the analysis of counts data to extract meaningful latent components. Methods like Probabilistic Latent Semantic Analysis (PLSA) and Latent Dirichlet Allocation (LDA) have been proposed for this purpose. However, they are limited in the number of components they can extract and lack an explicit provision to control the “expressiveness” of the extracted components. In this paper, we present a learning formulation to address these limitations by employing the notion of sparsity. We start with the PLSA framework and use an entropic prior in a maximum a posteriori formulation to enforce sparsity. We show that this allows the extraction of overcomplete sets of latent components which better characterize the data. We present experimental evidence of the utility of such representations.

reference text

[1] DM Blei and JD Lafferty. Correlated Topic Models. In NIPS, 2006.

[2] DM Blei, AY Ng, and MI Jordan. Latent Dirichlet Allocation. Journal of Machine Learning Research, 3:993–1022, 2003.

[3] ME Brand. Pattern Discovery via Entropy Minimization. In Uncertainty 99: AISTATS 99, 1999.

[4] RM Corless, GH Gonnet, DEG Hare, DJ Jeffrey, and DE Knuth. On the Lambert W Function. Advances in Computational mathematics, 1996.

[5] DJ Field. What is the Goal of Sensory Coding? Neural Computation, 1994.

[6] T Hofmann. Unsupervised Learning by Probabilistic Latent Semantic Analysis. Machine Learning, 42:177–196, 2001.

[7] PO Hoyer. Non-negative Matrix Factorization with Sparseness Constraints. Journal of Machine Learning Research, 5, 2004.

[8] DD Lee and HS Seung. Algorithms for Non-negative Matrix Factorization. In NIPS, 2001.

[9] J Skilling. Classic Maximum Entropy. In J Skilling, editor, Maximum Entropy and Bayesian Methods. Kluwer Academic, 1989. 8