nips nips2012 nips2012-19 nips2012-19-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Anima Anandkumar, Yi-kai Liu, Daniel J. Hsu, Dean P. Foster, Sham M. Kakade
Abstract: Topic modeling is a generalization of clustering that posits that observations (words in a document) are generated by multiple latent factors (topics), as opposed to just one. This increased representational power comes at the cost of a more challenging unsupervised learning problem of estimating the topic-word distributions when only words are observed, and the topics are hidden. This work provides a simple and efficient learning procedure that is guaranteed to recover the parameters for a wide class of topic models, including Latent Dirichlet Allocation (LDA). For LDA, the procedure correctly recovers both the topic-word distributions and the parameters of the Dirichlet prior over the topic mixtures, using only trigram statistics (i.e., third order moments, which may be estimated with documents containing just three words). The method, called Excess Correlation Analysis, is based on a spectral decomposition of low-order moments via two singular value decompositions (SVDs). Moreover, the algorithm is scalable, since the SVDs are carried out only on k × k matrices, where k is the number of latent factors (topics) and is typically much smaller than the dimension of the observation (word) space. 1
[1] David M. Blei, Andrew Ng, and Michael Jordan. Latent dirichlet allocation. JMLR, 3:993–1022, 2003.
[2] R. A. Redner and H. F. Walker. Mixture densities, maximum likelihood and the EM algorithm. SIAM Review, 26(2):195–239, 1984.
[3] A. Asuncion, P. Smyth, M. Welling, D. Newman, I. Porteous, and S. Triglia. Distributed gibbs sampling for latent variable models. In Scaling Up Machine Learning: Parallel and Distributed Approaches. Cambridge Univ Pr, 2011.
[4] M.D. Hoffman, D.M. Blei, and F. Bach. Online learning for latent dirichlet allocation. In NIPS, 2010. 8
[5] Thomas Hofmann. Probilistic latent semantic analysis. In UAI, 1999.
[6] Daniel D. Lee and H. Sebastian Seung. Learning the parts of objects by nonnegative matrix factorization. Nature, 401, 1999.
[7] K. Pearson. Contributions to the mathematical theory of evolution. Phil. Trans. of the Royal Society, London, A., 1894.
[8] A. Anandkumar, D. P. Foster, D. Hsu, S. M. Kakade, and Y.-K. Liu. Two svds suffice: spectral decompositions for probabilistic topic models and latent dirichlet allocation, 2012. arXiv:1204.6703.
[9] Christos H. Papadimitriou, Prabhakar Raghavan, Hisao Tamaki, and Santosh Vempala. Latent semantic indexing: A probabilistic analysis. J. Comput. Syst. Sci., 61(2), 2000.
[10] S. Dasgupta. Learning mixutres of Gaussians. In FOCS, 1999.
[11] S. Dasgupta and L. Schulman. A two-round variant of em for gaussian mixtures. In UAI, 2000.
[12] S. Arora and R. Kannan. Learning mixtures of arbitrary Gaussians. In STOC, 2001.
[13] S. Vempala and G. Wang. A spectral algorithm for learning mixtures of distributions. In FOCS, 2002.
[14] R. Kannan, H. Salmasian, and S. Vempala. The spectral method for general mixture models. In COLT, 2005.
[15] D. Achlioptas and F. McSherry. On spectral learning of mixtures of distributions. In COLT, 2005.
[16] K. Chaudhuri and S. Rao. Learning mixtures of product distributions using correlations and independence. In COLT, 2008.
[17] S. C. Brubaker and S. Vempala. Isotropic PCA and affine-invariant clustering. In FOCS, 2008.
[18] K. Chaudhuri, S. M. Kakade, K. Livescu, and K. Sridharan. Multi-view clustering via canonical correlation analysis. In ICML, 2009.
[19] A. T. Kalai, A. Moitra, and G. Valiant. Efficiently learning mixtures of two Gaussians. In STOC, 2010.
[20] M. Belkin and K. Sinha. Polynomial learning of distribution families. In FOCS, 2010.
[21] A. Moitra and G. Valiant. Settling the polynomial learnability of mixtures of Gaussians. In FOCS, 2010.
[22] A. Anandkumar, D. Hsu, and S. M. Kakade. A method of moments for mixture models and hidden markov models. In COLT, 2012.
[23] S. Arora, , R. Ge, and A. Moitra. Learning topic models — going beyond svd. In FOCS, 2012.
[24] J. T. Chang. Full reconstruction of Markov models on evolutionary trees: Identifiability and consistency. Mathematical Biosciences, 137:51–73, 1996.
[25] E. Mossel and S. Roch. Learning nonsingular phylogenies and hidden Markov models. Annals of Applied Probability, 16(2):583–614, 2006.
[26] D. Hsu, S. M. Kakade, and T. Zhang. A spectral algorithm for learning hidden Markov models. In COLT, 2009.
[27] Jean-Franois Cardoso and Pierre Comon. Independent component analysis, a survey of some algebraic methods. In IEEE International Symposium on Circuits and Systems, pages 93–96, 1996.
[28] P. Comon and C. Jutten. Handbook of Blind Source Separation: Independent Component Analysis and Applications. Academic Press. Elsevier, 2010.
[29] Alan M. Frieze, Mark Jerrum, and Ravi Kannan. Learning linear transformations. In FOCS, 1996.
[30] P. Q. Nguyen and O. Regev. Learning a parallelepiped: Cryptanalysis of GGH and NTRU signatures. Journal of Cryptology, 22(2):139–160, 2009.
[31] S. Arora, R. Ge, A. Moitra, and S. Sachdeva. Provable ICA with unknown Gaussian noise, and implications for Gaussian mixtures and autoencoders. In NIPS, 2012.
[32] R. Ando and T. Zhang. Two-view feature generation model for semi-supervised learning. In ICML, 2007.
[33] Sham M. Kakade and Dean P. Foster. Multi-view regression via canonical correlation analysis. In COLT, 2007.
[34] H. Hotelling. The most predictable criterion. Journal of Educational Psychology, 26(2):139–142, 1935.
[35] Mark Steyvers and Tom Griffiths. Probabilistic topic models. In T. Landauer, D. Mcnamara, S. Dennis, and W. Kintsch, editors, Latent Semantic Analysis: A Road to Meaning. Laurence Erlbaum, 2006.
[36] A. Bunse-Gerstner, R. Byers, and V. Mehrmann. Numerical methods for simultaneous diagonalization. SIAM Journal on Matrix Analysis and Applications, 14(4):927–949, 1993.
[37] A. Anandkumar, R. Ge, D. Hsu, S. M. Kakade, and T. Telgarsky. Tensor decompositions for learning latent variable models, 2012. arXiv:1210.7559. 9