nips nips2010 nips2010-120 nips2010-120-reference knowledge-graph by maker-knowledge-mining

120 nips-2010-Improvements to the Sequence Memoizer

Source: pdf

Author: Jan Gasthaus, Yee W. Teh

Abstract: The sequence memoizer is a model for sequence data with state-of-the-art performance on language modeling and compression. We propose a number of improvements to the model and inference algorithm, including an enlarged range of hyperparameters, a memory-efﬁcient representation, and inference algorithms operating on the new representation. Our derivations are based on precise deﬁnitions of the various processes that will also allow us to provide an elementary proof of the “mysterious” coagulation and fragmentation properties used in the original paper on the sequence memoizer by Wood et al. (2009). We present some experimental results supporting our improvements. 1

reference text

[1] F. Wood, C. Archambeau, J. Gasthaus, L. F. James, and Y. W. Teh. A stochastic memoizer for sequence data. In Proceedings of the International Conference on Machine Learning, volume 26, pages 1129–1136, 2009.

[2] J. Gasthaus, F. Wood, and Y. W. Teh. Lossless compression based on the Sequence Memoizer. In James A. Storer and Michael W. Marcellin, editors, Data Compression Conference, pages 337–345, Los Alamitos, CA, USA, 2010. IEEE Computer Society.

[3] Y. W. Teh. A Bayesian interpretation of interpolated Kneser-Ney. Technical Report TRA2/06, School of Computing, National University of Singapore, 2006.

[4] J. Pitman. Coalescents with multiple collisions. Annals of Probability, 27:1870–1902, 1999.

[5] M. W. Ho, L. F. James, and J. W. Lau. Coagulation fragmentation laws induced by general coagulations of two-parameter Poisson-Dirichlet processes. http://arxiv.org/abs/math.PR/0601608, 2006.

[6] Y. W. Teh, M. I. Jordan, M. J. Beal, and D. M. Blei. Hierarchical Dirichlet processes. Journal of the American Statistical Association, 101(476):1566–1581, 2006.

[7] P. Blunsom, T. Cohn, S. Goldwater, and M. Johnson. A note on the implementation of hierarchical Dirichlet processes. In Proceedings of the ACL-IJCNLP 2009 Conference Short Papers, pages 337–340, Suntec, Singapore, August 2009. Association for Computational Linguistics.

[8] J. Pitman and M. Yor. The two-parameter Poisson-Dirichlet distribution derived from a stable subordinator. Annals of Probability, 25:855–900, 1997.

[9] H. Ishwaran and L. F. James. Gibbs sampling methods for stick-breaking priors. Journal of the American Statistical Association, 96(453):161–173, 2001.

[10] L. C. Hsu and P. J.-S. Shiue. A uniﬁed approach to generalized Stirling numbers. Advances in Applied Mathematics, 20:366–384, 1998.

[11] Y. W. Teh. A hierarchical Bayesian language model based on Pitman-Yor processes. In Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, pages 985–992, 2006.

[12] Y. Bengio, R. Ducharme, P. Vincent, and C. Jauvin. A neural probabilistic language model. Journal of Machine Learning Research, 3:1137–1155, 2003. 9