nips nips2007 nips2007-197 nips2007-197-reference knowledge-graph by maker-knowledge-mining

197 nips-2007-The Infinite Markov Model

Source: pdf

Author: Daichi Mochihashi, Eiichiro Sumita

Abstract: We present a nonparametric Bayesian method of estimating variable order Markov processes up to a theoretically inﬁnite order. By extending a stick-breaking prior, which is usually deﬁned on a unit interval, “vertically” to the trees of inﬁnite depth associated with a hierarchical Chinese restaurant process, our model directly infers the hidden orders of Markov dependencies from which each symbol originated. Experiments on character and word sequences in natural language showed that the model has a comparative performance with an exponentially large full-order model, while computationally much efﬁcient in both time and space. We expect that this basic model will also extend to the variable order hierarchical clustering of general data. 1

reference text

[1] C. E. Shannon. A mathematical theory of communication. Bell System Technical Journal, 27:379–423, 623–656, 1948.

[2] Alberto Apostolico and Gill Bejerano. Optimal amnesic probabilistic automata, or, how to learn and classify proteins in linear time and space. Journal of Computational Biology, 7:381–393, 2000.

[3] F.M.J. Willems, Y.M. Shtarkov, and T.J. Tjalkens. The Context-Tree Weighting Method: Basic Properties. IEEE Trans. on Information Theory, 41:653–664, 1995.

[4] Frederick Jelinek. Statistical Methods for Speech Recognition. Language, Speech, and Communication Series. MIT Press, 1998.

[5] Peter Buhlmann and Abraham J. Wyner. Variable Length Markov Chains. The Annals of Statistics, 27(2):480–513, 1999.

[6] Fernando Pereira, Yoram Singer, and Naftali Tishby. Beyond Word N-grams. In Proc. of the Third Workshop on Very Large Corpora, pages 95–106, 1995.

[7] Dana Ron, Yoram Singer, and Naftali Tishby. The Power of Amnesia. In Advances in Neural Information Processing Systems, volume 6, pages 176–183, 1994.

[8] Andreas Stolcke. Entropy-based Pruning of Backoff Language Models. In Proc. of DARPA Broadcast News Transcription and Understanding Workshop, pages 270–274, 1998.

[9] Yee Whye Teh. A Bayesian Interpretation of Interpolated Kneser-Ney. Technical Report TRA2/06, School of Computing, NUS, 2006.

[10] Sharon Goldwater, Thomas L. Grifﬁths, and Mark Johnson. Interpolating Between Types and Tokens by Estimating Power-Law Generators. In NIPS 2005, 2005.

[11] Daniel Sleator and Robert Tarjan. Self-Adjusting Binary Search Trees. JACM, 32(3):652–686, 1985.

[12] Joshua T. Goodman. A Bit of Progress in Language Modeling, Extended Version. Technical Report MSR–TR–2001–72, Microsoft Research, 2001.

[13] Thomas P. Minka. Estimating a Dirichlet distribution, 2000. http://research.microsoft.com/˜minka/papers/ dirichlet/.

[14] Mark Girolami and Ata Kab´ n. Simplicial Mixtures of Markov Chains: Distributed Modelling of Dya namic User Proﬁles. In NIPS 2003. 2003.

[15] R. Daniel Mauldin, William D. Sudderth, and S. C. Williams. Polya Trees and Random Distributions. Annals of Statistics, 20(3):1203–1221, 1992.