nips nips2010 nips2010-215 nips2010-215-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Nicholas Bartlett, Frank Wood, David Tax
Abstract: We propose a novel Bayesian nonparametric approach to learning with probabilistic deterministic finite automata (PDFA). We define and develop a sampler for a PDFA with an infinite number of states which we call the probabilistic deterministic infinite automata (PDIA). Posterior predictive inference in this model, given a finite training sequence, can be interpreted as averaging over multiple PDFAs of varying structure, where each PDFA is biased towards having few states. We suggest that our method for averaging over PDFAs is a novel approach to predictive distribution smoothing. We test PDIA inference both on PDFA structure learning and on both natural language and DNA data prediction tasks. The results suggest that the PDIA presents an attractive compromise between the computational cost of hidden Markov models and the storage requirements of hierarchically smoothed Markov models. 1
[1] R. Carrasco and J. Oncina. Learning stochastic regular grammars by means of a state merging method. Grammatical Inference and Applications, pages 139–152, 1994.
[2] L. Carroll. Alice’s Adventures in Wonderland. http://www.gutenberg.org/etext/11. Macmillan, 1865. URL
[3] P. Dupont, F. Denis, and Y. Esposito. Links between probabilistic automata and hidden Markov models: probability distributions, learning models and induction algorithms. Pattern recognition, 38(9):1349– 1371, 2005.
[4] J. Feldman and J.F. Hanna. The structure of responses to a sequence of binary events. Journal of Mathematical Psychology, 3(2):371–387, 1966.
[5] J. Gasthaus, F. Wood, and Y. W. Teh. Lossless compression based on the Sequence Memoizer. In Data Compression Conference 2010, pages 337–345, 2010.
[6] A. Gelman, J. B. Carlin, H. S. Stern, and D. B. Rubin. Bayesian data analysis. Chapman & Hall, New York, 1995.
[7] D. J. C. MacKay and L.C. Bauman Peto. A hierarchical Dirichlet language model. Natural language engineering, 1(2):289–307, 1995.
[8] K. Murphy. Hidden Markov model (HMM) toolbox for Matlab, http://www.cs.ubc.ca/ murphyk/Software/HMM/hmm.html. 2005. URL
[9] M.O. Rabin. Probabilistic automata. Information and control, 6(3):230–245, 1963.
[10] L. Rabiner. A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77:257–286, 1989.
[11] A.S. Reber. Implicit learning of artificial grammars. Journal of verbal learning and verbal behavior, 6 (6):855–863, 1967.
[12] D. Ron, Y. Singer, and N. Tishby. The power of amnesia: Learning probabilistic automata with variable memory length. Machine learning, 25(2):117–149, 1996.
[13] C.R. Shalizi and K.L. Shalizi. Blind construction of optimal nonlinear recursive predictors for discrete sequences. In Proceedings of the 20th conference on Uncertainty in Artificial Intelligence, pages 504–511. UAI Press, 2004.
[14] Y. W. Teh. A hierarchical Bayesian language model based on Pitman-Yor processes. In Proceedings of the Association for Computational Linguistics, pages 985–992, 2006.
[15] Y. W. Teh, M. I. Jordan, M. J. Beal, and D. M. Blei. Hierarchical Dirichlet processes. Journal of the American Statistical Association, 101(476):1566–1581, 2006.
[16] F. Thollard. Improving probabilistic grammatical inference core algorithms with post-processing techniques. In Eighteenth International Conference on Machine Learning, pages 561–568, 2001.
[17] F. Thollard, P. Dupont, and C. del la Higuera. Probabilistic DFA inference using Kullback-Leibler divergence and minimality. In Seventeenth International Conference on Machine Learning, pages 975–982. Citeseer, 2000.
[18] F. Wood, C. Archambeau, J. Gasthaus, L. James, and Y. W. Teh. A stochastic memoizer for sequence data. In Proceedings of the 26th International Conference on Machine Learning, pages 1129–1136, Montreal, Canada, 2009. 9