nips nips2013 nips2013-298 nips2013-298-reference knowledge-graph by maker-knowledge-mining

298 nips-2013-Small-Variance Asymptotics for Hidden Markov Models

Source: pdf

Author: Anirban Roychowdhury, Ke Jiang, Brian Kulis

Abstract: Small-variance asymptotics provide an emerging technique for obtaining scalable combinatorial algorithms from rich probabilistic models. We present a smallvariance asymptotic analysis of the Hidden Markov Model and its inﬁnite-state Bayesian nonparametric extension. Starting with the standard HMM, we ﬁrst derive a “hard” inference algorithm analogous to k-means that arises when particular variances in the model tend to zero. This analysis is then extended to the Bayesian nonparametric case, yielding a simple, scalable, and ﬂexible algorithm for discrete-state sequence data with a non-ﬁxed number of states. We also derive the corresponding combinatorial objective functions arising from our analysis, which involve a k-means-like term along with penalties based on state transitions and the number of states. A key property of such algorithms is that— particularly in the nonparametric setting—standard probabilistic inference algorithms lack scalability and are heavily dependent on good initialization. A number of results on synthetic and real data sets demonstrate the advantages of the proposed framework. 1

reference text

[1] C. E. Antoniak. Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems. The Annals of Statistics, 2(6):1152–1174, 1974.

[2] A. Banerjee, S. Merugu, I. S. Dhillon, and J. Ghosh. Clustering with Bregman divergences. Journal of Machine Learning Research, 6:1705–1749, 2005.

[3] M. J. Beal, Z. Ghahramani, and C. E. Rasmussen. The inﬁnite hidden Markov model. In Advances in neural information processing systems, 2002.

[4] C. M. Bishop. Pattern Recognition and Machine Learning. Springer, 2006.

[5] T. Broderick, B. Kulis, and M. I. Jordan. MAD-Bayes: MAP-based asymptotic derivations from Bayes. In Proceedings of the 30th International Conference on Machine Learning, 2013.

[6] J. V. Gael, Y. Saatci, Y. W. Teh, and Z. Ghahramani. Beam sampling for the inﬁnite hidden Markov model. In Proceedings of the 25th International Conference on Machine Learning, 2008.

[7] K. Jiang, B. Kulis, and M. I. Jordan. Small-variance asymptotics for exponential family Dirichlet process mixture models. In Advances in Neural Information Processing Systems, 2012.

[8] B. Kulis and M. I. Jordan. Revisiting k-means: New algorithms via Bayesian nonparametrics. In Proceedings of the 29th International Conference on Machine Learning, 2012.

[9] L. R. Rabiner. A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE, 77(2):257–286, 1989.

[10] S. Roweis. EM algorithms for PCA and SPCA. In Advances in Neural Information Processing Systems, 1998.

[11] E. Sudderth. Toward reliable Bayesian nonparametric learning. In NIPS Workshop on Bayesian Noparametric Models for Reliable Planning and Decision-Making Under Uncertainty, 2012.

[12] Y. W. Teh, M. I. Jordan, M. J. Beal, and D. M. Blei. Hierarchical Dirichlet processes. Journal of the American Statistical Association, 101(476):1566–1581, 2006.

[13] M. E. Tipping and C. M. Bishop. Probabilistic principal component analysis. Journal of Royal Statistical Society, Series B, 21(3):611–622, 1999.

[14] S. Tong and D. Koller. Restricted Bayes optimal classiﬁers. In Proc. 17th AAAI Conference, 2000. 9