nips nips2005 nips2005-48 nips2005-48-reference knowledge-graph by maker-knowledge-mining

48 nips-2005-Context as Filtering

Source: pdf

Author: Daichi Mochihashi, Yuji Matsumoto

Abstract: Long-distance language modeling is important not only in speech recognition and machine translation, but also in high-dimensional discrete sequence modeling in general. However, the problem of context length has almost been neglected so far and a na¨ve bag-of-words history has been ı employed in natural language processing. In contrast, in this paper we view topic shifts within a text as a latent stochastic process to give an explicit probabilistic generative model that has partial exchangeability. We propose an online inference algorithm using particle ﬁlters to recognize topic shifts to employ the most appropriate length of context automatically. Experiments on the BNC corpus showed consistent improvement over previous methods involving no chronological order. 1

reference text

[1] Jay M. Ponte and W. Bruce Croft. A Language Modeling Approach to Information Retrieval. In Proc. of SIGIR ’98, pages 275–281, 1998.

[2] David Cohn and Thomas Hofmann. The Missing Link: a probabilistic model of document content and hypertext connectivity. In NIPS 2001, 2001.

[3] David M. Blei, Andrew Y. Ng, and Michael I. Jordan. Latent Dirichlet Allocation. Journal of Machine Learning Research, 3:993–1022, 2003.

[4] Daniel Gildea and Thomas Hofmann. Topic-based Language Models Using EM. In Proc. of EUROSPEECH ’99, pages 2167–2170, 1999.

[5] Takuya Mishina and Mikio Yamamoto. Context adaptation using variational Bayesian learning for ngram models based on probabilistic LSA. IEICE Trans. on Inf. and Sys., J87-D-II(7):1409– 1417, 2004.

[6] Zoubin Ghahramani and Michael I. Jordan. Factorial Hidden Markov Models. In Advances in Neural Information Processing Systems (NIPS), volume 8, pages 472–478. MIT Press, 1995.

[7] Yuguo Chen and Tze Leung Lai. Sequential Monte Carlo Methods for Filtering and Smoothing in Hidden Markov Models. Discussion Paper 03-19, Institute of Statistics and Decision Sciences, Duke University, 2003.

[8] H. Chernoff and S. Zacks. Estimating the Current Mean of a Normal Distribution Which is Subject to Changes in Time. Annals of Mathematical Statistics, 35:999–1018, 1964.

[9] Yi-Chin Yao. Estimation of a noisy discrete-time step function: Bayes and empirical Bayes approaches. Annals of Statistics, 12:1434–1447, 1984.

[10] Arnaud Doucet, Nando de Freitas, and Neil Gordon. Sequential Monte Carlo Methods in Practice. Statistics for Engineering and Information Science. Springer-Verlag, 2001.

[11] Mikio Yamamoto and Kugatsu Sadamitsu. Dirichlet Mixtures in Text Modeling. CS Technical Report CS-TR-05-1, University of Tsukuba, 2005. http://www.mibel.cs.tsukuba.ac.jp/˜myama/ pdf/dm.pdf.

[12] Kamal Nigam, Andrew K. McCallum, Sebastian Thrun, and Tom M. Mitchell. Text Classiﬁcation from Labeled and Unlabeled Documents using EM. Machine Learning, 39(2/3):103–134, 2000.

[13] Thomas P. Minka. Estimating a Dirichlet distribution, 2000. http://research.microsoft.com/ ˜minka/papers/dirichlet/.

[14] K. Sj¨lander, K. Karplus, M.P. Brown, R. Hughey, R. Krogh, I.S. Mian, and D. Haussler. Diricho let Mixtures: A Method for Improved Detection of Weak but Signiﬁcant Protein Sequence Homology. Computing Applications in the Biosciences, 12(4):327–245, 1996.

[15] D. J. C. MacKay and L. Peto. A Hierarchical Dirichlet Language Model. Natural Language Engineering, 1(3):1–19, 1994.