nips nips2012 nips2012-218 nips2012-218-reference knowledge-graph by maker-knowledge-mining

218 nips-2012-Mixing Properties of Conditional Markov Chains with Unbounded Feature Functions


Source: pdf

Author: Mathieu Sinn, Bei Chen

Abstract: Conditional Markov Chains (also known as Linear-Chain Conditional Random Fields in the literature) are a versatile class of discriminative models for the distribution of a sequence of hidden states conditional on a sequence of observable variables. Large-sample properties of Conditional Markov Chains have been first studied in [1]. The paper extends this work in two directions: first, mixing properties of models with unbounded feature functions are being established; second, necessary conditions for model identifiability and the uniqueness of maximum likelihood estimates are being given. 1


reference text

[1] Sinn, M. & Poupart, P. (2011) Asymptotic theory for linear-chain conditional random fields. In Proc. of the 14th International Conference on Artificial Intelligence and Statistics (AISTATS).

[2] Lafferty, J., McCallum, A. & Pereira, F. (2001) Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proc. of the 18th IEEE International Conference on Machine Learning (ICML).

[3] Sutton, C. & McCallum, A. (2006) An introduction to conditional random fields for relational learning. In: Getoor, L. & Taskar, B. (editors), Introduction to Statistical Relational Learning. Cambridge, MA: MIT Press.

[4] Hofmann, T., Sch¨ lkopf, B. & Smola, A.J. (2008) Kernel methods in machine learning. The Annals of o Statstics, Vol. 36, No. 3, 1171-1220.

[5] Xiang, R. & Neville, J. (2011) Relational learning with one network: an asymptotic analysis. In Proc. of the 14th International Conference on Artificial Intelligence and Statistics (AISTATS).

[6] Seneta, E. (2006) Non-Negative Matrices and Markov Chains. Revised Edition. New York, NY: Springer.

[7] Wainwright, M.J. & Jordan, M.I. (2008) Graphical models: exponential families, and variational inference. Foundations and Trends R in Machine Learning, Vol. 1, Nos. 1-2, 1-305.

[8] Cornfeld, I.P., Fomin, S.V. & Sinai, Y.G. (1982) Ergodic Theory. Berlin, Germany: Springer.

[9] Orey, S. (1991) Markov chains with stochastically stationary transition probabilities. The Annals of Probability, Vol. 19, No. 3, 907-928.

[10] Hern´ ndez-Lerma, O. & Lasserre, J.B. (2003) Markov Chains and Invariant Probabilities. Basel, Switzera land: Birkh¨ user. a

[11] Foguel, S.R. (1969) The Ergodic Theory of Markov Processes. Princeton, NJ: Van Nostrand.

[12] Samson, P.-M. (2000) Concentration of measure inequalities for Markov chains and Φ-mixing processes. The Annals of Probability, Vol. 28, No. 1, 416-461.

[13] Kontorovich, L. & Ramanan, K. (2008) Concentration inequalities for dependent random variables via the martingale method. The Annals of Probability, Vol. 36, No. 6, 2126-2158.

[14] Sha, F. & Pereira, F. (2003) Shallow parsing with conditional random fields. In Proc. of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics (HLT-NAACL).

[15] Lehmann, E.L. (1999) Elements of Large-Sample Theory. New York, NY: Springer.

[16] Hoefel, G. & Elkan, C. (2008) Learning a two-stage SVM/CRF sequence classifier. In Proc. of the 17th ACM International Conference on Information and Knowledge Management (CIKM).