nips nips2013 nips2013-342 nips2013-342-reference knowledge-graph by maker-knowledge-mining

342 nips-2013-Unsupervised Spectral Learning of Finite State Transducers


Source: pdf

Author: Raphael Bailly, Xavier Carreras, Ariadna Quattoni

Abstract: Finite-State Transducers (FST) are a standard tool for modeling paired inputoutput sequences and are used in numerous applications, ranging from computational biology to natural language processing. Recently Balle et al. [4] presented a spectral algorithm for learning FST from samples of aligned input-output sequences. In this paper we address the more realistic, yet challenging setting where the alignments are unknown to the learning algorithm. We frame FST learning as finding a low rank Hankel matrix satisfying constraints derived from observable statistics. Under this formulation, we provide identifiability results for FST distributions. Then, following previous work on rank minimization, we propose a regularized convex relaxation of this objective which is based on minimizing a nuclear norm penalty subject to linear constraints and can be solved efficiently. 1


reference text

[1] R. Bailly, X. Carreras, F. M. Luque, and A. Quattoni. Unsupervised spectral learning of wcfg as low-rank matrix completion. EMNLP, 2013.

[2] R. Bailly, F. Denis, and L. Ralaivola. Grammatical inference as a principal component analysis problem. In Proc. ICML, 2009.

[3] B. Balle and M. Mohri. Spectral learning of general weighted automata via constrained matrix completion. In Proc. of NIPS, 2012.

[4] B. Balle, A. Quattoni, and X. Carreras. A spectral learning algorithm for finite state transducers. ECML–PKDD, 2011.

[5] Borja Balle, Ariadna Quattoni, and Xavier Carreras. Local loss optimization in operator models: A new insight into spectral learning. In John Langford and Joelle Pineau, editors, Proceedings of the 29th International Conference on Machine Learning (ICML-12), ICML ’12, pages 1879–1886, New York, NY, USA, July 2012. Omnipress.

[6] M. Bernard, J-C. Janodet, and M. Sebban. A discriminative model of stochastic edit distance in the form of a conditional transducer. Grammatical Inference: Algorithms and Applications, 4201, 2006.

[7] B. Boots, S. Siddiqi, and G. Gordon. Closing the learning planning loop with predictive state representations. I. J. Robotic Research, 2011.

[8] F. Casacuberta. Inference of finite-state transducers by using regular grammars and morphisms. Grammatical Inference: Algorithms and Applications, 1891, 2000.

[9] A. Clark. Partially supervised learning of morphology with stochastic transducers. In Proc. of NLPRS, pages 341–348, 2001.

[10] Shay B. Cohen, Karl Stratos, Michael Collins, Dean P. Foster, and Lyle Ungar. Spectral learning of latent-variable pcfgs. In Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 223–231, Jeju Island, Korea, July 2012. Association for Computational Linguistics.

[11] J. Eisner. Parameter estimation for probabilistic finite-state transducers. In Proc. of ACL, pages 1–8, 2002.

[12] G. Stewart et J.-G. Sun. Matrix perturbation theory. Academic Press, 1990.

[13] Maryam Fazel. Matrix rank minimization with applications. PhD thesis, Stanford University, Electrical Engineering Dept., 2002.

[14] D. Hsu, S. M. Kakade, and T. Zhang. A spectral algorithm for learning hidden markov models. In Proc. of COLT, 2009.

[15] Mehryar Mohri. Finite-state transducers in language and speech processing. Computational Linguistics, 23(2):269–311, 1997.

[16] A.P. Parikh, L. Song, and E.P. Xing. A spectral algorithm for latent tree graphical models. ICML, 2011.

[17] S.M. Siddiqi, B. Boots, and G.J. Gordon. Reduced-rank hidden markov models. AISTATS, 2010.

[18] L. Song, B. Boots, S. Siddiqi, G. Gordon, and A. Smola. Hilbert space embeddings of hidden markov models. ICML, 2010.

[19] L. Zwald and G. Blanchard. On the convergence of eigenspaces in kernel principal component analysis. NIPS, 2005. 9