jmlr jmlr2010 jmlr2010-52 jmlr2010-52-reference knowledge-graph by maker-knowledge-mining

52 jmlr-2010-Incremental Sigmoid Belief Networks for Grammar Learning


Source: pdf

Author: James Henderson, Ivan Titov

Abstract: We propose a class of Bayesian networks appropriate for structured prediction problems where the Bayesian network’s model structure is a function of the predicted output structure. These incremental sigmoid belief networks (ISBNs) make decoding possible because inference with partial output structures does not require summing over the unboundedly many compatible model structures, due to their directed edges and incrementally specified model structure. ISBNs are specifically targeted at challenging structured prediction problems such as natural language parsing, where learning the domain’s complex statistical dependencies benefits from large numbers of latent variables. While exact inference in ISBNs with large numbers of latent variables is not tractable, we propose two efficient approximations. First, we demonstrate that a previous neural network parsing model can be viewed as a coarse mean-field approximation to inference with ISBNs. We then derive a more accurate but still tractable variational approximation, which proves effective in artificial experiments. We compare the effectiveness of these models on a benchmark natural language parsing task, where they achieve accuracy competitive with the state-of-the-art. The model which is a closer approximation to an ISBN has better parsing accuracy, suggesting that ISBNs are an appropriate abstract model of natural language grammar learning. Keywords: Bayesian networks, dynamic Bayesian networks, grammar learning, natural language parsing, neural networks


reference text

Dan M. Bikel. Intricacies of Collins’ parsing model. Computational Linguistics, 30(4):479–511, 2004. Leon Bottou. Une approche th´ oretique de l’apprentissage connexionniste: Applications a la ree ` connaissance de la parole. PhD thesis, Universit´ de Paris XI, Paris, France, 1991. e Eugene Charniak. A maximum-entropy-inspired parser. In Proceedings of the First Meeting of North American Chapter of Association for Computational Linguistics, pages 132–139, Seattle, Washington, 2000. Eugene Charniak and Mark Johnson. Coarse-to-fine n-best parsing and MaxEnt discriminative reranking. In Proceedings of the 43rd Meeting of Association for Computational Linguistics, pages 173–180, Ann Arbor, MI, 2005. Michael Collins. Head-Driven Statistical Models for Natural Language Parsing. PhD thesis, University of Pennsylvania, Philadelphia, PA, 1999. Persi Diaconis and Bradley Efron. Computer-intensive methods in statistics. Scientific American, pages 116–130, 1983. Richard Durbin, Sean R. Eddy, Anders Krogh, and Graeme J. Mitchison. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids. Cambridge University Press, Cambridge, UK, 1998. Jenny R. Finkel, Alex Kleeman, and Christopher D. Manning. Efficient feature-based, conditional random field parsing. In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics, pages 959–967, Columbus, Ohio, June 2008. Stuart Geman and Donald Geman. Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6:721– 741, 1984. Andrea Gesmundo, James Henderson, Paola Merlo, and Ivan Titov. A latent variable model of synchronous syntactic-semantic parsing for multiple languages. In Proceedings of the 13th Conference on Computational Natural Language Learning: Shared Task, pages 37–42, 2009. James Henderson. Inducing history representations for broad coverage statistical parsing. In Proceedings of the joint meeting of North American Chapter of the Association for Computational Linguistics and the Human Language Technology Conference, pages 103–110, Edmonton, Canada, 2003. James Henderson. Discriminative training of a neural network statistical parser. In Proceedings of the 42nd Meeting of Association for Computational Linguistics, Barcelona, Spain, 2004. James Henderson, Paola Merlo, Gabriele Musillo, and Ivan Titov. A latent variable model of synchronous parsing for syntactic and semantic dependencies. In Proceedings of the 12th Conference on Computational Natural Language Learning, pages 178–182, Manchester, UK, 2008. 3567 H ENDERSON AND T ITOV Geoffrey E. Hinton, Peter Dayan, Brendan Frey, and Radford Neal. The wake-sleep algorithm for unsupervised neural networks. Science, 268:1158–1161, 1995. Mark Johnson. PCFG models of linguistic tree representations. Computational Linguistics, 24(4): 613–632, 1998. Michael I. Jordan, Zoubin Ghahramani, Tommi Jaakkola, and Lawrence K. Saul. An introduction to variational methods for graphical models. In Michael I. Jordan, editor, Learning in Graphical Models. MIT Press, Cambridge, MA, 1999. Terry Koo and Michael Collins. Hidden-variable models for discriminative reranking. In Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing, pages 507–514, 2005. Kenichi Kurihara and Taisuke Sato. An application of the variational bayesian approach to probabilistic context-free grammars. In Proceedings of the International Joint Conference on Natural Language Processing, Workshop: Beyond Shallow Analyses, 2004. John Lafferty, Andrew McCallum, and Fernando Pereira. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the 18th International Conference on Machine Learning, pages 282–289. Morgan Kaufmann, San Francisco, CA, 2001. Percy Liang, Slav Petrov, Michael Jordan, and Dan Klein. The infinite PCFG using hierarchical dirichlet processes. In Proceedings of the Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pages 688–697, 2007. Mitchell P. Marcus, Beatrice Santorini, and Mary Ann Marcinkiewicz. Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics, 19(2):313–330, 1993. Takuya Matsuzaki, Yusuke Miyao, and Jun’ichi Tsujii. Probabilistic CFG with latent annotations. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, pages 75–82, 2005. Kevin P. Murphy. Dynamic Belief Networks: Representation, Inference and Learning. PhD thesis, University of California, Berkeley, CA, 2002. Gabriele Musillo and Paola Merlo. Unlexicalised hidden variable models of split dependency grammars. In Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics, 2008. Radford Neal. Connectionist learning of belief networks. Artificial Intelligence, 56:71–113, 1992. Slav Petrov and Dan Klein. Improved inference for unlexicalized parsing. In Proceedings of the Conference on Human Language Technology and North American chapter of the Association for Computational Linguistics, pages 404–411, 2007. Slav Petrov, Leon Barrett, Romain Thibaux, and Dan Klein. Learning accurate, compact, and interpretable tree annotation. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics, pages 433–440, 2006. 3568 I NCREMENTAL S IGMOID B ELIEF N ETWORKS Detlef Prescher. Head-driven PCFGs with latent-head statistics. In Proceedings of the 9th International Workshop on Parsing Technology, pages 115–124, 2005. William H. Press, Brian Flannery, Saul A. Teukolsky, and William T. Vetterling. Numerical Recipes. Cambridge University Press, Cambridge, UK, 1996. Adwait Ratnaparkhi. A maximum entropy model for part-of-speech tagging. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pages 133–142, Univ. of Pennsylvania, PA, 1996. Stefan Riezler, Tracy H. King, Ronald M. Kaplan, Richard Crouch, John T. Maxwell, III, and Mark Johnson. Parsing the Wall Street Journal using a Lexical-Functional Grammar and discriminative estimation techniques. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pages 271–278, 2002. Brian Roark and Mark Johnson. Efficient probabilistic top-down and left-corner parsing. In Proceedings of the 37th Meeting of Association for Computational Linguistics, pages 421–428, 1999. Khashayar Rohanimanesh, Michael Wick, and Andrew McCallum. Inference and learning in large factor graphs with adaptive proposal distributions and a rank-based objective. Technical Report UM-CS-2009-008, University of Massachusetts, 2009. David E. Rumelhart, Geoffrey E. Hinton, and Ronald J. Williams. Learning internal representations by error propagation. In D. E. Rumelhart and J. L. McClelland, editors, Parallel Distributed Processing, Vol 1, pages 318–362. MIT Press, Cambridge, MA, 1986. Brian Sallans. Reinforcement Learning for Factored Markov Decision Processes. PhD thesis, University of Toronto, Toronto, Canada, 2002. Lawrence K. Saul and Michael I. Jordan. A mean field learning algorithm for unsupervised neural networks. In Michael I. Jordan, editor, Learning in Graphical Models, pages 541–554. MIT Press, Cambridge, MA, 1999. Lawrence K. Saul, Tommi Jaakkola, and Michael I. Jordan. Mean field theory for sigmoid belief networks. Journal of Artificial Intelligence Research, 4:61–76, 1996. Virginia Savova and Leon Peshkin. Dependency parsing with dynamic Bayesian network. In AAAI, 20th National Conference on Artificial Intelligence, pages 1112–1117, Pittsburgh, Pennsylvania, 2005. Khalil Sima’an. Computational complexity of probabilistic disambiguation. Grammars, 5:125–151, 1992. Eljas Soisalon-Soininen and Esko Ukkonen. A method for transforming grammars into LL(k) form. Acta Informatica, 12:339–369, 1979. Ben Taskar, Dan Klein, Michael Collins, Daphne Koller, and Christopher D. Manning. Max-margin parsing. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, Barcelona, Spain, 2004. 3569 H ENDERSON AND T ITOV Ivan Titov and James Henderson. Loss minimization in parse reranking. In Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing, pages 560–567, 2006. Ivan Titov and James Henderson. A latent variable model for generative dependency parsing. In Proceedings of the 10th International Conference on Parsing Technologies, pages 144–155, 2007. Joseph Turian and I. Dan Melamed. Constituent parsing by classification. In Proceedings of the Ninth International Workshop on Parsing Technology, pages 141–151, 2005. Joseph Turian and I. Dan Melamed. Advances in discriminative parsing. In Proceedings of the Annual Meeting of the Association for Computational Linguistics and the International Conference on Computational Linguistics, Sydney, Australia, 2006. Joseph Turian, Benjamin Wellington, and I. Dan Melamed. Scalable discriminative learning for natural language parsing and translation. In Proc. 20th Conference on Neural Information Processing Systems, Vancouver, Canada, 2006. 3570