acl acl2013 acl2013-275 acl2013-275-reference knowledge-graph by maker-knowledge-mining

275 acl-2013-Parsing with Compositional Vector Grammars

Source: pdf

Author: Richard Socher ; John Bauer ; Christopher D. Manning ; Ng Andrew Y.

Abstract: Natural language parsing has typically been done with small sets of discrete categories such as NP and VP, but this representation does not capture the full syntactic nor semantic richness of linguistic phrases, and attempts to improve on this by lexicalizing phrases or splitting categories only partly address the problem at the cost of huge feature spaces and sparseness. Instead, we introduce a Compositional Vector Grammar (CVG), which combines PCFGs with a syntactically untied recursive neural network that learns syntactico-semantic, compositional vector representations. The CVG improves the PCFG of the Stanford Parser by 3.8% to obtain an F1 score of 90.4%. It is fast to train and implemented approximately as an efficient reranker it is about 20% faster than the current Stanford factored parser. The CVG learns a soft notion of head words and improves performance on the types of ambiguities that require semantic information such as PP attachments.

reference text

Y. Bengio, R. Ducharme, P. Vincent, and C. Janvin. 2003. A neural probabilistic language model. Journal of Machine Learning Research, 3: 1137–1 155. P. F. Brown, P. V. deSouza, R. L. Mercer, V. J. Della Pietra, and J. C. Lai. 1992. Class-based n-gram models of natural language. Computational Linguistics, 18. C. Callison-Burch. 2008. Syntactic constraints on paraphrases extracted from parallel corpora. In Proceedings of EMNLP, pages 196–205. E. Charniak and M. Johnson. 2005. Coarse-to-fine n-best parsing and maxent discriminative reranking. In ACL. E. Charniak. 2000. A maximum-entropy-inspired parser. In Proceedings of ACL, pages 132–139. M. Collins. 1997. Three generative, lexicalised models for statistical parsing. In ACL. M. Collins. 2003. Head-driven statistical models for natural language parsing. Computational Linguistics, 29(4):589–637. R. Collobert and J. Weston. 2008. A unified architecture for natural language processing: deep neural networks with multitask learning. In Proceedings of ICML, pages 160–167. F. Costa, P. Frasconi, V. Lombardo, and G. Soda. 2003. Towards incremental parsing of natural language using recursive neural networks. Applied Intelligence. J. Duchi, E. Hazan, and Y. Singer. 2011. Adaptive subgradient methods for online learning and stochastic optimization. JMLR, 12, July. J. L. Elman. 1991 . Distributed representations, simple recurrent networks, and grammatical structure. Machine Learning, 7(2-3): 195–225. J. R. Finkel, A. Kleeman, and C. D. Manning. 2008. Efficient, feature-based, conditional random field parsing. In Proceedings of ACL, pages 959–967. D. Gildea and M. Palmer. 2002. The necessity of parsing for predicate argument recognition. In Proceedings of ACL, pages 239–246. C. Goller and A. K ¨uchler. 1996. Learning taskdependent distributed representations by backpropagation through structure. In Proceedings of the International Conference on Neural Networks. J. Goodman. MIT. 1998. Parsing Inside-Out. Ph.D. thesis, D. Hall and D. Klein. 2012. Training factored pcfgs with expectation propagation. In EMNLP. J. Henderson. 2003. Neural network probability estimation for broad coverage parsing. In Proceedings of EACL. J. Henderson. 2004. Discriminative training of a neural network statistical parser. In ACL. Liang Huang and David Chiang. 2005. Better k-best parsing. In Proceedings of the 9th International Workshop on Parsing Technologies (IWPT 2005). E. H. Huang, R. Socher, C. D. Manning, and A. Y. Ng. 2012. Improving Word Representations via Global Context and Multiple Word Prototypes. In ACL. D. Kartsaklis, M. Sadrzadeh, and S. Pulman. 2012. A unified sentence space for categorical distributionalcompositional semantics: Theory and experiments. Proceedings of 24th International Conference on Computational Linguistics (COLING): Posters. D. Klein and C. D. Manning. 2003a. Accurate unlexicalized parsing. In Proceedings of ACL, pages 423–430. D. Klein and C.D. Manning. 2003b. Fast exact inference with a factored model for natural language parsing. In NIPS. J. K. Kummerfeld, D. Hall, J. R. Curran, and D. Klein. 2012. Parser showdown at the wall street corral: An empirical investigation of error types in parser output. In EMNLP. Q. V. Le, J. Ngiam, Z. Chen, D. Chia, P. W. Koh, and A. Y. Ng. 2010. Tiled convolutional neural networks. In NIPS. T. Matsuzaki, Y. Miyao, and J. Tsujii. 2005. Probabilistic cfg with latent annotations. In ACL. D. McClosky, E. Charniak, and M. Johnson. 2006. Effective self-training for parsing. In NAACL. S. Menchetti, F. Costa, P. Frasconi, and M. Pontil. 2005. Wide coverage natural language processing using kernel methods and neural networks for structured data. Pattern Recognition Letters, 26(12): 1896–1906. T. Mikolov, W. Yih, and G. Zweig. 2013. Linguistic regularities in continuous spaceword representations. In HLT-NAACL. S. Petrov and D. Klein. 2007. Improved inference for unlexicalized parsing. In NAACL. S. Petrov, L. Barrett, R. Thibaux, and D. Klein. 2006. Learning accurate, compact, and interpretable tree annotation. In Proceedings of ACL, pages 433–440. N. Ratliff, J. A. Bagnell, and M. Zinkevich. 2007. (Online) subgradient methods for structured prediction. In Eleventh International Conference on Artificial Intelligence and Statistics (AIStats). R. Socher, C. D. Manning, and A. Y. Ng. 2010. Learning continuous phrase representations and syntactic parsing with recursive neural networks. In Proceedings of the NIPS-2010 Deep Learning and Unsupervised Feature Learning Workshop. 464 R. Socher, E. H. Huang, J. Pennington, A. Y. Ng, and C. D. Manning. 2011a. Dynamic Pooling and Unfolding Recursive Autoencoders for Paraphrase Detection. In NIPS. MIT Press. R. Socher, C. Lin, A. Y. Ng, and C.D. Manning. 2011b. Parsing Natural Scenes and Natural Language with Recursive Neural Networks. In ICML. R. Socher, B. Huval, C. D. Manning, and A. Y. Ng. 2012. Semantic Compositionality Through Recursive Matrix-Vector Spaces. In EMNLP. B. Taskar, D. Klein, M. Collins, D. Koller, and C. Manning. 2004. Max-margin parsing. In Proceedings of EMNLP, pages 1–8. I. Titov and J. Henderson. 2006. Porting statistical parsers with data-defined kernels. In CoNLL-X. I. Titov and J. Henderson. 2007. Constituent parsing with incremental sigmoid belief networks. In ACL. J. Turian, L. Ratinov, and Y. Bengio. 2010. Word representations: a simple and general method for semisupervised learning. In Proceedings of ACL, pages 384–394. P. D. Turney and P. Pantel. 2010. From frequency to meaning: Vector space models of semantics. Journal of Artificial Intelligence Research, 37: 141–188. 465