emnlp emnlp2010 emnlp2010-114 emnlp2010-114-reference knowledge-graph by maker-knowledge-mining

114 emnlp-2010-Unsupervised Parse Selection for HPSG

Source: pdf

Author: Rebecca Dridan ; Timothy Baldwin

Abstract: Parser disambiguation with precision grammars generally takes place via statistical ranking of the parse yield of the grammar using a supervised parse selection model. In the standard process, the parse selection model is trained over a hand-disambiguated treebank, meaning that without a significant investment of effort to produce the treebank, parse selection is not possible. Furthermore, as treebanking is generally streamlined with parse selection models, creating the initial treebank without a model requires more resources than subsequent treebanks. In this work, we show that, by taking advantage of the constrained nature of these HPSG grammars, we can learn a discriminative parse selection model from raw text in a purely unsupervised fashion. This allows us to bootstrap the treebanking process and provide better parsers faster, and with less resources.

reference text

Jason Baldridge. 2008. Weakly supervised supertagging with grammar-informed initialization. In Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008), pages 57–64, Manchester, UK. Srinivas Bangalore and Aravind K. Joshi. 1999. Supertagging: an approach to almost parsing. Computational Linguistics, 25(2):237–265. Emily M. Bender, Dan Flickinger, and Stephan Oepen. 2002. The grammar matrix: An open-source starterkit for the rapid development of cross-linguistically consistent broad-coverage precision grammars. In Proceedings of the Workshop on Grammar Engineering and Evaluation at the 19th International Con- ference on Computational Linguistics, pages 8–14, Taipei, Taiwan. Emily M. Bender. 2008. Evaluating a crosslinguistic grammar resource: A case study of Wambaya. In Proceedings of the 46th Annual Meeting of the ACL, pages 977–985, Columbus, USA. Philip Blunsom. 2007. Structured Classification for Multilingual Natural Language Processing. Ph.D. thesis, Department of Computer Science and Software Engineering, the University of Melbourne. Eric Brill. 1995. Unsupervised learning of disambiguation rules for part of speech tagging. In Proceedings of the Third Workshop on Very Large Corpora, pages 1–13, Cambridge, USA. Ted Briscoe and John Carroll. 2006. Evaluating the accuracy of an unlexicalised statistical parser on the PARC DepBank. In Proceedings of the 44th Annual Meeting of the ACL and the 21st International Conference on Computational Linguistics, pages 41–48, Sydney, Australia. David Carter. 1997. The treebanker: a tool for supervised training of parsed corpora. In Proceedings of a Workshop on Computational Environments for Grammar Development and Linguistic Engineering, pages 9–15, Madrid, Spain. Eugene Charniak and Mark Johnson. 2005. Coarse-tofine n-best parsing and maxent discriminative rerank- ing. In Proceedings of the 43rd Annual Meeting of the ACL, pages 173–180, Ann Arbor, USA. Stephen Clark and James R. Curran. 2006. Partial training for a lexicalized-grammar parser. In Proceedings of the Human Language Technology Conference of the North American Chapter of the ACL (NAACL), pages 144–151, New York City, USA. Stephen Clark and James R. Curran. 2007a. Formalismindependent parser evaluation with CCG and DepBank. In Proceedings of the 45th Annual Meeting of the ACL, pages 248–255, Prague, Czech Republic. Stephen Clark and James R. Curran. 2007b. Widecoverage efficient statistical parsing with CCG and log-linear models. Computational Linguistics, 33(4):493–552. Ann Copestake, Dan Flickinger, Ivan A. Sag, and Carl Pollard. 2005. Minimal recursion semantics: An introduction. Research on Language and Computation, vol 3(no 4):pp 281–332. Mary Dalrymple. 2006. How much can part-of-speech tagging help parsing? Natural Language Engineering, 12(4):373–389. Rebecca Dridan. 2009. Using lexical statistics to improve HPSG parsing. Ph.D. thesis, Saarland University. Dan Flickinger. 2002. On building a more efficient grammar by exploiting types. In Stephan Oepen, Dan Flickinger, Jun’ichi Tsujii, and Hans Uszkoreit, edi- tors, Collaborative Language Engineering, pages 1 17. Stanford: CSLI Publications. Tadayoshi Hara, Yusuke Miyao, and Jun’ichi Tsujii. 2007. Evaluating impact of re-training a lexical disambiguation model on domain adaptation of an HPSG parser. In Proceedings of the 10th International Conference on Parsing Technology (IWPT 2007), pages 11–22, Prague, Czech Republic. Julia Hockenmaier and Mark Steedman. 2007. CCGbank: A corpus of CCG derivations and dependency structures extracted from the Penn Treebank. Computational Linguistics, 33(3):355–396, September. Mark Johnson. 2007. Why doesnt EM find good HMM POS-taggers? In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pages 296–305, Prague, Czech Republic. Robert Malouf. 2002. A comparison of algorithms for maximum entropy parameter estimation. In Proceedings of the 6th Conference on Natural Language Learning, Taipei, Taiwan. David McClosky, Eugene Charniak, and Mark Johnson. 2006. Effective self-training for parsing. In Proceedings of the Human Language Technology Conference of the NAACL, pages 152–159, New York City, USA. Bernard Merialdo. 1994. Tagging english text with a probabilistic model. Computational Linguistics, 20(2): 155–171. Yusuke Miyao and Jun’ichi Tsujii. 2008. Feature forest models for probabilistic HPSG parsing. Computational Linguistics, 34(1):35–80. Yusuke Miyao, Kenji Sagae, and Jun’ichi Tsujii. 2007. Towards framework-independent evaluation of deep linguistic parsers. In Proceedings of the GEAF 2007 Workshop, Palo Alto, California. 703 Yusuke Miyao, Rune Sætre, Kenji Sagae, Takuya Matsuzaki, and Jun’ichi Tsujii. 2008. Task-oriented evaluation of syntactic parsers and their representations. In Proceedings of the 46th Annual Meeting of the ACL, pages 46–54, Columbus, USA. Stephan Oepen and John Carroll. 2000. Ambiguity packing in constraint-based parsing - practical results. In Proceedings of the 1st Conference of the North American Chapter of the Association for Computational Linguistics, pages 162–169, Seattle, USA. Stephan Oepen, Dan Flickinger, Kristina Toutanova, and Christopher D. Manning. 2004. LinGO redwoods. a rich and dynamic treebank for HPSG. Journal of Research in Language and Computation, 2(4):575–596. Stephan Oepen. 2001 . [incr tsdb()] – competence and performance laboratory. User manual, Computational Linguistics, Saarland University, Saarbr¨ ucken, Germany. Carl Pollard and Ivan A. Sag. 1994. Head-Driven Phrase Structure Grammar. University of Chicago Press, Chicago, USA. Sujith Ravi, Jason Baldridge, and Kevin Knight. 2010a. Minimized models and grammar-informed initialization for supertagging with highly ambiguous lexicons. In Proceedings ofthe 48thAnnual Meeting ofthe Association for Computational Linguistics, pages 495–503, Uppsala, Sweden. Sujith Ravi, Ashish Vaswani, Kevin Knight, and David Chiang. 2010b. Fast, greedy model minimization for unsupervised tagging. In Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010), pages 940–948, Beijing, China. Laura Rimell and Stephen Clark. 2008. Adapting a lexicalized-grammar parser to contrasting domains. In Proceedings of the 2008 Conference on Empirical Methods in Natural Language Processing (EMNLP 2008), pages 475–484, Honolulu, USA. Yves Schabes and Aravind K. Joshi. 1991. Parsing with lexicalized tree adjoining grammar. In Masaru Tomita, editor, Current Issues in Parsing Technology, chapter 3, pages 25–48. Kluwer. Melanie Siegel and Emily M. Bender. 2002. Efficient deep processing of japanese. In Proceedings of the 3rd Workshop on Asian Language Resources and Interna- tional Standardization. Coling 2002 Post-Conference Workshop., Taipei, Taiwan. Yasuhito Tanaka. 2001 . Compilation parallel corpus. pages 265–268, In Proceedings of a multilingual of PACLING 2001, Kitakyushu, Japan. Kristina Toutanova, Chistopher Shieber, Dan Flickinger, Parse disambiguation D. Manning, and Stephan Oepen. 2002. for a rich HPSG grammar. First Workshop on Treebanks and Linguistic (TLT2002), pages Stuart M. 253–263. Alexander Yeh. 2000. More accurate tests for the statistical significance of result differences. In Proceedings of the 18th International Conference on Computational Linguistics (COLING 2000), pages 947–953, Saarbrcken, Germany. Yi Zhang, Stephan Oepen, and John Carroll. 2007. Efficiency in unification-based n-best parsing. In Proceedings of the 10th international conference on parsing technologies (IWPT 2007), pages 48–59, Prague, Czech Republic. 704 In Theories