emnlp emnlp2012 emnlp2012-124 emnlp2012-124-reference knowledge-graph by maker-knowledge-mining

124 emnlp-2012-Three Dependency-and-Boundary Models for Grammar Induction

Source: pdf

Author: Valentin I. Spitkovsky ; Hiyan Alshawi ; Daniel Jurafsky

Abstract: We present a new family of models for unsupervised parsing, Dependency and Boundary models, that use cues at constituent boundaries to inform head-outward dependency tree generation. We build on three intuitions that are explicit in phrase-structure grammars but only implicit in standard dependency formulations: (i) Distributions of words that occur at sentence boundaries such as English determiners resemble constituent edges. (ii) Punctuation at sentence boundaries further helps distinguish full sentences from fragments like headlines and titles, allowing us to model grammatical differences between complete and incomplete sentences. (iii) Sentence-internal punctuation boundaries help with longer-distance dependencies, since punctuation correlates with constituent edges. Our models induce state-of-the-art dependency grammars for many languages without — — special knowledge of optimal input sentence lengths or biased, manually-tuned initializers.

reference text

H. Alshawi, S. Bangalore, and S. Douglas. 2000. Learning dependency translation models as collections of finite-state head transducers. Computational Linguistics, 26. H. Alshawi. 1996a. Head automata for speech translation. In ICSLP. H. Alshawi. 1996b. Method and apparatus for an improved language recognition system. US Patent 1999/5870706. J. K. Baker. 1979. Trainable grammars for speech recognition. In Speech Communication Papers for the 97th Meeting of the Acoustical Society of America. Y. Bengio, J. Louradour, R. Collobert, and J. Weston. 2009. Curriculum learning. In ICML. J. Berant, Y. Gross, M. Mussel, B. Sandbank, E. Ruppin, and S. Edelman. 2006. Boosting unsupervised grammar induction by splitting complex sentences on function words. In BUCLD. T. Berg-Kirkpatrick and D. Klein. 2010. Phylogenetic grammar induction. In ACL. T. Berg-Kirkpatrick, A. Bouchard-C oˆt´ e, J. DeNero, and D. Klein. 2010. Painless unsupervised learning with features. In NAACL-HLT. A. Bhattacharyya. 1943. On a measure of divergence between two statistical populations defined by their probability distributions. BCMS, 35. D. M. Bikel. 2004. Intricacies of Collins’ parsing model. Computational Linguistics, 30. A. Blum and T. Mitchell. 1998. Combining labeled and unlabeled data with co-training. In COLT. M. R. Brent and J. M. Siskind. 2001. The role of exposure to isolated words in early vocabulary development. Cognition, 81. P. F. Brown, V. J. Della Pietra, S. A. Della Pietra, and R. L. Mercer. 1993. The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics, 19. S. Buchholz and E. Marsi. 2006. CoNLL-X shared task on multilingual dependency parsing. In CoNLL. G. Carroll and E. Charniak. 1992. Two experiments on learning probabilistic dependency grammars from corpora. Technical report, Brown University. S. B. Cohen and N. A. Smith. 2009. Shared logistic normal distributions for soft parameter tying in unsupervised grammar induction. In NAACL-HLT. S. B. Cohen and N. A. Smith. 2010. Viterbi training for PCFGs: Hardness results and competitiveness of uniform initialization. In ACL. S. B. Cohen, D. Das, and N. A. Smith. 2011. Unsupervised structure prediction with non-parallel multilingual guidance. In EMNLP. M. Collins. 1997. Three generative, lexicalised models for statistical parsing. In ACL. M. Collins. 1999. Head-Driven Statistical Models for Natural Language Parsing. Ph.D. thesis, University of Pennsylvania. M. Collins. 2003. Head-driven statistical models for natural language parsing. Computational Linguistics, 29. J. Eisner and G. Satta. 1999. Efficient parsing for bilexical context-free grammars and head-automaton grammars. In ACL. J. M. Eisner. 1996. An empirical comparison of probability models for dependency grammar. Technical report, IRCS. J. Eisner. 2000. Bilexical grammars and their cubic-time parsing algorithms. In H. C. Bunt and A. Nijholt, editors, Advances in Probabilistic and Other Parsing Technologies. Kluwer Academic Publishers. J. L. Elman. 1993. Learning and development in neural networks: The importance of starting small. Cognition, 48. R. Frank. 2000. From regular to context-free to mildly contextsensitive tree rewriting systems: The path of child language acquisition. In A. Abeill´ e and O. Rambow, editors, Tree Adjoining Grammars: Formalisms, Linguistic Analysis and Processing. CSLI Publications. J. Gillenwater, K. Ganchev, J. Gra ¸ca, B. Taskar, and F. Pereira. 2009. Sparsity in grammar induction. In NIPS: Grammar Induction, Representation of Language and Language Learning. J. Gillenwater, K. Ganchev, J. Gra ¸ca, F. Pereira, and B. Taskar. 2010. Posterior sparsity in unsupervised dependency parsing. Technical report, University of Pennsylvania. K. Gimpel and N. A. Smith. 2011. Concavity and initialization for unsupervised dependency grammar induction. Technical report, CMU. C. H ¨anig. 2010. Improvements in unsupervised co-occurrence based parsing. In CoNLL. W. P. Headden, III, M. Johnson, and D. McClosky. 2009. Improving unsupervised dependency parsing with richer contexts and smoothing. In NAACL-HLT. D. Klein and C. D. Manning. 2004. Corpus-based induction of syntactic structure: Models of dependency and constituency. In ACL. K. A. Krueger and P. Dayan. 2009. Flexible shaping: How learning in small steps helps. Cognition, 110. M. P. Marcus, B. Santorini, and M. A. Marcinkiewicz. 1993. Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics, 19. 698 D. Mare cˇek and Z. Zabokrtsk y´. 2011. Gibbs sampling with treeness constraint in unsupervised dependency parsing. In ROBUS. D. McClosky. 2008. Modeling valence effects in unsupervised grammar induction. Technical report, Brown University. R. McDonald, S. Petrov, and K. Hall. 2011. Multi-source transfer of delexicalized dependency parsers. In EMNLP. T. Naseem, H. Chen, R. Barzilay, and M. Johnson. 2010. Using universal linguistic knowledge to guide grammar induction. In EMNLP. M. S. Nikulin. 2002. Hellinger distance. In M. Hazewinkel, editor, Encyclopaedia of Mathematics. Kluwer Academic Publishers. J. Nivre, J. Hall, S. K ¨ubler, R. McDonald, J. Nilsson, S. Riedel, and D. Yuret. 2007. The CoNLL 2007 shared task on dependency parsing. In EMNLP-CoNLL. M. A. Paskin. 2001a. Cubic-time parsing and learning algorithms for grammatical bigram models. Technical report, UCB. M. A. Paskin. 2001b. Grammatical bigrams. In NIPS. F. Pereira and Y. Schabes. 1992. Inside-outside reestimation from partially bracketed corpora. In ACL. E. Ponvert, J. Baldridge, and K. Erk. 2010. Simple unsupervised identification of low-level constituents. In ICSC. M. S. Rasooli and H. Faili. 2012. Fast unsupervised dependency parsing with arc-standard transitions. In ROBUSUNSUP. R. Samdani, M.-W. Chang, and D. Roth. 2012. Unified expectation maximization. In NAACL-HLT. Y. Seginer. 2007. Learning Syntactic Structure. Ph.D. thesis, University of Amsterdam. A. Søgaard. 2011a. Data point selection for cross-language adaptation of dependency parsers. In ACL. A. Søgaard. 2011b. From ranked words to dependency trees: two-stage unsupervised non-projective dependency parsing. In TextGraphs. V. I. Spitkovsky, H. Alshawi, and D. Jurafsky. 2009. Baby Steps: How “Less is More” in unsupervised dependency parsing. In NIPS: Grammar Induction, Representation of Language and Language Learning. V. I. Spitkovsky, H. Alshawi, and D. Jurafsky. 2011a. Lateen EM: Unsupervised training with multiple objectives, applied to dependency grammar induction. In EMNLP. V. I. Spitkovsky, H. Alshawi, and D. Jurafsky. 2011b. Punctuation: Making a point in unsupervised dependency parsing. In CoNLL. V. I. Spitkovsky, H. Alshawi, and D. Jurafsky. 2012a. Bootstrapping dependency grammar inducers from incomplete sentence fragments via austere models the “wabi-sabi” of unsupervised parsing. In submission. V. I. Spitkovsky, H. Alshawi, and D. Jurafsky. 2012b. Capitalization cues improve dependency grammar induction. In WILS. K. Tu and V. Honavar. 2011. On the utility of curricula in unsupervised learning of probabilistic grammars. In IJCAI. F. Xia and M. Palmer. 2001. Converting dependency structures to phrase structures. In HLT. –