emnlp emnlp2010 emnlp2010-60 emnlp2010-60-reference knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Roi Reichart ; Ari Rappoport
Abstract: We introduce a novel training algorithm for unsupervised grammar induction, called Zoomed Learning. Given a training set T and a test set S, the goal of our algorithm is to identify subset pairs Ti, Si of T and S such that when the unsupervised parser is trained on a training subset Ti its results on its paired test subset Si are better than when it is trained on the entire training set T. A successful application of zoomed learning improves overall performance on the full test set S. We study our algorithm’s effect on the leading algorithm for the task of fully unsupervised parsing (Seginer, 2007) in three different English domains, WSJ, BROWN and GENIA, and show that it improves the parser F-score by up to 4.47%.
Markus Becker and Miles Osborne, 2005. A two-stage method for active learning of statistical grammars. IJCAI ’05. Rens Bod, 2006a. An all-subtrees approach to unsupervised parsing. ACL-COLING ’06. Rens Bod, 2006b. Unsupervised parsing with U-DOP. CoNLL ’06. Rens Bod, 2007. Is the end of supervised parsing in sight? ACL ’07. Leo Breiman, 1996. Bagging predictors. Machine Learning, 24(2): 123–140. Rich Caruana and Alexandru Niculescu-Mizil, 2006. An empirical comparison of supervised learning algorithms. ICML ’06. Alexander Clark, 2001. Unsupervised language acquisition: theory and practice. Ph.D. thesis, University of Sussex. Alexander Clark, 2003. Combining distributional and morphological information for part of speech induction. EACL ’03. Shay Cohen, Kevin Gimpel and Noah Smith, 2008. Logistic normal priors for unsupervised probabilistic grammar induction. NIPS ’08. Shay Cohen and Noah Smith, 2009. Shared logistic normal distributions for soft parameter tying in unsupervised grammar induction. NAACL ’09. David Cohn, Les Atlas and Richard Ladner. 1994. Improving generalization with active learning. Machine Learning, 15(2):201–221 . Simon Dennis, 2005. An exemplar-based approach to unsupervised parsing. CogSci ’05. W. N. Francis and H. Kucera 1979. Manual of information to accompany a standard corpus of present-day edited American English, for use with digital computers. Department of Linguistics, Brown University Press, Providence, RI. Yoav Freund and Robert E. Schapire, 1996. Experiments with a new boosting algorithm. ICML ’96. William Headden III, Mark Johnson and David McClosky, 2009. Improving unsupervised dependency parsing with richer contexts and smoothing. NAACL ’09. John Henderson and Eric Brill, 2000. Bagging and boosting a treebank parser. NAACL ’00. Rebecca Hwa. 2004. Sample selection for statistical parsing. Computational Linguistics, 30(3):253–276. Daisuke Kawahara and Kiyotaka Uchimoto 2008. Learning reliability of parses for domain adaptation of dependency parsing. IJCNLP ’08. Dan Klein and Christopher Manning, 2002. A genera- tive constituent-context model for improved grammar induction. ACL ’02. 693 Dan Klein and Christopher Manning, 2004. Corpusbased induction of syntactic structure: Models of dependency and constituency. ACL ’04. Dan Klein, 2005. The unsupervised learning of natural language structure. Ph.D. thesis, Stanford University. Jin–Dong Kim, Tomoko Ohta, Yuka Teteisi and Jun’ichi Tsujii, 2003. GENIA corpus a semantically annotated corpus for bio-textmining. Bioinformatics, (supplement: 11th ISMB) 19:i180–i182, Oxford University Press, 2003. Mitchell Marcus, Beatrice Santorini, and Mary Ann Marcinkiewicz. 1993. Building a large annotated corpus of english: The penn treebank. Computational Linguistics, 19(2):3 13–330. David McClosky, Eugene Charniak, and Mark Johnson, 2006. Reranking and self-training for parser adaptation. ACL-COLING ’06. Sujith Ravi, Kevin Knight and Radu Soricut, 2008. Automatic prediction of parser accuracy. EMNLP ’08. Roi Reichart and Ari Rappoport, 2007. An ensemble method for selection of high quality parses. ACL ’07. Roi Reichart and Ari Rappoport, 2009a. Sample selection for statistical parsers: cognitively driven algo– rithms and evaluation measures. CoNLL ’09. Roi Reichart and Ari Rappoport, 2009b. Automatic selection of high quality parses created by a fully unsupervised parser. CoNLL ’09. Yoav Seginer, 2007. Fast unsupervised incremental parsing. ACL ’07. Noah Smith and Jason Eisner, 2006. Annealing structural bias in multilingual weighted grammar induction. ACL-COLING ’06. Valentin Spitkovsky, Hiyan Alshawi, and Daniel Jurafsky, 2010. From baby steps to leapfrog: how “less is more” in unsupervised dependency parsing. NAACL ’10. Ricardo Vilalta and Irina Rish, 2003. A decomposition of classes via clustering to explain and improve naive bayes. ECML ’03. Alexander Yates, Stefan Schoenmackers and Oren Etzioni, 2006. Detecting parser errors using web-based semantic filters . EMNLP ’06.