emnlp emnlp2013 emnlp2013-168 emnlp2013-168-reference knowledge-graph by maker-knowledge-mining

168 emnlp-2013-Semi-Supervised Feature Transformation for Dependency Parsing


Source: pdf

Author: Wenliang Chen ; Min Zhang ; Yue Zhang

Abstract: In current dependency parsing models, conventional features (i.e. base features) defined over surface words and part-of-speech tags in a relatively high-dimensional feature space may suffer from the data sparseness problem and thus exhibit less discriminative power on unseen data. In this paper, we propose a novel semi-supervised approach to addressing the problem by transforming the base features into high-level features (i.e. meta features) with the help of a large amount of automatically parsed data. The meta features are used together with base features in our final parser. Our studies indicate that our proposed approach is very effective in processing unseen data and features. Experiments on Chinese and English data sets show that the final parser achieves the best-reported accuracy on the Chinese data and comparable accuracy with the best known parsers on the English data.


reference text

R.K. Ando and T. Zhang. 2005. A high-performance semi-supervised learning method for text chunking. ACL. Bernd Bohnet. 2010. Top accuracy and fast dependency parsing is not a contradiction. In Proceedings of the 23rd International Conference on Computational Linguistics (Coling 2010), pages 89–97, Beijing, China, August. Coling 2010 Organizing Committee. S. Buchholz and E. Marsi. 2006. CoNLL-X shared task on multilingual dependency parsing. In Proc. of CoNLL-X. SIGNLL. Xavier Carreras. 2007. Experiments with a higher-order projective dependency parser. In Proceedings of the CoNLL Shared Task Session of EMNLP-CoNLL 2007, pages 957–961, Prague, Czech Republic, June. Association for Computational Linguistics. Eugene Charniak, Don Blaheta, Niyu Ge, Keith Hall, John Hale, and Mark Johnson. 2000. BLLIP 198789 WSJ Corpus Release 1, LDC2000T43. Linguistic Data Consortium. Wenliang Chen, Jun’ichi Kazama, Kiyotaka Uchimoto, and Kentaro Torisawa. 2009. Improving dependency parsing with subtrees from auto-parsed data. In Proceedings of EMNLP 2009, pages 570–579, Singapore, August. Wenliang Chen, Min Zhang, and Haizhou Li. 2012. Utilizing dependency language models for graph-based dependency parsing models. In Proceedings of ACL 2012, Korea, July. Koby Crammer and Yoram Singer. 2003. Ultraconservative online algorithms for multiclass problems. J. Mach. Learn. Res., 3:95 1–991 . 1312 Xiangyu Duan, Jun Zhao, and Bo Xu. 2007. Probabilistic models for action-based chinese dependency parsing. In Proceedings of ECML/ECPPKDD, Warsaw, Poland. J. Eisner. 1996. Three new probabilistic models for dependency parsing: An exploration. In Proceedings of COLING1996, pages 340–345. Jun Hatori, Takuya Matsuzaki, Yusuke Miyao, and Jun’ichi Tsujii. 2011. Incremental joint pos tagging and dependency parsing in chinese. In Proceedings of 5th International Joint Conference on Natural Language Processing, pages 1216–1224, Chiang Mai, Thailand, November. Asian Federation of Natural Language Processing. Chu-Ren Huang. 2009. Tagged Chinese Gigaword Version 2.0, LDC2009T14. Linguistic Data Consortium. Terry Koo and Michael Collins. 2010. Efficient thirdorder dependency parsers. In Proceedings of ACL 2010, pages 1–1 1, Uppsala, Sweden, July. Association for Computational Linguistics. T. Koo, X. Carreras, and M. Collins. 2008. Simple semi-supervised dependency parsing. In Proceedings of ACL-08: HLT, Columbus, Ohio, June. Canasai Kruengkrai, Kiyotaka Uchimoto, Jun’ichi Kazama, Yiou Wang, Kentaro Torisawa, and Hitoshi Isahara. 2009. An error-driven word-character hybrid model for joint Chinese word segmentation and POS tagging. In Proceedings of ACL-IJCNLP2009, pages 513–521, Suntec, Singapore, August. Association for Computational Linguistics. Zhenghua Li, Min Zhang, Wanxiang Che, Ting Liu, Wen- liang Chen, and Haizhou Li. 2011. Joint models for chinese pos tagging and dependency parsing. In Proceedings of EMNLP 2011, UK, July. Zhenghua Li, Min Zhang, Wanxiang Che, and Ting Liu. 2012. A separately passive-aggressive training algorithm for joint pos tagging and dependency parsing. In Proceedings of the 24rd International Conference on Computational Linguistics (Coling 2012), Mumbai, India. Coling 2012 Organizing Committee. Mitchell P. Marcus, Beatrice Santorini, and Mary Ann Marcinkiewicz. 1993. Building a large annotated corpus of English: the Penn Treebank. Computational Linguisticss, 19(2):3 13–330. D. McClosky, E. Charniak, and M. Johnson. 2006. Reranking and self-training for parser adaptation. In Proceedings of Coling-ACL, pages 337–344. R. McDonald and J. Nivre. 2007. Characterizing the errors of data-driven dependency parsing models. In Proceedings of EMNLP-CoNLL, pages 122–131. Ryan McDonald and Fernando Pereira. 2006. Online learning of approximate dependency parsing algorithms. In Proceedings of EACL 2006, pages 81–88. McDonald, Koby Crammer, and Fernando Pereira. 2005. Online large-margin training of dependency parsers. In Proceedings of ACL 2005, pages 91–98. Association for Computational Linguistics. Joakim Nivre and Mario Scholz. 2004. Deterministic dependency parsing of English text. In Proc. of the 20th Intern. Conf. on Computational Linguistics (COLING), pages 64–70. J. Nivre, J. Hall, S. K ¨ubler, R. McDonald, J. Nilsson, S. Riedel, and D. Yuret. 2007. The CoNLL 2007 shared task on dependency parsing. In Proceedings of the CoNLL Shared Task Session of EMNLP-CoNLL Ryan Guangyou Zhou, Jun Zhao, Kang Liu, and Li Cai. 2011. Exploiting web-derived selectional preference to improve statistical dependency parsing. In Proceedings of ACL-HLT2011, pages 1556–1565, Portland, Oregon, USA, June. Association for Computational Linguistics. 2007, pages 915–932. Adwait Ratnaparkhi. 1996. A maximum entropy model for part-of-speech tagging. In Proceedings of EMNLP 1996, pages 133–142. K. Sagae and J. Tsujii. 2007. Dependency parsing and domain adaptation with LR models and parser ensembles. In Proceedings of the CoNLL Shared Task Session of EMNLP-CoNLL 2007, pages 1044–1050. Jun Suzuki and Hideki Isozaki. 2008. Semi-supervised sequential labeling and segmentation using Giga-word scale unlabeled data. In Proceedings of ACL-08: HLT, pages 665–673, Columbus, Ohio, June. Association for Computational Linguistics. Jun Suzuki, Hideki Isozaki, Xavier Carreras, and Michael Collins. 2009. An empirical study of semi-supervised structured conditional models for dependency parsing. In Proceedings of EMNLP2009, pages 551–560, Singapore, August. Association for Computational Linguistics. Jun Suzuki, Hideki Isozaki, and Masaaki Nagata. 2011. Learning condensed feature representations from large unsupervised data sets for supervised learning. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, pages 636–641, Portland, Oregon, USA, June. Association for Computational Linguistics. Nianwen Xue, Fei Xia, Fu dong Chiou, and Martha Palmer. 2005. Building a Large Annotated Chinese Corpus: the Penn Chinese Treebank. Journal of Natural Language Engineering, 11(2):207–238. Hiroyasu Yamada and Yuji Matsumoto. 2003. Statistical dependency analysis with support vector machines. In Proceedings of IWPT 2003, pages 195–206. Y. Zhang and S. Clark. 2008. A tale of two parsers: Investigating and combining graph-based and transitionbased dependency parsing. In Proceedings of EMNLP 2008, pages 562–571, Honolulu, Hawaii, October. Yue Zhang and Joakim Nivre. 2011. Transition-based dependency parsing with rich non-local features. In Proceedings of ACL-HLT2011, pages 188–193, Portland, Oregon, USA, June. Association for Computational Linguistics. 1313