acl acl2010 acl2010-24 acl2010-24-reference knowledge-graph by maker-knowledge-mining

24 acl-2010-Active Learning-Based Elicitation for Semi-Supervised Word Alignment

Source: pdf

Author: Vamshi Ambati ; Stephan Vogel ; Jaime Carbonell

Abstract: Semi-supervised word alignment aims to improve the accuracy of automatic word alignment by incorporating full or partial manual alignments. Motivated by standard active learning query sampling frameworks like uncertainty-, margin- and query-by-committee sampling we propose multiple query strategies for the alignment link selection task. Our experiments show that by active selection of uncertain and informative links, we reduce the overall manual effort involved in elicitation of alignment link data for training a semisupervised word aligner.

reference text

John Blatz, Erin Fitzgerald, George Foster, Simona Gandrabur, Cyril Goutte, Alex Kulesza, Alberto Sanchis, and Nicola Ueffing. 2004. Confidence estimation for machine translation. In Proceedings of Coling 2004, pages 3 15– 321, Geneva, Switzerland, Aug 23–Aug 27. COLING. Peter F. Brown, Vincent J. Della Pietra, Stephen A. Della Pietra, and Robert L. Mercer. 1993. The mathematics of statistical machine translation: parameter estimation. Computational Linguistics, 19(2):263–3 11. Chris Callison-Burch, David Talbot, and Miles Osborne. 2004. Statistical machine translation with word- and sentence-aligned parallel corpora. In ACL 2004, page 175, Morristown, NJ, USA. Association for Computational Linguistics. Colin Cherry and Dekang Lin. 2006. Soft syntactic constraints for word alignment through discriminative training. In Proceedings of the COLING/ACL on Main con- ference poster sessions, pages 105–1 12, Morristown, NJ, USA. Pinar Donmez and Jaime G. Carbonell. 2008. Optimizing estimated loss reduction for active sampling in rank learning. In ICML ’08: Proceedings of the 25th international conference on Machine learning, pages 248–255, New York, NY, USA. ACM. Alexander Fraser and Daniel Marcu. 2006. Semi-supervised training for statistical word alignment. In ACL-44: Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics, pages 769– 776, Morristown, NJ, USA. Association for Computational Linguistics. Alexander Fraser and Daniel Marcu. 2007a. Getting the structure right for word alignment: LEAF. In Proceedings of the 2007 Joint Conference on EMNLP-CoNLL, pages 51–60. Alexander Fraser and Daniel Marcu. 2007b. Measuring word alignment quality for statistical machine translation. Comput. Linguist., 33(3):293–303. Yoav Freund, Sebastian H. Seung, Eli Shamir, and Naftali Tishby. 1997. Selective sampling using the query by committee algorithm. Machine. Learning. , 28(2-3): 133–168. Qin Gao and Stephan Vogel. 2008. Parallel implementa- tions of word alignment tool. In Software Engineering, Testing, and QualityAssuranceforNatural Language Processing, pages 49–57, Columbus, Ohio, June. Association for Computational Linguistics. Gholamreza Haffari and Anoop Sarkar. 2009. Active learning for multilingual statistical machine translation. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, pages 181–189, Suntec, Singapore, August. Association for Computational Linguistics. Fei Huang. 2009. Confidence measure for word alignment. In Proceedings of the Joint ACL and IJCNLP, pages 932– 940, Suntec, Singapore, August. Association for Computational Linguistics. Philipp Koehn, Franz Josef Och, and Daniel Marcu. 2003. Statistical phrase-based translation. In Proc. of the HLT/NAACL, Edomonton, Canada. Philipp Koehn, Hieu Hoang, Alexandra Birch Mayne, Christopher Callison-Burch, Marcello Federico, Nicola Bertoldi, Brooke Cowan, Wade Shen, Christine Moran, Richard Zens, Chris Dyer, Ondrej Bojar, Alexandra Constantin, and Evan Herbst. 2007. Moses: Open source toolkit for statistical machine translation. In ACL Demonstration Session. Alon Lavie and Abhaya Agarwal. 2007. Meteor: an automatic metric for mt evaluation with high levels of correlation with human judgments. In WMT 2007, pages 228– 23 1, Morristown, NJ, USA. David D. Lewis and Jason Catlett. 1994. Heterogeneous uncertainty sampling for supervised learning. In In Proceedings of the Eleventh International Conference on Machine Learning, pages 148–156. Morgan Kaufmann. Hieu T. Nguyen and Arnold Smeulders. 2004. Active learning using pre-clustering. In ICML. Franz JosefOch and Hermann Ney. 2003. A systematic comparison of various statistical alignment models. Computational Linguistics, pages 19–51 . Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. 2002. Bleu: a method for automatic evaluation of machine translation. In ACL 2002, pages 3 11–318, Morristown, NJ, USA. Tobias Scheffer, Christian Decomain, and Stefan Wrobel. 2001. Active hidden markov models for information extraction. In IDA ’01: Proceedings of the 4th International Conference on Advances in Intelligent Data Analysis, pages 309–318, London, UK. Springer-Verlag. Simon Tong and Daphne Koller. 2002. Support vector machine active learning with applications to text classification. Journal of Machine Learning, pages 45–66. Nicola Ueffing and Hermann Ney. 2007. Word-level confidence estimation for machine translation. Comput. Linguist., 33(1):9–40. Hua Wu, Haifeng Wang, and Zhanyi Liu. 2006. Boosting statistical word alignment using labeled and unlabeled data. In Proceedings of the COLING/ACL on Main conference poster sessions, pages 913–920, Morristown, NJ, USA. Association for Computational Linguistics. 370