acl acl2010 acl2010-133 acl2010-133-reference knowledge-graph by maker-knowledge-mining

133 acl-2010-Hierarchical Search for Word Alignment

Source: pdf

Author: Jason Riesa ; Daniel Marcu

Abstract: We present a simple yet powerful hierarchical search algorithm for automatic word alignment. Our algorithm induces a forest of alignments from which we can efficiently extract a ranked k-best list. We score a given alignment within the forest with a flexible, linear discriminative model incorporating hundreds of features, and trained on a relatively small amount of annotated data. We report results on Arabic-English word alignment and translation tasks. Our model outperforms a GIZA++ Model-4 baseline by 6.3 points in F-measure, yielding a 1.1 BLEU score increase over a state-of-the-art syntax-based machine translation system.

reference text

Phil Blunsom and Trevor Cohn. 2006. Discriminative Word Alignment with Conditional Random Fields. In Proceedings of the 44th Annual Meeting of the ACL. Sydney, Australia. Peter F. Brown, Stephen A. Della Pietra, Vincent Della J. Pietra, and Robert L. Mercer. 1993. The mathematics of statistical machine translation: Parameter estimation. Computational Linguistics, 19(2):263– 312. MIT Press. Camrbidge, MA. USA. 165 Colin Cherry and Dekang Lin. 2006. Soft Syntactic Constraints for Word Alignment through Discriminative Training. In Proceedings of the 44th Annual Meeting of the ACL. Sydney, Australia. David Chiang. 2007. Hierarchical phrase-based translation. Computational Linguistics. 33(2):201–228. MIT Press. Cambridge, MA. USA. David Chiang, Yuval Marton, and Philip Resnik. 2008. Online Large-Margin Training of Syntactic and Structural Translation Features. In Proceedings of EMNLP. Honolulu, HI. USA. Michael Collins. 2003. Head-Driven Statistical Models for Natural Language Parsing. Computational Linguistics. 29(4):589–637. MIT Press. Cambridge, MA. USA. Michael Collins 2002. Discriminative training methods for hidden markov models: Theory and experiments with perceptron algorithms. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. John DeNero and Dan Klein. 2007. Tailoring Word Alignments to Syntactic Machine Translation. In Proceedings of the 45th Annual Meeting of the ACL. Prague, Czech Republic. Alexander Fraser and Daniel Marcu. 2007. Getting the Structure Right for Word Alignment: LEAF. In Proceedings of EMNLP-CoNLL. Prague, Czech Republic. Victoria Fossum, Kevin Knight, and Steven Abney. 2008. Using Syntax to Improve Word Alignment Precision for Syntax-Based Machine Translation. In Proceedings of the Third Workshop on Statistical Machine Translation. Columbus, Ohio. Dan Klein and Christopher D. Manning. 2001 . Parsing and Hypergraphs. In Proceedings of the 7th International Workshop on Parsing Technologies. Beijing, China. Aria Haghighi, John Blitzer, and Dan Klein. 2009. Better Word Alignments with Supervised ITG Models. In Proceedings of ACL-IJCNLP 2009. Singapore. Liang Huang and David Chiang. 2005. Better k-best Parsing. In Proceedings of the 9th International Workshop on Parsing Technologies. Vancouver, BC. Canada. Liang Huang and David Chiang. 2007. Forest Rescoring: Faster Decoding with Integrated Language Models. In Proceedings of the 45th Annual Meeting of the ACL. Prague, Czech Republic. Liang Huang. 2008. Forest Reranking: Discriminative Parsing with Non-Local Features. In Proceedings of the 46th Annual Meeting of the ACL. Columbus, OH. USA. Michel Galley, Mark Hopkins, Kevin Knight, and Daniel Marcu. 2004. What’s in a Translation Rule? In Proceedings of NAACL. Michel Galley, Jonathan Graehl, Kevin Knight, Daniel Marcu, Steve DeNeefe, Wei Wang, and Ignacio Thayer. 2006. Scalable Inference and Training of Context-Rich Syntactic Models In Proceedings of the 44th Annual Meeting of the ACL. Sydney, Australia. Abraham Ittycheriah and Salim Roukos. 2005. A max- imum entropy word aligner for Arabic-English machine translation. In Proceedings of HLT-EMNLP. Vancouver, BC. Canada. Simon Lacoste-Julien, Ben Taskar, Dan Klein, and Michael I. Jordan. 2006. Word alignment via Quadratic Assignment. In Proceedings of HLTEMNLP. New York, NY. USA. Yang Liu, Qun Liu, and Shouxun Lin. 2005. Loglinear Models for Word Alignment In Proceedings of the 43rd Annual Meeting of the ACL. Ann Arbor, Michigan. USA. Robert C. Moore. 2005. A Discriminative Framework for Word Alignment. In Proceedings of EMNLP. Vancouver, BC. Canada. Robert C. Moore, Wen-tau Yih, and Andreas Bode. 2006. Improved Discriminative Bilingual Word Alignment In Proceedings of the 44th Annual Meeting of the ACL. Sydney, Australia. Franz Josef Och and Hermann Ney. 2003. A Systematic Comparison of Various Statistical Alignment Models. Computational Linguistics. 29(1): 19–52. MIT Press. Cambridge, MA. USA. Slav Petrov, Leon Barrett, Romain Thibaux and Dan Klein 2006. Learning Accurate, Compact, and In- terpretable Tree Annotation In Proceedings of the 44th Annual Meeting of the ACL. Sydney, Australia. Kishore Papineni, Salim Roukos, T. Ward, and W-J. Zhu. 2002. BLEU: A Method for Automatic Evaluation of Machine Translation In Proceedings of the 40th Annual Meeting of the ACL. Philadelphia, PA. USA. Ben Taskar, Simon Lacoste-Julien, and Dan Klein. 2005. A Discriminative Matching Approach to Word Alignment. In Proceedings of HLT-EMNLP. Vancouver, BC. Canada. David Talbot and Thorsten Brants. 2008. Randomized Language Models via Perfect Hash Functions. In Proceedings of ACL-08: HLT. Columbus, OH. USA. Dekai Wu. 1997. Stochastic inversion transduction grammars and bilingual parsing of parallel corpora. Computational Linguistics. 23(3):377–404. MIT Press. Cambridge, MA. USA. 166