emnlp emnlp2011 emnlp2011-108 emnlp2011-108-reference knowledge-graph by maker-knowledge-mining

108 emnlp-2011-Quasi-Synchronous Phrase Dependency Grammars for Machine Translation

Source: pdf

Author: Kevin Gimpel ; Noah A. Smith

Abstract: We present a quasi-synchronous dependency grammar (Smith and Eisner, 2006) for machine translation in which the leaves of the tree are phrases rather than words as in previous work (Gimpel and Smith, 2009). This formulation allows us to combine structural components of phrase-based and syntax-based MT in a single model. We describe a method of extracting phrase dependencies from parallel text using a target-side dependency parser. For decoding, we describe a coarse-to-fine approach based on lattice dependency parsing of phrase lattices. We demonstrate performance improvements for Chinese-English and UrduEnglish translation over a phrase-based baseline. We also investigate the use of unsupervised dependency parsers, reporting encouraging preliminary results.

reference text

bush bush bush efforts after palestinian election efforts after palestinian elections : u.s. will enhance peace efforts after palestinian election us to boost peace efforts after palestinian elections : bush : us set to boost peace : us to step up peace Figure 2: (a) Moses translation output along with γ, φ, and a. An English gloss is shown above the Chinese sentence and above the gloss is shown the dependency parse from the Stanford parser. (b) QPDG system output with additional structure τφ. (c) reference translations. types of improvements further in §8. 7.2 Unsupervised Parsing Our results thus far use supervised parsers for both Chinese and English, but parsers are only available for a small fraction of the languages we would like to translate. Fortunately, unsupervised dependency grammar induction has improved substantially in recent years due to a flurry of recent research. While attachment accuracies on standard treebank test sets are still relatively low, it may be the case that even though unsupervised parsers do not match treebank annotations very well, they may perform well when used for extrinsic applications. We believe that syntax-based MT offers a compelling platform for development and extrinsic evaluation of unsupervised parsers. In this paper, we use the standard dependency model with valence (DMV; Klein and Manning, 2004). When training is initialized using the output of a simpler, concave dependency model, the rectly translated and reordered, but the system was nonetheless able to use it to improve the fluency of the output. 482 DMV can approach state-of-the-art unsupervised accuracy (Gimpel and Smith, 2011). For English, the resulting parser achieves 53.1% attachment accuracy on Section 23 of the Penn Treebank (Marcus et al., 1993), which approaches the 55.7% accuracy of a recent state-of-the-art unsupervised model (Blunsom and Cohn, 2010). The Chinese parser, initialized and trained the same way, achieves 44.4%, which is the highest reported accuracy on the Chinese Treebank (Xue et al., 2004) test set. Most unsupervised grammar induction models assume gold standard POS tags and sentences stripped of punctuation. We use the Stanford tagger (Toutanova et al., 2003) to obtain tags for both English and Chinese, parse the sentences without punctuation using the DMV, and then attach punc- tuation tokens to the root word of the tree in a postprocessing step. For English, the predicted parents agreed with those of TurboParser for 48.7% of the tokens in the corpus. We considered all four scenarios: supervised and unsupervised English parsing paired with supervised and unsupervised Chinese parsing. Table 6 shows ZHsu npsuerpveirsveidsedu3 12ns.1 u82p (e3r 3v4.is7 e64d)EN3 s1 .u89p68e( r3 v4 is.e79d8 ) Moses31.33 (33.84) Table 6: Results when using unsupervised dependency parsers. Cells contain averaged % BLEU on the three test sets and % BLEU on tuning data (MT03) in parentheses. of sentences in the MT03 tuning set, both before MERT (column 2) and after (column 3). “Same” versions oftreeto-tree configuration features are shown; the rarer “swap” features showed a similar trend. BLEU scores averaged over the three test sets with tuning data BLEU in parentheses. Surprisingly, we achieve our best results when using the unsupervised English parser in place of the supervised one (+0.79 over Moses), while keeping the Chinese parser supervised. Competitive performance is also found by using the unsupervised Chinese parser and supervised English parser (+0.53 over Moses). However, when using unsupervised parsers for both languages, performance was below that of Moses. During tuning for this configuration, we found that MERT struggled to find good parameter estimates, typically converging to suboptimal solutions after a small number of iterations. We believe this is due to the large number of features (37), the noise in the parse trees, and known instabilities of MERT. In future work we plan to experiment with training algorithms that are more stable and that can handle larger numbers of features. 8 Analysis To understand what our model learns during MER training, we computed the feature vectors of the best derivation for each sentence in the tuning data at 483 both the start and end of tuning. Table 7 shows these feature values averaged across all tuning sentences. The first four features are the configurations from Fig. 1, in order from left to right. From these rows, we can observe that the model learns to encourage swapping when generating right children and penalize swapping for left children. In addition to objects, right children in English are often prepositional phrases, relative clauses, or other modifiers; as we noted above, Chinese generally places these modifiers before their heads, requiring reordering during translation. Here the model appears to be learning this reordering behavior. From the second set of features, we see that the model learns to favor producing dependency trees that are mostly isomorphic to the source tree, by favoring root-root and parent-child configurations at the expense of most others. 9 Discussion In looking at BLEU score differences between the two systems, the unigram precisions were typically equal or only slightly different, while precisions for higher-order n-grams contained the bulk of the improvement. This suggests that our system is not finding substantially better translations for individual words in the input, but rather is focused on reordering the existing translations. This is not surprising given our choice of features, which focus on syntactic language modeling and syntax-based reordering. The obvious next step for our framework is to include bilingual rules that include source syntax (Quirk et al., 2005), target syntax (Shen et al., 2008), and syntax on both sides. Our framework allows integrating together all of these and other types of structures, with the ultimate goal of combining the strengths of multiple approaches to translation in a single model. Acknowledgments We thank Chris Dyer and the anonymous reviewers for helpful comments that improved this paper. This research was supported in part by the NSF through grant IIS0844507, the U. S. Army Research Laboratory and the U. S. Army Research Office under contract/grant number W91 1NF-10-1-0533, and Sandia National Laboratories (fellowship to K. Gimpel). M. Auli, A. Lopez, H. Hoang, and P. Koehn. 2009. A systematic analysis of translation model search spaces. In Proceedings of the Fourth Workshop on Statistical Machine Translation. S. Bergsma and C. Cherry. 2010. Fast and accurate arc filtering for dependency parsing. In Proc. of COLING. P. Blunsom and T. Cohn. 2010. Unsupervised induction of tree substitution grammars for dependency parsing. In Proc. of EMNLP. P. F. Brown, P. V. deSouza, R. L. Mercer, V. J. Della Pietra, and J. C. Lai. 1992. Class-based n-gram models of natural language. Computational Linguistics, 18. X. Carreras and M. Collins. 2009. Non-projective parsing for statistical machine translation. In Proc. of EMNLP. P. Chang, M. Galley, and C. Manning. 2008. Optimizing Chinese word segmentation for machine translation performance. In Proc. of the Third Workshop on Statistical Machine Translation. S. Chen and J. Goodman. 1998. An empirical study of smoothing techniques for language modeling. Technical report 10-98, Harvard University. D. Chiang. 2005. A hierarchical phrase-based model for statistical machine translation. In Proc. of ACL. D. Das and N. A. Smith. 2009. Paraphrase identification as probabilistic quasi-synchronous recognition. In Proc. of ACL-IJCNLP. S. DeNeefe, K. Knight, W. Wang, and D. Marcu. 2007. What can syntax-based MT learn from phrase-based MT? In Proc. of EMNLP-CoNLL. J. DeNero, S. Kumar, C. Chelba, and F. J. Och. 2010. Model combination for machine translation. In Proc. of NAACL. M. Dymetman and N. Cancedda. 2010. Intersecting hierarchical and phrase-based models of translation. formal aspects and algorithms. In Proc. of SSST-4. J. Eisner, E. Goldlust, and N. A. Smith. 2005. Compiling Comp Ling: Practical weighted dynamic programming and the Dyna language. In Proc. of HLTEMNLP. J. Eisner. 1996. Three new probabilistic models for dependency parsing: An exploration. In Proc. of COLING. R. W. Floyd. 1962. Algorithm 97: Shortest path. Communications of the ACM, 5(6). M. Galley and C. D. Manning. 2009. Quadratic-time dependency parsing for machine translation. In Proc. of ACL-IJCNLP. M. Galley and C. D. Manning. 2010. Accurate nonhierarchical phrase-based translation. In Proc. of NAACL. 484 M. Galley, J. Graehl, K. Knight, D. Marcu, S. DeNeefe, W. Wang, and I. Thayer. 2006. Scalable inference and training of context-rich syntactic translation models. In Proc. of COLING-ACL. K. Gimpel and N. A. Smith. 2009. Feature-rich translation by quasi-synchronous lattice parsing. In Proc. of EMNLP. K. Gimpel and N. A. Smith. 2011. Concavity and initial- ization for unsupervised dependency grammar induction. Technical report, Carnegie Mellon University. L. Huang, K. Knight, and A. Joshi. 2006. Statistical syntax-directed translation with extended domain of locality. In Proc. of AMTA. D. Klein and C. D. Manning. 2004. Corpus-based induction of syntactic structure: Models of dependency and constituency. In Proc. of ACL. P. Koehn, F. J. Och, and D. Marcu. 2003. Statistical phrase-based translation. In Proc. of HLT-NAACL. P. Koehn, A. Axelrod, A. Birch Mayne, C. CallisonBurch, M. Osborne, and D. Talbot. 2005. Edinburgh system description for the 2005 iwslt speech translation evaluation. In Proc. of IWSLT. P. Koehn, H. Hoang, A. Birch, C. Callison-Burch, M. Federico, N. Bertoldi, B. Cowan, W. Shen, C. Moran, R. Zens, C. Dyer, O. Bojar, A. Constantin, and E. Herbst. 2007. Moses: Open source toolkit for statistical machine translation. In Proc. of ACL (demo session). R. Levy and C. D. Manning. 2003. Is it harder to parse chinese, or the chinese treebank? In Proc. of ACL. P. Liang. 2005. Semi-supervised learning for natural language. Master’s thesis, Massachusetts Institute of Technology. Y. Liu, H. Mi, Y. Feng, and Q. Liu. 2009. Joint decoding with multiple translation models. In Proc. of ACL-IJCNLP. 2008. Tera-scale translation models via pattern matching. In Proc. of COLING. W. Macherey, F. Och, I. Thayer, and J. Uszkoreit. 2008. Lattice-based minimum error rate training for statistiA. Lopez. cal machine translation. In EMNLP. and M. A. Marcinkiewicz. 1993. Building a large annotated corpus of English: The Penn Treebank. Computational Linguistics, 19:313–330. A. F. T. Martins, N. A. Smith, and E. P. Xing. 2009. Concise integer linear programming formulations for dependency parsing. In Proc. of ACL. M.-J. Nederhof. 2003. Weighted deductive parsing and knuth’s algorithm. Computational Linguistics, 29(1). F. J. Och and H. Ney. 2002. Discriminative training and maximum entropy models for statistical machine M. P. Marcus, B. Santorini, translation. In Proc. of ACL. F. J. Och and H. Ney. 2003. A systematic comparison of various statistical alignment models. Computational Linguistics, 29(1). F. J. Och. 2003. Minimum error rate training for statistical machine translation. In Proc. of ACL. K. Papineni, S. Roukos, T. Ward, and W.J. Zhu. 2001. BLEU: a method for automatic evaluation of machine translation. In Proc. of ACL. S. Petrov. 2009. Coarse-to-Fine Natural Language Processing. Ph.D. thesis, University of California at Berkeley. C. Quirk, A. Menezes, and C. Cherry. 2005. Dependency treelet translation: Syntactically informed phrasal SMT. In Proc. of ACL. L. Shen, J. Xu, and R. Weischedel. 2008. A new stringto-dependency machine translation algorithm with a target dependency language model. In Proc. of ACL. A. Sixtus and S. Ortmanns. 1999. High quality word graphs using forward-backward pruning. In Proc. of the IEEE Int. Conf. on Acoustics, Speech and Signal Processing. D. A. Smith and J. Eisner. 2006. Quasi-synchronous grammars: Alignment by soft projection of syntactic dependencies. In Proc. of HLT-NAACL Workshop on A. Zollmann, A. Venugopal, F. J. Och, and J. Ponte. 2008. A systematic comparison of phrase-based, hierarchical and syntax-augmented statistical MT. In Proc. of COLING. Statistical Machine Translation. D. A. Smith and J. Eisner. 2009. Parser adaptation and projection with quasi-synchronous features. In Proc. of EMNLP. A. Stolcke. 2002. SRILM—an extensible language modeling toolkit. In Proc. of ICSLP. K. Toutanova, D. Klein, C. D. Manning, and Y. Singer. 2003. Feature-rich part-of-speech tagging with a cyclic dependency network. In Proc. of HLT-NAACL. R. Tromble, S. Kumar, F. Och, and W. Macherey. 2008. Lattice Minimum Bayes-Risk decoding for statistical machine translation. In EMNLP. N. Ueffing, F. J. Och, and H. Ney. 2002. Generation of word graphs in statistical machine translation. In Proc. of EMNLP. M. Wang, N. A. Smith, and T. Mitamura. 2007. What is the Jeopardy model? a quasi-synchronous grammar for QA. In Proc. of EMNLP-CoNLL. Y. Wu, Q. Zhang, X. Huang, and L. Wu. 2009. Phrase dependency parsing for opinion mining. In Proc. of EMNLP. N. Xue, F. Xia, F.-D. Chiou, and M. Palmer. 2004. The Penn Chinese Treebank: Phrase structure annotation of a large corpus. Natural Language Engineering, 10(4): 1–30. A. Zollmann and A. Venugopal. 2006. Syntax augmented machine translation via chart parsing. In Proc. of NAACL 2006 Workshop on Statistical Machine Translation. 485