emnlp emnlp2010 emnlp2010-98 knowledge-graph by maker-knowledge-mining

98 emnlp-2010-Soft Syntactic Constraints for Hierarchical Phrase-Based Translation Using Latent Syntactic Distributions


Source: pdf

Author: Zhongqiang Huang ; Martin Cmejrek ; Bowen Zhou

Abstract: In this paper, we present a novel approach to enhance hierarchical phrase-based machine translation systems with linguistically motivated syntactic features. Rather than directly using treebank categories as in previous studies, we learn a set of linguistically-guided latent syntactic categories automatically from a source-side parsed, word-aligned parallel corpus, based on the hierarchical structure among phrase pairs as well as the syntactic structure of the source side. In our model, each X nonterminal in a SCFG rule is decorated with a real-valued feature vector computed based on its distribution of latent syntactic categories. These feature vectors are utilized at decod- ing time to measure the similarity between the syntactic analysis of the source side and the syntax of the SCFG rules that are applied to derive translations. Our approach maintains the advantages of hierarchical phrase-based translation systems while at the same time naturally incorporates soft syntactic constraints.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu Abstract In this paper, we present a novel approach to enhance hierarchical phrase-based machine translation systems with linguistically motivated syntactic features. [sent-3, score-0.509]

2 In our model, each X nonterminal in a SCFG rule is decorated with a real-valued feature vector computed based on its distribution of latent syntactic categories. [sent-5, score-0.644]

3 These feature vectors are utilized at decod- ing time to measure the similarity between the syntactic analysis of the source side and the syntax of the SCFG rules that are applied to derive translations. [sent-6, score-0.55]

4 Our approach maintains the advantages of hierarchical phrase-based translation systems while at the same time naturally incorporates soft syntactic constraints. [sent-7, score-0.586]

5 , (Wu, 1997; Chiang, 2007)) extract synchronous grammars from parallel corpora based on the hierarchical structure of natural language pairs without any explicit linguistic knowledge or annotations. [sent-26, score-0.448]

6 On the one hand, hierarchical phrase-based models do not suffer from errors in syntactic constraints that are unavoidable in linguistically syntax-based models. [sent-28, score-0.445]

7 On the other hand, when properly used, syntactic constraints can provide invaluable benefits to improve translation quality. [sent-34, score-0.331]

8 tc ho2d0s10 in A Nsastoucira tlio Lnan fogru Cagoem Ppruotcaetisosninagl, L pinag eusis 1t3ic8s–147, icantly outperform hierarchical phrase-based models when using forest-based rule extraction together with forest-based decoding. [sent-37, score-0.314]

9 Chiang (2010) also obtained significant improvement over his hierarchical baseline by using syntactic parse trees on both source and target sides to induce fuzzy (not exact) tree-to-tree rules and by also allowing syntactically mismatched substitutions. [sent-38, score-0.7]

10 In this paper, we augment rules in hierarchical phrase-based translation systems with novel syntactic features. [sent-39, score-0.535]

11 In our model, two symbolically different sequences of syntactic categories could have a high similarity score in the feature vector representation if they are syntactically similar, and a low score otherwise. [sent-44, score-0.489]

12 In decoding, these feature vectors are utilized to measure the similarity between the syntactic analysis of the source side and the syntax of the SCFG rules that are applied to derive translations. [sent-45, score-0.55]

13 Our approach maintains the advantages of hierarchical phrase-based translation systems while at the same time naturally incorporates soft syntactic constraints. [sent-46, score-0.586]

14 Section 3 presents an overview of our approach, followed by Section 4 describing the hierarchical structure of aligned phrase pairs and Section 5 describing how to induce latent syntactic categories. [sent-50, score-0.823]

15 , rule) rewrites a nonterminal into a pair of strings, γ and α, where γ (or α) contains terminal and nonterminal symbols from the source (or target) language and there is a one-toone correspondence between the nonterminal sym- bols on both sides. [sent-56, score-0.625]

16 Two example English-toChinese translation rules are represented as follows: X X → → hgive the pen to me, 钢笔 给 我i hgive X1to me, X1 给 我i (1) (2) The SCFG rules of hierarchical phrase-based models are extracted automatically from corpora of word-aligned parallel sentence pairs (Brown et al. [sent-58, score-0.791]

17 Widely adopted in phrase-based models (Och and Ney, 2004), a pair of consecutive sequences of words from E and F is a phrase pair if all words are aligned only within the sequences and not to any word outside. [sent-62, score-0.511]

18 We call a sequence of words a phrase if it corresponds to either side of a phrase pair, and a non-phrase otherwise. [sent-63, score-0.491]

19 We call the phrase pairs with all boundary words aligned tight phrase pairs (Zhang et al. [sent-65, score-0.853]

20 A tight phrase pair is the minimal phrase pair among all that share the same set of alignment links. [sent-67, score-0.773]

21 Figure 1 (b) highlights the tight phrase pairs in the example sentence pair. [sent-68, score-0.516]

22 (a) 32561412345 (b) Figure 1: An example of word-aligned sentence pair (a) with tight phrase pairs marked in a matrix representation (b). [sent-69, score-0.597]

23 In the second step, abstract rules are extracted from tight phrase pairs that contain other tight phrase pairs by replacing the sub phrase pairs with co-indexed Xnonterminals. [sent-72, score-1.488]

24 In our example above, rule (2) can be extracted from rule (1) with the following sub phrase pair: X → hthe pen, 钢笔i The use of a unified X nonterminal makes hierarchical phrase-based models flexible at capturing non-local reordering of phrases. [sent-76, score-0.868]

25 t eSdu pfrpoomse a phrase pair wamith r e Ia am reading a book on the source side where X1 is abstracted from the noun phrase pair . [sent-79, score-0.726]

26 cause the nonterminal X1 in the rule was abstracted from a noun phrase on the source side of the training data and would thus be better (more informative) to be applied to phrases of the same type. [sent-84, score-0.648]

27 Zollmann and Venugopal (2006) attempted to address this problem by annotating phrase pairs with 140 treebank categories based on automatic parse trees. [sent-86, score-0.533]

28 , NP+V for she went and DT\NP for great wall, an noun phrase wei wthe a missing \dNetPer fmorine grr on twhael left) to annotate phrase pairs that do not align with syntactic constituents. [sent-89, score-0.632]

29 Their hard syntactic constraint requires that the nonterminals should match exactly to rewrite with a rule, which could rule out poten- tially correct derivations due to errors in the syntactic parses as well as to data sparsity. [sent-90, score-0.518]

30 For example, NP cannot be instantiated with phrase pairs of type DT+NN, in spite of their syntactic similarity. [sent-91, score-0.451]

31 (2009) addressed this problem by directly introducing soft syntactic preferences into SCFG rules using preference grammars, but they had to face the computational challenges of large preference vectors. [sent-93, score-0.331]

32 Chiang (2010) also avoided hard constraints and took a soft alternative that directly models the cost of mismatched rule substitutions. [sent-94, score-0.324]

33 3 Approach Overview In this work, we take a different approach to introduce linguistic syntax to hierarchical phrase-based translation systems and impose soft syntactic constraints between derivation rules and the syntactic parse of the sentence to be translated. [sent-96, score-1.093]

34 For each phrase pair extracted from a sentence pair of a source-side parsed parallel corpus, we abstract its syntax by the sequence of highest root categories, exactly1 which we call a tag sequence, that dominates the syntactic tree fragments of the source-side phrase. [sent-97, score-0.968]

35 The tag sequence for “the pen” is simply “NP” because it is a noun phrase, while phrase “give the pen” is dominated by a verb followed by a noun phrase, and thus its tag sequence is “VBP NP”. [sent-99, score-0.559]

36 fT ahlle syntax of each X in a SCFG rule can be then nonterminal2 1In case of a non-tight phrase pair, we only abstract and compare the syntax of the largest tight part. [sent-101, score-0.755]

37 → characterized by the distribution of tag sequences = (pX(ts1), · ·· ,pX(tsm)), based on the phrase pairs it is abs)t,r·a·c·te ,dp from. [sent-107, score-0.523]

38 Suppose we have a collection of n latent syntactic categories C = {c1, · · · , cn}. [sent-110, score-0.434]

39 3} means that the latent syntactic categories c1, c2, . [sent-116, score-0.434]

40 Similarly, we can represent the syntax of each X nonterminal in a rule with a feature vector computed as the sum of the feature vectors of tag F~(X), 3Other measures such as KL-divergence in the probability space are also feasible. [sent-122, score-0.679]

41 141 sequences weighted by the distribution of tag sequences of the nonterminal X: F~(X) = X pX(ts)F~(ts) tsX∈TS Now we can impose soft syntactic constraints using these feature vectors when a SCFG rule is used to translate a parsed source sentence. [sent-123, score-1.141]

42 In our approach, the set of latent syntactic categories is automatically induced from a source-side parsed, word-aligned parallel corpus based on the hierarchical structure among phrase pairs along with the syntactic parse of the source side. [sent-125, score-1.221]

43 , how to identify the hierarchical structures among all phrase pairs in a sentence pair, and how to induce the latent syntactic cate- gories from the hierarchy to syntactically explain the phrase pairs. [sent-128, score-1.133]

44 4 Alignment-based Hierarchy The aforementioned abstract rule extraction algorithm of Chiang (2007) is based on the property that a tight phrase pair can contain other tight phrase pairs. [sent-129, score-1.005]

45 Given two non-disjoint tight phrase pairs that share at least one common alignment link, there are only two relationships: either one completely includes another or they do not include one another but have a non-empty overlap, which we call a nontrivial overlap. [sent-130, score-0.551]

46 In the second case, the intersection, differences, and union of the two phrase pairs are 4A normalized uniform feature vector is used for tag sequences (of parsed test sentences) that are not seen on the training corpus. [sent-131, score-0.611]

47 Figure 2: A decomposition tree of tight phrase pairs with all tight phrase pairs listed on the right. [sent-132, score-1.256]

48 also tight phrase pairs (see Figure 1 (b) for example), and the two phrase pairs, as well as their intersection and differences, are all sub phrase pairs of their union. [sent-134, score-1.07]

49 (2008) exploited this property to construct a hierarchical decomposition tree (Bui-Xuan et al. [sent-136, score-0.404]

50 , 2005) of phrase pairs from a sentence pair to extract all phrase pairs in linear time. [sent-137, score-0.685]

51 In this paper, we focus on learning the syntactic dependencies along the hierarchy of phrase pairs. [sent-138, score-0.373]

52 Let P be the set of tight phrase pairs extracted froLme a sentence pair. [sent-140, score-0.516]

53 oWf tei cghaltl a sequentially-ordered list5 L = (p1, · · · ,pk) ofunique phrase pairs pi ∈ P a chain if every ,twpo successive phrase pairs i∈n PL have a non-trivial overlap. [sent-141, score-0.636]

54 Note that any sub-sequence of phrase pairs in a chain generates a tight phrase pair. [sent-143, score-0.729]

55 In particular, chain L generates a tight phrase pair τ(L) that corresponds exactly to the union of the alignment links in p ∈ L. [sent-144, score-0.543]

56 hains maximal phrase pairs and call the other phrase pairs non-maximal. [sent-146, score-0.693]

57 Nonmaximal phrase pairs always overlap non-trivially with some other phrase pairs while maximal phrase pairs do not, and it can be shown that any nonmaximal phrase pair can be generated by a sequence of maximal phrase pairs. [sent-147, score-1.617]

58 Note that the largest tight phrase pair that includes all alignment links in A is also a maximal phrase pair. [sent-148, score-0.781]

59 5The phrase pairs can be sequentially ordered first by the boundary positions of the source-side phrase and then by the boundary positions of the target-side phrase. [sent-149, score-0.553]

60 Lemma 1 Given two different maximal phrase pairs p1 and p2, exactly one of the following alternatives is true: p1 and p2 are disjoint, p1 is a sub phrase pair of p2, or p2 is a sub phrase pair of p1. [sent-159, score-1.057]

61 All of the tight phrase pairs of a sentence pair can be extracted directly from the nodes of the decomposition tree (these phrase pairs are maximal), or generated by sequences of consecutive sibling nodes6 (these phrase pairs are non-maximal). [sent-161, score-1.509]

62 Figure 2 shows the decomposition tree as well as all of the tight phrase pairs that can be extracted from the example sentence pair in Figure 1. [sent-162, score-0.821]

63 We then abstract the trees nodes with two symbol, X for phrases, and B for non-phrases, and call the result the decomposition tree of the source side phrases. [sent-165, score-0.378]

64 We further recursively binarize7 the decomposition tree into a binarized decomposition forest such that all phrases are directly represented as nodes in the forest. [sent-167, score-0.548]

65 The binarized decomposition forest compactly encodes the hierarchical structure among phrases and non-phrases. [sent-169, score-0.504]

66 In order to bring in syntactic constraints, we annotate the nodes in the decomposition forest with syntactic observations based on the automatic syntactic parse tree of the source side. [sent-171, score-0.857]

67 We call the resulting forest a syntactic decomposition forest. [sent-175, score-0.4]

68 Figure 3 (d) shows two syntactic decomposition trees of the forest based on the parse tree in Figure 3 (b). [sent-176, score-0.556]

69 We will next describe how to learn finer-grained X and B categories based on the hierarchical syntactic constraints. [sent-177, score-0.46]

70 Such a grammar can be derived from the set of syntactic decomposition forests extracted from a source-side parsed parallel corpus, with rule probability scores estimated as the relative frequencies of the production and emission rules. [sent-180, score-0.745]

71 The X and B nonterminals in the grammar are coarse representations of phrase and non-phrases and do not carry any syntactic information at all. [sent-181, score-0.416]

72 The motivation is to let the latent categories learn different preferences of (emitted) syntactic categories as well as structural dependencies along the hierarchy so that they can carry syntactic information. [sent-183, score-0.757]

73 The learned Xi’s represent syntactically-induced finer-grained categories of phrases and are used as the set of latent syntactic categories C described in Section 3. [sent-185, score-0.565]

74 (2006) introduced latent variables to learn finergrained distinctions of treebank categories for parsing, and Huang et al. [sent-189, score-0.33]

75 9Each binary production rule is now associated with a 3dimensional matrix of probabilities, and each emission rule associated with a 1-dimensional array of probabilities. [sent-194, score-0.354]

76 Recall that our decomposition forests are fully binarized (except the root). [sent-196, score-0.339]

77 o nGoivdeens a fao nredst W WF wwitithh rrooodtu cntoidone R, we d →eno Vte W e(U) vtehen emitted syntactic category at node U and LR(U) (or PL(W), or PR(V ))11 the set of node pairs (V, W) (or (U, V ), or (U, W)) such that h(V, W), Ui is a hyperedge o )f, tohre( Ufo,rWest). [sent-198, score-0.462]

78 Let Ux be the latent syntactic category12 of node U. [sent-200, score-0.361]

79 Once a grammar is learned, for each such node with a corresponding tag sequence ts in forest F, we compute the posterior probability that the latent category of node U being Xi as: P(Xi|ts) =POUT(PUINi()RPI)N(Ui) This contributes P(Xi |ts) evidence that tag sequence ts belongs to a Xi category. [sent-208, score-1.251]

80 The baseline system is our implementation of the hierarchical phrase-based model of Chiang (2007), and it includes basic features such as rule and lexicalized rule translation probabilities, language model scores, rule counts, etc. [sent-224, score-0.705]

81 In this study, we induce 16 latent categories for both X and B nonterminals. [sent-234, score-0.323]

82 Our approach identifies ∼180k unique tag sequences fpoprr othaec English iseisde ∼ ∼of1 phrase pairs itang g bo stehtasks. [sent-235, score-0.523]

83 As shown by the examples in Table 2, the syntactic feature vector representation is able to identify similar and dissimilar tag sequences. [sent-236, score-0.368]

84 Notice that our latent categories are learned automatically to maximize the likelihood of the training forests extracted based on alignment and are not explicitly instructed to discriminate between syntactically different tag sequences. [sent-238, score-0.649]

85 Our approach is not guaranteed to always assign similar feature vectors to syntactically similar tag sequences. [sent-239, score-0.311]

86 However, as the experimental results show below, the latent categories are able to capture some similarities among tag sequences that are beneficial for translation. [sent-240, score-0.506]

87 Therefore, there is more potential gain from using syntax features to rule out unlikely derivations of longer sentences, while phrasal rules might be adequate for shorter sentences, leaving less room for syntax to help as in the case of the English-to-Chinese task. [sent-264, score-0.443]

88 7 Discussions The incorporation of the syntactic feature into the hierarchical phrase-based translation system also brings in additional memory load and computational cost. [sent-265, score-0.491]

89 In the worst case, our approach requires storing one feature vector for each tag sequence and one feature vector for each nonterminal of a SCFG rule, with the latter taking the majority of the extra memory storage. [sent-266, score-0.435]

90 We observed that about 90% of the X nonterminals in the rules only have one tag sequence, and thus the required memory space can be significantly reduced by only storing a pointer to the feature vector of the tag sequence for these nonterminals. [sent-267, score-0.534]

91 Our approach also requires computing one dot-product of two feature vectors for each nonterminal when a SCFG rule is applied to a source span. [sent-268, score-0.43]

92 There are other successful investigations to impose soft syntactic constraints to hierarchical phrase-based models by either introducing syntaxbased rule features such as the prior derivation model of Zhou et al. [sent-277, score-0.76]

93 This work is an initial effort to investigate latent syntactic categories to enhance hierarchical phrasebased translation models, and there are many directions to continue this line of research. [sent-284, score-0.737]

94 In this case, target side parse trees could also be used alone or together with the source side parse trees to induce the latent syntactic categories. [sent-286, score-0.719]

95 Third, in addition to the treebank categories obtained by syntactic parsing, lexical cues directly available in 146 sentence pairs could also to explored to guide the learning of latent categories. [sent-288, score-0.6]

96 Last but not the least, it would be interesting to investigate discriminative training approaches to learn latent categories that directly optimize on translation quality. [sent-289, score-0.408]

97 8 Conclusion We have presented a novel approach to enhance hierarchical phrase-based machine translation systems with real-valued linguistically motivated feature vectors. [sent-290, score-0.399]

98 Our approach maintains the advantages of hierarchical phrase-based translation systems while at the same time naturally incorporates soft syntactic constraints. [sent-291, score-0.586]

99 We will continue this line of research and exploit better ways to learn syntax and apply syntactic constraints to machine translation. [sent-293, score-0.321]

100 Prior derivation models for formally syntax-based translation using linguistically syntactic parsing and tree kernels. [sent-423, score-0.426]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('ts', 0.256), ('ux', 0.255), ('scfg', 0.241), ('jj', 0.225), ('tight', 0.214), ('nn', 0.184), ('phrase', 0.181), ('hierarchical', 0.18), ('nonterminal', 0.168), ('decomposition', 0.16), ('latent', 0.154), ('adjp', 0.151), ('syntactic', 0.149), ('tag', 0.137), ('rule', 0.134), ('categories', 0.131), ('dt', 0.128), ('translation', 0.123), ('pairs', 0.121), ('syntax', 0.113), ('forests', 0.106), ('huang', 0.105), ('soft', 0.099), ('chiang', 0.094), ('forest', 0.091), ('maximal', 0.089), ('vb', 0.089), ('syntactically', 0.086), ('nonterminals', 0.086), ('np', 0.085), ('sequences', 0.084), ('rules', 0.083), ('pair', 0.081), ('side', 0.077), ('pout', 0.075), ('vywz', 0.075), ('binarized', 0.073), ('sub', 0.071), ('tree', 0.064), ('xiong', 0.064), ('pen', 0.064), ('parallel', 0.061), ('constraints', 0.059), ('pts', 0.058), ('zhongqiang', 0.058), ('node', 0.058), ('linguistically', 0.057), ('syntaxbased', 0.056), ('wz', 0.056), ('parse', 0.055), ('venugopal', 0.054), ('zollmann', 0.054), ('sequence', 0.052), ('impose', 0.05), ('vp', 0.05), ('rb', 0.049), ('vectors', 0.049), ('parsed', 0.049), ('abstracted', 0.048), ('treebank', 0.045), ('emission', 0.045), ('synchronous', 0.044), ('ui', 0.044), ('pl', 0.043), ('dissimilar', 0.043), ('symbol', 0.043), ('hierarchy', 0.043), ('mi', 0.042), ('grammars', 0.042), ('production', 0.041), ('mary', 0.041), ('xi', 0.04), ('source', 0.04), ('feature', 0.039), ('induce', 0.038), ('cn', 0.038), ('brochure', 0.038), ('heber', 0.038), ('hgive', 0.038), ('hyperedge', 0.038), ('intj', 0.038), ('nonmaximal', 0.038), ('tsm', 0.038), ('harper', 0.038), ('emitted', 0.038), ('trees', 0.037), ('reading', 0.037), ('och', 0.036), ('maintains', 0.035), ('boundary', 0.035), ('alignment', 0.035), ('cd', 0.034), ('pin', 0.034), ('galley', 0.034), ('derivation', 0.033), ('vy', 0.032), ('deyi', 0.032), ('mismatched', 0.032), ('uv', 0.032), ('chain', 0.032)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 1.0000004 98 emnlp-2010-Soft Syntactic Constraints for Hierarchical Phrase-Based Translation Using Latent Syntactic Distributions

Author: Zhongqiang Huang ; Martin Cmejrek ; Bowen Zhou

Abstract: In this paper, we present a novel approach to enhance hierarchical phrase-based machine translation systems with linguistically motivated syntactic features. Rather than directly using treebank categories as in previous studies, we learn a set of linguistically-guided latent syntactic categories automatically from a source-side parsed, word-aligned parallel corpus, based on the hierarchical structure among phrase pairs as well as the syntactic structure of the source side. In our model, each X nonterminal in a SCFG rule is decorated with a real-valued feature vector computed based on its distribution of latent syntactic categories. These feature vectors are utilized at decod- ing time to measure the similarity between the syntactic analysis of the source side and the syntax of the SCFG rules that are applied to derive translations. Our approach maintains the advantages of hierarchical phrase-based translation systems while at the same time naturally incorporates soft syntactic constraints.

2 0.26191053 57 emnlp-2010-Hierarchical Phrase-Based Translation Grammars Extracted from Alignment Posterior Probabilities

Author: Adria de Gispert ; Juan Pino ; William Byrne

Abstract: We report on investigations into hierarchical phrase-based translation grammars based on rules extracted from posterior distributions over alignments of the parallel text. Rather than restrict rule extraction to a single alignment, such as Viterbi, we instead extract rules based on posterior distributions provided by the HMM word-to-word alignmentmodel. We define translation grammars progressively by adding classes of rules to a basic phrase-based system. We assess these grammars in terms of their expressive power, measured by their ability to align the parallel text from which their rules are extracted, and the quality of the translations they yield. In Chinese-to-English translation, we find that rule extraction from posteriors gives translation improvements. We also find that grammars with rules with only one nonterminal, when extracted from posteri- ors, can outperform more complex grammars extracted from Viterbi alignments. Finally, we show that the best way to exploit source-totarget and target-to-source alignment models is to build two separate systems and combine their output translation lattices.

3 0.24417637 86 emnlp-2010-Non-Isomorphic Forest Pair Translation

Author: Hui Zhang ; Min Zhang ; Haizhou Li ; Eng Siong Chng

Abstract: This paper studies two issues, non-isomorphic structure translation and target syntactic structure usage, for statistical machine translation in the context of forest-based tree to tree sequence translation. For the first issue, we propose a novel non-isomorphic translation framework to capture more non-isomorphic structure mappings than traditional tree-based and tree-sequence-based translation methods. For the second issue, we propose a parallel space searching method to generate hypothesis using tree-to-string model and evaluate its syntactic goodness using tree-to-tree/tree sequence model. This not only reduces the search complexity by merging spurious-ambiguity translation paths and solves the data sparseness issue in training, but also serves as a syntax-based target language model for better grammatical generation. Experiment results on the benchmark data show our proposed two solutions are very effective, achieving significant performance improvement over baselines when applying to different translation models.

4 0.15882677 94 emnlp-2010-SCFG Decoding Without Binarization

Author: Mark Hopkins ; Greg Langmead

Abstract: Conventional wisdom dictates that synchronous context-free grammars (SCFGs) must be converted to Chomsky Normal Form (CNF) to ensure cubic time decoding. For arbitrary SCFGs, this is typically accomplished via the synchronous binarization technique of (Zhang et al., 2006). A drawback to this approach is that it inflates the constant factors associated with decoding, and thus the practical running time. (DeNero et al., 2009) tackle this problem by defining a superset of CNF called Lexical Normal Form (LNF), which also supports cubic time decoding under certain implicit assumptions. In this paper, we make these assumptions explicit, and in doing so, show that LNF can be further expanded to a broader class of grammars (called “scope3”) that also supports cubic-time decoding. By simply pruning non-scope-3 rules from a GHKM-extracted grammar, we obtain better translation performance than synchronous binarization.

5 0.14736709 42 emnlp-2010-Efficient Incremental Decoding for Tree-to-String Translation

Author: Liang Huang ; Haitao Mi

Abstract: Syntax-based translation models should in principle be efficient with polynomially-sized search space, but in practice they are often embarassingly slow, partly due to the cost of language model integration. In this paper we borrow from phrase-based decoding the idea to generate a translation incrementally left-to-right, and show that for tree-to-string models, with a clever encoding of derivation history, this method runs in averagecase polynomial-time in theory, and lineartime with beam search in practice (whereas phrase-based decoding is exponential-time in theory and quadratic-time in practice). Experiments show that, with comparable translation quality, our tree-to-string system (in Python) can run more than 30 times faster than the phrase-based system Moses (in C++).

6 0.13972357 106 emnlp-2010-Top-Down Nearly-Context-Sensitive Parsing

7 0.12354478 76 emnlp-2010-Maximum Entropy Based Phrase Reordering for Hierarchical Phrase-Based Translation

8 0.12307151 50 emnlp-2010-Facilitating Translation Using Source Language Paraphrase Lattices

9 0.12159118 99 emnlp-2010-Statistical Machine Translation with a Factorized Grammar

10 0.11458925 47 emnlp-2010-Example-Based Paraphrasing for Improved Phrase-Based Statistical Machine Translation

11 0.11371531 36 emnlp-2010-Discriminative Word Alignment with a Function Word Reordering Model

12 0.10829654 96 emnlp-2010-Self-Training with Products of Latent Variable Grammars

13 0.10419252 116 emnlp-2010-Using Universal Linguistic Knowledge to Guide Grammar Induction

14 0.10382172 97 emnlp-2010-Simple Type-Level Unsupervised POS Tagging

15 0.10332266 18 emnlp-2010-Assessing Phrase-Based Translation Models with Oracle Decoding

16 0.10204995 33 emnlp-2010-Cross Language Text Classification by Model Translation and Semi-Supervised Learning

17 0.098864906 114 emnlp-2010-Unsupervised Parse Selection for HPSG

18 0.098641448 39 emnlp-2010-EMNLP 044

19 0.096495286 118 emnlp-2010-Utilizing Extra-Sentential Context for Parsing

20 0.093751237 88 emnlp-2010-On Dual Decomposition and Linear Programming Relaxations for Natural Language Processing


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.338), (1, -0.139), (2, 0.253), (3, -0.06), (4, 0.177), (5, 0.028), (6, 0.014), (7, -0.017), (8, -0.139), (9, -0.065), (10, 0.14), (11, -0.112), (12, -0.048), (13, -0.044), (14, -0.083), (15, -0.042), (16, -0.121), (17, -0.063), (18, 0.034), (19, 0.12), (20, -0.081), (21, -0.044), (22, 0.07), (23, -0.127), (24, -0.021), (25, -0.066), (26, 0.008), (27, 0.106), (28, 0.057), (29, 0.011), (30, 0.058), (31, -0.016), (32, -0.144), (33, -0.022), (34, -0.017), (35, 0.069), (36, 0.013), (37, -0.08), (38, -0.086), (39, -0.074), (40, 0.109), (41, 0.023), (42, -0.061), (43, 0.077), (44, 0.032), (45, 0.088), (46, -0.026), (47, 0.056), (48, -0.032), (49, 0.017)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96595562 98 emnlp-2010-Soft Syntactic Constraints for Hierarchical Phrase-Based Translation Using Latent Syntactic Distributions

Author: Zhongqiang Huang ; Martin Cmejrek ; Bowen Zhou

Abstract: In this paper, we present a novel approach to enhance hierarchical phrase-based machine translation systems with linguistically motivated syntactic features. Rather than directly using treebank categories as in previous studies, we learn a set of linguistically-guided latent syntactic categories automatically from a source-side parsed, word-aligned parallel corpus, based on the hierarchical structure among phrase pairs as well as the syntactic structure of the source side. In our model, each X nonterminal in a SCFG rule is decorated with a real-valued feature vector computed based on its distribution of latent syntactic categories. These feature vectors are utilized at decod- ing time to measure the similarity between the syntactic analysis of the source side and the syntax of the SCFG rules that are applied to derive translations. Our approach maintains the advantages of hierarchical phrase-based translation systems while at the same time naturally incorporates soft syntactic constraints.

2 0.74662471 94 emnlp-2010-SCFG Decoding Without Binarization

Author: Mark Hopkins ; Greg Langmead

Abstract: Conventional wisdom dictates that synchronous context-free grammars (SCFGs) must be converted to Chomsky Normal Form (CNF) to ensure cubic time decoding. For arbitrary SCFGs, this is typically accomplished via the synchronous binarization technique of (Zhang et al., 2006). A drawback to this approach is that it inflates the constant factors associated with decoding, and thus the practical running time. (DeNero et al., 2009) tackle this problem by defining a superset of CNF called Lexical Normal Form (LNF), which also supports cubic time decoding under certain implicit assumptions. In this paper, we make these assumptions explicit, and in doing so, show that LNF can be further expanded to a broader class of grammars (called “scope3”) that also supports cubic-time decoding. By simply pruning non-scope-3 rules from a GHKM-extracted grammar, we obtain better translation performance than synchronous binarization.

3 0.72917056 86 emnlp-2010-Non-Isomorphic Forest Pair Translation

Author: Hui Zhang ; Min Zhang ; Haizhou Li ; Eng Siong Chng

Abstract: This paper studies two issues, non-isomorphic structure translation and target syntactic structure usage, for statistical machine translation in the context of forest-based tree to tree sequence translation. For the first issue, we propose a novel non-isomorphic translation framework to capture more non-isomorphic structure mappings than traditional tree-based and tree-sequence-based translation methods. For the second issue, we propose a parallel space searching method to generate hypothesis using tree-to-string model and evaluate its syntactic goodness using tree-to-tree/tree sequence model. This not only reduces the search complexity by merging spurious-ambiguity translation paths and solves the data sparseness issue in training, but also serves as a syntax-based target language model for better grammatical generation. Experiment results on the benchmark data show our proposed two solutions are very effective, achieving significant performance improvement over baselines when applying to different translation models.

4 0.65824628 57 emnlp-2010-Hierarchical Phrase-Based Translation Grammars Extracted from Alignment Posterior Probabilities

Author: Adria de Gispert ; Juan Pino ; William Byrne

Abstract: We report on investigations into hierarchical phrase-based translation grammars based on rules extracted from posterior distributions over alignments of the parallel text. Rather than restrict rule extraction to a single alignment, such as Viterbi, we instead extract rules based on posterior distributions provided by the HMM word-to-word alignmentmodel. We define translation grammars progressively by adding classes of rules to a basic phrase-based system. We assess these grammars in terms of their expressive power, measured by their ability to align the parallel text from which their rules are extracted, and the quality of the translations they yield. In Chinese-to-English translation, we find that rule extraction from posteriors gives translation improvements. We also find that grammars with rules with only one nonterminal, when extracted from posteri- ors, can outperform more complex grammars extracted from Viterbi alignments. Finally, we show that the best way to exploit source-totarget and target-to-source alignment models is to build two separate systems and combine their output translation lattices.

5 0.56190854 76 emnlp-2010-Maximum Entropy Based Phrase Reordering for Hierarchical Phrase-Based Translation

Author: Zhongjun He ; Yao Meng ; Hao Yu

Abstract: Hierarchical phrase-based (HPB) translation provides a powerful mechanism to capture both short and long distance phrase reorderings. However, the phrase reorderings lack of contextual information in conventional HPB systems. This paper proposes a contextdependent phrase reordering approach that uses the maximum entropy (MaxEnt) model to help the HPB decoder select appropriate reordering patterns. We classify translation rules into several reordering patterns, and build a MaxEnt model for each pattern based on various contextual features. We integrate the MaxEnt models into the HPB model. Experimental results show that our approach achieves significant improvements over a standard HPB system on large-scale translation tasks. On Chinese-to-English translation, , the absolute improvements in BLEU (caseinsensitive) range from 1.2 to 2.1.

6 0.55094332 42 emnlp-2010-Efficient Incremental Decoding for Tree-to-String Translation

7 0.50411254 99 emnlp-2010-Statistical Machine Translation with a Factorized Grammar

8 0.48205462 106 emnlp-2010-Top-Down Nearly-Context-Sensitive Parsing

9 0.43817741 113 emnlp-2010-Unsupervised Induction of Tree Substitution Grammars for Dependency Parsing

10 0.41528064 105 emnlp-2010-Title Generation with Quasi-Synchronous Grammar

11 0.37852025 116 emnlp-2010-Using Universal Linguistic Knowledge to Guide Grammar Induction

12 0.37588552 118 emnlp-2010-Utilizing Extra-Sentential Context for Parsing

13 0.37486276 50 emnlp-2010-Facilitating Translation Using Source Language Paraphrase Lattices

14 0.35806635 36 emnlp-2010-Discriminative Word Alignment with a Function Word Reordering Model

15 0.35780096 5 emnlp-2010-A Hybrid Morpheme-Word Representation for Machine Translation of Morphologically Rich Languages

16 0.35527635 47 emnlp-2010-Example-Based Paraphrasing for Improved Phrase-Based Statistical Machine Translation

17 0.34890774 96 emnlp-2010-Self-Training with Products of Latent Variable Grammars

18 0.33266947 29 emnlp-2010-Combining Unsupervised and Supervised Alignments for MT: An Empirical Study

19 0.32960591 114 emnlp-2010-Unsupervised Parse Selection for HPSG

20 0.32097855 18 emnlp-2010-Assessing Phrase-Based Translation Models with Oracle Decoding


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(3, 0.012), (10, 0.014), (12, 0.033), (29, 0.151), (30, 0.032), (32, 0.019), (52, 0.081), (56, 0.069), (66, 0.073), (72, 0.042), (76, 0.055), (77, 0.018), (79, 0.011), (83, 0.012), (87, 0.039), (96, 0.26)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.81638873 98 emnlp-2010-Soft Syntactic Constraints for Hierarchical Phrase-Based Translation Using Latent Syntactic Distributions

Author: Zhongqiang Huang ; Martin Cmejrek ; Bowen Zhou

Abstract: In this paper, we present a novel approach to enhance hierarchical phrase-based machine translation systems with linguistically motivated syntactic features. Rather than directly using treebank categories as in previous studies, we learn a set of linguistically-guided latent syntactic categories automatically from a source-side parsed, word-aligned parallel corpus, based on the hierarchical structure among phrase pairs as well as the syntactic structure of the source side. In our model, each X nonterminal in a SCFG rule is decorated with a real-valued feature vector computed based on its distribution of latent syntactic categories. These feature vectors are utilized at decod- ing time to measure the similarity between the syntactic analysis of the source side and the syntax of the SCFG rules that are applied to derive translations. Our approach maintains the advantages of hierarchical phrase-based translation systems while at the same time naturally incorporates soft syntactic constraints.

2 0.58381349 11 emnlp-2010-A Semi-Supervised Approach to Improve Classification of Infrequent Discourse Relations Using Feature Vector Extension

Author: Hugo Hernault ; Danushka Bollegala ; Mitsuru Ishizuka

Abstract: Several recent discourse parsers have employed fully-supervised machine learning approaches. These methods require human annotators to beforehand create an extensive training corpus, which is a time-consuming and costly process. On the other hand, unlabeled data is abundant and cheap to collect. In this paper, we propose a novel semi-supervised method for discourse relation classification based on the analysis of cooccurring features in unlabeled data, which is then taken into account for extending the feature vectors given to a classifier. Our experimental results on the RST Discourse Treebank corpus and Penn Discourse Treebank indicate that the proposed method brings a significant improvement in classification accuracy and macro-average F-score when small training datasets are used. For instance, with training sets of c.a. 1000 labeled instances, the proposed method brings improvements in accuracy and macro-average F-score up to 50% compared to a baseline classifier. We believe that the proposed method is a first step towards detecting low-occurrence relations, which is useful for domains with a lack of annotated data.

3 0.58299112 57 emnlp-2010-Hierarchical Phrase-Based Translation Grammars Extracted from Alignment Posterior Probabilities

Author: Adria de Gispert ; Juan Pino ; William Byrne

Abstract: We report on investigations into hierarchical phrase-based translation grammars based on rules extracted from posterior distributions over alignments of the parallel text. Rather than restrict rule extraction to a single alignment, such as Viterbi, we instead extract rules based on posterior distributions provided by the HMM word-to-word alignmentmodel. We define translation grammars progressively by adding classes of rules to a basic phrase-based system. We assess these grammars in terms of their expressive power, measured by their ability to align the parallel text from which their rules are extracted, and the quality of the translations they yield. In Chinese-to-English translation, we find that rule extraction from posteriors gives translation improvements. We also find that grammars with rules with only one nonterminal, when extracted from posteri- ors, can outperform more complex grammars extracted from Viterbi alignments. Finally, we show that the best way to exploit source-totarget and target-to-source alignment models is to build two separate systems and combine their output translation lattices.

4 0.57454109 94 emnlp-2010-SCFG Decoding Without Binarization

Author: Mark Hopkins ; Greg Langmead

Abstract: Conventional wisdom dictates that synchronous context-free grammars (SCFGs) must be converted to Chomsky Normal Form (CNF) to ensure cubic time decoding. For arbitrary SCFGs, this is typically accomplished via the synchronous binarization technique of (Zhang et al., 2006). A drawback to this approach is that it inflates the constant factors associated with decoding, and thus the practical running time. (DeNero et al., 2009) tackle this problem by defining a superset of CNF called Lexical Normal Form (LNF), which also supports cubic time decoding under certain implicit assumptions. In this paper, we make these assumptions explicit, and in doing so, show that LNF can be further expanded to a broader class of grammars (called “scope3”) that also supports cubic-time decoding. By simply pruning non-scope-3 rules from a GHKM-extracted grammar, we obtain better translation performance than synchronous binarization.

5 0.56532335 89 emnlp-2010-PEM: A Paraphrase Evaluation Metric Exploiting Parallel Texts

Author: Chang Liu ; Daniel Dahlmeier ; Hwee Tou Ng

Abstract: We present PEM, the first fully automatic metric to evaluate the quality of paraphrases, and consequently, that of paraphrase generation systems. Our metric is based on three criteria: adequacy, fluency, and lexical dissimilarity. The key component in our metric is a robust and shallow semantic similarity measure based on pivot language N-grams that allows us to approximate adequacy independently of lexical similarity. Human evaluation shows that PEM achieves high correlation with human judgments.

6 0.56433201 78 emnlp-2010-Minimum Error Rate Training by Sampling the Translation Lattice

7 0.56140208 87 emnlp-2010-Nouns are Vectors, Adjectives are Matrices: Representing Adjective-Noun Constructions in Semantic Space

8 0.55826968 18 emnlp-2010-Assessing Phrase-Based Translation Models with Oracle Decoding

9 0.55504823 105 emnlp-2010-Title Generation with Quasi-Synchronous Grammar

10 0.55311865 7 emnlp-2010-A Mixture Model with Sharing for Lexical Semantics

11 0.55308962 42 emnlp-2010-Efficient Incremental Decoding for Tree-to-String Translation

12 0.55305243 116 emnlp-2010-Using Universal Linguistic Knowledge to Guide Grammar Induction

13 0.55245024 65 emnlp-2010-Inducing Probabilistic CCG Grammars from Logical Form with Higher-Order Unification

14 0.54793119 77 emnlp-2010-Measuring Distributional Similarity in Context

15 0.5449608 86 emnlp-2010-Non-Isomorphic Forest Pair Translation

16 0.54483098 29 emnlp-2010-Combining Unsupervised and Supervised Alignments for MT: An Empirical Study

17 0.54429531 34 emnlp-2010-Crouching Dirichlet, Hidden Markov Model: Unsupervised POS Tagging with Context Local Tag Generation

18 0.54227781 99 emnlp-2010-Statistical Machine Translation with a Factorized Grammar

19 0.54143441 96 emnlp-2010-Self-Training with Products of Latent Variable Grammars

20 0.53996879 52 emnlp-2010-Further Meta-Evaluation of Broad-Coverage Surface Realization