emnlp emnlp2011 emnlp2011-108 knowledge-graph by maker-knowledge-mining

108 emnlp-2011-Quasi-Synchronous Phrase Dependency Grammars for Machine Translation


Source: pdf

Author: Kevin Gimpel ; Noah A. Smith

Abstract: We present a quasi-synchronous dependency grammar (Smith and Eisner, 2006) for machine translation in which the leaves of the tree are phrases rather than words as in previous work (Gimpel and Smith, 2009). This formulation allows us to combine structural components of phrase-based and syntax-based MT in a single model. We describe a method of extracting phrase dependencies from parallel text using a target-side dependency parser. For decoding, we describe a coarse-to-fine approach based on lattice dependency parsing of phrase lattices. We demonstrate performance improvements for Chinese-English and UrduEnglish translation over a phrase-based baseline. We also investigate the use of unsupervised dependency parsers, reporting encouraging preliminary results.

Reference: text


Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 edu ,na Abstract We present a quasi-synchronous dependency grammar (Smith and Eisner, 2006) for machine translation in which the leaves of the tree are phrases rather than words as in previous work (Gimpel and Smith, 2009). [sent-4, score-0.684]

2 We describe a method of extracting phrase dependencies from parallel text using a target-side dependency parser. [sent-6, score-0.692]

3 For decoding, we describe a coarse-to-fine approach based on lattice dependency parsing of phrase lattices. [sent-7, score-0.979]

4 We also investigate the use of unsupervised dependency parsers, reporting encouraging preliminary results. [sent-9, score-0.357]

5 We propose a model in which phrases are organized into a tree structure inspired by dependency 474 syntax. [sent-20, score-0.479]

6 Instead of standard dependency trees in which words are vertices, our trees have phrases as vertices. [sent-21, score-0.487]

7 We describe a simple heuristic to extract phrase dependencies from an aligned parallel cor- pus parsed on the target side, and use them to compute target-side tree features. [sent-22, score-0.563]

8 We define additional string-to-tree features and, if a source-side dependency parser is available, tree-to-tree features to capture properties of how phrase dependencies interact with reordering. [sent-23, score-0.832]

9 The decoder involves generating a phrase lattice (Ueffing et al. [sent-26, score-0.613]

10 , 2002) in a coarse pass using a phrase-based model, followed by lattice dependency parsing of the phrase lattice. [sent-27, score-1.129]

11 This approach allows us to feasibly explore the combined search space of segmentations, phrase alignments, and target phrase dependency trees. [sent-28, score-0.877]

12 We also describe experiments in which we replace supervised dependency parsers with unsupervised parsers, reporting promising results: using a supervised Chinese parser and a state-of-the-art unsupervised English parser provides our best results, giving an averaged gain of +0. [sent-32, score-0.58]

13 2 Related Work We previously applied quasi-synchronous grammar to machine translation (Gimpel and Smith, 2009), but that system performed translation fundamentally at the word level. [sent-37, score-0.322]

14 Aside from QG, there have been many efforts to use dependency syntax in machine translation. [sent-40, score-0.398]

15 (2005) used a source-side dependency parser and projected automatic parses across word alignments in order to model dependency syntax on phrase pairs. [sent-42, score-0.975]

16 (2008) presented an extension to Hiero (Chiang, 2005) in which rules have target-side dependency syntax and therefore enable the use of a dependency language model. [sent-44, score-0.655]

17 More recently, researchers have sought the benefits of dependency syntax while preserving the advantages of phrase-based models, such as efficiency and coverage. [sent-45, score-0.351]

18 Galley and Manning (2009) loosened standard assumptions about dependency parsing so that the efficient left-to-right decoding procedure of phrase-based translation could be retained while a dependency language model is incorporated. [sent-46, score-0.848]

19 Carreras and Collins (2009) presented a string-todependency system that permits non-projective dependency trees (thereby allowing a larger space of translations) and use a rule extraction procedure that includes rules for every phrase in the phrase table. [sent-47, score-0.854]

20 We take an additional step in this direction by working with dependency grammars on the phrases themselves, thereby bringing together the structural components of phrase-based and dependency-based MT in a single model. [sent-48, score-0.445]

21 Quasi-synchronous grammar makes no restrictions on the form of the target monolingual grammar, though dependency grammars have been used in most previous applications of QG (Wang et al. [sent-57, score-0.499]

22 We previously presented a word-based machine translation model based on a quasi-synchronous dependency θ grammar. [sent-59, score-0.421]

23 Therefore, we use a dependency grammar in which the leaves are phrases rather than words. [sent-61, score-0.491]

24 We define a phrase dependency grammar as a model p(φ, τφ |t) over the joint space of segmentations of a sen|tt)en ocvee rin ttohe phrases paancde dependency trees on the phrases. [sent-62, score-1.138]

25 2 Phrase dependency grammars sis of the problem of intersecting phrase-based and hierarchical translation models, but do not provide experimental results. [sent-63, score-0.509]

26 When used for translation modeling, they allow us to capture phenomena like local reordering and idiomatic translations within each phrase as well as long-distance re- lationships among the phrases in a sentence. [sent-67, score-0.597]

27 We then define a quasi-synchronous phrase dependency grammar (QPDG) as a conditional model p(t, γ, φ, τφ, a | s, τs) that induces a probabilistic monolingual phrase dependency grammar over sentences inspired by the source sentence and (lexical) dependency tree. [sent-68, score-1.664]

28 The source and target sentences are segmented into phrases and the phrases are aligned in a one-to-one alignment. [sent-69, score-0.331]

29 However, we never commit to a source phrase dependency tree, instead using a source lexical dependency tree output by a dependency parser, so our alignment variable a is a function from target tree nodes (phrases in φ) to source phrases in γ, which might not be source tree nodes. [sent-72, score-1.903]

30 The features in our model may consider a large number of source phrase dependency trees as long as they are consistent with τs . [sent-73, score-0.722]

31 , 2007), including four phrase table probability features, a phrase penalty feature, an n-gram language model, a distortion cost, six lexicalized reordering features, and a word penalty feature. [sent-75, score-0.642]

32 We now describe in detail the additional features 476 2 words in one of the phrases (dependencies in which one phrase is entirely punctuation are not shown). [sent-76, score-0.407]

33 in our model that are used to score phrase dependency trees. [sent-78, score-0.558]

34 , 2008; Galley and Manning, 2009), though unlike previous work our features model both the phrase segmentation and dependency structure. [sent-85, score-0.653]

35 However, there do not currently exist treebanks with annotated phrase “syn- phrase “made up” for each direction, sorted by the conditional probability of the child phrase given the parent phrase and direction. [sent-87, score-1.09]

36 Our solution is to use a standard supervised dependency parser and extract phrase dependencies using bilingual information. [sent-89, score-0.724]

37 Given the set of extracted phrase pairs for a sentence, denote by W the set of unique target-side phrases among them. [sent-92, score-0.353]

38 We parse the target sentence with a dependency parser and, for each pair of phrases u, v ∈ W, we extract a phrase dependency (along sw uit,hv i t∈s direction) itrfa u a an dp v dsoe not overlap and there is at least one lexical dependency between a word in u and a word in v. [sent-93, score-1.463]

39 If there are lexical dependencies in both directions, we extract a phrase dependency only for the single longest one. [sent-94, score-0.737]

40 Since we use a projective dependency parser, the longest lexical dependency between two phrases is guaranteed to be unique. [sent-95, score-0.786]

41 Table 2 shows a listing of the most frequent phrase dependencies extracted (lexical dependencies are omitted). [sent-96, score-0.454]

42 We note that during training we never explicitly commit to any single phrase dependency tree for a target sentence. [sent-97, score-0.739]

43 Rather, we extract phrase dependencies from all phrase dependency trees consistent with the word alignments and the lexical dependency tree. [sent-98, score-1.291]

44 Thus we treat phrase dependency trees analogously to phrase segmentations in standard phrase extraction. [sent-99, score-1.155]

45 (2009) used a shallow parser to convert lexical dependencies from a dependency parser into phrase dependencies. [sent-102, score-0.823]

46 477 phrase dependencies of the form hu, v, di, where u risa tehe d hpeeandd phrase, v hise t fhoer mch hilud, phrase, haenrde d ∈ {left, right} is the direction, we then estimate dco ∈nd{i tlieonfta,lr probabilities p(v|u, d) using er enla etisvtiem fraetequency ensatilm praotiboanb. [sent-103, score-0.354]

47 (1) where d(i) = I[τφ(i) i> 0] is the direction of the dependency arc. [sent-109, score-0.304]

48 4 The max expression protects unseen parent-child phrase dependencies from causing the score to be negative infinity. [sent-111, score-0.354]

49 Our motivation is a desire for the features to be used to prefer one derivation over another but not to rule out a deriva− tion completely if it merely happens to contain a dependency unobserved in the training data. [sent-112, score-0.398]

50 Whenever we extract a phrase dependency, we extract the longest lexical dependency contained within it. [sent-115, score-0.637]

51 For all hparent, child, directioni lexical dependency tuples hx, y, di, we desitriemctaioten ico lnedxii-tciaolna dle probabilities plex (y|x, d) ,f wrome e tshtiem parsed corpus using relative frequency de)st firmoamtio tnh. [sent-116, score-0.373]

52 e Then, f coor a phrase dependency with longest lexical dependency hx, y, di, we add a feature for plex (y|x, d) to the model, using a fdor amu felaa suirmeil faorr t op Eq. [sent-117, score-1.015]

53 dD)if tfoer tehnet instances of a phrase dependency may have different lexical dependencies extracted with them. [sent-119, score-0.691]

54 We add the lexical weight for the most frequent, breaking ties by choosing the lexical dependency that maximizes p(y|x, d), as was also done by Koehn et al. [sent-120, score-0.37]

55 In all, we include 4 target tree features: one for phrase dependencies, one for lexical dependencies, 4The reasoning here is that whenever we use a phrase dependency that we have observed in the training data, we want to boost the score of the translation. [sent-122, score-1.026]

56 If we used log-probabilities, each observed dependency would incur a penalty. [sent-123, score-0.304]

57 2 String-to-Tree Configurations We consider features that count instances of reordering configurations involving phrase dependencies. [sent-127, score-0.54]

58 For example, when building a parent-child phrase dependency with the child to the left, one feature value is incremented if their aligned source-side phrases are in the same order. [sent-129, score-0.695]

59 We begin with features for each of the quasi-synchronous configurations from Smith and Eisner (2006), adapted to phrase dependency grammars. [sent-134, score-0.752]

60 6We actually include two versions of each configuration feature other than “root-root”: one for the source phrases being in the same order as the target phrases and one for them being swapped. [sent-140, score-0.476]

61 Given a pair of source words, one with index j in source phrase a(τφ(i)) and the other with index k in source phrase a(i), we have a parentchild configuration if τs (k) = j; if τs (j) = k, a child-parent configuration is present. [sent-143, score-0.926]

62 Therefore, only one configuration feature fires for each phrase dependency attachment. [sent-148, score-0.703]

63 Finally, we include features that consider the dependency path distance between phrases in the source-side dependency tree that are aligned to parent-child pairs in τφ. [sent-149, score-0.912]

64 We include a feature that sums, for each target phrase i, the inverse of the minimum undirected path length between each word in a(i) and each word in τφ(a(i)). [sent-150, score-0.507]

65 The minimum undirected path length is defined as the number of dependency arcs that must be crossed to travel from one word to the other in τs. [sent-151, score-0.532]

66 We follow Gimpel and Smith (2009) in constructing a lattice to represent Gs,τs and using lattice parsing to search for the best derivation, but we construct the lattice differently and employ a coarse-tofine strategy (Petrov, 2009) to speed up decoding. [sent-158, score-1.139]

67 It has become common in recent years for MT researchers to exploit efficient data structures for encoding concise representations of the pruned search space of the model, such as phrase lattices for phrase-based MT (Ueffing et al. [sent-159, score-0.375]

68 Each edge in a phrase lattice corresponds to a phrase pair and each path through the lattice corresponds to a tuple ht, γ, φ, ai for the input s. [sent-163, score-1.356]

69 To also maximize over τφ, we perform lattice dependency parsing, which allows us to search over the space of tuples ht, γ, φ, a, τφi . [sent-166, score-0.663]

70 Given the lattice and Gs,τs , tluapttilcees parsing ,isa a straightforward generalization of the standard arc-factored dynamic programming algorithm from Eisner (1996). [sent-167, score-0.421]

71 The lattice parsing algorithm requires O(E2V ) time and O(E2 + V E) space, where E is the number of edges in the lattice and V is the number of nodes. [sent-168, score-0.852]

72 7 Typical phrase lattices might easily contain tens of thousands of nodes and edges, making exact search prohibitively expensive for all but the smallest lattices. [sent-169, score-0.375]

73 Pass 1: Lattice Pruning After generating phrase lattices using a phrase-based MT system, we prune lattice edges using forward-backward pruning (Sixtus and Ortmanns, 1999), which has also been used in previous work using phrase lattices (Tromble et al. [sent-173, score-1.227]

74 This pruning method computes the maxmarginal for each lattice edge, which is the score of the best full path that uses that edge. [sent-175, score-0.48]

75 Max-marginals 7To prevent confusion, we use the term edge to refer to a phrase lattice edge and arc to refer to a parent-child dependency in the phrase dependency tree. [sent-176, score-1.585]

76 479 offer the advantage that the best path in the lattice is preserved during pruning. [sent-177, score-0.434]

77 Pass 2: Parent Ranking Given a pruned lattice, we then remove some candidate dependency arcs from consideration. [sent-181, score-0.382]

78 It is common in dependency parsing to use a coarse model to rank the top k parents for each word, and to only consider these during parsing (Martins et al. [sent-182, score-0.531]

79 Unlike string parsing, our phrase lattices impose several types of constraints on allowable arcs. [sent-184, score-0.375]

80 For example, each node in the phrase lattice is annotated with a coverage vector—a bit vector indicating which words in the source sentence have been translated—which implies a topological ordering of the nodes. [sent-185, score-0.681]

81 This algorithm also tells us whether each edge is reachable from each other edge, allowing us to immediately prune dependency arcs between edges that are unreachable from each other. [sent-187, score-0.509]

82 In lattice parsing, however, most lattice edges will not be assigned any parent. [sent-190, score-0.79]

83 Certain lattice edges are much more likely to be contained within paths, so we allow some edges to have more candidate parent edges than others. [sent-191, score-0.649]

84 We introduce hyperparameters α, β, and to denote, respectively, the minimum, maximum, and average number of parent edges to be considered for each lattice edge (α ≤ ≤ β). [sent-192, score-0.56]

85 scores (using the QPDG features and their weights ψ) and choose the top µE of these arcs while ensuring that each edge has at least α and at most β potential parent edges. [sent-194, score-0.315]

86 We weight agenda items by the sum of their scores and the Floyd-Warshall best path scores both from the start node of the lattice to the beginning of the item and the end of the item to any final node. [sent-200, score-0.541]

87 We first use MERT to train parameters for the coarse phrase-based model used to generate phrase lattices. [sent-212, score-0.357]

88 We initialize λ to the default Moses feature weights and for ψ we initialize the two target phrase dependency weights to 0. [sent-215, score-0.769]

89 We used this baseline Moses system to generate phrase lattices for our system, so our model includes all of the Moses features in addition to the Q MP oD seG s(T )+S2T+ 2T)3 M43. [sent-235, score-0.429]

90 We note that computing our features requires parsing the target (English) side of the parallel text, but not the source side. [sent-255, score-0.321]

91 An additional cluster was created for all other words; this allowed us to use phrase dependency cluster features even for out-ofvocabulary words. [sent-262, score-0.612]

92 We used a max phrase length of 7 when extracting phrase dependencies to match the max phrase length used in phrase extraction. [sent-263, score-1.116]

93 Approximately 87M unique phrase dependencies were extracted from the ZH-EN data and 7M from the UR-EN data. [sent-264, score-0.354]

94 will enhance peace efforts after palestinian election us to boost peace efforts after palestinian elections : bush : us set to boost peace : us to step up peace Figure 2: (a) Moses translation output along with γ, φ, and a. [sent-300, score-1.034]

95 An English gloss is shown above the Chinese sentence and above the gloss is shown the dependency parse from the Stanford parser. [sent-301, score-0.338]

96 Fortunately, unsupervised dependency grammar induction has improved substantially in recent years due to a flurry of recent research. [sent-307, score-0.445]

97 From the second set of features, we see that the model learns to favor producing dependency trees that are mostly isomorphic to the source tree, by favoring root-root and parent-child configurations at the expense of most others. [sent-352, score-0.554]

98 Unsupervised induction of tree substitution grammars for dependency parsing. [sent-386, score-0.422]

99 Concavity and initial- ization for unsupervised dependency grammar induction. [sent-527, score-0.445]

100 A new stringto-dependency machine translation algorithm with a target dependency language model. [sent-699, score-0.486]


similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('lattice', 0.359), ('dependency', 0.304), ('qpdg', 0.276), ('phrase', 0.254), ('gimpel', 0.18), ('moses', 0.161), ('smith', 0.145), ('configurations', 0.14), ('qg', 0.126), ('chinese', 0.123), ('lattices', 0.121), ('translation', 0.117), ('bush', 0.116), ('configuration', 0.107), ('coarse', 0.103), ('dependencies', 0.1), ('bleu', 0.099), ('phrases', 0.099), ('reordering', 0.092), ('eisner', 0.09), ('palestinian', 0.09), ('grammar', 0.088), ('mert', 0.087), ('galley', 0.087), ('peace', 0.083), ('mt', 0.08), ('arcs', 0.078), ('tree', 0.076), ('path', 0.075), ('ht', 0.074), ('parent', 0.074), ('tuning', 0.073), ('koehn', 0.072), ('edges', 0.072), ('agenda', 0.072), ('czdeax', 0.069), ('election', 0.069), ('lfb', 0.069), ('source', 0.068), ('parser', 0.066), ('och', 0.065), ('target', 0.065), ('gs', 0.062), ('parsing', 0.062), ('decoding', 0.061), ('fine', 0.058), ('zollmann', 0.056), ('edge', 0.055), ('features', 0.054), ('weights', 0.054), ('ueffing', 0.054), ('unsupervised', 0.053), ('pass', 0.047), ('efforts', 0.047), ('deneefe', 0.047), ('segmentations', 0.047), ('dmv', 0.047), ('syntax', 0.047), ('longest', 0.046), ('pruning', 0.046), ('elatt', 0.046), ('elections', 0.046), ('greunneer', 0.046), ('imce', 0.046), ('intersecting', 0.046), ('sixtus', 0.046), ('stre', 0.046), ('turboparser', 0.046), ('martins', 0.044), ('tromble', 0.044), ('grammars', 0.042), ('distortion', 0.042), ('trees', 0.042), ('segmentation', 0.041), ('boost', 0.04), ('derivation', 0.04), ('urdu', 0.04), ('commit', 0.04), ('minimum', 0.039), ('prepositional', 0.039), ('side', 0.038), ('parsers', 0.038), ('shen', 0.038), ('feature', 0.038), ('english', 0.038), ('treebank', 0.037), ('brown', 0.036), ('undirected', 0.036), ('plex', 0.036), ('auli', 0.036), ('items', 0.035), ('quirk', 0.035), ('translations', 0.035), ('parse', 0.034), ('heuristic', 0.034), ('parallel', 0.034), ('lexical', 0.033), ('army', 0.033), ('bergsma', 0.033), ('levy', 0.033)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.9999997 108 emnlp-2011-Quasi-Synchronous Phrase Dependency Grammars for Machine Translation

Author: Kevin Gimpel ; Noah A. Smith

Abstract: We present a quasi-synchronous dependency grammar (Smith and Eisner, 2006) for machine translation in which the leaves of the tree are phrases rather than words as in previous work (Gimpel and Smith, 2009). This formulation allows us to combine structural components of phrase-based and syntax-based MT in a single model. We describe a method of extracting phrase dependencies from parallel text using a target-side dependency parser. For decoding, we describe a coarse-to-fine approach based on lattice dependency parsing of phrase lattices. We demonstrate performance improvements for Chinese-English and UrduEnglish translation over a phrase-based baseline. We also investigate the use of unsupervised dependency parsers, reporting encouraging preliminary results.

2 0.22144899 5 emnlp-2011-A Fast Re-scoring Strategy to Capture Long-Distance Dependencies

Author: Anoop Deoras ; Tomas Mikolov ; Kenneth Church

Abstract: A re-scoring strategy is proposed that makes it feasible to capture more long-distance dependencies in the natural language. Two pass strategies have become popular in a number of recognition tasks such as ASR (automatic speech recognition), MT (machine translation) and OCR (optical character recognition). The first pass typically applies a weak language model (n-grams) to a lattice and the second pass applies a stronger language model to N best lists. The stronger language model is intended to capture more longdistance dependencies. The proposed method uses RNN-LM (recurrent neural network language model), which is a long span LM, to rescore word lattices in the second pass. A hill climbing method (iterative decoding) is proposed to search over islands of confusability in the word lattice. An evaluation based on Broadcast News shows speedups of 20 over basic N best re-scoring, and word error rate reduction of 8% (relative) on a highly competitive setup.

3 0.21592066 15 emnlp-2011-A novel dependency-to-string model for statistical machine translation

Author: Jun Xie ; Haitao Mi ; Qun Liu

Abstract: Dependency structure, as a first step towards semantics, is believed to be helpful to improve translation quality. However, previous works on dependency structure based models typically resort to insertion operations to complete translations, which make it difficult to specify ordering information in translation rules. In our model of this paper, we handle this problem by directly specifying the ordering information in head-dependents rules which represent the source side as head-dependents relations and the target side as strings. The head-dependents rules require only substitution operation, thus our model requires no heuristics or separate ordering models of the previous works to control the word order of translations. Large-scale experiments show that our model performs well on long distance reordering, and outperforms the state- of-the-art constituency-to-string model (+1.47 BLEU on average) and hierarchical phrasebased model (+0.46 BLEU on average) on two Chinese-English NIST test sets without resort to phrases or parse forest. For the first time, a source dependency structure based model catches up with and surpasses the state-of-theart translation models.

4 0.20995505 123 emnlp-2011-Soft Dependency Constraints for Reordering in Hierarchical Phrase-Based Translation

Author: Yang Gao ; Philipp Koehn ; Alexandra Birch

Abstract: Long-distance reordering remains one of the biggest challenges facing machine translation. We derive soft constraints from the source dependency parsing to directly address the reordering problem for the hierarchical phrasebased model. Our approach significantly improves Chinese–English machine translation on a large-scale task by 0.84 BLEU points on average. Moreover, when we switch the tuning function from BLEU to the LRscore which promotes reordering, we observe total improvements of 1.21 BLEU, 1.30 LRscore and 3.36 TER over the baseline. On average our approach improves reordering precision and recall by 6.9 and 0.3 absolute points, respectively, and is found to be especially effective for long-distance reodering.

5 0.18012381 136 emnlp-2011-Training a Parser for Machine Translation Reordering

Author: Jason Katz-Brown ; Slav Petrov ; Ryan McDonald ; Franz Och ; David Talbot ; Hiroshi Ichikawa ; Masakazu Seno ; Hideto Kazawa

Abstract: We propose a simple training regime that can improve the extrinsic performance of a parser, given only a corpus of sentences and a way to automatically evaluate the extrinsic quality of a candidate parse. We apply our method to train parsers that excel when used as part of a reordering component in a statistical machine translation system. We use a corpus of weakly-labeled reference reorderings to guide parser training. Our best parsers contribute significant improvements in subjective translation quality while their intrinsic attachment scores typically regress.

6 0.17179762 4 emnlp-2011-A Fast, Accurate, Non-Projective, Semantically-Enriched Parser

7 0.16472979 44 emnlp-2011-Domain Adaptation via Pseudo In-Domain Data Selection

8 0.16286625 95 emnlp-2011-Multi-Source Transfer of Delexicalized Dependency Parsers

9 0.16229418 125 emnlp-2011-Statistical Machine Translation with Local Language Models

10 0.15885092 13 emnlp-2011-A Word Reordering Model for Improved Machine Translation

11 0.15211105 75 emnlp-2011-Joint Models for Chinese POS Tagging and Dependency Parsing

12 0.15108024 50 emnlp-2011-Evaluating Dependency Parsing: Robust and Heuristics-Free Cross-Annotation Evaluation

13 0.14821976 20 emnlp-2011-Augmenting String-to-Tree Translation Models with Fuzzy Use of Source-side Syntax

14 0.14731102 10 emnlp-2011-A Probabilistic Forest-to-String Model for Language Generation from Typed Lambda Calculus Expressions

15 0.14715932 22 emnlp-2011-Better Evaluation Metrics Lead to Better Machine Translation

16 0.14509869 137 emnlp-2011-Training dependency parsers by jointly optimizing multiple objectives

17 0.14263982 58 emnlp-2011-Fast Generation of Translation Forest for Large-Scale SMT Discriminative Training

18 0.14058898 85 emnlp-2011-Learning to Simplify Sentences with Quasi-Synchronous Grammar and Integer Programming

19 0.13904485 51 emnlp-2011-Exact Decoding of Phrase-Based Translation Models through Lagrangian Relaxation

20 0.13896364 146 emnlp-2011-Unsupervised Structure Prediction with Non-Parallel Multilingual Guidance


similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, 0.41), (1, 0.299), (2, 0.09), (3, 0.042), (4, -0.022), (5, 0.01), (6, 0.075), (7, -0.076), (8, 0.033), (9, -0.021), (10, 0.037), (11, 0.116), (12, 0.091), (13, 0.024), (14, 0.047), (15, 0.081), (16, 0.054), (17, 0.004), (18, 0.127), (19, -0.002), (20, 0.069), (21, -0.023), (22, -0.077), (23, 0.005), (24, -0.081), (25, 0.068), (26, -0.052), (27, -0.052), (28, -0.044), (29, 0.028), (30, 0.044), (31, -0.077), (32, 0.097), (33, -0.014), (34, 0.117), (35, -0.077), (36, -0.018), (37, 0.007), (38, 0.107), (39, 0.013), (40, 0.041), (41, -0.007), (42, 0.138), (43, -0.006), (44, -0.085), (45, -0.184), (46, 0.058), (47, 0.077), (48, -0.048), (49, -0.08)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96303469 108 emnlp-2011-Quasi-Synchronous Phrase Dependency Grammars for Machine Translation

Author: Kevin Gimpel ; Noah A. Smith

Abstract: We present a quasi-synchronous dependency grammar (Smith and Eisner, 2006) for machine translation in which the leaves of the tree are phrases rather than words as in previous work (Gimpel and Smith, 2009). This formulation allows us to combine structural components of phrase-based and syntax-based MT in a single model. We describe a method of extracting phrase dependencies from parallel text using a target-side dependency parser. For decoding, we describe a coarse-to-fine approach based on lattice dependency parsing of phrase lattices. We demonstrate performance improvements for Chinese-English and UrduEnglish translation over a phrase-based baseline. We also investigate the use of unsupervised dependency parsers, reporting encouraging preliminary results.

2 0.73517299 15 emnlp-2011-A novel dependency-to-string model for statistical machine translation

Author: Jun Xie ; Haitao Mi ; Qun Liu

Abstract: Dependency structure, as a first step towards semantics, is believed to be helpful to improve translation quality. However, previous works on dependency structure based models typically resort to insertion operations to complete translations, which make it difficult to specify ordering information in translation rules. In our model of this paper, we handle this problem by directly specifying the ordering information in head-dependents rules which represent the source side as head-dependents relations and the target side as strings. The head-dependents rules require only substitution operation, thus our model requires no heuristics or separate ordering models of the previous works to control the word order of translations. Large-scale experiments show that our model performs well on long distance reordering, and outperforms the state- of-the-art constituency-to-string model (+1.47 BLEU on average) and hierarchical phrasebased model (+0.46 BLEU on average) on two Chinese-English NIST test sets without resort to phrases or parse forest. For the first time, a source dependency structure based model catches up with and surpasses the state-of-theart translation models.

3 0.67326474 123 emnlp-2011-Soft Dependency Constraints for Reordering in Hierarchical Phrase-Based Translation

Author: Yang Gao ; Philipp Koehn ; Alexandra Birch

Abstract: Long-distance reordering remains one of the biggest challenges facing machine translation. We derive soft constraints from the source dependency parsing to directly address the reordering problem for the hierarchical phrasebased model. Our approach significantly improves Chinese–English machine translation on a large-scale task by 0.84 BLEU points on average. Moreover, when we switch the tuning function from BLEU to the LRscore which promotes reordering, we observe total improvements of 1.21 BLEU, 1.30 LRscore and 3.36 TER over the baseline. On average our approach improves reordering precision and recall by 6.9 and 0.3 absolute points, respectively, and is found to be especially effective for long-distance reodering.

4 0.63931292 85 emnlp-2011-Learning to Simplify Sentences with Quasi-Synchronous Grammar and Integer Programming

Author: Kristian Woodsend ; Mirella Lapata

Abstract: Text simplification aims to rewrite text into simpler versions, and thus make information accessible to a broader audience. Most previous work simplifies sentences using handcrafted rules aimed at splitting long sentences, or substitutes difficult words using a predefined dictionary. This paper presents a datadriven model based on quasi-synchronous grammar, a formalism that can naturally capture structural mismatches and complex rewrite operations. We describe how such a grammar can be induced from Wikipedia and propose an integer linear programming model for selecting the most appropriate simplification from the space of possible rewrites generated by the grammar. We show experimentally that our method creates simplifications that significantly reduce the reading difficulty ofthe input, while maintaining grammaticality and preserving its meaning.

5 0.62413406 5 emnlp-2011-A Fast Re-scoring Strategy to Capture Long-Distance Dependencies

Author: Anoop Deoras ; Tomas Mikolov ; Kenneth Church

Abstract: A re-scoring strategy is proposed that makes it feasible to capture more long-distance dependencies in the natural language. Two pass strategies have become popular in a number of recognition tasks such as ASR (automatic speech recognition), MT (machine translation) and OCR (optical character recognition). The first pass typically applies a weak language model (n-grams) to a lattice and the second pass applies a stronger language model to N best lists. The stronger language model is intended to capture more longdistance dependencies. The proposed method uses RNN-LM (recurrent neural network language model), which is a long span LM, to rescore word lattices in the second pass. A hill climbing method (iterative decoding) is proposed to search over islands of confusability in the word lattice. An evaluation based on Broadcast News shows speedups of 20 over basic N best re-scoring, and word error rate reduction of 8% (relative) on a highly competitive setup.

6 0.6017561 65 emnlp-2011-Heuristic Search for Non-Bottom-Up Tree Structure Prediction

7 0.58903486 66 emnlp-2011-Hierarchical Phrase-based Translation Representations

8 0.55738544 103 emnlp-2011-Parser Evaluation over Local and Non-Local Deep Dependencies in a Large Corpus

9 0.54258227 4 emnlp-2011-A Fast, Accurate, Non-Projective, Semantically-Enriched Parser

10 0.53851688 100 emnlp-2011-Optimal Search for Minimum Error Rate Training

11 0.53072244 95 emnlp-2011-Multi-Source Transfer of Delexicalized Dependency Parsers

12 0.52937895 50 emnlp-2011-Evaluating Dependency Parsing: Robust and Heuristics-Free Cross-Annotation Evaluation

13 0.51198637 51 emnlp-2011-Exact Decoding of Phrase-Based Translation Models through Lagrangian Relaxation

14 0.50924337 102 emnlp-2011-Parse Correction with Specialized Models for Difficult Attachment Types

15 0.5049454 74 emnlp-2011-Inducing Sentence Structure from Parallel Corpora for Reordering

16 0.50241041 93 emnlp-2011-Minimum Imputed-Risk: Unsupervised Discriminative Training for Machine Translation

17 0.49830171 75 emnlp-2011-Joint Models for Chinese POS Tagging and Dependency Parsing

18 0.49277046 20 emnlp-2011-Augmenting String-to-Tree Translation Models with Fuzzy Use of Source-side Syntax

19 0.49059439 137 emnlp-2011-Training dependency parsers by jointly optimizing multiple objectives

20 0.48428947 44 emnlp-2011-Domain Adaptation via Pseudo In-Domain Data Selection


similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(15, 0.011), (19, 0.125), (23, 0.12), (35, 0.013), (36, 0.031), (37, 0.056), (45, 0.051), (53, 0.04), (54, 0.036), (57, 0.013), (62, 0.033), (64, 0.067), (66, 0.04), (69, 0.04), (79, 0.053), (82, 0.03), (85, 0.02), (87, 0.015), (90, 0.021), (96, 0.074), (98, 0.041)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.88371021 68 emnlp-2011-Hypotheses Selection Criteria in a Reranking Framework for Spoken Language Understanding

Author: Marco Dinarelli ; Sophie Rosset

Abstract: Reranking models have been successfully applied to many tasks of Natural Language Processing. However, there are two aspects of this approach that need a deeper investigation: (i) Assessment of hypotheses generated for reranking at classification phase: baseline models generate a list of hypotheses and these are used for reranking without any assessment; (ii) Detection of cases where reranking models provide a worst result: the best hypothesis provided by the reranking model is assumed to be always the best result. In some cases the reranking model provides an incorrect hypothesis while the baseline best hypothesis is correct, especially when baseline models are accurate. In this paper we propose solutions for these two aspects: (i) a semantic inconsistency metric to select possibly more correct n-best hypotheses, from a large set generated by an SLU basiline model. The selected hypotheses are reranked applying a state-of-the-art model based on Partial Tree Kernels, which encode SLU hypotheses in Support Vector Machines with complex structured features; (ii) finally, we apply a decision strategy, based on confidence values, to select the final hypothesis between the first ranked hypothesis provided by the baseline SLU model and the first ranked hypothesis provided by the re-ranker. We show the effectiveness of these solutions presenting comparative results obtained reranking hypotheses generated by a very accurate Conditional Random Field model. We evaluate our approach on the French MEDIA corpus. The results show significant improvements with respect to current state-of-the-art and previous 1104 Sophie Rosset LIMSI-CNRS B.P. 133, 91403 Orsay Cedex France ro s set @ l ims i fr . re-ranking models.

same-paper 2 0.85067689 108 emnlp-2011-Quasi-Synchronous Phrase Dependency Grammars for Machine Translation

Author: Kevin Gimpel ; Noah A. Smith

Abstract: We present a quasi-synchronous dependency grammar (Smith and Eisner, 2006) for machine translation in which the leaves of the tree are phrases rather than words as in previous work (Gimpel and Smith, 2009). This formulation allows us to combine structural components of phrase-based and syntax-based MT in a single model. We describe a method of extracting phrase dependencies from parallel text using a target-side dependency parser. For decoding, we describe a coarse-to-fine approach based on lattice dependency parsing of phrase lattices. We demonstrate performance improvements for Chinese-English and UrduEnglish translation over a phrase-based baseline. We also investigate the use of unsupervised dependency parsers, reporting encouraging preliminary results.

3 0.78370231 59 emnlp-2011-Fast and Robust Joint Models for Biomedical Event Extraction

Author: Sebastian Riedel ; Andrew McCallum

Abstract: Extracting biomedical events from literature has attracted much recent attention. The bestperforming systems so far have been pipelines of simple subtask-specific local classifiers. A natural drawback of such approaches are cascading errors introduced in early stages of the pipeline. We present three joint models of increasing complexity designed to overcome this problem. The first model performs joint trigger and argument extraction, and lends itself to a simple, efficient and exact inference algorithm. The second model captures correlations between events, while the third model ensures consistency between arguments of the same event. Inference in these models is kept tractable through dual decomposition. The first two models outperform the previous best joint approaches and are very competitive with respect to the current state-of-theart. The third model yields the best results reported so far on the BioNLP 2009 shared task, the BioNLP 2011 Genia task and the BioNLP 2011Infectious Diseases task.

4 0.77009493 123 emnlp-2011-Soft Dependency Constraints for Reordering in Hierarchical Phrase-Based Translation

Author: Yang Gao ; Philipp Koehn ; Alexandra Birch

Abstract: Long-distance reordering remains one of the biggest challenges facing machine translation. We derive soft constraints from the source dependency parsing to directly address the reordering problem for the hierarchical phrasebased model. Our approach significantly improves Chinese–English machine translation on a large-scale task by 0.84 BLEU points on average. Moreover, when we switch the tuning function from BLEU to the LRscore which promotes reordering, we observe total improvements of 1.21 BLEU, 1.30 LRscore and 3.36 TER over the baseline. On average our approach improves reordering precision and recall by 6.9 and 0.3 absolute points, respectively, and is found to be especially effective for long-distance reodering.

5 0.7681132 20 emnlp-2011-Augmenting String-to-Tree Translation Models with Fuzzy Use of Source-side Syntax

Author: Jiajun Zhang ; Feifei Zhai ; Chengqing Zong

Abstract: Due to its explicit modeling of the grammaticality of the output via target-side syntax, the string-to-tree model has been shown to be one of the most successful syntax-based translation models. However, a major limitation of this model is that it does not utilize any useful syntactic information on the source side. In this paper, we analyze the difficulties of incorporating source syntax in a string-totree model. We then propose a new way to use the source syntax in a fuzzy manner, both in source syntactic annotation and in rule matching. We further explore three algorithms in rule matching: 0-1 matching, likelihood matching, and deep similarity matching. Our method not only guarantees grammatical output with an explicit target tree, but also enables the system to choose the proper translation rules via fuzzy use of the source syntax. Our extensive experiments have shown significant improvements over the state-of-the-art string-to-tree system. 1

6 0.76395512 136 emnlp-2011-Training a Parser for Machine Translation Reordering

7 0.76230407 1 emnlp-2011-A Bayesian Mixture Model for PoS Induction Using Multiple Features

8 0.75200593 66 emnlp-2011-Hierarchical Phrase-based Translation Representations

9 0.74706984 134 emnlp-2011-Third-order Variational Reranking on Packed-Shared Dependency Forests

10 0.74473685 85 emnlp-2011-Learning to Simplify Sentences with Quasi-Synchronous Grammar and Integer Programming

11 0.7431919 8 emnlp-2011-A Model of Discourse Predictions in Human Sentence Processing

12 0.74128246 137 emnlp-2011-Training dependency parsers by jointly optimizing multiple objectives

13 0.73958927 13 emnlp-2011-A Word Reordering Model for Improved Machine Translation

14 0.73917377 79 emnlp-2011-Lateen EM: Unsupervised Training with Multiple Objectives, Applied to Dependency Grammar Induction

15 0.7385422 15 emnlp-2011-A novel dependency-to-string model for statistical machine translation

16 0.73801708 83 emnlp-2011-Learning Sentential Paraphrases from Bilingual Parallel Corpora for Text-to-Text Generation

17 0.73652244 53 emnlp-2011-Experimental Support for a Categorical Compositional Distributional Model of Meaning

18 0.73511243 28 emnlp-2011-Closing the Loop: Fast, Interactive Semi-Supervised Annotation With Queries on Features and Instances

19 0.73482376 75 emnlp-2011-Joint Models for Chinese POS Tagging and Dependency Parsing

20 0.73181325 35 emnlp-2011-Correcting Semantic Collocation Errors with L1-induced Paraphrases