acl acl2010 acl2010-169 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: David Chiang
Abstract: Statistical translation models that try to capture the recursive structure of language have been widely adopted over the last few years. These models make use of varying amounts of information from linguistic theory: some use none at all, some use information about the grammar of the target language, some use information about the grammar of the source language. But progress has been slower on translation models that are able to learn the relationship between the grammars of both the source and target language. We discuss the reasons why this has been a challenge, review existing attempts to meet this challenge, and show how some old and new ideas can be combined into a sim- ple approach that uses both source and target syntax for significant improvements in translation accuracy.
Reference: text
sentIndex sentText sentNum sentScore
1 Abstract Statistical translation models that try to capture the recursive structure of language have been widely adopted over the last few years. [sent-2, score-0.151]
2 These models make use of varying amounts of information from linguistic theory: some use none at all, some use information about the grammar of the target language, some use information about the grammar of the source language. [sent-3, score-0.393]
3 But progress has been slower on translation models that are able to learn the relationship between the grammars of both the source and target language. [sent-4, score-0.38]
4 We discuss the reasons why this has been a challenge, review existing attempts to meet this challenge, and show how some old and new ideas can be combined into a sim- ple approach that uses both source and target syntax for significant improvements in translation accuracy. [sent-5, score-0.444]
5 1 Introduction Statistical translation models that use synchronous context-free grammars (SCFGs) or related formalisms to try to capture the recursive structure of language have been widely adopted over the last few years. [sent-6, score-0.287]
6 The simplest of these (Chiang, 2005) make no use of information from syntactic theories or syntactic annotations, whereas others have successfully incorporated syntactic information on the target side (Galley et al. [sent-7, score-0.391]
7 In this paper, we explore the reasons why tree- to-tree translation has been challenging, and how source syntax and target syntax might be used together. [sent-17, score-0.55]
8 Drawing on previous successful attempts to relax syntactic constraints during grammar extraction in various ways (Zhang et al. [sent-18, score-0.309]
9 , 2009; Zollmann and Venugopal, 2006), we compare several methods for extracting a synchronous grammar from tree-to-tree data. [sent-20, score-0.197]
10 One confounding factor in such a comparison is that some methods generate many new syntactic categories, making it more difficult to satisfy syntactic constraints at decoding time. [sent-21, score-0.205]
11 2 Grammar extraction A synchronous tree-substitution grammar (STSG) is a set of rules or elementary tree pairs (γ, α) , where: is a tree whose interior labels are sourcelanguage nonterminal symbols and whose frontier labels are source-language nonterminal symbols or terminal symbols (words). [sent-25, score-1.156]
12 The nonterminal-labeled frontier nodes are called substitution nodes, conventionally marked with an arrow (↓). [sent-26, score-0.332]
13 • γ • α is a tree of the same form except with 1443 Proce dingUsp opfs thaela 4, 8Stwhe Adnen u,a 1l1- M16e Jtiunlgy o 2f0 t1h0e. [sent-27, score-0.107]
14 Rule (γ2, α2) is substituted into rule (γ1 , α1) to yield (γ3, α3) . [sent-60, score-0.14]
15 • • The substitution nodes of γ are aligned bijectively with those of α. [sent-62, score-0.305]
16 The terminal-labeled frontier nodes of γ are aligned (many-to-many) with those of α. [sent-63, score-0.175]
17 In the substitution operation, an aligned pair of substitution nodes is rewritten with an elementary tree pair. [sent-64, score-0.801]
18 The labels of the substitution nodes must match the root labels of the elementary trees with which they are rewritten (but we will relax this constraint below). [sent-65, score-0.801]
19 See Figure 1for examples of elementary tree pairs and substitution. [sent-66, score-0.28]
20 1 Exact tree-to-tree extraction The use of STSGs for translation was proposed in the Data-Oriented Parsing literature (Poutsma, 2000; Hearne and Way, 2003) and by Eisner (2003). [sent-68, score-0.292]
21 As tree constraints are added, the number of phrase pairs drops. [sent-70, score-0.223]
22 Percentages are relative to the maximum number of nested phrase pairs. [sent-72, score-0.138]
23 , 2008), we obtain the following grammar extraction method, which we call exact tree-to-tree extraction. [sent-76, score-0.345]
24 Given a pair of source- and target-language parse trees with a word alignment between their leaves, identify all the phrase pairs (¯f, e¯) , i. [sent-77, score-0.171]
25 Then the extracted grammar is the smallest STSG G satisfying: • If (γ, α) is a pair of subtrees of a training ex- ample and the frontiers of γ and α form a phrase pair, then (γ, α) is a rule in G. [sent-109, score-0.315]
26 • If (γ2, α2) ∈ G, (γ3 , α3) ∈ G, and (γ1 , α1 ) is an elementary tree pair )su ∈ch that substituting (γ2, α2) into (γ1 , α1 ) results in (γ3 , α3) ,then (γ1 , α1 ) is a rule in G. [sent-110, score-0.376]
27 For example, consider the training example in Figure 2, from which the elementary tree pairs shown in Figure 1 can be extracted. [sent-111, score-0.28]
28 The elementary tree pairs (γ2, α2) and (γ3 , α3) are rules in G because their yields are phrase pairs, and (γ1 , α1 ) results from subtracting (γ2, α2) from (γ3 , α3) . [sent-112, score-0.455]
29 2 Fuzzy tree-to-tree extraction Exact tree-to-tree translation requires that translation rules deal with syntactic constituents on both the source and target side, which reduces the number of eligible phrases. [sent-114, score-0.798]
30 1 The first line shows the number of phrase-pair occurrences that are extracted in the absence of syntactic constraints,2 and the second line shows the maximum number of nested phrase-pair occurrences, which is the most that exact syntax-based extraction can achieve. [sent-116, score-0.373]
31 Whereas tree-to-string extraction and string-to-tree extraction permit 70–80% of the maximum possible number of phrase pairs, tree-to-tree extraction only permits 60–70%. [sent-117, score-0.495]
32 We can see that moving from human annotations to automatic annotations decreases not only the absolute number of phrase pairs, but the percentage of phrases that pass the syntactic filters. [sent-119, score-0.19]
33 (2006), in a more systematic study, find that, of sentences where the tree-to-tree constraint blocks rule extraction, the majority are due to parser errors. [sent-121, score-0.19]
34 (2009) extract rules from pairs 1The first 2000 sentences from the GALE Phase 4 Chinese Parallel Word Alignment and Tagging Part 1 (LDC2009E83) and the Chinese News Translation Text Part 1 (LDC2005T06), respectively. [sent-123, score-0.147]
35 Since a packed forest is much more likely to include the correct tree, it is less likely that parser errors will cause good rules to be filtered out. [sent-126, score-0.16]
36 However, even on human-annotated data, treeto-tree extraction misses many rules, and many such rules would seem to be useful. [sent-127, score-0.244]
37 therefore argue that in order to extract as many rules as possible, a more powerful formalism than synchronous CFG/TSG is required: for example, generalized multitext grammar (Melamed et al. [sent-138, score-0.353]
38 But the problem illustrated in Figure 2 does not reflect a very deep fact about syntax or crosslingual divergences, but rather choices in annotation style that interact badly with the exact treeto-tree extraction heuristic. [sent-140, score-0.348]
39 ]] Thus even in the gold-standard parse trees, phrase structure can be underspecified (like the flat IP above) or uncertain (like the PP attachment above). [sent-148, score-0.163]
40 Synchronous tree-sequence–substitution grammar (STSSG) allows either side of a rule to comprise a sequence of trees instead of a single tree (Zhang et al. [sent-150, score-0.527]
41 In the substitution operation, a sequence of sister substitution nodes is rewritten with a tree sequence of equal length (see Figure 3a). [sent-152, score-0.629]
42 Any STSSG can be converted into an equivalent STSG via the creation of virtual nodes (see Figure 3b): for every elementary tree sequence with roots X1, . [sent-154, score-0.298]
43 on (b) Figure 3: (a) Example tree-sequence substitution grammar and (b) its equivalent SAMT-style treesubstitution grammar. [sent-205, score-0.303]
44 complex label X1 ∗ · · · ∗ Xn immediately dominating the old roots, a ·n·d· replace every sequence of substitution sites X1, . [sent-206, score-0.252]
45 ,Xn with a single substitution site X1 ∗ · · · ∗ Xn. [sent-209, score-0.301]
46 In addition, SAMT drops the requirement that the Xi are sisters, and uses categories X / Y (an X missing a Y on the right) and Y\ X (an X missing a Y on the left) in the style of categorial grammar (Bar-Hillel, 1953). [sent-211, score-0.158]
47 Both STSSG and SAMT are examples of what we might call fuzzy tree-to-tree extraction. [sent-213, score-0.288]
48 Moreover, we allow the product categories X1 ∗ · · · ∗ Xn to be of any length n, and we allow the ∗sla ·s·h· categories to take any number of arguments on either side. [sent-215, score-0.11]
49 Thus every phrase can be assigned a (possibly very complex) syntactic category, so that fuzzy tree-to-tree extraction does not lose any rules relative to stringto-string extraction. [sent-216, score-0.669]
50 On the other hand, if several rules are extracted 1446 that differ only in their nonterminal labels, only the most-frequent rule is kept, and its count is the total count of all the rules. [sent-217, score-0.301]
51 This means that there is a one-to-one correspondence between the rules extracted by fuzzy tree-to-tree extraction and hierarchical string-to-string extraction. [sent-218, score-0.587]
52 3 Nesting phrases Fuzzy tree-to-tree extraction (like string-to-string extraction) generates many times more rules than exact tree-to-tree extraction does. [sent-220, score-0.539]
53 In Figure 2, we observed that the flat structure of the Chinese IP prevented exact tree-to-tree extraction from ex- tracting a rule containing just part of the IP, for example: (3) (4) (5) zaì . [sent-221, score-0.429]
54 měiyuán] [PP Fuzzy tree-to-tree extraction allows any of these to be the source side of a rule. [sent-233, score-0.376]
55 We might think of it as effectively restructuring the trees by inserting nodes with complex labels. [sent-234, score-0.117]
56 However, it is not possible to represent this restructuring with a single tree (see Figure 4). [sent-235, score-0.107]
57 In other words, exact tree-to-tree extraction commits to a single structural analysis but fuzzy tree-to-tree extraction pursues many restructured analyses at once. [sent-238, score-0.671]
58 Iterate through all the phrase pairs (¯f, e¯) in the following order: 1. [sent-241, score-0.116]
59 For each phrase pair, accept it if it does not cross any previously accepted phrase pair; otherwise, reject it. [sent-246, score-0.144]
60 Because this heuristic produces a set of nesting phrases, we can represent them all in a single re- structured tree. [sent-247, score-0.107]
61 3 Decoding In decoding, the rules extracted during training must be reassembled to form a derivation whose source side matches the input sentence. [sent-249, score-0.382]
62 In the exact tree-to-tree approach, whenever substitution is performed, the root labels of the substituted trees must match the labels of the substitution nodes—call this the matching constraint. [sent-250, score-0.853]
63 Because this constraint must be satisfied on both the source and target side, it can become difficult to generalize well from training examples to new input sentences . [sent-251, score-0.237]
64 Still, only derivations that satisfy the matching constraint are included in the summation. [sent-254, score-0.16]
65 But in some cases we may want to soften the matching constraint itself. [sent-255, score-0.163]
66 Some syntactic categories are similar enough to be considered compatible: for example, if a rule rooted in VBD (pasttense verb) could substitute into a site labeled VBZ (present-tense verb), it might still generate correct output. [sent-256, score-0.412]
67 This is all the more true with the addition of SAMT-style categories: for example, if a rule rooted in ADVP ∗ VP could substitute into a site labeled VP, it wo∗uld very likely generate correct output. [sent-257, score-0.292]
68 shùnchā (b) Figure 4: Fuzzy tree-to-tree extraction effectively restructures the Chinese tree from Figure 2 in two ways but does not commit to either one. [sent-318, score-0.248]
69 • • • • matchf counts the number of substitutions where the label of the source side of the substitution site matches the root label of the source side of the rule, and ¬matchf counts those where the labels do not ¬match. [sent-319, score-1.23]
70 substfXYcounts the number of substitutions where →the label of the source side of the substitution site is X and the root label of the source side of the rule is Y. [sent-320, score-1.208]
71 rootX,X′ counts the number of rules whose root label on the source side is X and whose root label on the target side is X′. [sent-322, score-0.844]
72 3 For example, in the derivation of Figure 1,the following features would fire: matchf = 1 = 1 substfNPNP e match = 1 substeNP→NP 1 = rootNP,NP = 1 The decoding algorithm then operates as in hierarchical phrase-based translation. [sent-323, score-0.265]
73 The decoder has to store in each hypothesis the source and target root labels of the partial derivation, but these labels are used for calculating feature vectors only and not for checking well-formedness of derivations. [sent-324, score-0.499]
74 We parsed the source sides of both parallel texts using the Berkeley parser (Petrov et al. [sent-332, score-0.161]
75 Rules with nonterminals were extracted from a subset of the data (labeled “Core” in Table 2), and rules without nonterminals were extracted from the full parallel text. [sent-335, score-0.291]
76 For exact tree-to-tree extraction, we used simpler settings: no limit on initial phrase size or unaligned words, and a maximum of 7 frontier nodes on the source side. [sent-337, score-0.514]
77 All systems used the glue rule (Chiang, 2005), which allows the decoder, working bottom-up, to stop building hierarchical structure and instead concatenate partial translations without any reordering. [sent-338, score-0.303]
78 , 2009), but with bigram features (source and target word) instead of trigram features (source and target word and neighboring source word). [sent-345, score-0.261]
79 This limit controls how deeply nested the tree structures built by the decoder are, and we want to see whether adding syntactic information leads to more complex structures. [sent-353, score-0.349]
80 But with fuzzy tree-to-tree extraction, we obtained an improvement of +0. [sent-357, score-0.288]
81 Applying the heuristic for nesting phrases reduced the grammar sizes dramatically (by a factor of 2. [sent-361, score-0.263]
82 2 for Arabic) but, interestingly, had almost no effect on translation quality: a slight decrease in BLEU on the Arabic-English development set and no significant difference on the other sets. [sent-363, score-0.151]
83 This suggests that the strength of fuzzy tree-to-tree extraction lies in its ability to break up flat structures and to reconcile the source and target trees with each other, rather than multiple restructurings of the training trees. [sent-364, score-0.718]
84 3 Rule usage We then took a closer look at the behavior of the string-to-string and fuzzy tree-to-tree grammars (without the nesting heuristic). [sent-366, score-0.437]
85 Because the rules of these grammars are in one-to-one correspondence, we can analyze the string-to-string system’s derivations as though they had syntactic categories. [sent-367, score-0.268]
86 First, Table 4 shows that the system using the tree-to-tree grammar used the glue rule much less and performed more matching substitutions. [sent-368, score-0.403]
87 Several changes appear to have to do with definiteness of NPs: on the English side, adding the syntax features encourages matching substitutions of type DT \ NP-C (anarthrous NP), but discourages DT \ N\P-C and NN from substituting into NP-C an\d vice versa. [sent-372, score-0.248]
88 For example, a translation with the rewriting NP-C → DT \ NP-C begins with “24th meeting of th→e Standing Committee. [sent-373, score-0.151]
89 ,” but the system using the fuzzy tree-to-tree grammar changes this to “The 24th meeting of the Standing Committee. [sent-376, score-0.391]
90 ” The root features had a less noticeable effect on 1449 Table 3: On both the Chinese-English and Arabic-English translation tasks, fuzzy tree-to-tree extraction outperforms exact tree-to-tree extraction and string-to-string extraction. [sent-380, score-0.925]
91 rule choice; one interesting change was that the frequency of rules with Chinese root VP / IP and English root VP / S-C increased from 0. [sent-383, score-0.449]
92 Indeed, we have found that the model learns on its own to choose syntactically richer and more wellformed structures, demonstrating that source- and target-side syntax can be used together profitably as long as they are not allowed to overconstrain the translation model. [sent-387, score-0.257]
93 Table 4: Moving from string-to-string (s-to-s) extraction to fuzzy tree-to-tree (t-to-t) extraction decreases glue rule usage and increases the frequency of matching substitutions. [sent-393, score-0.87]
94 8 Table 5: Comparison of frequency of source-side rewrites in Chinese-English translation between string-to-string (s-to-s) and fuzzy tree-to-tree (t-tot) grammars. [sent-418, score-0.531]
95 The label “entity” stands for handwritten rules for named entities and numbers. [sent-420, score-0.201]
96 7 Table 6: Comparison of frequency of target-side rewrites in Chinese-English translation between string-to-string (s-to-s) and fuzzy tree-to-tree (tto-t) grammars. [sent-451, score-0.531]
97 The label “entity” stands for handwritten rules for named entities and numbers. [sent-453, score-0.201]
98 Improving syntax driven translation models by re-structuring divergent and non-isomorphic parse tree structures. [sent-456, score-0.364]
99 Scalable inference and training of context-rich syntactic translation models. [sent-513, score-0.216]
100 Syntax-driven learning of sub-sentential translation equivalents and translation rules from parsed parallel corpora. [sent-528, score-0.453]
wordName wordTfidf (topN-words)
[('fuzzy', 0.288), ('substitution', 0.2), ('nch', 0.16), ('np', 0.159), ('translation', 0.151), ('vp', 0.15), ('extraction', 0.141), ('rule', 0.14), ('sh', 0.135), ('elementary', 0.129), ('side', 0.122), ('source', 0.113), ('chiang', 0.111), ('glue', 0.108), ('tree', 0.107), ('nesting', 0.107), ('stssg', 0.107), ('za', 0.107), ('iyu', 0.107), ('syntax', 0.106), ('nn', 0.104), ('chinese', 0.103), ('grammar', 0.103), ('rules', 0.103), ('root', 0.103), ('site', 0.101), ('exact', 0.101), ('loose', 0.099), ('ip', 0.096), ('synchronous', 0.094), ('pp', 0.093), ('rewrites', 0.092), ('venugopal', 0.092), ('arabicenglish', 0.091), ('matchf', 0.091), ('samt', 0.091), ('wellington', 0.091), ('iw', 0.09), ('substitutions', 0.09), ('dt', 0.089), ('zollmann', 0.087), ('nnp', 0.077), ('decoding', 0.075), ('pu', 0.074), ('target', 0.074), ('liu', 0.073), ('phrase', 0.072), ('galley', 0.072), ('labels', 0.071), ('frontier', 0.07), ('nonterminals', 0.07), ('decoder', 0.067), ('nested', 0.066), ('syntactic', 0.065), ('stsg', 0.065), ('nodes', 0.062), ('soften', 0.061), ('zh', 0.061), ('lavie', 0.06), ('rewritten', 0.06), ('derivations', 0.058), ('nonterminal', 0.058), ('packed', 0.057), ('trees', 0.055), ('categories', 0.055), ('hierarchical', 0.055), ('hearne', 0.053), ('multitext', 0.053), ('matche', 0.053), ('phrases', 0.053), ('matching', 0.052), ('unaligned', 0.052), ('label', 0.052), ('kevin', 0.051), ('substitute', 0.051), ('constraint', 0.05), ('oy', 0.049), ('vamshi', 0.049), ('gale', 0.048), ('parallel', 0.048), ('counterpart', 0.047), ('tie', 0.047), ('flat', 0.047), ('melamed', 0.046), ('ashish', 0.046), ('fossum', 0.046), ('handwritten', 0.046), ('derivation', 0.044), ('knight', 0.044), ('attachment', 0.044), ('phrasebased', 0.044), ('limit', 0.044), ('pairs', 0.044), ('ambati', 0.043), ('aligned', 0.043), ('grammars', 0.042), ('talbot', 0.041), ('standing', 0.041), ('pauls', 0.041), ('newswire', 0.041)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000005 169 acl-2010-Learning to Translate with Source and Target Syntax
Author: David Chiang
Abstract: Statistical translation models that try to capture the recursive structure of language have been widely adopted over the last few years. These models make use of varying amounts of information from linguistic theory: some use none at all, some use information about the grammar of the target language, some use information about the grammar of the source language. But progress has been slower on translation models that are able to learn the relationship between the grammars of both the source and target language. We discuss the reasons why this has been a challenge, review existing attempts to meet this challenge, and show how some old and new ideas can be combined into a sim- ple approach that uses both source and target syntax for significant improvements in translation accuracy.
2 0.25580779 46 acl-2010-Bayesian Synchronous Tree-Substitution Grammar Induction and Its Application to Sentence Compression
Author: Elif Yamangil ; Stuart M. Shieber
Abstract: We describe our experiments with training algorithms for tree-to-tree synchronous tree-substitution grammar (STSG) for monolingual translation tasks such as sentence compression and paraphrasing. These translation tasks are characterized by the relative ability to commit to parallel parse trees and availability of word alignments, yet the unavailability of large-scale data, calling for a Bayesian tree-to-tree formalism. We formalize nonparametric Bayesian STSG with epsilon alignment in full generality, and provide a Gibbs sampling algorithm for posterior inference tailored to the task of extractive sentence compression. We achieve improvements against a number of baselines, including expectation maximization and variational Bayes training, illustrating the merits of nonparametric inference over the space of grammars as opposed to sparse parametric inference with a fixed grammar.
3 0.25351197 75 acl-2010-Correcting Errors in a Treebank Based on Synchronous Tree Substitution Grammar
Author: Yoshihide Kato ; Shigeki Matsubara
Abstract: This paper proposes a method of correcting annotation errors in a treebank. By using a synchronous grammar, the method transforms parse trees containing annotation errors into the ones whose errors are corrected. The synchronous grammar is automatically induced from the treebank. We report an experimental result of applying our method to the Penn Treebank. The result demonstrates that our method corrects syntactic annotation errors with high precision.
4 0.2204856 115 acl-2010-Filtering Syntactic Constraints for Statistical Machine Translation
Author: Hailong Cao ; Eiichiro Sumita
Abstract: Source language parse trees offer very useful but imperfect reordering constraints for statistical machine translation. A lot of effort has been made for soft applications of syntactic constraints. We alternatively propose the selective use of syntactic constraints. A classifier is built automatically to decide whether a node in the parse trees should be used as a reordering constraint or not. Using this information yields a 0.8 BLEU point improvement over a full constraint-based system.
5 0.22009732 69 acl-2010-Constituency to Dependency Translation with Forests
Author: Haitao Mi ; Qun Liu
Abstract: Tree-to-string systems (and their forestbased extensions) have gained steady popularity thanks to their simplicity and efficiency, but there is a major limitation: they are unable to guarantee the grammaticality of the output, which is explicitly modeled in string-to-tree systems via targetside syntax. We thus propose to combine the advantages of both, and present a novel constituency-to-dependency translation model, which uses constituency forests on the source side to direct the translation, and dependency trees on the target side (as a language model) to ensure grammaticality. Medium-scale experiments show an absolute and statistically significant improvement of +0.7 BLEU points over a state-of-the-art forest-based tree-to-string system even with fewer rules. This is also the first time that a treeto-tree model can surpass tree-to-string counterparts.
6 0.21571934 243 acl-2010-Tree-Based and Forest-Based Translation
7 0.19871958 118 acl-2010-Fine-Grained Tree-to-String Translation Rule Extraction
8 0.19334736 133 acl-2010-Hierarchical Search for Word Alignment
9 0.18702522 53 acl-2010-Blocked Inference in Bayesian Tree Substitution Grammars
10 0.18530589 9 acl-2010-A Joint Rule Selection Model for Hierarchical Phrase-Based Translation
11 0.17420188 48 acl-2010-Better Filtration and Augmentation for Hierarchical Phrase-Based Translation Rules
12 0.16639355 240 acl-2010-Training Phrase Translation Models with Leaving-One-Out
13 0.16202222 21 acl-2010-A Tree Transducer Model for Synchronous Tree-Adjoining Grammars
14 0.15832756 110 acl-2010-Exploring Syntactic Structural Features for Sub-Tree Alignment Using Bilingual Tree Kernels
15 0.15660624 51 acl-2010-Bilingual Sense Similarity for Statistical Machine Translation
16 0.13756198 200 acl-2010-Profiting from Mark-Up: Hyper-Text Annotations for Guided Parsing
17 0.13648492 54 acl-2010-Boosting-Based System Combination for Machine Translation
18 0.13538651 87 acl-2010-Discriminative Modeling of Extraction Sets for Machine Translation
19 0.13247541 56 acl-2010-Bridging SMT and TM with Translation Recommendation
20 0.13095714 265 acl-2010-cdec: A Decoder, Alignment, and Learning Framework for Finite-State and Context-Free Translation Models
topicId topicWeight
[(0, -0.339), (1, -0.276), (2, 0.073), (3, -0.0), (4, -0.149), (5, -0.038), (6, 0.176), (7, 0.053), (8, -0.136), (9, -0.079), (10, 0.132), (11, -0.099), (12, 0.06), (13, -0.118), (14, 0.001), (15, 0.005), (16, 0.061), (17, -0.054), (18, 0.002), (19, 0.078), (20, -0.077), (21, 0.046), (22, 0.105), (23, 0.117), (24, 0.005), (25, -0.077), (26, -0.021), (27, -0.089), (28, -0.036), (29, -0.049), (30, -0.034), (31, -0.034), (32, 0.014), (33, -0.021), (34, -0.005), (35, 0.022), (36, -0.015), (37, 0.025), (38, -0.024), (39, 0.04), (40, -0.001), (41, 0.041), (42, -0.017), (43, -0.009), (44, 0.074), (45, -0.057), (46, 0.008), (47, -0.039), (48, 0.0), (49, 0.063)]
simIndex simValue paperId paperTitle
same-paper 1 0.9665243 169 acl-2010-Learning to Translate with Source and Target Syntax
Author: David Chiang
Abstract: Statistical translation models that try to capture the recursive structure of language have been widely adopted over the last few years. These models make use of varying amounts of information from linguistic theory: some use none at all, some use information about the grammar of the target language, some use information about the grammar of the source language. But progress has been slower on translation models that are able to learn the relationship between the grammars of both the source and target language. We discuss the reasons why this has been a challenge, review existing attempts to meet this challenge, and show how some old and new ideas can be combined into a sim- ple approach that uses both source and target syntax for significant improvements in translation accuracy.
2 0.86066514 75 acl-2010-Correcting Errors in a Treebank Based on Synchronous Tree Substitution Grammar
Author: Yoshihide Kato ; Shigeki Matsubara
Abstract: This paper proposes a method of correcting annotation errors in a treebank. By using a synchronous grammar, the method transforms parse trees containing annotation errors into the ones whose errors are corrected. The synchronous grammar is automatically induced from the treebank. We report an experimental result of applying our method to the Penn Treebank. The result demonstrates that our method corrects syntactic annotation errors with high precision.
3 0.83426988 46 acl-2010-Bayesian Synchronous Tree-Substitution Grammar Induction and Its Application to Sentence Compression
Author: Elif Yamangil ; Stuart M. Shieber
Abstract: We describe our experiments with training algorithms for tree-to-tree synchronous tree-substitution grammar (STSG) for monolingual translation tasks such as sentence compression and paraphrasing. These translation tasks are characterized by the relative ability to commit to parallel parse trees and availability of word alignments, yet the unavailability of large-scale data, calling for a Bayesian tree-to-tree formalism. We formalize nonparametric Bayesian STSG with epsilon alignment in full generality, and provide a Gibbs sampling algorithm for posterior inference tailored to the task of extractive sentence compression. We achieve improvements against a number of baselines, including expectation maximization and variational Bayes training, illustrating the merits of nonparametric inference over the space of grammars as opposed to sparse parametric inference with a fixed grammar.
4 0.77542567 115 acl-2010-Filtering Syntactic Constraints for Statistical Machine Translation
Author: Hailong Cao ; Eiichiro Sumita
Abstract: Source language parse trees offer very useful but imperfect reordering constraints for statistical machine translation. A lot of effort has been made for soft applications of syntactic constraints. We alternatively propose the selective use of syntactic constraints. A classifier is built automatically to decide whether a node in the parse trees should be used as a reordering constraint or not. Using this information yields a 0.8 BLEU point improvement over a full constraint-based system.
5 0.76673245 118 acl-2010-Fine-Grained Tree-to-String Translation Rule Extraction
Author: Xianchao Wu ; Takuya Matsuzaki ; Jun'ichi Tsujii
Abstract: Tree-to-string translation rules are widely used in linguistically syntax-based statistical machine translation systems. In this paper, we propose to use deep syntactic information for obtaining fine-grained translation rules. A head-driven phrase structure grammar (HPSG) parser is used to obtain the deep syntactic information, which includes a fine-grained description of the syntactic property and a semantic representation of a sentence. We extract fine-grained rules from aligned HPSG tree/forest-string pairs and use them in our tree-to-string and string-to-tree systems. Extensive experiments on largescale bidirectional Japanese-English trans- lations testified the effectiveness of our approach.
6 0.7435419 53 acl-2010-Blocked Inference in Bayesian Tree Substitution Grammars
7 0.72348779 69 acl-2010-Constituency to Dependency Translation with Forests
8 0.71931523 21 acl-2010-A Tree Transducer Model for Synchronous Tree-Adjoining Grammars
9 0.64673012 9 acl-2010-A Joint Rule Selection Model for Hierarchical Phrase-Based Translation
10 0.63467121 243 acl-2010-Tree-Based and Forest-Based Translation
11 0.59225929 48 acl-2010-Better Filtration and Augmentation for Hierarchical Phrase-Based Translation Rules
12 0.59059078 84 acl-2010-Detecting Errors in Automatically-Parsed Dependency Relations
13 0.56707609 67 acl-2010-Computing Weakest Readings
14 0.5554294 211 acl-2010-Simple, Accurate Parsing with an All-Fragments Grammar
15 0.5480991 265 acl-2010-cdec: A Decoder, Alignment, and Learning Framework for Finite-State and Context-Free Translation Models
16 0.53612673 110 acl-2010-Exploring Syntactic Structural Features for Sub-Tree Alignment Using Bilingual Tree Kernels
17 0.50784701 95 acl-2010-Efficient Inference through Cascades of Weighted Tree Transducers
18 0.49934793 51 acl-2010-Bilingual Sense Similarity for Statistical Machine Translation
19 0.49060392 201 acl-2010-Pseudo-Word for Phrase-Based Machine Translation
20 0.47696391 133 acl-2010-Hierarchical Search for Word Alignment
topicId topicWeight
[(1, 0.011), (4, 0.018), (14, 0.014), (16, 0.019), (18, 0.024), (25, 0.133), (26, 0.038), (33, 0.022), (39, 0.01), (40, 0.013), (42, 0.02), (44, 0.012), (59, 0.138), (73, 0.037), (76, 0.015), (78, 0.066), (83, 0.089), (84, 0.02), (91, 0.073), (98, 0.148)]
simIndex simValue paperId paperTitle
same-paper 1 0.9507314 169 acl-2010-Learning to Translate with Source and Target Syntax
Author: David Chiang
Abstract: Statistical translation models that try to capture the recursive structure of language have been widely adopted over the last few years. These models make use of varying amounts of information from linguistic theory: some use none at all, some use information about the grammar of the target language, some use information about the grammar of the source language. But progress has been slower on translation models that are able to learn the relationship between the grammars of both the source and target language. We discuss the reasons why this has been a challenge, review existing attempts to meet this challenge, and show how some old and new ideas can be combined into a sim- ple approach that uses both source and target syntax for significant improvements in translation accuracy.
2 0.9169004 53 acl-2010-Blocked Inference in Bayesian Tree Substitution Grammars
Author: Trevor Cohn ; Phil Blunsom
Abstract: Learning a tree substitution grammar is very challenging due to derivational ambiguity. Our recent approach used a Bayesian non-parametric model to induce good derivations from treebanked input (Cohn et al., 2009), biasing towards small grammars composed of small generalisable productions. In this paper we present a novel training method for the model using a blocked Metropolis-Hastings sampler in place of the previous method’s local Gibbs sampler. The blocked sampler makes considerably larger moves than the local sampler and consequently con- verges in less time. A core component of the algorithm is a grammar transformation which represents an infinite tree substitution grammar in a finite context free grammar. This enables efficient blocked inference for training and also improves the parsing algorithm. Both algorithms are shown to improve parsing accuracy.
3 0.91558748 23 acl-2010-Accurate Context-Free Parsing with Combinatory Categorial Grammar
Author: Timothy A. D. Fowler ; Gerald Penn
Abstract: The definition of combinatory categorial grammar (CCG) in the literature varies quite a bit from author to author. However, the differences between the definitions are important in terms of the language classes of each CCG. We prove that a wide range of CCGs are strongly context-free, including the CCG of CCGbank and of the parser of Clark and Curran (2007). In light of these new results, we train the PCFG parser of Petrov and Klein (2007) on CCGbank and achieve state of the art results in supertagging accuracy, PARSEVAL measures and dependency accuracy.
4 0.91253698 237 acl-2010-Topic Models for Word Sense Disambiguation and Token-Based Idiom Detection
Author: Linlin Li ; Benjamin Roth ; Caroline Sporleder
Abstract: This paper presents a probabilistic model for sense disambiguation which chooses the best sense based on the conditional probability of sense paraphrases given a context. We use a topic model to decompose this conditional probability into two conditional probabilities with latent variables. We propose three different instantiations of the model for solving sense disambiguation problems with different degrees of resource availability. The proposed models are tested on three different tasks: coarse-grained word sense disambiguation, fine-grained word sense disambiguation, and detection of literal vs. nonliteral usages of potentially idiomatic expressions. In all three cases, we outper- form state-of-the-art systems either quantitatively or statistically significantly.
5 0.91149682 211 acl-2010-Simple, Accurate Parsing with an All-Fragments Grammar
Author: Mohit Bansal ; Dan Klein
Abstract: We present a simple but accurate parser which exploits both large tree fragments and symbol refinement. We parse with all fragments of the training set, in contrast to much recent work on tree selection in data-oriented parsing and treesubstitution grammar learning. We require only simple, deterministic grammar symbol refinement, in contrast to recent work on latent symbol refinement. Moreover, our parser requires no explicit lexicon machinery, instead parsing input sentences as character streams. Despite its simplicity, our parser achieves accuracies of over 88% F1 on the standard English WSJ task, which is competitive with substantially more complicated state-of-theart lexicalized and latent-variable parsers. Additional specific contributions center on making implicit all-fragments parsing efficient, including a coarse-to-fine inference scheme and a new graph encoding.
6 0.91079831 71 acl-2010-Convolution Kernel over Packed Parse Forest
7 0.90693915 172 acl-2010-Minimized Models and Grammar-Informed Initialization for Supertagging with Highly Ambiguous Lexicons
8 0.90119302 218 acl-2010-Structural Semantic Relatedness: A Knowledge-Based Method to Named Entity Disambiguation
9 0.90112758 261 acl-2010-Wikipedia as Sense Inventory to Improve Diversity in Web Search Results
10 0.89880276 120 acl-2010-Fully Unsupervised Core-Adjunct Argument Classification
11 0.89804828 46 acl-2010-Bayesian Synchronous Tree-Substitution Grammar Induction and Its Application to Sentence Compression
12 0.89607441 114 acl-2010-Faster Parsing by Supertagger Adaptation
13 0.89283073 130 acl-2010-Hard Constraints for Grammatical Function Labelling
14 0.89127886 248 acl-2010-Unsupervised Ontology Induction from Text
15 0.89030129 115 acl-2010-Filtering Syntactic Constraints for Statistical Machine Translation
16 0.89007807 69 acl-2010-Constituency to Dependency Translation with Forests
17 0.88903409 9 acl-2010-A Joint Rule Selection Model for Hierarchical Phrase-Based Translation
18 0.88808131 162 acl-2010-Learning Common Grammar from Multilingual Corpus
19 0.88725817 109 acl-2010-Experiments in Graph-Based Semi-Supervised Learning Methods for Class-Instance Acquisition
20 0.88645375 70 acl-2010-Contextualizing Semantic Representations Using Syntactically Enriched Vector Models