acl acl2011 acl2011-217 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Taro Watanabe ; Eiichiro Sumita
Abstract: The state-of-the-art system combination method for machine translation (MT) is based on confusion networks constructed by aligning hypotheses with regard to word similarities. We introduce a novel system combination framework in which hypotheses are encoded as a confusion forest, a packed forest representing alternative trees. The forest is generated using syntactic consensus among parsed hypotheses: First, MT outputs are parsed. Second, a context free grammar is learned by extracting a set of rules that constitute the parse trees. Third, a packed forest is generated starting from the root symbol of the extracted grammar through non-terminal rewriting. The new hypothesis is produced by searching the best derivation in the forest. Experimental results on the WMT10 system combination shared task yield comparable performance to the conventional confusion network based method with smaller space.
Reference: text
sentIndex sentText sentNum sentScore
1 jp Abstract The state-of-the-art system combination method for machine translation (MT) is based on confusion networks constructed by aligning hypotheses with regard to word similarities. [sent-5, score-0.951]
2 We introduce a novel system combination framework in which hypotheses are encoded as a confusion forest, a packed forest representing alternative trees. [sent-6, score-1.371]
3 The forest is generated using syntactic consensus among parsed hypotheses: First, MT outputs are parsed. [sent-7, score-0.796]
4 Third, a packed forest is generated starting from the root symbol of the extracted grammar through non-terminal rewriting. [sent-9, score-0.76]
5 Experimental results on the WMT10 system combination shared task yield comparable performance to the conventional confusion network based method with smaller space. [sent-11, score-0.717]
6 1 Introduction System combination techniques take the advantages of consensus among multiple systems and have been widely used in fields, such as speech recognition (Fiscus, 1997; Mangu et al. [sent-12, score-0.242]
7 One of the state-of-the-art system combination methods for MT is based on confusion networks, which are compact graph-based structures representing multiple hypotheses (Bangalore et al. [sent-14, score-0.734]
8 First, one skeleton or 1249 backbone sentence is selected. [sent-17, score-0.161]
9 Then, other hypotheses are aligned against the skeleton, forming a lattice with each arc representing alternative word candidates. [sent-18, score-0.27]
10 , 2007) in which alignment is measured by an evaluation metric, such as translation er- ror rate (TER) (Snover et al. [sent-22, score-0.151]
11 The new translation hypothesis is generated by selecting the best path through the network. [sent-24, score-0.182]
12 We present a novel method for system combination which exploits the syntactic similarity of system outputs. [sent-25, score-0.162]
13 Instead of constructing a string-based confusion network, we generate a packed forest (Billot and Lang, 1989; Mi et al. [sent-26, score-1.05]
14 The packed forest, or confusionforest, is constructed by merging the MT outputs with regard to their syntactic consensus. [sent-28, score-0.305]
15 We employ a grammar-based method to generate the confusion forest: First, system outputs are parsed. [sent-29, score-0.529]
16 Third, a packed forest is generated using a variant of Earley’s algorithm (Earley, 1970) starting from the unique root symbol. [sent-31, score-0.666]
17 New hypotheses are selected by searching the best derivation in the forest. [sent-32, score-0.198]
18 Spurious ambiguity during the generation step is further reduced by encoding the tree local contextual information in each non-terminal symbol, such as parent and sibling labels, using the state representation in Earley’s algorithm. [sent-34, score-0.144]
19 , 2010), a Snpda we hfo}-utnodcomparable performance to the conventional confusion network based system combination in two language pairs, and statistically significant improvements in the others. [sent-38, score-0.717]
20 First, we will review the state-of-the-art method which is a system combination framework based on confusion networks (§2). [sent-39, score-0.611]
21 ion T hmenet,h woed wbaisleld in on confusion forest (§3) and present related work in consensus ftroarensstla (ti§o3n)s a (§4). [sent-41, score-1.079]
22 2 Combination by Confusion Network The system combination framework based on confusion network starts from computing pairwise align- ment between hypotheses by taking one hypothesis as a reference. [sent-43, score-1.025]
23 Other hypotheses are aligned against the skeleton using the pairwise alignment. [sent-51, score-0.397]
24 Figure 1(b) illustrates an example of a confusion network constructed from the four hypotheses in Figure 1(a), assuming the first hypothesis is selected as our skeleton. [sent-52, score-0.908]
25 The network consists of several arcs, each of which represents an alternative word at that position, including the empty symbol, ϵ. [sent-53, score-0.181]
26 This pairwise alignment strategy is prone to spurious insertions and repetitions due to alignment errors such as in Figure 1(a) in which “green” in the third hypothesis is aligned with “forest” in the skeleton. [sent-54, score-0.272]
27 (2008) introduces an incremental method so that hypotheses are aligned incremen- tally to the growing confusion network, not only the 1250 *. [sent-56, score-0.645]
28 nd (a) Pairwise alignment using the first starred hypothesis as a skeleton. [sent-77, score-0.125]
29 green ϵ walked ϵ (b) Confusion network from (a) I saw . [sent-91, score-0.309]
30 ϵ walked trees ϵ ϵ (c) Incrementally constructed confusion network Figure 1: An example confusion network construction skeleton hypothesis. [sent-103, score-1.465]
31 The confusion network construction is largely influenced by the skeleton selection, which determines the global word reordering of a new hypothesis. [sent-105, score-0.79]
32 This large grammatical difference may produce a longer sentence with spuriously inserted words, as in “I saw the blue trees was found” in Figure 1(c). [sent-107, score-0.146]
33 (2007b) partially resolved the problem by constructing a large network in which each hypothesis was treated as a skeleton and the multiple networks were merged into a single network. [sent-109, score-0.496]
34 3 Combination by Confusion Forest The confusion network approach to system combination encodes multiple hypotheses into a compact lattice structure by using word-level consensus. [sent-110, score-0.953]
35 Likewise, we propose to encode multiple hypotheses into a confusion forest, which is a packed forest which represents multiple parse trees in a polyno- mial space (Billot and Lang, 1989; Mi et al. [sent-111, score-1.33]
36 , 2008) Syntactic consensus is realized by sharing tree fragS@. [sent-112, score-0.227]
37 es Figure 2: An example packed forest representing hypotheses in Figure 1(a). [sent-150, score-0.835]
38 The forest is represented as a hypergraph which is exploited in parsing (Klein and Manning, 2001 ; Huang and Chiang, 2005) and machine translation (Chiang, 2007; Huang and Chiang, 2007). [sent-152, score-0.708]
39 Figure 2 presents an example packed forest for the parsed hypotheses in Figure 1(a). [sent-159, score-0.888]
40 Given system outputs, we employ the following grammar based approach for constructing a confusion forest: First, MT outputs are parsed. [sent-163, score-0.568]
41 Third, a forest is generated from the unique root symbol of the extracted grammar through non-terminal rewriting. [sent-165, score-0.631]
42 Figure 3 presents the deductive inference rules (Goodman, 1999) for our generation algorithm. [sent-168, score-0.127]
43 Thus, the height of the forest is constrained in the prediction step not to exceed H, which is empirically set to 1. [sent-177, score-0.577]
44 5 times the maximum height of the parsed system outputs. [sent-178, score-0.133]
45 2 Tree Annotation The grammar compiled from the parsed trees is local in that it can represent a finite number of sentences translated from a specific input sentence. [sent-180, score-0.132]
46 Figure 4(a) presents an example parse tree with each symbol replaced by the Earley’s state in Figure 4(b). [sent-184, score-0.164]
47 e Tr hliem citoendt by the vertical and horizontal Markovization (Klein and Manning, 2003). [sent-187, score-0.213]
48 We define the vertical order v in which the label is limited to memorize only v previous prediction steps. [sent-188, score-0.149]
49 No limits in the horizontal and vertical Markovization orders implies memorizing of all the deductions and yields a confusion forest representing the union of parse trees through the grammar collection and the generation processes. [sent-193, score-1.336]
50 More relaxed horizontal orders allow more reordering of subtrees in a confusion forest by discarding the sibling context in each prediction step. [sent-194, score-1.151]
51 Likewise, constraining the vertical order generates a deeper forest by ignoring the sequence of symbols leading to a particular node. [sent-195, score-0.629]
52 3 Forest Rescoring From the packed forest F, new k-best derivations are extracted from all possible derivations D by efficient forest-based algorithms for k-best parsing (Huang and Chiang, 2005). [sent-197, score-0.744]
53 the forest (b) Earley’s state annotated tree for (a). [sent-230, score-0.575]
54 Then, k-best derivations are extracted from the rescored forest using algorithm 3 of Huang and Chiang (2005). [sent-234, score-0.542]
55 One of the simplest forms is a sentence-based combination in which hypotheses are simply reranked without merging (Nomoto, 2004). [sent-236, score-0.343]
56 Frederking and Nirenburg (1994) proposed a phrasal combination by merging hypotheses in a chart structure, while others depended on confusion networks, or similar structures, as a building block for merging hypotheses at the word level (Bangalore et al. [sent-237, score-1.045]
57 Our work is the first to explicitly exploit syntactic similarity for system combination by merging hypotheses into a syntactic packed forest. [sent-242, score-0.511]
58 The confusion forest approach may suffer from parsing errors such as the confusion network construction influenced by alignment errors. [sent-243, score-1.634]
59 Even with parsing errors, we can still take a tree fragment-level consensus as long as a parser is consistent in that similar syntactic mistakes would be made for similar hypotheses. [sent-244, score-0.236]
60 (2007a) describe a re-generation approach to consensus translation in which a phrasal translation table is constructed from the MT outputs aligned with an input source sentence. [sent-246, score-0.545]
61 Instead of generating forests from semantic representations (Langkilde, 2000), we generate forests from a CFG encoding the consensus among parsed hypotheses. [sent-250, score-0.359]
62 (2009) present joint decoding in which a translation forest is constructed from two distinct MT systems, tree-to-string and string-to-string, by merging forest outputs. [sent-252, score-1.254]
63 Their merging method is either translation-level in which no new translation is generated, or derivation-level in that the rules sharing the same left-hand-side are used in both systems. [sent-253, score-0.226]
64 While our work is similar in that a new forest is constructed by sharing rules among systems, although their work involves no consensus translation and requires structures internal to each system such as model combinations (DeNero et al. [sent-254, score-0.908]
65 The system outputs are retokenized to match the Penn-treebank standard, parsed by the Stanford Parser (Klein and Manning, 2003), and lower-cased. [sent-265, score-0.169]
66 We implemented our confusion forest system combination using an in-house developed hypergraph-based toolkit cicada which is motivated by generic weighted logic programming (Lopez, 2009), originally developed for a synchronous-CFG based machine translation system (Chiang, 2007). [sent-266, score-1.228]
67 Input to our system is a collection of hypergraphs, a set of parsed hypotheses, from which rules are extracted and a new forest is generated as described in Section 3. [sent-267, score-0.631]
68 Our baseline, also implemented in cicada, is a confusion network-based system combination method (§2) which incrementally aligns hypotheses ettoh othde growing nhe itnwcorrekm using yT aElRig (Rosti et al. [sent-268, score-0.734]
69 After performing epsilon removal, the network is transformed into a forest by parsing with monotone rules of S → X, S → S X apnards iXng → x. [sent-270, score-0.759]
70 We employ M confidence measures hsm(d) for M systems, which basically count the number of rules used in d originally extracted from mth system hypothesis (Rosti et al. [sent-278, score-0.178]
71 Our baseline confusion network system has an additional penalty feature, hp(m), which is the total edits required to construct a confusion network using the mth system hypothesis as a skeleton, normalized by the number of nodes in the network (Rosti et al. [sent-288, score-1.555]
72 3 Results Table 2 compares our confusion forest approach (CF) with different orders, a confusion network (CN) and max/min systems measured by BLEU (Papineni et al. [sent-291, score-1.515]
73 We vary the horizontal orders, h = 1, 2, ∞ with vertical orders of v = 3, 4, ∞. [sent-293, score-0.255]
74 In general, lower horizontal and vertical order leads to lower BLEU. [sent-300, score-0.213]
75 CN for confusion network and CF for confusion forest with different vertical (v) and horizontal (h) Markovization order. [sent-354, score-1.728]
76 The gains achievable by the CF over simple reranking are small, at most 2-3 points, indicating that small variations are encoded in confusion forests. [sent-356, score-0.441]
77 We also observed that a lower horizontal and vertical order leads to better BLEU potentials. [sent-357, score-0.213]
78 2, the higher horizontal and vertical order implies more faithfulness to the original parse trees. [sent-359, score-0.255]
79 Introducing new tree fragments to confusion forests leads to new phrasal translations with enlarged forests, as presented in Table 4, measured by the average number langcz-ende-enes-enfr-en CN2,222. [sent-360, score-0.594]
80 The low potential in German should be interpreted in the light of the extremely large confusion network in Table 4. [sent-387, score-0.594]
81 We postulate that the divergence in German hypotheses yields wrong alignments, and therefore amounts to larger networks with incorrect hypotheses. [sent-388, score-0.273]
82 Table 4 also shows that CN produces a forest that is an order of magnitude larger than those created by CFs. [sent-389, score-0.508]
83 Although we cannot directly relate the runtime and the number of hyperedges in CN and CFs, since the shape of the forests are different, CN requires more space to encode the hypotheses than those by CFs. [sent-390, score-0.331]
84 CN may generate shorter hypotheses, whereby CF prefers longer hypotheses as we decrease the vertical order. [sent-392, score-0.319]
85 6 Conclusion We presented a confusion forest based method for system combination in which system outputs are merged into a packed forest using their syntactic 3We measure the hypergraph size before intersecting with non-local features, like n-gram language models. [sent-394, score-1.883]
86 The forest construction is treated as a generation from a CFG compiled from the parsed outputs. [sent-397, score-0.633]
87 Our experiments indicate comparable per- formance to a strong confusion network baseline with smaller space, and statistically significant gains in some language pairs. [sent-398, score-0.594]
88 To our knowledge, this is the first work to directly introduce syntactic consensus to system combination by encoding multiple system outputs into a single forest structure. [sent-399, score-0.905]
89 We believe that the confusion forest based approach to system combination has future exploration potential. [sent-400, score-1.044]
90 2 which would be helpful in discriminating hypotheses in larger forests. [sent-402, score-0.198]
91 We would also like to analyze the trade-offs, if any, between parsing errors and confusion forest constructions by controlling the parsing qualities. [sent-403, score-0.999]
92 As an alternative to the grammar-based forest generation, we are investigating an edit distance measure for tree alignment, such as tree edit distance (Bille, 2005) which basically computes insertion/deletion/replacement of nodes in trees. [sent-404, score-0.642]
93 Indirect-HMM-based hypothesis alignment for combining outputs from machine translation systems. [sent-458, score-0.306]
94 Efficient minimum error rate training and minimum bayes-risk decoding for translation hypergraphs and lattices. [sent-489, score-0.18]
95 An empirical study on computing consensus translations from multiple machine translation systems. [sent-514, score-0.3]
96 Finding consensus in speech recognition: word error minimization and other applications of confusion networks. [sent-518, score-0.571]
97 Computing consensus translation from multiple machine translation systems using enhanced hypotheses alignment. [sent-522, score-0.564]
98 Incremental hypothesis alignment for building confusion networks with application to machine translation system combination. [sent-550, score-0.756]
99 Consensus network decoding for statistical machine translation system combination. [sent-571, score-0.359]
100 A study of translation edit rate with targeted human annotation. [sent-575, score-0.132]
wordName wordTfidf (topN-words)
[('forest', 0.508), ('confusion', 0.413), ('cfv', 0.225), ('hypotheses', 0.198), ('network', 0.181), ('earley', 0.17), ('consensus', 0.158), ('rosti', 0.14), ('skeleton', 0.133), ('packed', 0.129), ('vertical', 0.121), ('translation', 0.104), ('snp', 0.102), ('vp', 0.093), ('horizontal', 0.092), ('combination', 0.084), ('hypothesis', 0.078), ('outputs', 0.077), ('cf', 0.077), ('networks', 0.075), ('forests', 0.074), ('npp', 0.072), ('matusov', 0.066), ('cn', 0.066), ('vbd', 0.064), ('langkilde', 0.062), ('jayaraman', 0.061), ('merging', 0.061), ('hyperedges', 0.059), ('bleu', 0.058), ('sim', 0.057), ('hypergraph', 0.057), ('deductive', 0.057), ('chiang', 0.056), ('symbol', 0.055), ('billot', 0.054), ('spuriously', 0.054), ('parsed', 0.053), ('saw', 0.052), ('huang', 0.049), ('alignment', 0.047), ('markovization', 0.044), ('green', 0.043), ('matsoukas', 0.043), ('spyros', 0.043), ('orders', 0.042), ('parse', 0.042), ('cfg', 0.041), ('cicada', 0.041), ('vnbpd', 0.041), ('height', 0.041), ('hypergraphs', 0.041), ('trees', 0.04), ('tree', 0.039), ('np', 0.039), ('generation', 0.039), ('system', 0.039), ('grammar', 0.039), ('parsing', 0.039), ('german', 0.038), ('sibling', 0.038), ('macherey', 0.038), ('mt', 0.038), ('lattice', 0.038), ('constructed', 0.038), ('translations', 0.038), ('irene', 0.036), ('mangu', 0.036), ('frederking', 0.036), ('tails', 0.036), ('aligner', 0.035), ('decoding', 0.035), ('aligned', 0.034), ('derivations', 0.034), ('bangalore', 0.034), ('spurious', 0.034), ('walked', 0.033), ('ovf', 0.033), ('construction', 0.033), ('pairwise', 0.032), ('dreyer', 0.031), ('rules', 0.031), ('kumar', 0.03), ('sharing', 0.03), ('reordering', 0.03), ('phrasal', 0.03), ('asru', 0.03), ('mth', 0.03), ('snover', 0.029), ('root', 0.029), ('merged', 0.029), ('czech', 0.029), ('klein', 0.029), ('rescoring', 0.028), ('backbone', 0.028), ('hyperedge', 0.028), ('achievable', 0.028), ('edit', 0.028), ('prediction', 0.028), ('state', 0.028)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999982 217 acl-2011-Machine Translation System Combination by Confusion Forest
Author: Taro Watanabe ; Eiichiro Sumita
Abstract: The state-of-the-art system combination method for machine translation (MT) is based on confusion networks constructed by aligning hypotheses with regard to word similarities. We introduce a novel system combination framework in which hypotheses are encoded as a confusion forest, a packed forest representing alternative trees. The forest is generated using syntactic consensus among parsed hypotheses: First, MT outputs are parsed. Second, a context free grammar is learned by extracting a set of rules that constitute the parse trees. Third, a packed forest is generated starting from the root symbol of the extracted grammar through non-terminal rewriting. The new hypothesis is produced by searching the best derivation in the forest. Experimental results on the WMT10 system combination shared task yield comparable performance to the conventional confusion network based method with smaller space.
2 0.33189252 110 acl-2011-Effective Use of Function Words for Rule Generalization in Forest-Based Translation
Author: Xianchao Wu ; Takuya Matsuzaki ; Jun'ichi Tsujii
Abstract: In the present paper, we propose the effective usage of function words to generate generalized translation rules for forest-based translation. Given aligned forest-string pairs, we extract composed tree-to-string translation rules that account for multiple interpretations of both aligned and unaligned target function words. In order to constrain the exhaustive attachments of function words, we limit to bind them to the nearby syntactic chunks yielded by a target dependency parser. Therefore, the proposed approach can not only capture source-tree-to-target-chunk correspondences but can also use forest structures that compactly encode an exponential number of parse trees to properly generate target function words during decoding. Extensive experiments involving large-scale English-toJapanese translation revealed a significant im- provement of 1.8 points in BLEU score, as compared with a strong forest-to-string baseline system.
3 0.25597712 61 acl-2011-Binarized Forest to String Translation
Author: Hao Zhang ; Licheng Fang ; Peng Xu ; Xiaoyun Wu
Abstract: Tree-to-string translation is syntax-aware and efficient but sensitive to parsing errors. Forestto-string translation approaches mitigate the risk of propagating parser errors into translation errors by considering a forest of alternative trees, as generated by a source language parser. We propose an alternative approach to generating forests that is based on combining sub-trees within the first best parse through binarization. Provably, our binarization forest can cover any non-consitituent phrases in a sentence but maintains the desirable property that for each span there is at most one nonterminal so that the grammar constant for decoding is relatively small. For the purpose of reducing search errors, we apply the synchronous binarization technique to forest-tostring decoding. Combining the two techniques, we show that using a fast shift-reduce parser we can achieve significant quality gains in NIST 2008 English-to-Chinese track (1.3 BLEU points over a phrase-based system, 0.8 BLEU points over a hierarchical phrase-based system). Consistent and significant gains are also shown in WMT 2010 in the English to German, French, Spanish and Czech tracks.
4 0.20740314 30 acl-2011-Adjoining Tree-to-String Translation
Author: Yang Liu ; Qun Liu ; Yajuan Lu
Abstract: We introduce synchronous tree adjoining grammars (TAG) into tree-to-string translation, which converts a source tree to a target string. Without reconstructing TAG derivations explicitly, our rule extraction algorithm directly learns tree-to-string rules from aligned Treebank-style trees. As tree-to-string translation casts decoding as a tree parsing problem rather than parsing, the decoder still runs fast when adjoining is included. Less than 2 times slower, the adjoining tree-tostring system improves translation quality by +0.7 BLEU over the baseline system only allowing for tree substitution on NIST ChineseEnglish test sets.
5 0.18091027 166 acl-2011-Improving Decoding Generalization for Tree-to-String Translation
Author: Jingbo Zhu ; Tong Xiao
Abstract: To address the parse error issue for tree-tostring translation, this paper proposes a similarity-based decoding generation (SDG) solution by reconstructing similar source parse trees for decoding at the decoding time instead of taking multiple source parse trees as input for decoding. Experiments on Chinese-English translation demonstrated that our approach can achieve a significant improvement over the standard method, and has little impact on decoding speed in practice. Our approach is very easy to implement, and can be applied to other paradigms such as tree-to-tree models. 1
6 0.17606963 155 acl-2011-Hypothesis Mixture Decoding for Statistical Machine Translation
8 0.14272368 268 acl-2011-Rule Markov Models for Fast Tree-to-String Translation
9 0.13615757 171 acl-2011-Incremental Syntactic Language Models for Phrase-based Translation
10 0.13400264 220 acl-2011-Minimum Bayes-risk System Combination
11 0.13207509 202 acl-2011-Learning Hierarchical Translation Structure with Linguistic Annotations
12 0.1086195 152 acl-2011-How Much Can We Gain from Supervised Word Alignment?
13 0.10173962 339 acl-2011-Word Alignment Combination over Multiple Word Segmentation
14 0.091323912 123 acl-2011-Exact Decoding of Syntactic Translation Models through Lagrangian Relaxation
15 0.089341938 174 acl-2011-Insights from Network Structure for Text Mining
16 0.084580638 29 acl-2011-A Word-Class Approach to Labeling PSCFG Rules for Machine Translation
17 0.081540368 90 acl-2011-Crowdsourcing Translation: Professional Quality from Non-Professionals
18 0.08130303 313 acl-2011-Two Easy Improvements to Lexical Weighting
19 0.079696901 16 acl-2011-A Joint Sequence Translation Model with Integrated Reordering
20 0.079476722 216 acl-2011-MEANT: An inexpensive, high-accuracy, semi-automatic metric for evaluating translation utility based on semantic roles
topicId topicWeight
[(0, 0.223), (1, -0.211), (2, 0.104), (3, -0.019), (4, 0.023), (5, 0.031), (6, -0.166), (7, -0.064), (8, -0.037), (9, -0.032), (10, -0.059), (11, -0.036), (12, 0.001), (13, -0.015), (14, -0.006), (15, -0.055), (16, 0.024), (17, -0.019), (18, -0.043), (19, 0.017), (20, -0.031), (21, -0.043), (22, 0.01), (23, 0.14), (24, 0.078), (25, -0.108), (26, 0.025), (27, 0.095), (28, -0.044), (29, 0.032), (30, -0.085), (31, -0.111), (32, -0.033), (33, 0.027), (34, -0.006), (35, 0.033), (36, -0.126), (37, -0.025), (38, -0.107), (39, -0.13), (40, 0.027), (41, 0.05), (42, -0.019), (43, 0.04), (44, 0.03), (45, 0.039), (46, -0.004), (47, -0.096), (48, -0.01), (49, 0.008)]
simIndex simValue paperId paperTitle
same-paper 1 0.9344579 217 acl-2011-Machine Translation System Combination by Confusion Forest
Author: Taro Watanabe ; Eiichiro Sumita
Abstract: The state-of-the-art system combination method for machine translation (MT) is based on confusion networks constructed by aligning hypotheses with regard to word similarities. We introduce a novel system combination framework in which hypotheses are encoded as a confusion forest, a packed forest representing alternative trees. The forest is generated using syntactic consensus among parsed hypotheses: First, MT outputs are parsed. Second, a context free grammar is learned by extracting a set of rules that constitute the parse trees. Third, a packed forest is generated starting from the root symbol of the extracted grammar through non-terminal rewriting. The new hypothesis is produced by searching the best derivation in the forest. Experimental results on the WMT10 system combination shared task yield comparable performance to the conventional confusion network based method with smaller space.
2 0.83123195 110 acl-2011-Effective Use of Function Words for Rule Generalization in Forest-Based Translation
Author: Xianchao Wu ; Takuya Matsuzaki ; Jun'ichi Tsujii
Abstract: In the present paper, we propose the effective usage of function words to generate generalized translation rules for forest-based translation. Given aligned forest-string pairs, we extract composed tree-to-string translation rules that account for multiple interpretations of both aligned and unaligned target function words. In order to constrain the exhaustive attachments of function words, we limit to bind them to the nearby syntactic chunks yielded by a target dependency parser. Therefore, the proposed approach can not only capture source-tree-to-target-chunk correspondences but can also use forest structures that compactly encode an exponential number of parse trees to properly generate target function words during decoding. Extensive experiments involving large-scale English-toJapanese translation revealed a significant im- provement of 1.8 points in BLEU score, as compared with a strong forest-to-string baseline system.
3 0.82485813 166 acl-2011-Improving Decoding Generalization for Tree-to-String Translation
Author: Jingbo Zhu ; Tong Xiao
Abstract: To address the parse error issue for tree-tostring translation, this paper proposes a similarity-based decoding generation (SDG) solution by reconstructing similar source parse trees for decoding at the decoding time instead of taking multiple source parse trees as input for decoding. Experiments on Chinese-English translation demonstrated that our approach can achieve a significant improvement over the standard method, and has little impact on decoding speed in practice. Our approach is very easy to implement, and can be applied to other paradigms such as tree-to-tree models. 1
4 0.77620101 30 acl-2011-Adjoining Tree-to-String Translation
Author: Yang Liu ; Qun Liu ; Yajuan Lu
Abstract: We introduce synchronous tree adjoining grammars (TAG) into tree-to-string translation, which converts a source tree to a target string. Without reconstructing TAG derivations explicitly, our rule extraction algorithm directly learns tree-to-string rules from aligned Treebank-style trees. As tree-to-string translation casts decoding as a tree parsing problem rather than parsing, the decoder still runs fast when adjoining is included. Less than 2 times slower, the adjoining tree-tostring system improves translation quality by +0.7 BLEU over the baseline system only allowing for tree substitution on NIST ChineseEnglish test sets.
5 0.73327088 268 acl-2011-Rule Markov Models for Fast Tree-to-String Translation
Author: Ashish Vaswani ; Haitao Mi ; Liang Huang ; David Chiang
Abstract: Most statistical machine translation systems rely on composed rules (rules that can be formed out of smaller rules in the grammar). Though this practice improves translation by weakening independence assumptions in the translation model, it nevertheless results in huge, redundant grammars, making both training and decoding inefficient. Here, we take the opposite approach, where we only use minimal rules (those that cannot be formed out of other rules), and instead rely on a rule Markov model of the derivation history to capture dependencies between minimal rules. Large-scale experiments on a state-of-the-art tree-to-string translation system show that our approach leads to a slimmer model, a faster decoder, yet the same translation quality (measured using B ) as composed rules.
6 0.72714841 206 acl-2011-Learning to Transform and Select Elementary Trees for Improved Syntax-based Machine Translations
7 0.70977509 61 acl-2011-Binarized Forest to String Translation
8 0.68823808 155 acl-2011-Hypothesis Mixture Decoding for Statistical Machine Translation
9 0.68345165 290 acl-2011-Syntax-based Statistical Machine Translation using Tree Automata and Tree Transducers
10 0.6776731 220 acl-2011-Minimum Bayes-risk System Combination
11 0.60479593 154 acl-2011-How to train your multi bottom-up tree transducer
12 0.60139763 171 acl-2011-Incremental Syntactic Language Models for Phrase-based Translation
13 0.56175274 123 acl-2011-Exact Decoding of Syntactic Translation Models through Lagrangian Relaxation
14 0.54155046 330 acl-2011-Using Derivation Trees for Treebank Error Detection
15 0.50127482 202 acl-2011-Learning Hierarchical Translation Structure with Linguistic Annotations
16 0.49968025 173 acl-2011-Insertion Operator for Bayesian Tree Substitution Grammars
17 0.49538431 28 acl-2011-A Statistical Tree Annotator and Its Applications
18 0.47385153 106 acl-2011-Dual Decomposition for Natural Language Processing
19 0.44883189 146 acl-2011-Goodness: A Method for Measuring Machine Translation Confidence
20 0.43074679 235 acl-2011-Optimal and Syntactically-Informed Decoding for Monolingual Phrase-Based Alignment
topicId topicWeight
[(5, 0.026), (17, 0.097), (26, 0.025), (28, 0.01), (31, 0.012), (37, 0.072), (39, 0.075), (41, 0.048), (55, 0.023), (59, 0.025), (62, 0.26), (72, 0.036), (91, 0.043), (96, 0.17)]
simIndex simValue paperId paperTitle
1 0.92701477 74 acl-2011-Combining Indicators of Allophony
Author: Luc Boruta
Abstract: Allophonic rules are responsible for the great variety in phoneme realizations. Infants can not reliably infer abstract word representations without knowledge of their native allophonic grammar. We explore the hypothesis that some properties of infants’ input, referred to as indicators, are correlated with allophony. First, we provide an extensive evaluation of individual indicators that rely on distributional or lexical information. Then, we present a first evaluation of the combination of indicators of different types, considering both logical and numerical combinations schemes. Though distributional and lexical indicators are not redundant, straightforward combinations do not outperform individual indicators.
2 0.86622149 338 acl-2011-Wikulu: An Extensible Architecture for Integrating Natural Language Processing Techniques with Wikis
Author: Daniel Bar ; Nicolai Erbs ; Torsten Zesch ; Iryna Gurevych
Abstract: We present Wikulu1, a system focusing on supporting wiki users with their everyday tasks by means of an intelligent interface. Wikulu is implemented as an extensible architecture which transparently integrates natural language processing (NLP) techniques with wikis. It is designed to be deployed with any wiki platform, and the current prototype integrates a wide range of NLP algorithms such as keyphrase extraction, link discovery, text segmentation, summarization, or text similarity. Additionally, we show how Wikulu can be applied for visually analyzing the results of NLP algorithms, educational purposes, and enabling semantic wikis.
same-paper 3 0.82117033 217 acl-2011-Machine Translation System Combination by Confusion Forest
Author: Taro Watanabe ; Eiichiro Sumita
Abstract: The state-of-the-art system combination method for machine translation (MT) is based on confusion networks constructed by aligning hypotheses with regard to word similarities. We introduce a novel system combination framework in which hypotheses are encoded as a confusion forest, a packed forest representing alternative trees. The forest is generated using syntactic consensus among parsed hypotheses: First, MT outputs are parsed. Second, a context free grammar is learned by extracting a set of rules that constitute the parse trees. Third, a packed forest is generated starting from the root symbol of the extracted grammar through non-terminal rewriting. The new hypothesis is produced by searching the best derivation in the forest. Experimental results on the WMT10 system combination shared task yield comparable performance to the conventional confusion network based method with smaller space.
4 0.76756471 325 acl-2011-Unsupervised Word Alignment with Arbitrary Features
Author: Chris Dyer ; Jonathan H. Clark ; Alon Lavie ; Noah A. Smith
Abstract: We introduce a discriminatively trained, globally normalized, log-linear variant of the lexical translation models proposed by Brown et al. (1993). In our model, arbitrary, nonindependent features may be freely incorporated, thereby overcoming the inherent limitation of generative models, which require that features be sensitive to the conditional independencies of the generative process. However, unlike previous work on discriminative modeling of word alignment (which also permits the use of arbitrary features), the parameters in our models are learned from unannotated parallel sentences, rather than from supervised word alignments. Using a variety of intrinsic and extrinsic measures, including translation performance, we show our model yields better alignments than generative baselines in a number of language pairs.
5 0.75099778 45 acl-2011-Aspect Ranking: Identifying Important Product Aspects from Online Consumer Reviews
Author: Jianxing Yu ; Zheng-Jun Zha ; Meng Wang ; Tat-Seng Chua
Abstract: In this paper, we dedicate to the topic of aspect ranking, which aims to automatically identify important product aspects from online consumer reviews. The important aspects are identified according to two observations: (a) the important aspects of a product are usually commented by a large number of consumers; and (b) consumers’ opinions on the important aspects greatly influence their overall opinions on the product. In particular, given consumer reviews of a product, we first identify the product aspects by a shallow dependency parser and determine consumers’ opinions on these aspects via a sentiment classifier. We then develop an aspect ranking algorithm to identify the important aspects by simultaneously considering the aspect frequency and the influence of consumers’ opinions given to each aspect on their overall opinions. The experimental results on 11 popular products in four domains demonstrate the effectiveness of our approach. We further apply the aspect ranking results to the application ofdocumentlevel sentiment classification, and improve the performance significantly.
6 0.67090809 30 acl-2011-Adjoining Tree-to-String Translation
7 0.66402942 154 acl-2011-How to train your multi bottom-up tree transducer
8 0.65254605 241 acl-2011-Parsing the Internal Structure of Words: A New Paradigm for Chinese Word Segmentation
9 0.64782953 137 acl-2011-Fine-Grained Class Label Markup of Search Queries
10 0.64773566 61 acl-2011-Binarized Forest to String Translation
11 0.64500511 327 acl-2011-Using Bilingual Parallel Corpora for Cross-Lingual Textual Entailment
12 0.64447749 121 acl-2011-Event Discovery in Social Media Feeds
13 0.64425498 32 acl-2011-Algorithm Selection and Model Adaptation for ESL Correction Tasks
14 0.64403975 28 acl-2011-A Statistical Tree Annotator and Its Applications
15 0.64265913 117 acl-2011-Entity Set Expansion using Topic information
16 0.64145434 202 acl-2011-Learning Hierarchical Translation Structure with Linguistic Annotations
17 0.64134932 254 acl-2011-Putting it Simply: a Context-Aware Approach to Lexical Simplification
18 0.64118314 15 acl-2011-A Hierarchical Pitman-Yor Process HMM for Unsupervised Part of Speech Induction
19 0.64097089 11 acl-2011-A Fast and Accurate Method for Approximate String Search
20 0.64030468 141 acl-2011-Gappy Phrasal Alignment By Agreement