acl acl2011 acl2011-123 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Alexander M. Rush ; Michael Collins
Abstract: We describe an exact decoding algorithm for syntax-based statistical translation. The approach uses Lagrangian relaxation to decompose the decoding problem into tractable subproblems, thereby avoiding exhaustive dynamic programming. The method recovers exact solutions, with certificates of optimality, on over 97% of test examples; it has comparable speed to state-of-the-art decoders.
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract We describe an exact decoding algorithm for syntax-based statistical translation. [sent-4, score-0.295]
2 The approach uses Lagrangian relaxation to decompose the decoding problem into tractable subproblems, thereby avoiding exhaustive dynamic programming. [sent-5, score-0.549]
3 , 2008)) corresponds to the intersection of a (weighted) hypergraph with an n-gram language The hypergraph rep- model. [sent-12, score-0.698]
4 Exact dynamic programming algorithms for the problem are well known (Bar-Hillel et al. [sent-16, score-0.203]
5 This paper describes an efficient algorithm for exact decoding of synchronous grammar models for translation. [sent-19, score-0.345]
6 , 1964) by using Lagrangian relaxation to decompose the decoding problem into the following sub-problems: 1. [sent-29, score-0.415]
7 Application of an all-pairs shortest path algorithm to a directed graph derived from the weighted hypergraph. [sent-33, score-0.343]
8 Informally, the first decoding algorithm incorporates the weights and hard constraints on translations from the synchronous grammar, while the second decoding algorithm is used to integrate language model scores. [sent-35, score-0.58]
9 Lagrange multipliers are used to enforce agreement between the structures produced by the two decoding algorithms. [sent-36, score-0.332]
10 The algorithm uses a subgradient method to minimize a dual function. [sent-39, score-0.316]
11 The dual corresponds to a particular linear programming (LP) relaxation of the original decoding problem. [sent-40, score-0.674]
12 The method will recover an exact solution, with a certificate of optimality, if the underlying LP relaxation has an integral solution. [sent-41, score-0.385]
13 The second technical contribution of this paper is to describe a method that iteratively tightens the underlying LP relaxation until an exact solution is produced. [sent-43, score-0.41]
14 We do this by gradually introducing constraints to step 1 (dynamic programming over the hypergraph), while still maintaining efficiency. [sent-44, score-0.265]
15 The method is comparable in speed to state-of-the-art decoding algorithms; for example, over 70% of the test examples are decoded in 2 seconds or less. [sent-49, score-0.211]
16 2 Related Work A variety of approximate decoding algorithms have been explored for syntax-based translation systems, including cube-pruning (Chiang, 2007; Huang and Chiang, 2007), left-to-right decoding with beam search (Watanabe et al. [sent-52, score-0.336]
17 (2009) show that exact FST decoding is feasible for a phrase-based system with limited reordering (the MJ1 model (Kumar and Byrne, 2005)), and de Gispert et al. [sent-57, score-0.23]
18 (2010) show that exact FST decoding is feasible for a specific class of hierarchical grammars (shallow-1 grammars). [sent-58, score-0.23]
19 Lagrangian relaxation is a classical technique in combinatorial optimization (Korte and Vygen, 2008). [sent-61, score-0.268]
20 Lagrange multipliers are used to add linear constraints to an existing problem that can be solved using a combinatorial algorithm; the resulting dual function is then minimized, for example using subgradient methods. [sent-62, score-0.526]
21 In recent work, dual decomposition—a special case of Lagrangian relaxation, where the linear constraints enforce agreement between two or more models—has been applied to inference in Markov random fields (Wainwright et al. [sent-63, score-0.311]
22 The first step is to take an input sentence in the source language, and from this to create a hypergraph (sometimes called a translation forest) that represents the set of possible translations (strings in the target language) and derivations under the grammar. [sent-75, score-0.504]
23 For example, in the system of (Chiang, 2005), the hypergraph is created as follows: first, the source side of the synchronous grammar is used to create a parse forest over the source language string. [sent-77, score-0.399]
24 Chiang’s method uses a synchronous context-free grammar, but the hypergraph formalism is applicable to a broad range of other grammatical formalisms, for example dependency grammars (e. [sent-79, score-0.399]
25 We will assume that the hypergraph is acyclic: intuitively this will mean that no derivation (as defined below) contains the same vertex more than once (see (Martin et al. [sent-111, score-0.569]
26 A derivation is represented by a vector = y = {yr : r ∈ I} rwivheartieo yv = 1p rife sveenrtteexd v yis a au vseedct oinr tyhe = derivation, yv = 0e roet hyerwise (similarly ye = 1if edge e is used in the derivation, ye = 0 otherwise). [sent-119, score-0.931]
27 Thus y is a vector in {0, A valid derivation Tsahtuissfie ys itsh ea following {co0n,s1t}raints: 1}|I|. [sent-120, score-0.263]
28 E,1a}ch derivation y in the hypergraph will imply an ordered sequence of leaves v1 . [sent-130, score-0.638]
29 In a weighted hypergraph problem, we assume a parameter vector θ = {θr : r ∈ IP}. [sent-139, score-0.349]
30 ) For any derivation y, with leaves s(y) = v1, v2, . [sent-159, score-0.289]
31 , vn, it is the case that: (1) v1 = 2 and vn = 3; (2) the leaves 2 and 3 cannot appear at any other position in the strings s(y) for y ∈ Y; (3) l(2) = < s > where < s > is the start symbol ;in ( 3th)e l language model; (4) l(3) = < s/t s > where < / s > is the end symbol. [sent-162, score-0.234]
32 4 4 A Simple Lagrangian Relaxation Algorithm We now give a Lagrangian relaxation algorithm for integration of a hypergraph with a bigram language model, in cases where the hypergraph satisfies the following simplifying assumption: Assumption 4. [sent-165, score-1.141]
33 ) For any two leaves v and w, it is either the case that: 1) for all derivations y such that v and w are both in the sequence l(y), v precedes w; or 2) for all derivations y such that v and w are both in l(y), w precedes v. [sent-167, score-0.236]
34 (Tvh)e f algorithm t hVen involves the following steps: (1) For each leaf v, find the previous leaf w that maximizes the score θ(w, v) − u(w) (call this leaf α∗ (v), and define αv = θ(α∗ (v) , v) − u(α∗ (v))). [sent-174, score-0.656]
35 (2) find the highest scoring (dev)ri,vva)ti o−n using dynamic programming 4The assumption generalizes in the obvious way to k’th order language models: e. [sent-175, score-0.243]
36 , for trigram models we assume that v1 = 2, v2 = 3, vn = 4, l(2) = l(3) = , l(4) = . [sent-177, score-0.222]
37 5It is easy to come up with examples that violate this assumption: for example a hypergraph with edges hh4, 5i , 1i and hh5, 4i , 1i vfoirol eaxteasm tphlee assumption. [sent-178, score-0.451]
38 (3) If the output derivation from step 2 has the same set of bigrams as those from step 1, then we have an exact solution to the problem. [sent-182, score-0.421]
39 Steps 1and 2 can be performed efficiently; in particular, we avoid the classical dynamic programming intersection, instead relying on dynamic programming over the original, simple hypergraph. [sent-184, score-0.406]
40 1 above, Y = {y : y satisfies constraints C0, C1, C2} wabhoevree, t Yhe =co {nyst :ra yin sta dtiesffiinesiti coonnss are: • • 1n}de|Ix| (C0) The yv and ye variables form a derivation (inC t0h)e T hypergraph, as defined in section 3. [sent-193, score-0.75]
41 (PC1) For all v ∈ VL such that v = 2, yv = VL such that v = 3, yv = Pw:hw,vi∈By(w,v). [sent-194, score-0.56]
42 C1 stPates that each leaf in a derivation has exactly one in-coming bigram, and that each leaf not in the derivation has 0 incoming bigrams; C2 states that each leaf in a derivation has exactly one out-going bigram, and that each leaf not in the derivation has 0 outgoing bigrams. [sent-196, score-1.447]
43 Next, define Y′ as Y′ = {y : y satisfies constraints C0 and C1} In this definition we have dropped the C2 constraints. [sent-203, score-0.223]
44 At each point the algorithm finds = arg maxy∈Y′ L(ut−1 , y), where ut−1 are the Lagrange multipliers from the previous iteration. [sent-207, score-0.3]
45 If yt satisfies the C2 constraints in addition to C0 and C1, then it is returned as the output from the algorithm. [sent-208, score-0.265]
46 IntuiPtively, these updates encourage the values of yv and Pw:hv,wi∈B y(v, w) to be equal; formally, these updaPtes correspond to subgradient steps. [sent-210, score-0.416]
47 • Using dynamic programming, find values for tUhesi yv a dnydn ye ivcari parboglersa tmhamt ifnogrm, f a dva vliadl udeesriv fao-r tion, and that maximize f′(y) = Pv(βv + αv)yv + Peβeye. [sent-213, score-0.458]
48 In the first step we compute the highest scoring incoming bigram for each leaf v. [sent-216, score-0.309]
49 In the second step we use conventional dynamic programming over the hypergraph to find an optimal derivation that incorporates weights from the first step. [sent-217, score-0.779]
50 The most important formal results are: 1) for any value of u, L(u) ≥ f(y∗) (hence the dual value provides an upper b)o ≥und f on the optimal primal value); 2) under an appropriate choice of the step sizes δt, the subgradient algorithm is guaranteed to converge to the minimum of L(u) (i. [sent-222, score-0.548]
51 , we will minimize the upper bound, making it as tight as possible); 3) if at any point the algorithm in figure 1 finds a yt that satisfies the C2 constraints, then this is guaranteed to be the optimal primal solution. [sent-224, score-0.346]
52 1 A Sketch of the Algorithm A crucial idea in the new algorithm is that of paths between leaves in hypergraph derivations. [sent-235, score-0.614]
53 In addition, we will define g(y) p0 v1 p1 v2 p2 v3 p3 pn−1 vn, pn where each pi is a path in the derivation between leaves vi and vi+1. [sent-240, score-0.604]
54 The path traces through the nonterminals that are between the two leaves in the tree. [sent-241, score-0.265]
55 The mapping from a derivation y to a path g(y) can be performed using the algorithm in figure 2. [sent-249, score-0.391]
56 For a given derivation y, define E(y) = {y : ye = 1}, aan gdi use E(y) as nth ye, dseefti noef input edges :to y this algorithm. [sent-250, score-0.347]
57 eT Ehe( output hfreo mse tth oef algorithm ewsi ltol b teh a set of states S, and a set of directed edges T, which together fully define the path g(y). [sent-251, score-0.306]
58 In the simple algorithm, the first step was to predict the previous leaf for each leaf v, under a score that combined a language model score with a Lagrange multiplier score (i. [sent-252, score-0.41]
59 In this section we describe an algorithm that for each leaf v again predicts the previous leaf, but in addition predicts the full path back to that leaf. [sent-255, score-0.395]
60 For example, rather than making a prediction for leaf 5 that it should be preceded by leaf 4, we would also predict the path h4 ↑i h4 ↑, 2 ↑i h2 ↑, 5 ↓i h5 ↓i between tt htheese p two 4lea ↑vieh4s. [sent-256, score-0.509]
61 Lagrange multipliers w beillbe used to enforce consistency between these predictions (both paths and previous words) and a valid derivation. [sent-257, score-0.308]
62 The result is a directed graph (S, T) that contains all possible paths for valid derivations in V, E (it also contains additional, ill-formed paths). [sent-259, score-0.242]
63 1 A trigram path p is p = hv1,p1, v2,p2, v3i where: a) v1, v2, v3 ∈ VL; b) p1 is a path (sequence of states) between∈ nodes hv1 ↑i and hv2 ↓i in the graph (S, T); c) p2 is a path ↑biet awnede nhv no↓deis hv2 ↑i arnapdh hv3 ↓i )in; cth)e p graph (S, T). [sent-261, score-0.671]
64 Wtwee define ePs to be↑ tihe a set of all trigram paths (inS (S, T). [sent-262, score-0.242]
65 DD01– simply setreatethse ethcoatn ys = 1s iff there is exactly one edge e in the derivation such that s ∈ S(e). [sent-272, score-0.258]
66 le Cavoenss itrna tihntes trigram paths, acned c tohnes yv values. [sent-274, score-0.382]
67 The Lagrangian relaxation algorithm is then derived in a similar way to before. [sent-276, score-0.333]
68 The key step in the algorithm at each iteration is to compute Figure 4: The full Lagrangian relaxation algortihm. [sent-280, score-0.435]
69 , for each v, compute the highest scoring trigram path ending in v. [sent-287, score-0.253]
70 Find values for the yv, ye and ys variables that form a valid derivation, and that maximize f′(y) = Pv(βv+αv)yv+Peβeye+Ps βsys 3. [sent-289, score-0.214]
71 Set yp = P1 iff yv3(p) = 1anPd p = α∗v3(pP) The first step involves finding the highest scoring incoming trigram path for each leaf v. [sent-290, score-0.587]
72 This step can be performed efficiently using the Floyd-Warshall allpairs shortest path algorithm (Floyd, 1962) over the graph (S, T) ; the details are given in the appendix. [sent-291, score-0.432]
73 ming over the hypergraph (V, E) (it is simple to integrate the βs terms into this algorithm). [sent-293, score-0.349]
74 In the third step, the path variables yp are filled in. [sent-294, score-0.267]
75 The main steps of the algorithm are: 1) construction of the graph (S, T) ; 2) at each iteration, dynamic programming over the hypergraph (V, E); 3) at each iteration, all-pairs shortest path algorithms over the graph (S, T). [sent-297, score-0.989]
76 The other steps—hypergraph dynamic programming, and allpairs shortest path—are widely known algorithms that are simple to implement. [sent-301, score-0.202]
77 , see (Korte and Vygen, 2008)), L is the dual function for a particular LP relaxation arising from the definition of Y′ and the adldaixtiaotnioaln caorinssintagin frtos mD3 th–eD d6e. [sent-305, score-0.383]
78 f nIniti some cases the LP relaxation has an integral solution, in which case the algorithm will return an optimal solution yt. [sent-306, score-0.426]
79 7 In other cases, when the LP relaxation has a fractional solution, the subgradient algorithm will still converge to the minimum of L, but the primal solutions yt will move between a number of solutions. [sent-307, score-0.772]
80 e Feot rY a given y ∈ Y′, for any v with yv = 1, we can recover t yhe previous two leaves (the trigram ending in v) from either the path variables yp, or the hypergraph variables ye. [sent-309, score-1.133]
81 Specifically, define v−1 (v, y) to be the leaf preceding v in the trigram path p with yp = 1 and v3(p) = v, and v−2 (v, y) to be the leaf two positions before v in the trigram path p with yp = 1and v3 (p) = v. [sent-310, score-1.062]
82 sistent solution, we require v−1 (v, y) = v−′1 (v, y) and v−2 (v, y) = v−′2 (v, y) for all v with yv = 1. [sent-314, score-0.28]
83 Unfortunately, explicitly enforcing all of these constraints would require exhaustive dynamic programming over the hypergraph using the (Bar-Hillel et al. [sent-315, score-0.696]
84 pTahretinti we sw thilel add the following constraints to Y′: = = π(v−1 (v, y)) π(v−′1 (v, y)) π(v−2 (v, y)) π(v−′2 (v, y)) for all v such that yv = 1. [sent-323, score-0.386]
85 To do this we implement the following steps: 1) run the subgradient algorithm until L is close to convergence; 2) then run the subgradient algorithm for m further iterations, keeping track of all pairs of leaf nodes that violate the constraints (i. [sent-328, score-0.721]
86 7 Experiments We report experiments on translation from Chinese to English, using the tree-to-string model described = 8In fact in our experiments we use the original hypergraph to compute admissible outside scores for an exact A* search algorithm for this problem. [sent-338, score-0.539]
87 LR = Lagrangian relaxation; DP = exhaustive dynamic programming; ILP = integer linear programming; LP = linear programming (LP does not recover an exact solution). [sent-348, score-0.398]
88 In cases where the method does not converge within 200 iterations, we can return the best primal solution yt found by the algorithm during those iterations. [sent-361, score-0.331]
89 Figure 5 gives information) on decoding time for our method and two other exact decoding methods: integer linear programming (using constraints D0– D6), and exhaustive dynamic programming using the construction of (Bar-Hillel et al. [sent-367, score-0.868]
90 Figure 5 also gives a speed comparison of our method to a linear programming (LP) solver that solves the LP relaxation defined by constraints D0– D6. [sent-386, score-0.518]
91 The Lagrangian relaxation method, when run without the tightening method of section 6, is solving a dual of the problem being solved by the LP solver. [sent-388, score-0.519]
92 Inspection of the tightening procedure shows that the number of partitions required (the parameter q) is generally quite small: 59% of examples that require tightening require q ≤ 6; 97. [sent-392, score-0.342]
93 8 Conclusion We have described a Lagrangian relaxation algorithm for exact decoding of syntactic translation models, and shown that it is significantly more efficient than other exact algorithms for decoding tree80 to-string models. [sent-394, score-0.835]
94 Additionally, our experiments used a trigram language model, however the constraints in figure 3 generalize to higher-order language models. [sent-397, score-0.208]
95 This will allow us to apply shortest path algorithms to the graph, even though the weights u(s) and v(s) can be positive or negative. [sent-403, score-0.22]
96 The p∗u and scoreu∗ values can be calculated using an all-pairs shortest path algorithm, with weights u(s) on nodes in the graph. [sent-408, score-0.22]
97 Similarly, p∗v and scorev∗ can be computed using all-pairs shortest path with weights v(s) on the nodes. [sent-409, score-0.22]
98 Having calculated these values, define T (v) for any leafH v itnog b cea ltchuel steetd o thf trigrams s(,x ,d y, nv)e Tsu (cvh) th foart: n1y) x, y ∈ VL; 2) there is a path from hx ↑i to hy ↓i and from xhy, y↑i ∈ ∈to V hv; 2↓i) tihne trhee i graph hS ,f rTom. [sent-410, score-0.263]
99 On dual decomposition and linear programming relaxations for natural language processing. [sent-519, score-0.298]
100 A fast finite-state relaxation method for enforcing global constraints on sequence decoding. [sent-546, score-0.374]
wordName wordTfidf (topN-words)
[('hypergraph', 0.349), ('yv', 0.28), ('relaxation', 0.268), ('lagrangian', 0.224), ('vl', 0.181), ('leaf', 0.179), ('derivation', 0.175), ('lp', 0.157), ('path', 0.151), ('lagrange', 0.149), ('decoding', 0.147), ('subgradient', 0.136), ('tightening', 0.136), ('multipliers', 0.132), ('maxy', 0.121), ('vn', 0.12), ('dual', 0.115), ('leaves', 0.114), ('vi', 0.11), ('cube', 0.108), ('programming', 0.107), ('constraints', 0.106), ('trigram', 0.102), ('dynamic', 0.096), ('yt', 0.096), ('paths', 0.086), ('exact', 0.083), ('ye', 0.082), ('yp', 0.072), ('arg', 0.071), ('rush', 0.07), ('shortest', 0.069), ('algorithm', 0.065), ('chiang', 0.064), ('satisfies', 0.063), ('derivations', 0.061), ('solutions', 0.061), ('solution', 0.059), ('graph', 0.058), ('primal', 0.057), ('iglesias', 0.056), ('scoreu', 0.056), ('huang', 0.056), ('define', 0.054), ('converge', 0.054), ('enforce', 0.053), ('step', 0.052), ('sontag', 0.052), ('ys', 0.051), ('ps', 0.051), ('synchronous', 0.05), ('iteration', 0.05), ('ha', 0.049), ('yhe', 0.049), ('bigram', 0.047), ('korte', 0.045), ('hv', 0.045), ('vertex', 0.045), ('variables', 0.044), ('gispert', 0.043), ('xv', 0.043), ('translation', 0.042), ('fst', 0.041), ('assumption', 0.04), ('decomposition', 0.039), ('partitions', 0.038), ('exhaustive', 0.038), ('linear', 0.037), ('allpairs', 0.037), ('banga', 0.037), ('isil', 0.037), ('maxw', 0.037), ('scorev', 0.037), ('sys', 0.037), ('vygen', 0.037), ('vyv', 0.037), ('wainwright', 0.037), ('eye', 0.037), ('hypergraphs', 0.037), ('ut', 0.037), ('valid', 0.037), ('formal', 0.036), ('jaakkola', 0.036), ('edges', 0.036), ('bi', 0.036), ('steps', 0.036), ('fractional', 0.035), ('sketch', 0.035), ('pruning', 0.034), ('integral', 0.034), ('violate', 0.034), ('bound', 0.034), ('pe', 0.033), ('vki', 0.033), ('upper', 0.033), ('finds', 0.032), ('edge', 0.032), ('seconds', 0.032), ('examples', 0.032), ('incoming', 0.031)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000011 123 acl-2011-Exact Decoding of Syntactic Translation Models through Lagrangian Relaxation
Author: Alexander M. Rush ; Michael Collins
Abstract: We describe an exact decoding algorithm for syntax-based statistical translation. The approach uses Lagrangian relaxation to decompose the decoding problem into tractable subproblems, thereby avoiding exhaustive dynamic programming. The method recovers exact solutions, with certificates of optimality, on over 97% of test examples; it has comparable speed to state-of-the-art decoders.
2 0.2934567 106 acl-2011-Dual Decomposition for Natural Language Processing
Author: Alexander M. Rush and Michael Collins
Abstract: unkown-abstract
3 0.11993725 155 acl-2011-Hypothesis Mixture Decoding for Statistical Machine Translation
Author: Nan Duan ; Mu Li ; Ming Zhou
Abstract: This paper presents hypothesis mixture decoding (HM decoding), a new decoding scheme that performs translation reconstruction using hypotheses generated by multiple translation systems. HM decoding involves two decoding stages: first, each component system decodes independently, with the explored search space kept for use in the next step; second, a new search space is constructed by composing existing hypotheses produced by all component systems using a set of rules provided by the HM decoder itself, and a new set of model independent features are used to seek the final best translation from this new search space. Few assumptions are made by our approach about the underlying component systems, enabling us to leverage SMT models based on arbitrary paradigms. We compare our approach with several related techniques, and demonstrate significant BLEU improvements in large-scale Chinese-to-English translation tasks.
4 0.1193803 110 acl-2011-Effective Use of Function Words for Rule Generalization in Forest-Based Translation
Author: Xianchao Wu ; Takuya Matsuzaki ; Jun'ichi Tsujii
Abstract: In the present paper, we propose the effective usage of function words to generate generalized translation rules for forest-based translation. Given aligned forest-string pairs, we extract composed tree-to-string translation rules that account for multiple interpretations of both aligned and unaligned target function words. In order to constrain the exhaustive attachments of function words, we limit to bind them to the nearby syntactic chunks yielded by a target dependency parser. Therefore, the proposed approach can not only capture source-tree-to-target-chunk correspondences but can also use forest structures that compactly encode an exponential number of parse trees to properly generate target function words during decoding. Extensive experiments involving large-scale English-toJapanese translation revealed a significant im- provement of 1.8 points in BLEU score, as compared with a strong forest-to-string baseline system.
5 0.11567044 202 acl-2011-Learning Hierarchical Translation Structure with Linguistic Annotations
Author: Markos Mylonakis ; Khalil Sima'an
Abstract: While it is generally accepted that many translation phenomena are correlated with linguistic structures, employing linguistic syntax for translation has proven a highly non-trivial task. The key assumption behind many approaches is that translation is guided by the source and/or target language parse, employing rules extracted from the parse tree or performing tree transformations. These approaches enforce strict constraints and might overlook important translation phenomena that cross linguistic constituents. We propose a novel flexible modelling approach to introduce linguistic information of varying granularity from the source side. Our method induces joint probability synchronous grammars and estimates their parameters, by select- ing and weighing together linguistically motivated rules according to an objective function directly targeting generalisation over future data. We obtain statistically significant improvements across 4 different language pairs with English as source, mounting up to +1.92 BLEU for Chinese as target.
6 0.11549632 268 acl-2011-Rule Markov Models for Fast Tree-to-String Translation
7 0.11269854 166 acl-2011-Improving Decoding Generalization for Tree-to-String Translation
8 0.11147678 221 acl-2011-Model-Based Aligner Combination Using Dual Decomposition
9 0.10613239 235 acl-2011-Optimal and Syntactically-Informed Decoding for Monolingual Phrase-Based Alignment
10 0.10048192 30 acl-2011-Adjoining Tree-to-String Translation
11 0.099593788 5 acl-2011-A Comparison of Loopy Belief Propagation and Dual Decomposition for Integrated CCG Supertagging and Parsing
12 0.098840699 171 acl-2011-Incremental Syntactic Language Models for Phrase-based Translation
13 0.09750402 61 acl-2011-Binarized Forest to String Translation
14 0.091323912 217 acl-2011-Machine Translation System Combination by Confusion Forest
15 0.088794224 187 acl-2011-Jointly Learning to Extract and Compress
16 0.085614949 206 acl-2011-Learning to Transform and Select Elementary Trees for Improved Syntax-based Machine Translations
17 0.08469855 323 acl-2011-Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections
18 0.081479669 250 acl-2011-Prefix Probability for Probabilistic Synchronous Context-Free Grammars
19 0.081179932 234 acl-2011-Optimal Head-Driven Parsing Complexity for Linear Context-Free Rewriting Systems
20 0.077641256 150 acl-2011-Hierarchical Text Classification with Latent Concepts
topicId topicWeight
[(0, 0.192), (1, -0.119), (2, 0.021), (3, -0.075), (4, 0.009), (5, -0.019), (6, -0.142), (7, 0.022), (8, -0.053), (9, -0.005), (10, 0.034), (11, 0.078), (12, 0.027), (13, 0.086), (14, -0.063), (15, 0.022), (16, -0.009), (17, -0.05), (18, -0.054), (19, -0.037), (20, 0.062), (21, -0.011), (22, 0.134), (23, 0.078), (24, -0.071), (25, -0.065), (26, -0.02), (27, 0.032), (28, 0.005), (29, 0.073), (30, -0.014), (31, 0.043), (32, -0.06), (33, 0.012), (34, -0.039), (35, 0.091), (36, 0.04), (37, -0.012), (38, -0.157), (39, -0.115), (40, 0.093), (41, -0.017), (42, -0.037), (43, -0.081), (44, 0.143), (45, -0.179), (46, -0.037), (47, -0.084), (48, 0.075), (49, 0.045)]
simIndex simValue paperId paperTitle
same-paper 1 0.94185525 123 acl-2011-Exact Decoding of Syntactic Translation Models through Lagrangian Relaxation
Author: Alexander M. Rush ; Michael Collins
Abstract: We describe an exact decoding algorithm for syntax-based statistical translation. The approach uses Lagrangian relaxation to decompose the decoding problem into tractable subproblems, thereby avoiding exhaustive dynamic programming. The method recovers exact solutions, with certificates of optimality, on over 97% of test examples; it has comparable speed to state-of-the-art decoders.
2 0.87796974 106 acl-2011-Dual Decomposition for Natural Language Processing
Author: Alexander M. Rush and Michael Collins
Abstract: unkown-abstract
3 0.56408596 155 acl-2011-Hypothesis Mixture Decoding for Statistical Machine Translation
Author: Nan Duan ; Mu Li ; Ming Zhou
Abstract: This paper presents hypothesis mixture decoding (HM decoding), a new decoding scheme that performs translation reconstruction using hypotheses generated by multiple translation systems. HM decoding involves two decoding stages: first, each component system decodes independently, with the explored search space kept for use in the next step; second, a new search space is constructed by composing existing hypotheses produced by all component systems using a set of rules provided by the HM decoder itself, and a new set of model independent features are used to seek the final best translation from this new search space. Few assumptions are made by our approach about the underlying component systems, enabling us to leverage SMT models based on arbitrary paradigms. We compare our approach with several related techniques, and demonstrate significant BLEU improvements in large-scale Chinese-to-English translation tasks.
4 0.51852816 166 acl-2011-Improving Decoding Generalization for Tree-to-String Translation
Author: Jingbo Zhu ; Tong Xiao
Abstract: To address the parse error issue for tree-tostring translation, this paper proposes a similarity-based decoding generation (SDG) solution by reconstructing similar source parse trees for decoding at the decoding time instead of taking multiple source parse trees as input for decoding. Experiments on Chinese-English translation demonstrated that our approach can achieve a significant improvement over the standard method, and has little impact on decoding speed in practice. Our approach is very easy to implement, and can be applied to other paradigms such as tree-to-tree models. 1
5 0.50991303 220 acl-2011-Minimum Bayes-risk System Combination
Author: Jesus Gonzalez-Rubio ; Alfons Juan ; Francisco Casacuberta
Abstract: We present minimum Bayes-risk system combination, a method that integrates consensus decoding and system combination into a unified multi-system minimum Bayes-risk (MBR) technique. Unlike other MBR methods that re-rank translations of a single SMT system, MBR system combination uses the MBR decision rule and a linear combination of the component systems’ probability distributions to search for the minimum risk translation among all the finite-length strings over the output vocabulary. We introduce expected BLEU, an approximation to the BLEU score that allows to efficiently apply MBR in these conditions. MBR system combination is a general method that is independent of specific SMT models, enabling us to combine systems with heterogeneous structure. Experiments show that our approach bring significant improvements to single-system-based MBR decoding and achieves comparable results to different state-of-the-art system combination methods.
6 0.49972597 234 acl-2011-Optimal Head-Driven Parsing Complexity for Linear Context-Free Rewriting Systems
7 0.49377492 235 acl-2011-Optimal and Syntactically-Informed Decoding for Monolingual Phrase-Based Alignment
8 0.48565024 217 acl-2011-Machine Translation System Combination by Confusion Forest
9 0.47353297 268 acl-2011-Rule Markov Models for Fast Tree-to-String Translation
10 0.4572683 154 acl-2011-How to train your multi bottom-up tree transducer
11 0.45681405 239 acl-2011-P11-5002 k2opt.pdf
12 0.44770896 187 acl-2011-Jointly Learning to Extract and Compress
13 0.44470501 5 acl-2011-A Comparison of Loopy Belief Propagation and Dual Decomposition for Integrated CCG Supertagging and Parsing
14 0.43772206 221 acl-2011-Model-Based Aligner Combination Using Dual Decomposition
15 0.43255335 323 acl-2011-Unsupervised Part-of-Speech Tagging with Bilingual Graph-Based Projections
16 0.42620304 342 acl-2011-full-for-print
17 0.41789648 303 acl-2011-Tier-based Strictly Local Constraints for Phonology
18 0.41399387 150 acl-2011-Hierarchical Text Classification with Latent Concepts
19 0.40861502 30 acl-2011-Adjoining Tree-to-String Translation
20 0.40073359 110 acl-2011-Effective Use of Function Words for Rule Generalization in Forest-Based Translation
topicId topicWeight
[(5, 0.015), (17, 0.058), (26, 0.261), (37, 0.095), (39, 0.051), (41, 0.058), (55, 0.027), (59, 0.043), (62, 0.013), (64, 0.063), (72, 0.024), (91, 0.041), (96, 0.129)]
simIndex simValue paperId paperTitle
1 0.95515853 105 acl-2011-Dr Sentiment Knows Everything!
Author: Amitava Das ; Sivaji Bandyopadhyay
Abstract: Sentiment analysis is one of the hot demanding research areas since last few decades. Although a formidable amount of research have been done, the existing reported solutions or available systems are still far from perfect or do not meet the satisfaction level of end users’ . The main issue is the various conceptual rules that govern sentiment and there are even more clues (possibly unlimited) that can convey these concepts from realization to verbalization of a human being. Human psychology directly relates to the unrevealed clues and governs the sentiment realization of us. Human psychology relates many things like social psychology, culture, pragmatics and many more endless intelligent aspects of civilization. Proper incorporation of human psychology into computational sentiment knowledge representation may solve the problem. In the present paper we propose a template based online interactive gaming technology, called Dr Sentiment to automatically create the PsychoSentiWordNet involving internet population. The PsychoSentiWordNet is an extension of SentiWordNet that presently holds human psychological knowledge on a few aspects along with sentiment knowledge.
2 0.92636722 115 acl-2011-Engkoo: Mining the Web for Language Learning
Author: Matthew R. Scott ; Xiaohua Liu ; Ming Zhou ; Microsoft Engkoo Team
Abstract: This paper presents Engkoo 1, a system for exploring and learning language. It is built primarily by mining translation knowledge from billions of web pages - using the Internet to catch language in motion. Currently Engkoo is built for Chinese users who are learning English; however the technology itself is language independent and can be extended in the future. At a system level, Engkoo is an application platform that supports a multitude of NLP technologies such as cross language retrieval, alignment, sentence classification, and statistical machine translation. The data set that supports this system is primarily built from mining a massive set of bilingual terms and sentences from across the web. Specifically, web pages that contain both Chinese and English are discovered and analyzed for parallelism, extracted and formulated into clear term definitions and sample sentences. This approach allows us to build perhaps the world’s largest lexicon linking both Chinese and English together - at the same time covering the most up-to-date terms as captured by the net.
3 0.90796816 259 acl-2011-Rare Word Translation Extraction from Aligned Comparable Documents
Author: Emmanuel Prochasson ; Pascale Fung
Abstract: We present a first known result of high precision rare word bilingual extraction from comparable corpora, using aligned comparable documents and supervised classification. We incorporate two features, a context-vector similarity and a co-occurrence model between words in aligned documents in a machine learning approach. We test our hypothesis on different pairs of languages and corpora. We obtain very high F-Measure between 80% and 98% for recognizing and extracting correct translations for rare terms (from 1to 5 occurrences). Moreover, we show that our system can be trained on a pair of languages and test on a different pair of languages, obtaining a F-Measure of 77% for the classification of Chinese-English translations using a training corpus of Spanish-French. Our method is therefore even potentially applicable to low resources languages without training data.
4 0.90742838 253 acl-2011-PsychoSentiWordNet
Author: Amitava Das
Abstract: Sentiment analysis is one of the hot demanding research areas since last few decades. Although a formidable amount of research has been done but still the existing reported solutions or available systems are far from perfect or to meet the satisfaction level of end user's. The main issue may be there are many conceptual rules that govern sentiment, and there are even more clues (possibly unlimited) that can convey these concepts from realization to verbalization of a human being. Human psychology directly relates to the unrevealed clues; govern the sentiment realization of us. Human psychology relates many things like social psychology, culture, pragmatics and many more endless intelligent aspects of civilization. Proper incorporation of human psychology into computational sentiment knowledge representation may solve the problem. PsychoSentiWordNet is an extension over SentiWordNet that holds human psychological knowledge and sentiment knowledge simultaneously. 1
5 0.8794803 333 acl-2011-Web-Scale Features for Full-Scale Parsing
Author: Mohit Bansal ; Dan Klein
Abstract: Counts from large corpora (like the web) can be powerful syntactic cues. Past work has used web counts to help resolve isolated ambiguities, such as binary noun-verb PP attachments and noun compound bracketings. In this work, we first present a method for generating web count features that address the full range of syntactic attachments. These features encode both surface evidence of lexical affinities as well as paraphrase-based cues to syntactic structure. We then integrate our features into full-scale dependency and constituent parsers. We show relative error reductions of7.0% over the second-order dependency parser of McDonald and Pereira (2006), 9.2% over the constituent parser of Petrov et al. (2006), and 3.4% over a non-local constituent reranker.
same-paper 6 0.83198678 123 acl-2011-Exact Decoding of Syntactic Translation Models through Lagrangian Relaxation
7 0.77698463 70 acl-2011-Clustering Comparable Corpora For Bilingual Lexicon Extraction
8 0.74632204 258 acl-2011-Ranking Class Labels Using Query Sessions
9 0.72054112 271 acl-2011-Search in the Lost Sense of "Query": Question Formulation in Web Search Queries and its Temporal Changes
10 0.70992386 182 acl-2011-Joint Annotation of Search Queries
11 0.70909142 331 acl-2011-Using Large Monolingual and Bilingual Corpora to Improve Coordination Disambiguation
12 0.70622826 137 acl-2011-Fine-Grained Class Label Markup of Search Queries
13 0.6975885 181 acl-2011-Jigs and Lures: Associating Web Queries with Structured Entities
14 0.69479299 256 acl-2011-Query Weighting for Ranking Model Adaptation
15 0.69016206 36 acl-2011-An Efficient Indexer for Large N-Gram Corpora
16 0.67927587 34 acl-2011-An Algorithm for Unsupervised Transliteration Mining with an Application to Word Alignment
17 0.67521691 193 acl-2011-Language-independent compound splitting with morphological operations
18 0.6684342 177 acl-2011-Interactive Group Suggesting for Twitter
19 0.66463804 191 acl-2011-Knowledge Base Population: Successful Approaches and Challenges
20 0.66417944 246 acl-2011-Piggyback: Using Search Engines for Robust Cross-Domain Named Entity Recognition