emnlp emnlp2011 emnlp2011-51 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Yin-Wen Chang ; Michael Collins
Abstract: This paper describes an algorithm for exact decoding of phrase-based translation models, based on Lagrangian relaxation. The method recovers exact solutions, with certificates of optimality, on over 99% of test examples. The method is much more efficient than approaches based on linear programming (LP) or integer linear programming (ILP) solvers: these methods are not feasible for anything other than short sentences. We compare our method to MOSES (Koehn et al., 2007), and give precise estimates of the number and magnitude of search errors that MOSES makes.
Reference: text
sentIndex sentText sentNum sentScore
1 edu Abstract This paper describes an algorithm for exact decoding of phrase-based translation models, based on Lagrangian relaxation. [sent-3, score-0.4]
2 The method is much more efficient than approaches based on linear programming (LP) or integer linear programming (ILP) solvers: these methods are not feasible for anything other than short sentences. [sent-5, score-0.337]
3 The decoding problem for phrase-based models is NPhard1 ; because of this, previous work has generally focused on approximate search methods, for example variants of beam search, for decoding. [sent-12, score-0.325]
4 This paper describes an algorithm for exact decoding of phrase-based models, based on Lagrangian relaxation (Lemar´ echal, 2001). [sent-13, score-0.657]
5 The core of the algorithm is a dynamic program for phrasebased translation which is efficient, but which allows some ill-formed translations. [sent-14, score-0.465]
6 Lagrangian relaxation is used to enforce the constraint 1We refer here to the phrase-based models of (Koehn et al. [sent-16, score-0.388]
7 A subgradient algorithm is used to optimize the dual problem arising from the relaxation. [sent-23, score-0.427]
8 The first technical contribution of this paper is the basic Lagrangian relaxation algorithm. [sent-24, score-0.352]
9 By the usual guarantees for Lagrangian relaxation, if this algorithm converges to a solution where all constraints are satisfied (i. [sent-25, score-0.283]
10 , where each word is translated exactly once), then the solution is guaranteed to be optimal. [sent-27, score-0.29]
11 For some source-language sentences however, the underlying relaxation is loose, and the algorithm will not converge. [sent-28, score-0.4]
12 The second technical contribution of this paper is a method that incrementally adds constraints to the underlying dynamic program, thereby tightening the relaxation until an exact solution is recovered. [sent-29, score-0.938]
13 We compare to a linear programming (LP)/integer linear programming (ILP) based decoder. [sent-37, score-0.292]
14 Our method is much more efficient: LP or ILP decoding is not feasible for anything other than short sentences,2 whereas the average decoding time for our method (for sentences of length 1-50 words) is 121 seconds per sentence. [sent-38, score-0.362]
15 2 Related Work Lagrangian relaxation is a classical technique for solving combinatorial optimization problems (Korte and Vygen, 2008; Lemar ´echal, 2001). [sent-46, score-0.41]
16 Recently, Rush and Collins (201 1) describe a Lagrangian relaxation algorithm for decoding for syntactic translation; the algorithmic construction described in the current paper is, however, very different in nature to this work. [sent-54, score-0.581]
17 , 2003) are the most commonly used decoding algorithm for phrase-based models. [sent-56, score-0.229]
18 Exact decoding via integer linear programming (ILP) for IBM model 4 (Brown et al. [sent-59, score-0.372]
19 (2009) formulate the phrase-based decoding problem as a traveling salesman problem (TSP), and take advantage of existing exact and approximate approaches designed for TSP. [sent-64, score-0.325]
20 Their translation experiment uses a bigram language model and applies an approximate algorithm for TSP. [sent-65, score-0.196]
21 The idea of incrementally adding constraints to 27 tighten a relaxation until it is exact is a core idea in combinatorial optimization. [sent-70, score-0.691]
22 3 The Phrase-based Translation Model This section establishes notation for phrase-based translation models, and gives a definition of the decoding problem. [sent-73, score-0.276]
23 For two consecutive phrases pk = (s, t, e) and pk+1 = (s0, t0, e0), the distortion distance is defined as δ(t, s0) = |t + 1 s0 |. [sent-101, score-0.29]
24 The score for a translation is then d)ef=i ne |dt as − XL f(y) XL−1 = h(e(y))+Xg(pk)+Xη×δ(t(pk),s(pk+1)) kX= X1 Xk=1 where η ∈ R is often referred to as the distortion penalty, a ∈nd typically takes a negative value. [sent-102, score-0.216]
25 An exact dynamic programming algorithm for this problem uses states (w1, w2 , b, r), where (w1, w2) is a target-language bigram that the partial translation ended with, b is a bit-string denoting which source-language words have been translated, and r is the end position of the previous phrase (e. [sent-124, score-0.699]
26 The bigram (w1, w2) is needed for calculation of trigram language model scores; r is needed to enforce the distortion limit, and to calculate distortion costs. [sent-128, score-0.295]
27 Since the number of possible bit-strings is exponential in the length of sentence, exhaustive dynamic programming is in general intractable. [sent-130, score-0.309]
28 Instead, people commonly use heuristic search methods such as beam search for decoding. [sent-131, score-0.197]
29 4 A Decoding Algorithm based on Lagrangian Relaxation We now describe a decoding algorithm for phrasebased translation, based on Lagrangian relaxation. [sent-133, score-0.229]
30 28 We first describe a dynamic program for decoding which is efficient, but which relaxes the y(i) = 1 constraints described in the previous section. [sent-136, score-0.605]
31 We then describe the Lagrangian relaxation algorithm, which introduces Lagrange multipliers for each con- straint of the form y(i) = 1, and uses a subgradient algorithm to minimize the dual arising from the relaxation. [sent-137, score-0.908]
32 1 An Efficient Dynamic Program As described in the previous section, our goal is to find the optimal translation y∗ = arg maxy∈Y f(y). [sent-140, score-0.194]
33 PiN=1 We now give the dynamic program that defines Y0. [sent-151, score-0.322]
34 The main idea will be to replace bit-strings (as dYescribed in the previous section) by a much smaller number of dynamic programming states. [sent-152, score-0.309]
35 Specifically, the states of the new dynamic program will be tuples (w1, w2, n, l,m, r). [sent-153, score-0.389]
36 The integer n is the number of words that have been translated thus far in the dynamic programming algorithm. [sent-155, score-0.498]
37 xm in the source-language sentence; this span is the last contiguous span of words that have been translated thus far. [sent-159, score-0.346]
38 The dynamic program can be viewed as a shortest-path problem in a directed graph, with nodes in the graph corresponding to states (w1, w2 , n, l, m, r) . [sent-160, score-0.389]
39 ( ewM2,−e1 ,)eM) i f MM = ≥ 12 n0 = n + t − s +1 • (l0,m0) =( s l, m t ) ) iofth se= rwlmi−se+1 r0 = t The• new target-language bigram (w01 , w20) is the last two words of the partial translation after including phrase p. [sent-170, score-0.199]
40 The score of the transition is given by a sum of the phrase translation score g(p), the language • × model score, and the distortion cost η δ(r, s). [sent-174, score-0.307]
41 Other dynamic programs, and definitions of Y0, are possible: mfoirc example an naldte drnefaitniviteio wnsou olfd Y be to use a dynamic program with states (w1, w2, n, r). [sent-198, score-0.583]
42 mIn) experiments we Yhave found that including the previous span (l, m) ein h atvhee dynamic program glea tdhse to faster convergence of the subgradient algorithm described in the next section, and in general to more stable results. [sent-200, score-0.648]
43 This is in spite of the dynamic program being larger; it is no doubt due to Y0 being a gberatmter approximation o isf Y. [sent-201, score-0.322]
44 2 The Lagrangian Relaxation Algorithm We now describe the Lagrangian relaxation decoding algorithm for the phrase-based model. [sent-203, score-0.581]
45 ate Tdh as: argym∈aYx0f(y) such that ∀i, y(i) = 1 We use Lagrangian relaxation (Korte and Vygen, 2008) to deal with the y(i) = 1 constraints. [sent-207, score-0.352]
46 We now describe an algorithm that solves the dual problem. [sent-214, score-0.239]
47 By standard results for Lagrangian relaxation (Korte and Vygen, 2008), L(u) is a convex function; it can be minimized by a subgradient method. [sent-215, score-0.509]
48 A = subgradient emne γthod is an iterative method for minimizing L(u), which perfoms updates ut ← ut−1 −αtγut−1 where αt > 0 is the step size for th←e t u’th subgradient step. [sent-220, score-0.384]
49 by the dynamic program described in the previous section. [sent-224, score-0.322]
50 3 Properties We now give some theorems stating formal properties of the Lagrangian relaxation algorithm. [sent-228, score-0.392]
51 These results for Lagrangian relaxation are well known: for completeness, we state them here. [sent-229, score-0.352]
52 y∗ = arg maxy∈Y f(y) Our first theorem states that the dual function provides an upper bound on the score for the optimal translation, f(y∗): Theorem 1. [sent-231, score-0.438]
53 For each value of t we show the dual value L(ut−1), the derivation yt, and the number of times each word is translated, yt(i) for i= 1. [sent-245, score-0.246]
54 For each phrase in a derivation we show the English string e, together with the span (s, t): for example, the first phrase in the first derivation has English string the quality and, and span (3, 6). [sent-249, score-0.374]
55 Our final theorem states that if at any iteration the algorithm finds a solution yt such that yt(i) = 1for i= 1. [sent-254, score-0.47]
56 dB Lut( y∗ = arg maxy∈Y f(y), and yu ∈ Y, ≥he fn(cye we must also have f(yu) ≤ f(y∗). [sent-276, score-0.215]
57 In some cases, however, the algorithm in Figure 1 may not return a solution yt such that yt(i) = 1 for all i. [sent-278, score-0.275]
58 In the second case, the underlying relaxation may not be 31 tight, in that there may not be any settings u for the Lagrange multipliers such that yu(i) = 1for all i. [sent-281, score-0.481]
59 Section 5 describes a method for tightening the underlying relaxation by introducing hard constraints (of the form y(i) = 1for selected values of i). [sent-282, score-0.573]
60 We will see that this method is highly effective in tightening the relaxation until the algorithm converges to an optimal solution. [sent-283, score-0.59]
61 At each iteration, the Lagrangian multipliers are updated to encourage each word to be translated once. [sent-287, score-0.273]
62 On this example, the algorithm converges to a solution where all words are translated exactly once, and the solution is guaranteed to be optimal. [sent-288, score-0.471]
63 We now describe a method that incrementally tightens the Lagrangian relaxation algorithm until it provides an exact answer. [sent-293, score-0.528]
64 In cases that do not converge, we introduce hard constraints to force certain words to be translated exactly once in the dynamic programming solver. [sent-294, score-0.622]
65 d αecr >ea 0se iss each time the dual value increases from one iteration to the next; see Appendix A. [sent-303, score-0.238]
66 We can again run a Lagrangian relaxation algorithm, using the set YC0 in place of Y0. [sent-320, score-0.382]
67 We will use Lagrange multipliers uC(i) ptol aecnefo orfce Y the constraints y(i) = 1 for i ∈/ C. [sent-321, score-0.231]
68 However, assume that we have not yet generated a solution yt such that yt(i) = 1for all i. [sent-329, score-0.227]
69 In this case we have some evidence that the relaxation may not be tight, and that we need to add some constraints. [sent-330, score-0.352]
70 To answer this question, we run the subgradient algorithm for K more iterations (e. [sent-332, score-0.266]
71 oIn h an din citoinalphase the algorithm runs subgradient steps, while the dual is still improving. [sent-345, score-0.396]
72 If at any stage the algorithm finds a solution y∗ such that y∗ (i) = 1 for all i, then this is the solution to our original problem, arg maxy∈Y f(y). [sent-347, score-0.252]
73 tually C = {1 4Formal justification for the method comes from the relationship between Lagrangian relaxation and linear programming relaxations. [sent-360, score-0.498]
74 In cases where the relaxation is not tight, the subgradient method will essentially move between solutions whose convex combination form a fractional solution to an underlying LP relaxation (Nedi ´c and Ozdaglar, 2009). [sent-361, score-1.023]
75 Note that a maximum of 3 constraints are added at each recursive call, but that fewer than 3 constraints are added in cases where fewer than 3 constraints have count(i) 0. [sent-374, score-0.306]
76 We do this by recording the first and second best dual values L(u0) and L(u00) in the sequence of Lagrange multipliers u1, u2, . [sent-381, score-0.32]
77 When C ∅, A* search can be used for decoding, w Cith =6 =the ∅ dynamic program efor u eYd0 providing nadgm,i wsistibhle th eest dimynaatmesi cfo prr tohger dynamic program = for YC0. [sent-393, score-0.697]
78 , 2007), a phrase-based decoder using beam search, and to a general purpose integer linear programming (ILP) solver, which solves the problem exactly. [sent-398, score-0.282]
79 The distortion limit d is set to be four, and we prune the phrase translation table to have 10 English phrases per German phrase. [sent-403, score-0.267]
80 7%) sentences, the method converges without adding hard constraints to tighten the relaxation. [sent-409, score-0.243]
81 As expected, decoding times increase as the length of sentences, and the number of constraints required, increase. [sent-414, score-0.283]
82 Y0 indicates the dynamic program using set Y0 as defainnsewd eirn. [sent-430, score-0.322]
83 t1e, sa tnhde Y dy00 niandmicica pterso tghrea dynamic program using sdt iatnes S (w1, w2 , n, r). [sent-432, score-0.322]
84 1 Comparison to an LP/ILP solver To compare to a linear programming (LP) or integer linear programming (ILP) solver, we can implement the dynamic program (search over the set Y0) through linear constraints, with a linear objective. [sent-435, score-0.753]
85 Hence we can encode our relaxation within an LP or ILP. [sent-437, score-0.352]
86 We also compare to an LP or ILP where the dynamic program makes use of states (w1, w2 , n, r)—i. [sent-439, score-0.389]
87 , the span (l, m) is dropped, making the dynamic program smaller. [sent-441, score-0.403]
88 − Results for MOSES-nogc Table 5 shows the number of examples where MOSES-nogc fails to give a translation, and the number of search errors for those cases where it does give a translation, for a range of beam sizes. [sent-484, score-0.211]
89 A search error is defined as a case where our algorithm produces an exact solution that has higher score than the output from MOSESnogc. [sent-485, score-0.254]
90 ne863o15g% c)the translation score from MOSES, and from the optimal derivation, for those sentences where a search error is made. [sent-495, score-0.197]
91 13%) when the beam size is 100, and no search errors when the beam size is 200, 1,000, or 10,000. [sent-502, score-0.268]
92 The BLEU scores under the two decoders are almost identical; hence while MOSES makes a significant proportion of search errors, these search errors appear to be benign in terms of their impact on BLEU scores, at least for this particular translation model. [sent-506, score-0.277]
93 7 Conclusions We have described an exact decoding algorithm for phrase-based translation models, using Lagrangian 35 TatMyblpOeSoE7f:M-ngoBcsLeEUbac1mo0,r 2s1ei0z e0com#p1 s,a42e186nr 17it89s6 on2 . [sent-508, score-0.4]
94 Possible extensions to the approach include methods that incorporate the Lagrangian relaxation formulation within learning algorithms for statistical MT: we see this as an interesting avenue for future research. [sent-514, score-0.352]
95 Approximate primal solutions and rate analysis for dual subgradient methods. [sent-576, score-0.397]
96 Revisiting optimal decoding for machine translation IBM model 4. [sent-602, score-0.325]
97 Exact decoding of syntactic translation models through Lagrangian relaxation. [sent-608, score-0.276]
98 On dual decomposition and linear programming relaxations for natural language processing. [sent-612, score-0.368]
99 Word reordering and a dynamic programming beam search algorithm for statistical machine translation. [sent-626, score-0.501]
100 A fast finite-state relaxation method for enforcing global constraints on sequence decoding. [sent-635, score-0.454]
wordName wordTfidf (topN-words)
[('lagrangian', 0.401), ('relaxation', 0.352), ('dynamic', 0.194), ('dual', 0.191), ('decoding', 0.181), ('pk', 0.169), ('yu', 0.165), ('moses', 0.163), ('subgradient', 0.157), ('yt', 0.15), ('translated', 0.144), ('lagrange', 0.129), ('multipliers', 0.129), ('program', 0.128), ('distortion', 0.121), ('ilp', 0.117), ('programming', 0.115), ('lp', 0.113), ('koehn', 0.112), ('constraints', 0.102), ('korte', 0.099), ('xiu', 0.099), ('translation', 0.095), ('maxy', 0.094), ('beam', 0.091), ('converge', 0.088), ('tightening', 0.085), ('theorem', 0.081), ('span', 0.081), ('vygen', 0.079), ('solution', 0.077), ('exact', 0.076), ('ut', 0.07), ('states', 0.067), ('lemar', 0.059), ('rush', 0.058), ('combinatorial', 0.058), ('tommi', 0.057), ('converges', 0.056), ('derivation', 0.055), ('tillmann', 0.054), ('bigram', 0.053), ('search', 0.053), ('incrementally', 0.052), ('tighten', 0.051), ('sontag', 0.051), ('phrase', 0.051), ('arg', 0.05), ('solutions', 0.049), ('optimal', 0.049), ('algorithm', 0.048), ('iteration', 0.047), ('aryg', 0.046), ('integer', 0.045), ('gap', 0.043), ('decoders', 0.043), ('riedel', 0.041), ('transition', 0.04), ('jaakkola', 0.04), ('contiguous', 0.04), ('atvhee', 0.04), ('echal', 0.04), ('nedi', 0.04), ('theorems', 0.04), ('equality', 0.038), ('guaranteed', 0.036), ('christoph', 0.036), ('fractional', 0.036), ('xl', 0.036), ('tight', 0.036), ('constraint', 0.036), ('fails', 0.034), ('hard', 0.034), ('salesman', 0.034), ('traveling', 0.034), ('zaslavskiy', 0.034), ('argmy', 0.034), ('ayx', 0.034), ('limt', 0.034), ('stroudsburg', 0.034), ('och', 0.034), ('exactly', 0.033), ('philipp', 0.033), ('errors', 0.033), ('solver', 0.032), ('iterations', 0.031), ('decomposition', 0.031), ('nikos', 0.031), ('certificates', 0.031), ('komodakis', 0.031), ('recovers', 0.031), ('wainwright', 0.031), ('arising', 0.031), ('blackwood', 0.031), ('germann', 0.031), ('linear', 0.031), ('bleu', 0.03), ('run', 0.03), ('magnitude', 0.03), ('josef', 0.03)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000005 51 emnlp-2011-Exact Decoding of Phrase-Based Translation Models through Lagrangian Relaxation
Author: Yin-Wen Chang ; Michael Collins
Abstract: This paper describes an algorithm for exact decoding of phrase-based translation models, based on Lagrangian relaxation. The method recovers exact solutions, with certificates of optimality, on over 99% of test examples. The method is much more efficient than approaches based on linear programming (LP) or integer linear programming (ILP) solvers: these methods are not feasible for anything other than short sentences. We compare our method to MOSES (Koehn et al., 2007), and give precise estimates of the number and magnitude of search errors that MOSES makes.
2 0.27856314 45 emnlp-2011-Dual Decomposition with Many Overlapping Components
Author: Andre Martins ; Noah Smith ; Mario Figueiredo ; Pedro Aguiar
Abstract: Dual decomposition has been recently proposed as a way of combining complementary models, with a boost in predictive power. However, in cases where lightweight decompositions are not readily available (e.g., due to the presence of rich features or logical constraints), the original subgradient algorithm is inefficient. We sidestep that difficulty by adopting an augmented Lagrangian method that accelerates model consensus by regularizing towards the averaged votes. We show how first-order logical constraints can be handled efficiently, even though the corresponding subproblems are no longer combinatorial, and report experiments in dependency parsing, with state-of-the-art results. 1
3 0.20232347 49 emnlp-2011-Entire Relaxation Path for Maximum Entropy Problems
Author: Moshe Dubiner ; Yoram Singer
Abstract: We discuss and analyze the problem of finding a distribution that minimizes the relative entropy to a prior distribution while satisfying max-norm constraints with respect to an observed distribution. This setting generalizes the classical maximum entropy problems as it relaxes the standard constraints on the observed values. We tackle the problem by introducing a re-parametrization in which the unknown distribution is distilled to a single scalar. We then describe a homotopy between the relaxation parameter and the distribution characterizing parameter. The homotopy also reveals an aesthetic symmetry between the prior distribution and the observed distribution. We then use the reformulated problem to describe a space and time efficient algorithm for tracking the entire relaxation path. Our derivations are based on a compact geomet- ric view of the relaxation path as a piecewise linear function in a two dimensional space of the relaxation-characterization parameters. We demonstrate the usability of our approach by applying the problem to Zipfian distributions over a large alphabet.
4 0.13904485 108 emnlp-2011-Quasi-Synchronous Phrase Dependency Grammars for Machine Translation
Author: Kevin Gimpel ; Noah A. Smith
Abstract: We present a quasi-synchronous dependency grammar (Smith and Eisner, 2006) for machine translation in which the leaves of the tree are phrases rather than words as in previous work (Gimpel and Smith, 2009). This formulation allows us to combine structural components of phrase-based and syntax-based MT in a single model. We describe a method of extracting phrase dependencies from parallel text using a target-side dependency parser. For decoding, we describe a coarse-to-fine approach based on lattice dependency parsing of phrase lattices. We demonstrate performance improvements for Chinese-English and UrduEnglish translation over a phrase-based baseline. We also investigate the use of unsupervised dependency parsers, reporting encouraging preliminary results.
5 0.11236904 125 emnlp-2011-Statistical Machine Translation with Local Language Models
Author: Christof Monz
Abstract: Part-of-speech language modeling is commonly used as a component in statistical machine translation systems, but there is mixed evidence that its usage leads to significant improvements. We argue that its limited effectiveness is due to the lack of lexicalization. We introduce a new approach that builds a separate local language model for each word and part-of-speech pair. The resulting models lead to more context-sensitive probability distributions and we also exploit the fact that different local models are used to estimate the language model probability of each word during decoding. Our approach is evaluated for Arabic- and Chinese-to-English translation. We show that it leads to statistically significant improvements for multiple test sets and also across different genres, when compared against a competitive baseline and a system using a part-of-speech model.
6 0.11209669 59 emnlp-2011-Fast and Robust Joint Models for Biomedical Event Extraction
7 0.11126115 65 emnlp-2011-Heuristic Search for Non-Bottom-Up Tree Structure Prediction
8 0.11031885 100 emnlp-2011-Optimal Search for Minimum Error Rate Training
9 0.11016459 76 emnlp-2011-Language Models for Machine Translation: Original vs. Translated Texts
10 0.10339844 75 emnlp-2011-Joint Models for Chinese POS Tagging and Dependency Parsing
11 0.1003205 123 emnlp-2011-Soft Dependency Constraints for Reordering in Hierarchical Phrase-Based Translation
12 0.097994231 44 emnlp-2011-Domain Adaptation via Pseudo In-Domain Data Selection
13 0.095581494 58 emnlp-2011-Fast Generation of Translation Forest for Large-Scale SMT Discriminative Training
14 0.093825668 66 emnlp-2011-Hierarchical Phrase-based Translation Representations
15 0.089113906 9 emnlp-2011-A Non-negative Matrix Factorization Based Approach for Active Dual Supervision from Document and Word Labels
16 0.077635109 10 emnlp-2011-A Probabilistic Forest-to-String Model for Language Generation from Typed Lambda Calculus Expressions
17 0.076576874 13 emnlp-2011-A Word Reordering Model for Improved Machine Translation
18 0.074158557 20 emnlp-2011-Augmenting String-to-Tree Translation Models with Fuzzy Use of Source-side Syntax
19 0.073637642 15 emnlp-2011-A novel dependency-to-string model for statistical machine translation
20 0.072153889 96 emnlp-2011-Multilayer Sequence Labeling
topicId topicWeight
[(0, 0.237), (1, 0.137), (2, 0.064), (3, -0.076), (4, 0.09), (5, -0.055), (6, 0.058), (7, -0.396), (8, -0.056), (9, -0.07), (10, 0.029), (11, -0.054), (12, 0.245), (13, -0.253), (14, -0.049), (15, -0.143), (16, -0.07), (17, -0.069), (18, -0.032), (19, -0.089), (20, -0.084), (21, -0.088), (22, -0.02), (23, -0.066), (24, -0.124), (25, 0.065), (26, 0.025), (27, -0.162), (28, -0.025), (29, -0.038), (30, -0.042), (31, 0.052), (32, 0.041), (33, -0.016), (34, -0.02), (35, -0.028), (36, -0.044), (37, -0.087), (38, -0.088), (39, -0.042), (40, -0.051), (41, -0.011), (42, -0.029), (43, 0.075), (44, -0.037), (45, -0.044), (46, 0.09), (47, -0.051), (48, -0.04), (49, -0.002)]
simIndex simValue paperId paperTitle
same-paper 1 0.94640619 51 emnlp-2011-Exact Decoding of Phrase-Based Translation Models through Lagrangian Relaxation
Author: Yin-Wen Chang ; Michael Collins
Abstract: This paper describes an algorithm for exact decoding of phrase-based translation models, based on Lagrangian relaxation. The method recovers exact solutions, with certificates of optimality, on over 99% of test examples. The method is much more efficient than approaches based on linear programming (LP) or integer linear programming (ILP) solvers: these methods are not feasible for anything other than short sentences. We compare our method to MOSES (Koehn et al., 2007), and give precise estimates of the number and magnitude of search errors that MOSES makes.
2 0.84135544 45 emnlp-2011-Dual Decomposition with Many Overlapping Components
Author: Andre Martins ; Noah Smith ; Mario Figueiredo ; Pedro Aguiar
Abstract: Dual decomposition has been recently proposed as a way of combining complementary models, with a boost in predictive power. However, in cases where lightweight decompositions are not readily available (e.g., due to the presence of rich features or logical constraints), the original subgradient algorithm is inefficient. We sidestep that difficulty by adopting an augmented Lagrangian method that accelerates model consensus by regularizing towards the averaged votes. We show how first-order logical constraints can be handled efficiently, even though the corresponding subproblems are no longer combinatorial, and report experiments in dependency parsing, with state-of-the-art results. 1
3 0.82667685 49 emnlp-2011-Entire Relaxation Path for Maximum Entropy Problems
Author: Moshe Dubiner ; Yoram Singer
Abstract: We discuss and analyze the problem of finding a distribution that minimizes the relative entropy to a prior distribution while satisfying max-norm constraints with respect to an observed distribution. This setting generalizes the classical maximum entropy problems as it relaxes the standard constraints on the observed values. We tackle the problem by introducing a re-parametrization in which the unknown distribution is distilled to a single scalar. We then describe a homotopy between the relaxation parameter and the distribution characterizing parameter. The homotopy also reveals an aesthetic symmetry between the prior distribution and the observed distribution. We then use the reformulated problem to describe a space and time efficient algorithm for tracking the entire relaxation path. Our derivations are based on a compact geomet- ric view of the relaxation path as a piecewise linear function in a two dimensional space of the relaxation-characterization parameters. We demonstrate the usability of our approach by applying the problem to Zipfian distributions over a large alphabet.
4 0.42902535 66 emnlp-2011-Hierarchical Phrase-based Translation Representations
Author: Gonzalo Iglesias ; Cyril Allauzen ; William Byrne ; Adria de Gispert ; Michael Riley
Abstract: This paper compares several translation representations for a synchronous context-free grammar parse including CFGs/hypergraphs, finite-state automata (FSA), and pushdown automata (PDA). The representation choice is shown to determine the form and complexity of target LM intersection and shortest-path algorithms that follow. Intersection, shortest path, FSA expansion and RTN replacement algorithms are presented for PDAs. Chinese-toEnglish translation experiments using HiFST and HiPDT, FSA and PDA-based decoders, are presented using admissible (or exact) search, possible for HiFST with compact SCFG rulesets and HiPDT with compact LMs. For large rulesets with large LMs, we introduce a two-pass search strategy which we then analyze in terms of search errors and translation performance.
5 0.42080304 59 emnlp-2011-Fast and Robust Joint Models for Biomedical Event Extraction
Author: Sebastian Riedel ; Andrew McCallum
Abstract: Extracting biomedical events from literature has attracted much recent attention. The bestperforming systems so far have been pipelines of simple subtask-specific local classifiers. A natural drawback of such approaches are cascading errors introduced in early stages of the pipeline. We present three joint models of increasing complexity designed to overcome this problem. The first model performs joint trigger and argument extraction, and lends itself to a simple, efficient and exact inference algorithm. The second model captures correlations between events, while the third model ensures consistency between arguments of the same event. Inference in these models is kept tractable through dual decomposition. The first two models outperform the previous best joint approaches and are very competitive with respect to the current state-of-theart. The third model yields the best results reported so far on the BioNLP 2009 shared task, the BioNLP 2011 Genia task and the BioNLP 2011Infectious Diseases task.
6 0.4058978 100 emnlp-2011-Optimal Search for Minimum Error Rate Training
7 0.34581119 65 emnlp-2011-Heuristic Search for Non-Bottom-Up Tree Structure Prediction
8 0.33153173 76 emnlp-2011-Language Models for Machine Translation: Original vs. Translated Texts
9 0.32487833 108 emnlp-2011-Quasi-Synchronous Phrase Dependency Grammars for Machine Translation
10 0.30221543 123 emnlp-2011-Soft Dependency Constraints for Reordering in Hierarchical Phrase-Based Translation
11 0.30022046 44 emnlp-2011-Domain Adaptation via Pseudo In-Domain Data Selection
12 0.29292965 9 emnlp-2011-A Non-negative Matrix Factorization Based Approach for Active Dual Supervision from Document and Word Labels
13 0.28482321 125 emnlp-2011-Statistical Machine Translation with Local Language Models
14 0.28415936 5 emnlp-2011-A Fast Re-scoring Strategy to Capture Long-Distance Dependencies
15 0.25532252 75 emnlp-2011-Joint Models for Chinese POS Tagging and Dependency Parsing
16 0.25382149 22 emnlp-2011-Better Evaluation Metrics Lead to Better Machine Translation
17 0.24782069 85 emnlp-2011-Learning to Simplify Sentences with Quasi-Synchronous Grammar and Integer Programming
18 0.2368149 25 emnlp-2011-Cache-based Document-level Statistical Machine Translation
19 0.23458396 3 emnlp-2011-A Correction Model for Word Alignments
20 0.23270206 13 emnlp-2011-A Word Reordering Model for Improved Machine Translation
topicId topicWeight
[(23, 0.097), (36, 0.024), (37, 0.017), (45, 0.047), (53, 0.04), (54, 0.021), (57, 0.011), (62, 0.031), (64, 0.081), (65, 0.012), (66, 0.017), (69, 0.377), (79, 0.044), (82, 0.021), (85, 0.021), (87, 0.011), (96, 0.033), (98, 0.017)]
simIndex simValue paperId paperTitle
same-paper 1 0.84310925 51 emnlp-2011-Exact Decoding of Phrase-Based Translation Models through Lagrangian Relaxation
Author: Yin-Wen Chang ; Michael Collins
Abstract: This paper describes an algorithm for exact decoding of phrase-based translation models, based on Lagrangian relaxation. The method recovers exact solutions, with certificates of optimality, on over 99% of test examples. The method is much more efficient than approaches based on linear programming (LP) or integer linear programming (ILP) solvers: these methods are not feasible for anything other than short sentences. We compare our method to MOSES (Koehn et al., 2007), and give precise estimates of the number and magnitude of search errors that MOSES makes.
2 0.79540604 109 emnlp-2011-Random Walk Inference and Learning in A Large Scale Knowledge Base
Author: Ni Lao ; Tom Mitchell ; William W. Cohen
Abstract: t om . We consider the problem of performing learning and inference in a large scale knowledge base containing imperfect knowledge with incomplete coverage. We show that a soft inference procedure based on a combination of constrained, weighted, random walks through the knowledge base graph can be used to reliably infer new beliefs for the knowledge base. More specifically, we show that the system can learn to infer different target relations by tuning the weights associated with random walks that follow different paths through the graph, using a version of the Path Ranking Algorithm (Lao and Cohen, 2010b). We apply this approach to a knowledge base of approximately 500,000 beliefs extracted imperfectly from the web by NELL, a never-ending language learner (Carlson et al., 2010). This new system improves significantly over NELL’s earlier Horn-clause learning and inference method: it obtains nearly double the precision at rank 100, and the new learning method is also applicable to many more inference tasks.
3 0.73886287 31 emnlp-2011-Computation of Infix Probabilities for Probabilistic Context-Free Grammars
Author: Mark-Jan Nederhof ; Giorgio Satta
Abstract: The notion of infix probability has been introduced in the literature as a generalization of the notion of prefix (or initial substring) probability, motivated by applications in speech recognition and word error correction. For the case where a probabilistic context-free grammar is used as language model, methods for the computation of infix probabilities have been presented in the literature, based on various simplifying assumptions. Here we present a solution that applies to the problem in its full generality.
4 0.73500675 73 emnlp-2011-Improving Bilingual Projections via Sparse Covariance Matrices
Author: Jagadeesh Jagarlamudi ; Raghavendra Udupa ; Hal Daume III ; Abhijit Bhole
Abstract: Mapping documents into an interlingual representation can help bridge the language barrier of cross-lingual corpora. Many existing approaches are based on word co-occurrences extracted from aligned training data, represented as a covariance matrix. In theory, such a covariance matrix should represent semantic equivalence, and should be highly sparse. Unfortunately, the presence of noise leads to dense covariance matrices which in turn leads to suboptimal document representations. In this paper, we explore techniques to recover the desired sparsity in covariance matrices in two ways. First, we explore word association measures and bilingual dictionaries to weigh the word pairs. Later, we explore different selection strategies to remove the noisy pairs based on the association scores. Our experimental results on the task of aligning comparable documents shows the efficacy of sparse covariance matrices on two data sets from two different language pairs.
5 0.55237085 66 emnlp-2011-Hierarchical Phrase-based Translation Representations
Author: Gonzalo Iglesias ; Cyril Allauzen ; William Byrne ; Adria de Gispert ; Michael Riley
Abstract: This paper compares several translation representations for a synchronous context-free grammar parse including CFGs/hypergraphs, finite-state automata (FSA), and pushdown automata (PDA). The representation choice is shown to determine the form and complexity of target LM intersection and shortest-path algorithms that follow. Intersection, shortest path, FSA expansion and RTN replacement algorithms are presented for PDAs. Chinese-toEnglish translation experiments using HiFST and HiPDT, FSA and PDA-based decoders, are presented using admissible (or exact) search, possible for HiFST with compact SCFG rulesets and HiPDT with compact LMs. For large rulesets with large LMs, we introduce a two-pass search strategy which we then analyze in terms of search errors and translation performance.
6 0.46705705 45 emnlp-2011-Dual Decomposition with Many Overlapping Components
7 0.45139942 58 emnlp-2011-Fast Generation of Translation Forest for Large-Scale SMT Discriminative Training
8 0.44042727 108 emnlp-2011-Quasi-Synchronous Phrase Dependency Grammars for Machine Translation
9 0.43799454 59 emnlp-2011-Fast and Robust Joint Models for Biomedical Event Extraction
10 0.43024617 49 emnlp-2011-Entire Relaxation Path for Maximum Entropy Problems
11 0.42656916 65 emnlp-2011-Heuristic Search for Non-Bottom-Up Tree Structure Prediction
12 0.42568091 123 emnlp-2011-Soft Dependency Constraints for Reordering in Hierarchical Phrase-Based Translation
13 0.42280355 146 emnlp-2011-Unsupervised Structure Prediction with Non-Parallel Multilingual Guidance
14 0.41606766 122 emnlp-2011-Simple Effective Decipherment via Combinatorial Optimization
15 0.41582671 3 emnlp-2011-A Correction Model for Word Alignments
16 0.41280288 13 emnlp-2011-A Word Reordering Model for Improved Machine Translation
17 0.40690529 85 emnlp-2011-Learning to Simplify Sentences with Quasi-Synchronous Grammar and Integer Programming
18 0.40656644 134 emnlp-2011-Third-order Variational Reranking on Packed-Shared Dependency Forests
19 0.4047122 20 emnlp-2011-Augmenting String-to-Tree Translation Models with Fuzzy Use of Source-side Syntax
20 0.4036234 128 emnlp-2011-Structured Relation Discovery using Generative Models