emnlp emnlp2011 emnlp2011-66 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Gonzalo Iglesias ; Cyril Allauzen ; William Byrne ; Adria de Gispert ; Michael Riley
Abstract: This paper compares several translation representations for a synchronous context-free grammar parse including CFGs/hypergraphs, finite-state automata (FSA), and pushdown automata (PDA). The representation choice is shown to determine the form and complexity of target LM intersection and shortest-path algorithms that follow. Intersection, shortest path, FSA expansion and RTN replacement algorithms are presented for PDAs. Chinese-toEnglish translation experiments using HiFST and HiPDT, FSA and PDA-based decoders, are presented using admissible (or exact) search, possible for HiFST with compact SCFG rulesets and HiPDT with compact LMs. For large rulesets with large LMs, we introduce a two-pass search strategy which we then analyze in terms of search errors and translation performance.
Reference: text
sentIndex sentText sentNum sentScore
1 com au , , , Abstract This paper compares several translation representations for a synchronous context-free grammar parse including CFGs/hypergraphs, finite-state automata (FSA), and pushdown automata (PDA). [sent-10, score-0.549]
2 The representation choice is shown to determine the form and complexity of target LM intersection and shortest-path algorithms that follow. [sent-11, score-0.27]
3 Intersection, shortest path, FSA expansion and RTN replacement algorithms are presented for PDAs. [sent-12, score-0.276]
4 Chinese-toEnglish translation experiments using HiFST and HiPDT, FSA and PDA-based decoders, are presented using admissible (or exact) search, possible for HiFST with compact SCFG rulesets and HiPDT with compact LMs. [sent-13, score-0.265]
5 For large rulesets with large LMs, we introduce a two-pass search strategy which we then analyze in terms of search errors and translation performance. [sent-14, score-0.227]
6 1 Introduction Hierarchical phrase-based translation, using a synchronous context-free translation grammar (SCFG) together with an n-gram target language model (LM), is a popular approach in machine translation (Chiang, 2007). [sent-15, score-0.328]
7 e translation and language model combination with the highest-probablity path: Lˆ=argmaxl∈LL Of course, decoding requires explicit data representations and algorithms for combining and searching them. [sent-27, score-0.277]
8 In common to the approaches we will consider here, s is applied to G by using the CYK algorithm in Step 1 and M is represented by a finite automaton in Step 2. [sent-28, score-0.213]
9 i gSin dcee{s} is a finite language and we assume througho{sut} t ihsat a G fin diteoes la nngout aglelo wan dun wbeou ansdsuemd insertions, T and L are, in fact, regular languages. [sent-30, score-0.156]
10 I nh atvheis case, weighted nfi rneiptere-ssteanteta tiinotenrsse Tction and single-source shortest path algorithms (using negative log probabilities) can be used to solve Steps 2 and 3 (Mohri, 2009). [sent-32, score-0.303]
11 In this case, hypergraph intersection with a finite automaton and hypergraph shortest path algorithms can be used to solve Steps 2 and 3 (Huang, 2008). [sent-37, score-0.756]
12 In this paper, we will consider another representation for context-free languages T and L as well, pushdown automata (PDA) Tp asn Td Lp, fLam aisli ware lfl,ro pmu fhodromwanl ProceedEindgisnb oufr tghhe, 2 S0c1o1tl Canodn,f eUrKen,c Jeuol yn 2 E7m–3p1ir,ic 2a0l1 M1. [sent-39, score-0.268]
13 We will describe PDA intersection with a finite automaton and PDA shortest-path algorithms in Section 2 that can be used to solve Steps 2 and 3. [sent-42, score-0.366]
14 As presented so far, the search performed in Step 3 is admissible (or exact) the true shortest path is found. [sent-44, score-0.295]
15 Further, such pruning can greatly complicate any complexity analysis of the underlying representa– tions and algorithms. [sent-47, score-0.183]
16 We show that the PDA representation is particularly suited for decoding with large SCFGs and compact LMs. [sent-50, score-0.161]
17 We present Chinese-English translation results under the FSA and PDA translation representations. [sent-51, score-0.252]
18 We describe a two-pass translation strategy which we have developed to allow use of the PDA representation in large-scale translation. [sent-52, score-0.17]
19 In the first pass, translation is done using a lattice-generating version of the shortest path algorithm. [sent-53, score-0.35]
20 The full translation grammar is used but with a compact, entropy-pruned version (Stolcke, 1998) of the full language model. [sent-54, score-0.202]
21 This first-step uses admissible pruning and lattice generation under the compact language model. [sent-55, score-0.213]
22 We then investigate a translation grammar which is large enough that exact translation under the FSA representation is not possible. [sent-58, score-0.415]
23 We find that translation is possible using the two-pass strategy with the PDA translation representation and that gains in BLEU score result from using the larger translation grammar. [sent-59, score-0.422]
24 The challenge is to find algorithms that can be made to work with large translation grammars and large language models. [sent-62, score-0.152]
25 Search errors in hierarchical translation, and in translation more generally, have not been as extensively studied; this is undoubtedly due to the diffi- culties inherent in finding exact translations for use in comparison. [sent-64, score-0.22]
26 For Hiero translation, an extensive comparison of search errors between the cube pruning and FSA implementation was presented by Iglesias et al. [sent-68, score-0.145]
27 2 Pushdown Automata In this section, we formally define pushdown automata and give intersection, shortest-path and related algorithms that will be needed later. [sent-77, score-0.25]
28 Informally, pushdown automata are finite automata that have been augmented with a stack. [sent-78, score-0.417]
29 Our equivalent representation allows a transition to be labeled by a stack operation or a regular input symbol but not both. [sent-84, score-0.252]
30 The advantage of this representation is that is identical to the finite automaton representation except that certain symbols (the parentheses) have special semantics. [sent-86, score-0.301]
31 More formally, let A and A be two finite alphabets such that there exists a bijection f from A to 1375 A. [sent-93, score-0.196]
32 Intuitively, f maps an open parenthesis to its corresponding close parenthesis. [sent-94, score-0.141]
33 Observe xth aatl Let A and B be two finite alphabets such that B ⊆ A, we define the mapping rB : A∗ → B∗ by rB (x1 . [sent-101, score-0.167]
34 path π accepts the string x ∈ Σ])∗ ∈ ∈ifD Dit is a balanced accepting path s tuhceh s tthriantg rΣ (i[π]) = x. [sent-122, score-0.326]
35 A weighted language is recognizable by a weighted pushdown automaton iff it is context-free. [sent-124, score-0.35]
36 there exists K ∈ N Asuc PhD tAha Tt f hora any sub-path π cokf i any ebrael eanxicsetsd path in T: |cΠ (rbΠ (i[π])) | ≤ K. [sent-127, score-0.137]
37 A weighted finite automaton (FSA) can be viewed as a PDA where the open and close parentheses alphabets are empty, see (Mohri, 2009) for a standalone definition. [sent-130, score-0.394]
38 The set of states of T′ is the set of pairs (q, z) that can be reached from an initial state by transitions defined as above. [sent-137, score-0.164]
39 The condition that T has a bounded stack ensures that this set is finite z′ a′ a′, a′ z′= z′ z′a (since it implies that for any (q, z), |z| ≤ K). [sent-138, score-0.251]
40 3 Intersection Algorithm The class of weighted pushdown automata is closed under intersection with weighted finite automata (Bar-Hillel et al. [sent-143, score-0.65]
41 Considering a pair (T1, T2) where one element is an FSA and the other element a PDA, then there exists a PDA T1∩ T2, the intersection of T1 and T2, such that for all∩ x ∈ Σ∗: (T1 ∩ T2) (x) = T1(x) + T2(x). [sent-145, score-0.156]
42 Given a transition e1 = (q1, a, w1, q1′) in = T1, transitions out of (q1, q2) in T are obtained using the following rules. [sent-152, score-0.148]
43 4 Shortest Distance and Path Algorithms A shortest path in a PDA T is a balanced accepting path with minimal weight and the shortest distance in T is the weight of such a path. [sent-161, score-0.558]
44 We show that when T has a bounded stack, shortest distance and shortest path can be computed in O( log |T|) time (assuming Tn b haes no negative weights) aognd| space. [sent-162, score-0.379]
45 |T|3 TO|)(|T tim|2)e Given a state s in T with at least one incoming open parenthesis transition, we denote by Cs the set of states that can be reached from s by a balanced path. [sent-163, score-0.283]
46 If s has several incoming open parenthesis transitions, a naive implementation might lead to the states in Cs to be visited up to exponentially many times. [sent-164, score-0.203]
47 The basic idea of the algorithm is to memoize the shortest distance from s to states in Cs. [sent-165, score-0.151]
48 While the queue is not empty, a state is dequeued and its outgoing transitions examined (line 5-9). [sent-168, score-0.191]
49 When the considered transition e is labeled by a close parenthesis, it is remembered that it balances all incoming open parentheses in s labeled by i[e] B[s,i[e]] by adding e to (line 11-12). [sent-170, score-0.17]
50 Second, the space required ifnoirt storing B is |isQ a|t most in O(|E|2) since fqouri eeadch fo open parenthesis t rmaonssitti ionn e, Ethe| size of |B[n[e] ,i[e]] | is O( |E|) in the worst case. [sent-175, score-0.171]
51 ulated number of transitions examined at line 16 is in O(N|Q| |E|2) in the worst case, where N denotes tOhe(N Nm|aQx|im |Eal| number of times a state is inserted in the queue for a given call of GETDISTANCE. [sent-177, score-0.191]
52 Assuming the cost of a queue operation is Γ(n) for a queue containing n elements, the worst-case time complexity of the algorithm can then be expressed as O(N|T|3 Γ(|T|)). [sent-178, score-0.197]
53 te Wst-hfiernst T queue discipline laetaidvse to a time complexity in O(|T|3 log |T|). [sent-180, score-0.135]
54 Moreover, for each close parenthesis transistion e, there exists a unique open parenthesis transition e′ such that e ∈ B [n[e′] , i[e′]] . [sent-189, score-0.345]
55 The algorithm can be modified to compute the shortest path by keeping track of parent pointers. [sent-191, score-0.224]
56 5 Replacement Algorithm A recursive transition network (RTN) can be specified by (N, Σ, (Tν)ν∈N, S) where N is an alphabet of nonterminals, Σ is the input alphabet, (Tν)ν∈N is a family of FSAs with input alphabet Σ N, and Sa ∈ mNi liys t ohfe F roSoAt sn wonittherm inipnuatl. [sent-193, score-0.17]
57 ∈AN string x ∈ tΣ n∗o nist accepted by R if there exists an accepting path π in TS such that recursively replacing any transition with input label ν ∈ N by an accepting path nins Tν nlea wdisth hto in a path bπe∗l νwi ∈th N input x. [sent-194, score-0.579]
58 Figure 3 shows several alternative representations of T : Figure 3a shows the hypergraph representation of tTh :is F grammar; thhoewres tish a h1y :1p correspondence t baettiwonee onf each production in the CFG and each hyperedge in the hypergraph. [sent-205, score-0.171]
59 Figure 3c shows the pushdown automaton representation generated from the RTN with the replacement algorithm of Section 2. [sent-209, score-0.327]
60 Since s is a finite language and G does not allow unbounded insertion, Tp has a bounded stack and T is, in fact, a regular language. [sent-211, score-0.293]
61 Figure 3d shows tThe i sfi,n inite f-satcatt,e a a ruetgoumlaarto lann representation o 3fd dT sh genethreate fdin by tthatee P aDutAom using rtheper expansion algorithm of Section 2. [sent-212, score-0.139]
62 The HiFST decoder converts its RTN translation representation immediately into the finite-state representation using an algorithm equivalent to converting the RTN into a PDA followed by PDA expansion. [sent-214, score-0.251]
63 T|Ghe|) )F tiSmAe representation can require an additional time and space since the PDA expansion can be exponential. [sent-219, score-0.169]
64 Intersection: The intersection of a CFG Th with a finite automaton M can be performed by the classical Bar-Hillel algorithm (Bar-Hillel et al. [sent-221, score-0.34]
65 t path algorithm on the hypergraph, RTN, and FSA representations requires linear time and space (given the underlying acyclicity) (Huang, 2008; Mohri, 2009). [sent-229, score-0.182]
66 The FSA representation favors smaller target translation sets and larger language models. [sent-242, score-0.17]
67 Should a better complexity PDA shortest path algorithm be found, this conclusion could change. [sent-243, score-0.297]
68 The principal difference between the two decoders is where the finite-state expansion step is done. [sent-250, score-0.161]
69 In HiPDT, this expansion is delayed as late as possible - in the output of the shortest path algorithm. [sent-252, score-0.319]
70 Another possible configuration is to expand after the LM intersection step but before the shortest path algorithm; in practice this is quite similar to HiFST. [sent-253, score-0.351]
71 We call a translation grammar the set of rules extracted from this process. [sent-277, score-0.202]
72 We extract two translation grammars: • • A restricted grammar where we apply the following catdeddit giroanmalm caorn wsthrearinet: w eru aplepsl are only considered if they have a forward translation probability p > 0. [sent-278, score-0.328]
73 As will be discussed later, the interest of this grammar is that decoding under it can be exact, that is, without any pruning in search. [sent-281, score-0.267]
74 h Tuhti tsh ies a superset nofthe previous grammar, and exact search under it is not feasible for HiFST: pruning is required in search. [sent-284, score-0.188]
75 This produces a translation lattice in the topmost cell that contains hypotheses with exact scores under the translation grammar and M1θ. [sent-305, score-0.429]
76 We remove the M1θ scores from the pruned translation lattices and reapply M1, moving the word penalty back to the original value obtained in MERT. [sent-309, score-0.216]
77 Additionally, we can rescore the translation lattices obtained in steps 1 or 5 with the larger language model M2. [sent-312, score-0.188]
78 Note that if β = ∞ or if θ = 0, the translation lattices Nobotatein thedat i inf step ∞1 s ohro iufld θ =be0 ,id tehent ticraanl tloa tiohen ones eofs step 5. [sent-314, score-0.188]
79 While the goal is to increase θ to reduce the size of the language model used at Step 3, β will have to increase accordingly so as to avoid pruning away desirable hypotheses in Step 4. [sent-315, score-0.137]
80 If β defines a sufficiently wide beam to contain the hypotheses which would be favoured by M1, faster decoding with M1θ would be possible without incurring search errors M1. [sent-316, score-0.143]
81 5 Entropy-Pruned LM in Rescoring In Table 3 we show translation performance under grammar G1 for different values of θ. [sent-318, score-0.202]
82 Performance is reported after first-pass decoding with M1θ (see step 3 in Section 4), after rescoring with M1 (see step 5) and after rescoring with M2 (see step 6). [sent-319, score-0.255]
83 Under translation grammar G1, HiFST is able to generate an FSA with the entire space of possible candidate hypotheses. [sent-321, score-0.232]
84 6 Hiero with PDAs and FSAs In this section we contrast HiFST with HiPDT under the same translation grammar and entropy-pruned language models. [sent-336, score-0.202]
85 Under the constrained grammar G1 their performance is identical as both decoders can generate the entire search space which can then be rescored with M1 or M2 as shown in the previous section. [sent-337, score-0.207]
86 Therefore, we now focus on the unconstrained grammar G2, where exact search is not feasible for HiFST. [sent-338, score-0.154]
87 If this limit is reached in decod- Performance in subsequent various β is also rescoring with M1 and M2 after likelihood-based pruning of the translation lattices for reported. [sent-340, score-0.385]
88 Exact search with HiFST is not possible under these conditions: pruning during search would bwei required. [sent-357, score-0.18]
89 For HiFST, these include expansion into an FSA (Expand) and subsequent intersection with the language model (Compose). [sent-360, score-0.222]
90 For HiPDT, these include PDA intersection with the language model (Compose) and subsequent expansion into an FSA (Expand), using algorithms described in Section 2. [sent-361, score-0.248]
91 Table 4 shows the number of times each decoder succeeds in finding a hypothesis given the memory limit, and the operations being carried out when they fail to do so, when decoding with various M1θ. [sent-362, score-0.154]
92 7 Discussion and Conclusion HiFST fails to decode mainly because the expansion into an FST leads to far too big search spaces (e. [sent-374, score-0.215]
93 5h space into an FST, the decoder still has to compose with the language model, which is also critical in terms of memory usage (fails 536 times). [sent-379, score-0.169]
94 In contrast, HiPDT creates a PDA, which is a more compact representation of the search space and allows efficient intersection with the language model before expansion into an FST. [sent-380, score-0.367]
95 Nevertheless, the complexity of the language model is critical for the PDA intersection and very specially the PDA expansion into an FST (fails 403 times for θ=7. [sent-382, score-0.295]
96 t5h ×th 1e0 algorithms presented in this paper, decoding with PDAs is possible for any translation grammar as long as an entropy pruned LM is used. [sent-385, score-0.337]
97 On the other hand, HiFST cannot decode under large translation grammars, thus requiring pruning during lattice construction, but it can apply an unpruned LM in this process. [sent-387, score-0.306]
98 But without pruning in search, expansion directly into an FST would lead to an explosion in terms of memory usage. [sent-389, score-0.241]
99 Hierarchical phrase-based translation with weighted finite state transducers and shallow-n grammars. [sent-451, score-0.342]
100 A weighted finite state transducer translation template model for statistical machine translation. [sent-504, score-0.342]
wordName wordTfidf (topN-words)
[('pda', 0.513), ('fsa', 0.342), ('hifst', 0.275), ('rtn', 0.214), ('hipdt', 0.198), ('pushdown', 0.145), ('iglesias', 0.137), ('intersection', 0.127), ('translation', 0.126), ('shortest', 0.116), ('finite', 0.114), ('pruning', 0.11), ('path', 0.108), ('parenthesis', 0.107), ('automaton', 0.099), ('stack', 0.098), ('gispert', 0.095), ('expansion', 0.095), ('rescoring', 0.087), ('allauzen', 0.083), ('hypergraph', 0.083), ('decoding', 0.081), ('transitions', 0.08), ('automata', 0.079), ('riley', 0.079), ('accepting', 0.079), ('fst', 0.079), ('drosde', 0.076), ('pdas', 0.076), ('grammar', 0.076), ('complexity', 0.073), ('transition', 0.068), ('decoders', 0.066), ('mohri', 0.066), ('queue', 0.062), ('lattices', 0.062), ('ss', 0.061), ('getdistance', 0.061), ('lm', 0.058), ('tp', 0.056), ('scfg', 0.056), ('weighted', 0.053), ('alphabets', 0.053), ('cfg', 0.051), ('alphabet', 0.051), ('hierarchical', 0.051), ('state', 0.049), ('gonzalo', 0.048), ('adri', 0.048), ('fails', 0.046), ('banga', 0.046), ('dyck', 0.046), ('eduardo', 0.046), ('representation', 0.044), ('representations', 0.044), ('exact', 0.043), ('regular', 0.042), ('aho', 0.041), ('parentheses', 0.041), ('fsas', 0.039), ('replacement', 0.039), ('decode', 0.039), ('bounded', 0.039), ('chiang', 0.038), ('cyril', 0.037), ('decoder', 0.037), ('admissible', 0.036), ('compose', 0.036), ('memory', 0.036), ('compact', 0.036), ('huang', 0.035), ('william', 0.035), ('search', 0.035), ('states', 0.035), ('open', 0.034), ('deng', 0.033), ('ullman', 0.033), ('rb', 0.033), ('lattice', 0.031), ('balanced', 0.031), ('arto', 0.031), ('fsts', 0.031), ('hershberger', 0.031), ('kuich', 0.031), ('ljolje', 0.031), ('petre', 0.031), ('rulesets', 0.031), ('salomaa', 0.031), ('usage', 0.03), ('space', 0.03), ('quadratic', 0.029), ('row', 0.029), ('exists', 0.029), ('pruned', 0.028), ('rush', 0.027), ('incoming', 0.027), ('hypotheses', 0.027), ('en', 0.026), ('algorithms', 0.026), ('yonggang', 0.026)]
simIndex simValue paperId paperTitle
same-paper 1 0.99999887 66 emnlp-2011-Hierarchical Phrase-based Translation Representations
Author: Gonzalo Iglesias ; Cyril Allauzen ; William Byrne ; Adria de Gispert ; Michael Riley
Abstract: This paper compares several translation representations for a synchronous context-free grammar parse including CFGs/hypergraphs, finite-state automata (FSA), and pushdown automata (PDA). The representation choice is shown to determine the form and complexity of target LM intersection and shortest-path algorithms that follow. Intersection, shortest path, FSA expansion and RTN replacement algorithms are presented for PDAs. Chinese-toEnglish translation experiments using HiFST and HiPDT, FSA and PDA-based decoders, are presented using admissible (or exact) search, possible for HiFST with compact SCFG rulesets and HiPDT with compact LMs. For large rulesets with large LMs, we introduce a two-pass search strategy which we then analyze in terms of search errors and translation performance.
2 0.13686393 5 emnlp-2011-A Fast Re-scoring Strategy to Capture Long-Distance Dependencies
Author: Anoop Deoras ; Tomas Mikolov ; Kenneth Church
Abstract: A re-scoring strategy is proposed that makes it feasible to capture more long-distance dependencies in the natural language. Two pass strategies have become popular in a number of recognition tasks such as ASR (automatic speech recognition), MT (machine translation) and OCR (optical character recognition). The first pass typically applies a weak language model (n-grams) to a lattice and the second pass applies a stronger language model to N best lists. The stronger language model is intended to capture more longdistance dependencies. The proposed method uses RNN-LM (recurrent neural network language model), which is a long span LM, to rescore word lattices in the second pass. A hill climbing method (iterative decoding) is proposed to search over islands of confusability in the word lattice. An evaluation based on Broadcast News shows speedups of 20 over basic N best re-scoring, and word error rate reduction of 8% (relative) on a highly competitive setup.
3 0.12622602 52 emnlp-2011-Exact Inference for Generative Probabilistic Non-Projective Dependency Parsing
Author: Shay B. Cohen ; Carlos Gomez-Rodriguez ; Giorgio Satta
Abstract: We describe a generative model for nonprojective dependency parsing based on a simplified version of a transition system that has recently appeared in the literature. We then develop a dynamic programming parsing algorithm for our model, and derive an insideoutside algorithm that can be used for unsupervised learning of non-projective dependency trees.
4 0.11622929 31 emnlp-2011-Computation of Infix Probabilities for Probabilistic Context-Free Grammars
Author: Mark-Jan Nederhof ; Giorgio Satta
Abstract: The notion of infix probability has been introduced in the literature as a generalization of the notion of prefix (or initial substring) probability, motivated by applications in speech recognition and word error correction. For the case where a probabilistic context-free grammar is used as language model, methods for the computation of infix probabilities have been presented in the literature, based on various simplifying assumptions. Here we present a solution that applies to the problem in its full generality.
5 0.099295571 108 emnlp-2011-Quasi-Synchronous Phrase Dependency Grammars for Machine Translation
Author: Kevin Gimpel ; Noah A. Smith
Abstract: We present a quasi-synchronous dependency grammar (Smith and Eisner, 2006) for machine translation in which the leaves of the tree are phrases rather than words as in previous work (Gimpel and Smith, 2009). This formulation allows us to combine structural components of phrase-based and syntax-based MT in a single model. We describe a method of extracting phrase dependencies from parallel text using a target-side dependency parser. For decoding, we describe a coarse-to-fine approach based on lattice dependency parsing of phrase lattices. We demonstrate performance improvements for Chinese-English and UrduEnglish translation over a phrase-based baseline. We also investigate the use of unsupervised dependency parsers, reporting encouraging preliminary results.
6 0.094149537 65 emnlp-2011-Heuristic Search for Non-Bottom-Up Tree Structure Prediction
7 0.093825668 51 emnlp-2011-Exact Decoding of Phrase-Based Translation Models through Lagrangian Relaxation
8 0.078838773 10 emnlp-2011-A Probabilistic Forest-to-String Model for Language Generation from Typed Lambda Calculus Expressions
9 0.076774605 44 emnlp-2011-Domain Adaptation via Pseudo In-Domain Data Selection
10 0.072768241 58 emnlp-2011-Fast Generation of Translation Forest for Large-Scale SMT Discriminative Training
11 0.065291896 125 emnlp-2011-Statistical Machine Translation with Local Language Models
12 0.06459254 15 emnlp-2011-A novel dependency-to-string model for statistical machine translation
13 0.064227276 20 emnlp-2011-Augmenting String-to-Tree Translation Models with Fuzzy Use of Source-side Syntax
14 0.064219162 22 emnlp-2011-Better Evaluation Metrics Lead to Better Machine Translation
15 0.062896542 83 emnlp-2011-Learning Sentential Paraphrases from Bilingual Parallel Corpora for Text-to-Text Generation
16 0.060961396 93 emnlp-2011-Minimum Imputed-Risk: Unsupervised Discriminative Training for Machine Translation
17 0.058252595 100 emnlp-2011-Optimal Search for Minimum Error Rate Training
18 0.055232301 134 emnlp-2011-Third-order Variational Reranking on Packed-Shared Dependency Forests
19 0.053907394 75 emnlp-2011-Joint Models for Chinese POS Tagging and Dependency Parsing
20 0.050282612 132 emnlp-2011-Syntax-Based Grammaticality Improvement using CCG and Guided Search
topicId topicWeight
[(0, 0.176), (1, 0.109), (2, 0.045), (3, -0.075), (4, 0.037), (5, -0.062), (6, -0.054), (7, -0.197), (8, 0.035), (9, -0.063), (10, 0.038), (11, 0.124), (12, 0.001), (13, -0.011), (14, 0.016), (15, 0.004), (16, 0.056), (17, 0.039), (18, 0.075), (19, -0.077), (20, 0.192), (21, -0.116), (22, 0.074), (23, -0.114), (24, -0.026), (25, 0.16), (26, -0.086), (27, -0.017), (28, 0.124), (29, 0.024), (30, -0.148), (31, -0.026), (32, 0.014), (33, 0.217), (34, 0.145), (35, 0.185), (36, 0.052), (37, -0.008), (38, -0.08), (39, 0.137), (40, 0.053), (41, 0.108), (42, 0.01), (43, 0.062), (44, -0.069), (45, -0.122), (46, -0.034), (47, -0.012), (48, -0.016), (49, 0.007)]
simIndex simValue paperId paperTitle
same-paper 1 0.92413366 66 emnlp-2011-Hierarchical Phrase-based Translation Representations
Author: Gonzalo Iglesias ; Cyril Allauzen ; William Byrne ; Adria de Gispert ; Michael Riley
Abstract: This paper compares several translation representations for a synchronous context-free grammar parse including CFGs/hypergraphs, finite-state automata (FSA), and pushdown automata (PDA). The representation choice is shown to determine the form and complexity of target LM intersection and shortest-path algorithms that follow. Intersection, shortest path, FSA expansion and RTN replacement algorithms are presented for PDAs. Chinese-toEnglish translation experiments using HiFST and HiPDT, FSA and PDA-based decoders, are presented using admissible (or exact) search, possible for HiFST with compact SCFG rulesets and HiPDT with compact LMs. For large rulesets with large LMs, we introduce a two-pass search strategy which we then analyze in terms of search errors and translation performance.
2 0.67450947 52 emnlp-2011-Exact Inference for Generative Probabilistic Non-Projective Dependency Parsing
Author: Shay B. Cohen ; Carlos Gomez-Rodriguez ; Giorgio Satta
Abstract: We describe a generative model for nonprojective dependency parsing based on a simplified version of a transition system that has recently appeared in the literature. We then develop a dynamic programming parsing algorithm for our model, and derive an insideoutside algorithm that can be used for unsupervised learning of non-projective dependency trees.
3 0.64833695 31 emnlp-2011-Computation of Infix Probabilities for Probabilistic Context-Free Grammars
Author: Mark-Jan Nederhof ; Giorgio Satta
Abstract: The notion of infix probability has been introduced in the literature as a generalization of the notion of prefix (or initial substring) probability, motivated by applications in speech recognition and word error correction. For the case where a probabilistic context-free grammar is used as language model, methods for the computation of infix probabilities have been presented in the literature, based on various simplifying assumptions. Here we present a solution that applies to the problem in its full generality.
4 0.54466134 5 emnlp-2011-A Fast Re-scoring Strategy to Capture Long-Distance Dependencies
Author: Anoop Deoras ; Tomas Mikolov ; Kenneth Church
Abstract: A re-scoring strategy is proposed that makes it feasible to capture more long-distance dependencies in the natural language. Two pass strategies have become popular in a number of recognition tasks such as ASR (automatic speech recognition), MT (machine translation) and OCR (optical character recognition). The first pass typically applies a weak language model (n-grams) to a lattice and the second pass applies a stronger language model to N best lists. The stronger language model is intended to capture more longdistance dependencies. The proposed method uses RNN-LM (recurrent neural network language model), which is a long span LM, to rescore word lattices in the second pass. A hill climbing method (iterative decoding) is proposed to search over islands of confusability in the word lattice. An evaluation based on Broadcast News shows speedups of 20 over basic N best re-scoring, and word error rate reduction of 8% (relative) on a highly competitive setup.
5 0.49716091 65 emnlp-2011-Heuristic Search for Non-Bottom-Up Tree Structure Prediction
Author: Andrea Gesmundo ; James Henderson
Abstract: State of the art Tree Structures Prediction techniques rely on bottom-up decoding. These approaches allow the use of context-free features and bottom-up features. We discuss the limitations of mainstream techniques in solving common Natural Language Processing tasks. Then we devise a new framework that goes beyond Bottom-up Decoding, and that allows a better integration of contextual features. Furthermore we design a system that addresses these issues and we test it on Hierarchical Machine Translation, a well known tree structure prediction problem. The structure of the proposed system allows the incorporation of non-bottom-up features and relies on a more sophisticated decoding approach. We show that the proposed approach can find bet- ter translations using a smaller portion of the search space.
6 0.39741272 51 emnlp-2011-Exact Decoding of Phrase-Based Translation Models through Lagrangian Relaxation
7 0.3348603 108 emnlp-2011-Quasi-Synchronous Phrase Dependency Grammars for Machine Translation
8 0.27495939 93 emnlp-2011-Minimum Imputed-Risk: Unsupervised Discriminative Training for Machine Translation
9 0.27063715 100 emnlp-2011-Optimal Search for Minimum Error Rate Training
10 0.25154528 18 emnlp-2011-Analyzing Methods for Improving Precision of Pivot Based Bilingual Dictionaries
11 0.24484365 44 emnlp-2011-Domain Adaptation via Pseudo In-Domain Data Selection
12 0.22618781 10 emnlp-2011-A Probabilistic Forest-to-String Model for Language Generation from Typed Lambda Calculus Expressions
13 0.22338445 25 emnlp-2011-Cache-based Document-level Statistical Machine Translation
14 0.21876466 22 emnlp-2011-Better Evaluation Metrics Lead to Better Machine Translation
15 0.21637078 146 emnlp-2011-Unsupervised Structure Prediction with Non-Parallel Multilingual Guidance
16 0.20907417 134 emnlp-2011-Third-order Variational Reranking on Packed-Shared Dependency Forests
17 0.20811635 15 emnlp-2011-A novel dependency-to-string model for statistical machine translation
18 0.20619085 58 emnlp-2011-Fast Generation of Translation Forest for Large-Scale SMT Discriminative Training
19 0.20028488 132 emnlp-2011-Syntax-Based Grammaticality Improvement using CCG and Guided Search
20 0.19777028 34 emnlp-2011-Corpus-Guided Sentence Generation of Natural Images
topicId topicWeight
[(23, 0.087), (36, 0.02), (37, 0.059), (45, 0.057), (49, 0.011), (53, 0.031), (54, 0.036), (57, 0.017), (62, 0.018), (63, 0.238), (64, 0.024), (66, 0.026), (69, 0.092), (79, 0.048), (82, 0.047), (87, 0.017), (90, 0.019), (96, 0.027), (98, 0.019)]
simIndex simValue paperId paperTitle
same-paper 1 0.77315068 66 emnlp-2011-Hierarchical Phrase-based Translation Representations
Author: Gonzalo Iglesias ; Cyril Allauzen ; William Byrne ; Adria de Gispert ; Michael Riley
Abstract: This paper compares several translation representations for a synchronous context-free grammar parse including CFGs/hypergraphs, finite-state automata (FSA), and pushdown automata (PDA). The representation choice is shown to determine the form and complexity of target LM intersection and shortest-path algorithms that follow. Intersection, shortest path, FSA expansion and RTN replacement algorithms are presented for PDAs. Chinese-toEnglish translation experiments using HiFST and HiPDT, FSA and PDA-based decoders, are presented using admissible (or exact) search, possible for HiFST with compact SCFG rulesets and HiPDT with compact LMs. For large rulesets with large LMs, we introduce a two-pass search strategy which we then analyze in terms of search errors and translation performance.
2 0.5520187 109 emnlp-2011-Random Walk Inference and Learning in A Large Scale Knowledge Base
Author: Ni Lao ; Tom Mitchell ; William W. Cohen
Abstract: t om . We consider the problem of performing learning and inference in a large scale knowledge base containing imperfect knowledge with incomplete coverage. We show that a soft inference procedure based on a combination of constrained, weighted, random walks through the knowledge base graph can be used to reliably infer new beliefs for the knowledge base. More specifically, we show that the system can learn to infer different target relations by tuning the weights associated with random walks that follow different paths through the graph, using a version of the Path Ranking Algorithm (Lao and Cohen, 2010b). We apply this approach to a knowledge base of approximately 500,000 beliefs extracted imperfectly from the web by NELL, a never-ending language learner (Carlson et al., 2010). This new system improves significantly over NELL’s earlier Horn-clause learning and inference method: it obtains nearly double the precision at rank 100, and the new learning method is also applicable to many more inference tasks.
3 0.55188948 31 emnlp-2011-Computation of Infix Probabilities for Probabilistic Context-Free Grammars
Author: Mark-Jan Nederhof ; Giorgio Satta
Abstract: The notion of infix probability has been introduced in the literature as a generalization of the notion of prefix (or initial substring) probability, motivated by applications in speech recognition and word error correction. For the case where a probabilistic context-free grammar is used as language model, methods for the computation of infix probabilities have been presented in the literature, based on various simplifying assumptions. Here we present a solution that applies to the problem in its full generality.
4 0.55161601 51 emnlp-2011-Exact Decoding of Phrase-Based Translation Models through Lagrangian Relaxation
Author: Yin-Wen Chang ; Michael Collins
Abstract: This paper describes an algorithm for exact decoding of phrase-based translation models, based on Lagrangian relaxation. The method recovers exact solutions, with certificates of optimality, on over 99% of test examples. The method is much more efficient than approaches based on linear programming (LP) or integer linear programming (ILP) solvers: these methods are not feasible for anything other than short sentences. We compare our method to MOSES (Koehn et al., 2007), and give precise estimates of the number and magnitude of search errors that MOSES makes.
5 0.53908658 73 emnlp-2011-Improving Bilingual Projections via Sparse Covariance Matrices
Author: Jagadeesh Jagarlamudi ; Raghavendra Udupa ; Hal Daume III ; Abhijit Bhole
Abstract: Mapping documents into an interlingual representation can help bridge the language barrier of cross-lingual corpora. Many existing approaches are based on word co-occurrences extracted from aligned training data, represented as a covariance matrix. In theory, such a covariance matrix should represent semantic equivalence, and should be highly sparse. Unfortunately, the presence of noise leads to dense covariance matrices which in turn leads to suboptimal document representations. In this paper, we explore techniques to recover the desired sparsity in covariance matrices in two ways. First, we explore word association measures and bilingual dictionaries to weigh the word pairs. Later, we explore different selection strategies to remove the noisy pairs based on the association scores. Our experimental results on the task of aligning comparable documents shows the efficacy of sparse covariance matrices on two data sets from two different language pairs.
6 0.49482989 108 emnlp-2011-Quasi-Synchronous Phrase Dependency Grammars for Machine Translation
7 0.49387297 68 emnlp-2011-Hypotheses Selection Criteria in a Reranking Framework for Spoken Language Understanding
8 0.49155116 123 emnlp-2011-Soft Dependency Constraints for Reordering in Hierarchical Phrase-Based Translation
9 0.48997051 107 emnlp-2011-Probabilistic models of similarity in syntactic context
10 0.48458925 53 emnlp-2011-Experimental Support for a Categorical Compositional Distributional Model of Meaning
11 0.48356533 46 emnlp-2011-Efficient Subsampling for Training Complex Language Models
12 0.48071125 59 emnlp-2011-Fast and Robust Joint Models for Biomedical Event Extraction
13 0.48038548 111 emnlp-2011-Reducing Grounded Learning Tasks To Grammatical Inference
14 0.47850528 58 emnlp-2011-Fast Generation of Translation Forest for Large-Scale SMT Discriminative Training
15 0.47611502 65 emnlp-2011-Heuristic Search for Non-Bottom-Up Tree Structure Prediction
16 0.47571841 8 emnlp-2011-A Model of Discourse Predictions in Human Sentence Processing
17 0.47540331 1 emnlp-2011-A Bayesian Mixture Model for PoS Induction Using Multiple Features
18 0.47493905 85 emnlp-2011-Learning to Simplify Sentences with Quasi-Synchronous Grammar and Integer Programming
19 0.47363332 87 emnlp-2011-Lexical Generalization in CCG Grammar Induction for Semantic Parsing
20 0.4708162 35 emnlp-2011-Correcting Semantic Collocation Errors with L1-induced Paraphrases