emnlp emnlp2011 emnlp2011-65 knowledge-graph by maker-knowledge-mining
Source: pdf
Author: Andrea Gesmundo ; James Henderson
Abstract: State of the art Tree Structures Prediction techniques rely on bottom-up decoding. These approaches allow the use of context-free features and bottom-up features. We discuss the limitations of mainstream techniques in solving common Natural Language Processing tasks. Then we devise a new framework that goes beyond Bottom-up Decoding, and that allows a better integration of contextual features. Furthermore we design a system that addresses these issues and we test it on Hierarchical Machine Translation, a well known tree structure prediction problem. The structure of the proposed system allows the incorporation of non-bottom-up features and relies on a more sophisticated decoding approach. We show that the proposed approach can find bet- ter translations using a smaller portion of the search space.
Reference: text
sentIndex sentText sentNum sentScore
1 Then we devise a new framework that goes beyond Bottom-up Decoding, and that allows a better integration of contextual features. [sent-6, score-0.169]
2 The structure of the proposed system allows the incorporation of non-bottom-up features and relies on a more sophisticated decoding approach. [sent-8, score-0.199]
3 We show that the proposed approach can find bet- ter translations using a smaller portion of the search space. [sent-9, score-0.148]
4 2 Beyond Bottom-up Decoding TSP decoding requires scoring candidate trees, cost(t). [sent-24, score-0.232]
5 CKY-like BD approaches build candidate trees in a bottom-up fashion, allowing the use of Dynamic Programming techniques to simplify the search space by mering sub-trees with the same state, and also easing application of pruning techniques (such as Cube Pruning, e. [sent-31, score-0.207]
6 An item is a triple that con≡tainhs a span, a postcondition amnd i a carry. [sent-38, score-0.17]
7 Tthhaet span contains the indexes of the starting and ending input words delimiting the continuous sequence covered by the sub-tree represented by the item. [sent-39, score-0.165]
8 The carry, κ, stores extra information required to correctly score the non-local interactions of the item when it will be linked in a broader context (for HMT with LM the carry consists of boundary words that will form new n-grams). [sent-41, score-0.299]
9 In HMT the interaction cost includes the LM score of new ngrams generated by connecting the childrens’ subspans with terminals of r. [sent-43, score-0.274]
10 Notice that formula (3) is equal to formula (2) for items that cover the whole input sequence. [sent-44, score-0.206]
11 In many TSP applications, the search space is too large to allow an exhaustive search and there900 fore pruning techniques must be used. [sent-45, score-0.226]
12 It is not always possible to compute exactly nonlocal features while computing the score of partial derivations, since partial derivations miss part of the context. [sent-47, score-0.182]
13 Formula (3) accounts for the interaction between r and sub-items ι1 and ι2, but it does not integrate the cost relative to the interaction between the item and the surrounding context. [sent-48, score-0.436]
14 Therefore the item score computed in a bottom-up fashion is an approximation of the score the item has in a broader context. [sent-49, score-0.364]
15 For example, in HMT the LM score for ngrams that partially overlap the item’s span cannot be computed exactly since the surrounding words are not known. [sent-50, score-0.185]
16 Basing pruning decisions on approximated scores can introduce search errors. [sent-51, score-0.167]
17 It is possible to reduce search errors using heuristics based on future cost estimation. [sent-52, score-0.187]
18 In general the estimation of the interaction between ι and the surrounding context is function of the carry, κ. [sent-53, score-0.161]
19 In HMT it is possible to estimate the cost of n-grams that partially overlap ι’s span considering the boundary words. [sent-54, score-0.293]
20 We can obtain the heuristic cost for an item, ι, adding to formula (3) the factor, est(κ), for the estimation of interaction with missing context: heuristicCost(ι) = cost(ι) + est(κ) (4) And use heuristicCost(ι) to guide BD pruning decisions. [sent-55, score-0.629]
21 Anyway, even if a good interaction estimation is available, in practice it is not possible to avoid search errors while pruning. [sent-56, score-0.188]
22 Thus a bottom-up decoder without context needs to build all translations for “I go home”, introducing the possibility of pruning errors. [sent-77, score-0.259]
23 The items are created and scored in topological order. [sent-82, score-0.135]
24 The ordering constraint can be formally stated as: an item covering the span [i, j] must be processed after items covering sub spans [h, k] |h ≥ i, k ≤ j. [sent-83, score-0.418]
25 Now let us investigate how the decoding 901 algorithm could change if we remove the ordering constraint. [sent-87, score-0.222]
26 Removing the ordering constraint would lead to the occurrence of cases in which an item is processed before all child items have been processed. [sent-88, score-0.446]
27 For example, we could imagine to create and score an item, ι, with postcondition X and span [i, j] , linking the rule instantiation r : X → AB with only tinheg le thfte s ruubl-eit ienmst, ιA, wtiohnile r rin :for Xma→tiAonB Bfor w tihthe right sub-item, ιB is still missing. [sent-89, score-0.401]
28 In this case, we can rely on local and partial contextual features to score ι. [sent-90, score-0.138]
29 Afterwards, it is possible to process ιB using the parent item, ι, as a source of additional information about the parent context and sibling ιA yield. [sent-91, score-0.332]
30 This approach can avoid search errors in cases where pruning at the parent level can be correctly done using only local and partial yield context, while pruning at the child level needs extra non-bottom-up context to make a better pruning decision. [sent-92, score-0.767]
31 For example, consider the translation of the English sentence “I run” into French using the following synchronous grammar: r1 r2 r3 r4 r5 r6 : S → X1X2 | X1X2 : XS : XX : XX : XX : XX → → → → → XI | Je run | course run | courir run | cours run | courons . [sent-93, score-0.263]
32 If the beam is not big enough the correct translation could be pruned. [sent-98, score-0.283]
33 Instead, if we follow a non bottom-up approach, as described in Figure 1, we can: 1) first translate “I”; 2) Then create an item using r1 with missing right child; 3) finally choose the correct translation for “run” using r1 to access a wider context. [sent-100, score-0.341]
34 Notice that with this undirected approach it is possible to reach the correct translation using only beam size of Figure 1: Example of undirected decoding for HMT. [sent-101, score-0.616]
35 Notice that the parent link at step 3 is fundamental to correctly disambiguate the translation for “run”. [sent-103, score-0.301]
36 Undirecteditems, ι, are built linking rule instantiations with elements in L ≡ {left child, right child, parent}; feolerm example: Lι ≡ h ˚ ι1 ⋗ r ∨˙, ιpi, i csh ibldui,lt p linking r rw eitxha mlepftl child, ι1, and parent, ιp. [sent-106, score-0.137]
37 uWilet ldinenkointge with L˚ ι+ the set of links for which the undirectedwiteitmh, ι, has a connection, and with L˚ ι− the set oitef missing slin aks c. [sent-107, score-0.209]
38 The undirected-postcondition is a set of strings, one string for each of˚ ι’s missing links, l ∈ L˚ ι−. [sent-109, score-0.156]
39 Each string represents the non-terminal rel a ∈ted L to one of the missing links available for expansion. [sent-110, score-0.209]
40 In the proposed framework, the selection of undirected-items can be done in any order, for example: in a first step selecting an undirecteditem for span s1, then selecting an undirected-item for span s2, and in a third step selecting a second undirected-item for s1, and so on. [sent-113, score-0.27]
41 Allowing the system to decide decoding order based on the candidates’ scores, so that candi- dates with higher confidence can be selected earlier and used as context for candidates with lower confidence. [sent-115, score-0.325]
42 Having all candidates in the same queue introduces comparability issues. [sent-116, score-0.297]
43 In CKY, candidates are comparable since each span is processed separately and each candidate is scored with the estimation of the yield score. [sent-117, score-0.453]
44 Instead, in the proposed framework, the unique queue contains candidates relative to different nodes and with different context scope. [sent-118, score-0.275]
45 To ensure comparability, we can associate to candidate undirected-items a heuristic score of the full derivation: heuristicCost( ˚ ι) = cost( ˚ ι) + est( ˚ ι) (6) Where est( ˚ ι) estimates the cost of the missing branches of the derivation as a function of ι′s partial structure and carry. [sent-119, score-0.45]
46 In this framework, the queue can be initialized with a candidate for each rule instantiation. [sent-120, score-0.184]
47 These initializing candidates have no context information and can be scored using only local features. [sent-121, score-0.237]
48 A generic decoding algorithm can loop selecting the candidate undirected-item with the highest score, ι, and then propagating its information to neighboring candidates, which can update using ι as context. [sent-122, score-0.199]
49 In this general framework the link to the parent node is not treated differently from links to children. [sent-123, score-0.33]
50 While in CKY the information is always passed from children to parent, in Undirected-CKY the information can be propagated in any direction, and any decoding order is allowed. [sent-124, score-0.159]
51 In our actual instantiation we apply constraints on the initialization step, on the propagation policy, and fix a search beam of k. [sent-128, score-0.349]
52 At line 2 the queue of undirected-item candidates, Q, is initialized with only leafs rules. [sent-135, score-0.176]
53 At line 4 the candidate with highest score, ι, is popped from Q. [sent-137, score-0.149]
54 line 5 checks if˚ ι is within the beam width: if˚ ι has a span for which k candidates were already popped, then ι is dropped and a new iteration is begun. [sent-138, score-0.55]
55 From line 7 to line 10 the algorithm deals with the generation of new candidate undirected-items. [sent-140, score-0.174]
56 line 7 checks if ι has both children links, if not a new decoding iteration is begun. [sent-141, score-0.226]
57 At line 9, the set of new candidates, is built linking r with ι and any Cˆ, context already available in the out-forest. [sent-143, score-0.15]
58 Finally, between line 10 and line 12, each element in is inserted in Q after checking that is within the beam width: if has a span for which k candidates were already popped it doesn’t make sense to insert it in Q since it will be surely discarded at line 5. [sent-144, score-0.726]
59 In our current method, only the best possible parent context is used because it only provides context for ranking candidates, as discussed at the end of this section. [sent-146, score-0.214]
60 In contrast, a different candidate is generated for each possible other child in 2), as well as for the case where no other child is included in the undirected-item. [sent-147, score-0.366]
61 Notice that, the if statement at line 7 and the way new undirecteditems are created at line 9, enforce that each undirected-item covers a contiguous span. [sent-149, score-0.176]
62 An undirected-item that is missing a child link cannot be used as child context but can be used as parent context since it is added to the out-forest at line 6 before the if statement at line 7. [sent-150, score-0.912]
63 Furthermore, the if statements at line 5 and line 11 check that no more than k candidates are selected for each span, but the algorithm does not require the the selection of exactly k candidates per span as in CKY. [sent-151, score-0.537]
64 The queue of candidates, Q, is ordered according to the heuristic cost of formula (6). [sent-152, score-0.345]
65 The score of the candidate partial structure is accounted for with factor cost( ˚ ι) computed according to formula (5). [sent-153, score-0.236]
66 The factor est( ˚ ι) accounts for the estimation of the missing part of the derivation. [sent-154, score-0.225]
67 In HMT it is possible to exhaustively represent and search the context-free-forest (ignoring the LM), 904 which is done in the Forest Rescoring framework before our task of decoding with the LM. [sent-163, score-0.263]
68 We exploit this context-free-forest to compute localCost( ˚ ι, l) : for missing child links the localCost(·) is the Insfoidre score computed using eth loec (max, +) semiring (also known as the Viterbi score), and for missing parent links the localCost(·) is the corresponding Opaurtesnidte l score. [sent-164, score-0.781]
69 f Tthhee w faocrdtosr generated by t)he e missing branch and their interaction with the span covered by ι. [sent-166, score-0.454]
70 To compute the expected interaction cost we use the boundary words information contained in˚ ι’s carry as done in BD. [sent-167, score-0.293]
71 To estimate the LM cost of the missing branch we use an estimation function, conditioned on the missing span length, whose parameters are tuned on held-out data with gradient descent, using the search score as objective function. [sent-168, score-0.784]
72 Therefore we ensure that candidates embedded in the UD out-forest would have the same score if they were scored from BD. [sent-170, score-0.255]
73 We don’t need to worry about differences derived from the missing context estimation factor, est(·), since this factor is only cxotn essitdiemreadti ownh filaec sorting t·)he, queue, Q, aacctcoorrd i-s ing to the heuristicCost(·). [sent-171, score-0.289]
74 t are ssoc,or wede dwointh’ no mveis stoing child and no parent link, because in that case scoring function (3) for BD is equivalent to scoring function (5) for UD. [sent-173, score-0.379]
75 Instead, for candidates that are scored with parent link, we remove the parent link factor from the cost(·) function when inserting the fcaacntdoird afrtoe min t ohe t cheo out ffuornecstti. [sent-174, score-0.623]
76 o A wnhde nfo irn stheret ncagn tdhiedates that are scored with a missing child, we adjust the score once the link to the missing child is created in the out-forest. [sent-175, score-0.678]
77 In this way UD and BD score the same derivation with the same score and can be regarded as two ways to explore the same search space. [sent-176, score-0.19]
78 5 Experiments In this section we test the algorithm presented, and empirically show that it produces better translations searching a smaller portion of the search space. [sent-177, score-0.148]
79 Since we compare two different decoding strategies that rely on the same training technique, the evaluation is primarily based on search errors rather than on BLEU. [sent-189, score-0.218]
80 We compare the two systems on a variety of beam sizes between 1 and 16. [sent-190, score-0.214]
81 Figure 2 reports a comparison of the translation quality for the two systems in relation to the beam size. [sent-191, score-0.283]
82 The white area represents the portion of sentences for which the two systems found a translation with the same search score. [sent-193, score-0.167]
83 With beam 1the two systems obviously have a similar behavior, since both the systems stop investigating the candidates for a node after having selected the best candidate immediately available. [sent-194, score-0.388]
84 With beam 4, we observe that UD is able to find a better translation for 63. [sent-197, score-0.283]
85 For searches that employ a beam bigger than 8, we notice that the UD advantage slightly decreases, and 905 Beam Size Figure 3: Search score evolution for BD and UD. [sent-200, score-0.314]
86 We can understand this behavior considering that as the beam increases the two systems get closer to exhaustive search. [sent-202, score-0.214]
87 Figure 3 plots the search score variation for dif- ferent beam sizes. [sent-204, score-0.323]
88 We can see that UD search leads to an average search score that is consistently better than the one computed for BD. [sent-205, score-0.168]
89 In the graph we report also the performance obtained using BD with beam 32. [sent-214, score-0.214]
90 38 with beam 16: UD reaches a clearly higher BLEU score using half the beam size. [sent-217, score-0.516]
91 This is due to the fact that BD, using Cube Pruning, must select k candidates for each node. [sent-222, score-0.134]
92 With beam 16, the hypergraphs produced by UD contain on average 4. [sent-228, score-0.27]
93 Therefore UD is able to find better translations even if exploring a smaller portion of the search space. [sent-230, score-0.148]
94 We compare BD with beam of 16 and UD with beam of 8, so that we compare two systems with comparable search score. [sent-234, score-0.487]
95 While, for longer sentences, the amount of candidates considered during decoding grows exponentially with the size of the sentence, and UD needs to maintain an unique queue whose size is not bounded by the beam size k, as for the queues used in BD’s Cube Pruning. [sent-237, score-0.616]
96 In conclusion we can assert that, even if exploring a smaller portion of the search space, UD finds often a translation that is better than the one found with standard BD. [sent-239, score-0.167]
97 UD’s higher accuracy is due to its sophisticated search strategy that allows a more efficient integration of contextual features. [sent-240, score-0.191]
98 6 Future Work In the proposed framework the link to the parent node is not treated differently from links to child nodes, the information in the hypergraph can be propagated in any direction. [sent-242, score-0.564]
99 Furthermore, considering that the proposed framework lets the system decide the decoding order, we could design a system that explicitly learns to infer the decoding order at training time. [sent-245, score-0.363]
100 Compared to a state of the art HMT decoder the presented system produces higher quality translations searching a smaller portion of the search space, empirically showing that the bottom-up approximation of contextual features is a limitation for NLP tasks like HMT. [sent-250, score-0.233]
wordName wordTfidf (topN-words)
[('ud', 0.487), ('bd', 0.282), ('hmt', 0.25), ('beam', 0.214), ('child', 0.163), ('decoding', 0.159), ('missing', 0.156), ('parent', 0.15), ('tsp', 0.146), ('span', 0.135), ('candidates', 0.134), ('cost', 0.128), ('localcost', 0.125), ('item', 0.116), ('est', 0.114), ('queue', 0.109), ('pruning', 0.108), ('gesmundo', 0.104), ('cky', 0.099), ('interaction', 0.096), ('undirected', 0.087), ('cube', 0.085), ('heuristiccost', 0.083), ('link', 0.082), ('instantiation', 0.076), ('formula', 0.071), ('scored', 0.071), ('hypergraph', 0.071), ('translation', 0.069), ('line', 0.067), ('branch', 0.067), ('items', 0.064), ('ordering', 0.063), ('contextest', 0.062), ('nach', 0.062), ('lm', 0.061), ('search', 0.059), ('hypergraphs', 0.056), ('ich', 0.056), ('comparability', 0.054), ('postcondition', 0.054), ('nonlocal', 0.054), ('andrea', 0.053), ('links', 0.053), ('linking', 0.051), ('notice', 0.05), ('translations', 0.05), ('score', 0.05), ('cdec', 0.049), ('contextual', 0.049), ('anyway', 0.045), ('forest', 0.045), ('framework', 0.045), ('integration', 0.043), ('popped', 0.042), ('canpop', 0.042), ('chappelier', 0.042), ('courir', 0.042), ('gehe', 0.042), ('ideograms', 0.042), ('newundirecteditems', 0.042), ('rxi', 0.042), ('taln', 0.042), ('undirecteditems', 0.042), ('unige', 0.042), ('xcost', 0.042), ('dyer', 0.041), ('candidate', 0.04), ('processed', 0.04), ('sophisticated', 0.04), ('partial', 0.039), ('portion', 0.039), ('carry', 0.039), ('run', 0.038), ('reaches', 0.038), ('heuristic', 0.037), ('decoder', 0.036), ('factor', 0.036), ('xx', 0.036), ('langmead', 0.036), ('corazza', 0.036), ('caraballo', 0.036), ('mainstream', 0.036), ('rule', 0.035), ('guided', 0.035), ('estimation', 0.033), ('scoring', 0.033), ('parsing', 0.033), ('go', 0.033), ('limitations', 0.033), ('rescoring', 0.032), ('broader', 0.032), ('devise', 0.032), ('worry', 0.032), ('bleu', 0.032), ('context', 0.032), ('regarded', 0.031), ('doesn', 0.03), ('boundary', 0.03), ('indexes', 0.03)]
simIndex simValue paperId paperTitle
same-paper 1 1.0000017 65 emnlp-2011-Heuristic Search for Non-Bottom-Up Tree Structure Prediction
Author: Andrea Gesmundo ; James Henderson
Abstract: State of the art Tree Structures Prediction techniques rely on bottom-up decoding. These approaches allow the use of context-free features and bottom-up features. We discuss the limitations of mainstream techniques in solving common Natural Language Processing tasks. Then we devise a new framework that goes beyond Bottom-up Decoding, and that allows a better integration of contextual features. Furthermore we design a system that addresses these issues and we test it on Hierarchical Machine Translation, a well known tree structure prediction problem. The structure of the proposed system allows the incorporation of non-bottom-up features and relies on a more sophisticated decoding approach. We show that the proposed approach can find bet- ter translations using a smaller portion of the search space.
2 0.11719673 58 emnlp-2011-Fast Generation of Translation Forest for Large-Scale SMT Discriminative Training
Author: Xinyan Xiao ; Yang Liu ; Qun Liu ; Shouxun Lin
Abstract: Although discriminative training guarantees to improve statistical machine translation by incorporating a large amount of overlapping features, it is hard to scale up to large data due to decoding complexity. We propose a new algorithm to generate translation forest of training data in linear time with the help of word alignment. Our algorithm also alleviates the oracle selection problem by ensuring that a forest always contains derivations that exactly yield the reference translation. With millions of features trained on 519K sentences in 0.03 second per sentence, our system achieves significant improvement by 0.84 BLEU over the baseline system on the NIST Chinese-English test sets.
3 0.11126115 51 emnlp-2011-Exact Decoding of Phrase-Based Translation Models through Lagrangian Relaxation
Author: Yin-Wen Chang ; Michael Collins
Abstract: This paper describes an algorithm for exact decoding of phrase-based translation models, based on Lagrangian relaxation. The method recovers exact solutions, with certificates of optimality, on over 99% of test examples. The method is much more efficient than approaches based on linear programming (LP) or integer linear programming (ILP) solvers: these methods are not feasible for anything other than short sentences. We compare our method to MOSES (Koehn et al., 2007), and give precise estimates of the number and magnitude of search errors that MOSES makes.
4 0.10925495 108 emnlp-2011-Quasi-Synchronous Phrase Dependency Grammars for Machine Translation
Author: Kevin Gimpel ; Noah A. Smith
Abstract: We present a quasi-synchronous dependency grammar (Smith and Eisner, 2006) for machine translation in which the leaves of the tree are phrases rather than words as in previous work (Gimpel and Smith, 2009). This formulation allows us to combine structural components of phrase-based and syntax-based MT in a single model. We describe a method of extracting phrase dependencies from parallel text using a target-side dependency parser. For decoding, we describe a coarse-to-fine approach based on lattice dependency parsing of phrase lattices. We demonstrate performance improvements for Chinese-English and UrduEnglish translation over a phrase-based baseline. We also investigate the use of unsupervised dependency parsers, reporting encouraging preliminary results.
5 0.10518668 5 emnlp-2011-A Fast Re-scoring Strategy to Capture Long-Distance Dependencies
Author: Anoop Deoras ; Tomas Mikolov ; Kenneth Church
Abstract: A re-scoring strategy is proposed that makes it feasible to capture more long-distance dependencies in the natural language. Two pass strategies have become popular in a number of recognition tasks such as ASR (automatic speech recognition), MT (machine translation) and OCR (optical character recognition). The first pass typically applies a weak language model (n-grams) to a lattice and the second pass applies a stronger language model to N best lists. The stronger language model is intended to capture more longdistance dependencies. The proposed method uses RNN-LM (recurrent neural network language model), which is a long span LM, to rescore word lattices in the second pass. A hill climbing method (iterative decoding) is proposed to search over islands of confusability in the word lattice. An evaluation based on Broadcast News shows speedups of 20 over basic N best re-scoring, and word error rate reduction of 8% (relative) on a highly competitive setup.
6 0.10167141 15 emnlp-2011-A novel dependency-to-string model for statistical machine translation
7 0.094149537 66 emnlp-2011-Hierarchical Phrase-based Translation Representations
8 0.092636362 75 emnlp-2011-Joint Models for Chinese POS Tagging and Dependency Parsing
9 0.086953811 20 emnlp-2011-Augmenting String-to-Tree Translation Models with Fuzzy Use of Source-side Syntax
10 0.084373116 134 emnlp-2011-Third-order Variational Reranking on Packed-Shared Dependency Forests
11 0.08309903 10 emnlp-2011-A Probabilistic Forest-to-String Model for Language Generation from Typed Lambda Calculus Expressions
12 0.083059765 132 emnlp-2011-Syntax-Based Grammaticality Improvement using CCG and Guided Search
13 0.079054616 93 emnlp-2011-Minimum Imputed-Risk: Unsupervised Discriminative Training for Machine Translation
14 0.077610575 13 emnlp-2011-A Word Reordering Model for Improved Machine Translation
15 0.076336786 100 emnlp-2011-Optimal Search for Minimum Error Rate Training
16 0.073773533 125 emnlp-2011-Statistical Machine Translation with Local Language Models
17 0.069426984 44 emnlp-2011-Domain Adaptation via Pseudo In-Domain Data Selection
18 0.068652131 22 emnlp-2011-Better Evaluation Metrics Lead to Better Machine Translation
19 0.064885184 123 emnlp-2011-Soft Dependency Constraints for Reordering in Hierarchical Phrase-Based Translation
20 0.064557828 83 emnlp-2011-Learning Sentential Paraphrases from Bilingual Parallel Corpora for Text-to-Text Generation
topicId topicWeight
[(0, 0.216), (1, 0.123), (2, 0.054), (3, -0.074), (4, 0.04), (5, -0.067), (6, -0.021), (7, -0.147), (8, 0.065), (9, -0.043), (10, 0.043), (11, 0.087), (12, -0.04), (13, -0.016), (14, -0.005), (15, -0.034), (16, 0.02), (17, 0.008), (18, 0.081), (19, 0.025), (20, 0.076), (21, 0.021), (22, 0.024), (23, -0.021), (24, -0.097), (25, -0.002), (26, 0.005), (27, -0.001), (28, 0.0), (29, -0.014), (30, 0.056), (31, -0.042), (32, -0.076), (33, 0.196), (34, 0.093), (35, -0.007), (36, -0.095), (37, 0.145), (38, 0.098), (39, 0.088), (40, -0.043), (41, -0.044), (42, -0.039), (43, 0.093), (44, -0.112), (45, -0.034), (46, -0.049), (47, -0.049), (48, 0.14), (49, -0.236)]
simIndex simValue paperId paperTitle
same-paper 1 0.94729966 65 emnlp-2011-Heuristic Search for Non-Bottom-Up Tree Structure Prediction
Author: Andrea Gesmundo ; James Henderson
Abstract: State of the art Tree Structures Prediction techniques rely on bottom-up decoding. These approaches allow the use of context-free features and bottom-up features. We discuss the limitations of mainstream techniques in solving common Natural Language Processing tasks. Then we devise a new framework that goes beyond Bottom-up Decoding, and that allows a better integration of contextual features. Furthermore we design a system that addresses these issues and we test it on Hierarchical Machine Translation, a well known tree structure prediction problem. The structure of the proposed system allows the incorporation of non-bottom-up features and relies on a more sophisticated decoding approach. We show that the proposed approach can find bet- ter translations using a smaller portion of the search space.
2 0.64669609 5 emnlp-2011-A Fast Re-scoring Strategy to Capture Long-Distance Dependencies
Author: Anoop Deoras ; Tomas Mikolov ; Kenneth Church
Abstract: A re-scoring strategy is proposed that makes it feasible to capture more long-distance dependencies in the natural language. Two pass strategies have become popular in a number of recognition tasks such as ASR (automatic speech recognition), MT (machine translation) and OCR (optical character recognition). The first pass typically applies a weak language model (n-grams) to a lattice and the second pass applies a stronger language model to N best lists. The stronger language model is intended to capture more longdistance dependencies. The proposed method uses RNN-LM (recurrent neural network language model), which is a long span LM, to rescore word lattices in the second pass. A hill climbing method (iterative decoding) is proposed to search over islands of confusability in the word lattice. An evaluation based on Broadcast News shows speedups of 20 over basic N best re-scoring, and word error rate reduction of 8% (relative) on a highly competitive setup.
3 0.56386161 66 emnlp-2011-Hierarchical Phrase-based Translation Representations
Author: Gonzalo Iglesias ; Cyril Allauzen ; William Byrne ; Adria de Gispert ; Michael Riley
Abstract: This paper compares several translation representations for a synchronous context-free grammar parse including CFGs/hypergraphs, finite-state automata (FSA), and pushdown automata (PDA). The representation choice is shown to determine the form and complexity of target LM intersection and shortest-path algorithms that follow. Intersection, shortest path, FSA expansion and RTN replacement algorithms are presented for PDAs. Chinese-toEnglish translation experiments using HiFST and HiPDT, FSA and PDA-based decoders, are presented using admissible (or exact) search, possible for HiFST with compact SCFG rulesets and HiPDT with compact LMs. For large rulesets with large LMs, we introduce a two-pass search strategy which we then analyze in terms of search errors and translation performance.
4 0.49906141 132 emnlp-2011-Syntax-Based Grammaticality Improvement using CCG and Guided Search
Author: Yue Zhang ; Stephen Clark
Abstract: Machine-produced text often lacks grammaticality and fluency. This paper studies grammaticality improvement using a syntax-based algorithm based on CCG. The goal of the search problem is to find an optimal parse tree among all that can be constructed through selection and ordering of the input words. The search problem, which is significantly harder than parsing, is solved by guided learning for best-first search. In a standard word ordering task, our system gives a BLEU score of 40. 1, higher than the previous result of 33.7 achieved by a dependency-based system.
5 0.44723076 58 emnlp-2011-Fast Generation of Translation Forest for Large-Scale SMT Discriminative Training
Author: Xinyan Xiao ; Yang Liu ; Qun Liu ; Shouxun Lin
Abstract: Although discriminative training guarantees to improve statistical machine translation by incorporating a large amount of overlapping features, it is hard to scale up to large data due to decoding complexity. We propose a new algorithm to generate translation forest of training data in linear time with the help of word alignment. Our algorithm also alleviates the oracle selection problem by ensuring that a forest always contains derivations that exactly yield the reference translation. With millions of features trained on 519K sentences in 0.03 second per sentence, our system achieves significant improvement by 0.84 BLEU over the baseline system on the NIST Chinese-English test sets.
6 0.42050946 93 emnlp-2011-Minimum Imputed-Risk: Unsupervised Discriminative Training for Machine Translation
7 0.4179706 108 emnlp-2011-Quasi-Synchronous Phrase Dependency Grammars for Machine Translation
8 0.40121916 134 emnlp-2011-Third-order Variational Reranking on Packed-Shared Dependency Forests
9 0.3903614 51 emnlp-2011-Exact Decoding of Phrase-Based Translation Models through Lagrangian Relaxation
10 0.38620529 100 emnlp-2011-Optimal Search for Minimum Error Rate Training
11 0.36868572 75 emnlp-2011-Joint Models for Chinese POS Tagging and Dependency Parsing
12 0.36684906 15 emnlp-2011-A novel dependency-to-string model for statistical machine translation
13 0.36638379 20 emnlp-2011-Augmenting String-to-Tree Translation Models with Fuzzy Use of Source-side Syntax
14 0.35914221 47 emnlp-2011-Efficient retrieval of tree translation examples for Syntax-Based Machine Translation
15 0.35419455 19 emnlp-2011-Approximate Scalable Bounded Space Sketch for Large Data NLP
16 0.34724364 10 emnlp-2011-A Probabilistic Forest-to-String Model for Language Generation from Typed Lambda Calculus Expressions
17 0.33604962 86 emnlp-2011-Lexical Co-occurrence, Statistical Significance, and Word Association
18 0.30157113 74 emnlp-2011-Inducing Sentence Structure from Parallel Corpora for Reordering
19 0.29866093 13 emnlp-2011-A Word Reordering Model for Improved Machine Translation
20 0.29333678 121 emnlp-2011-Semi-supervised CCG Lexicon Extension
topicId topicWeight
[(23, 0.121), (36, 0.04), (37, 0.037), (45, 0.071), (53, 0.03), (54, 0.03), (62, 0.034), (64, 0.017), (66, 0.034), (69, 0.035), (78, 0.301), (79, 0.051), (82, 0.02), (87, 0.013), (90, 0.014), (96, 0.033), (98, 0.028)]
simIndex simValue paperId paperTitle
same-paper 1 0.74168628 65 emnlp-2011-Heuristic Search for Non-Bottom-Up Tree Structure Prediction
Author: Andrea Gesmundo ; James Henderson
Abstract: State of the art Tree Structures Prediction techniques rely on bottom-up decoding. These approaches allow the use of context-free features and bottom-up features. We discuss the limitations of mainstream techniques in solving common Natural Language Processing tasks. Then we devise a new framework that goes beyond Bottom-up Decoding, and that allows a better integration of contextual features. Furthermore we design a system that addresses these issues and we test it on Hierarchical Machine Translation, a well known tree structure prediction problem. The structure of the proposed system allows the incorporation of non-bottom-up features and relies on a more sophisticated decoding approach. We show that the proposed approach can find bet- ter translations using a smaller portion of the search space.
2 0.71191788 87 emnlp-2011-Lexical Generalization in CCG Grammar Induction for Semantic Parsing
Author: Tom Kwiatkowski ; Luke Zettlemoyer ; Sharon Goldwater ; Mark Steedman
Abstract: We consider the problem of learning factored probabilistic CCG grammars for semantic parsing from data containing sentences paired with logical-form meaning representations. Traditional CCG lexicons list lexical items that pair words and phrases with syntactic and semantic content. Such lexicons can be inefficient when words appear repeatedly with closely related lexical content. In this paper, we introduce factored lexicons, which include both lexemes to model word meaning and templates to model systematic variation in word usage. We also present an algorithm for learning factored CCG lexicons, along with a probabilistic parse-selection model. Evaluations on benchmark datasets demonstrate that the approach learns highly accurate parsers, whose generalization performance greatly from the lexical factoring. benefits
3 0.49332786 108 emnlp-2011-Quasi-Synchronous Phrase Dependency Grammars for Machine Translation
Author: Kevin Gimpel ; Noah A. Smith
Abstract: We present a quasi-synchronous dependency grammar (Smith and Eisner, 2006) for machine translation in which the leaves of the tree are phrases rather than words as in previous work (Gimpel and Smith, 2009). This formulation allows us to combine structural components of phrase-based and syntax-based MT in a single model. We describe a method of extracting phrase dependencies from parallel text using a target-side dependency parser. For decoding, we describe a coarse-to-fine approach based on lattice dependency parsing of phrase lattices. We demonstrate performance improvements for Chinese-English and UrduEnglish translation over a phrase-based baseline. We also investigate the use of unsupervised dependency parsers, reporting encouraging preliminary results.
4 0.48633078 68 emnlp-2011-Hypotheses Selection Criteria in a Reranking Framework for Spoken Language Understanding
Author: Marco Dinarelli ; Sophie Rosset
Abstract: Reranking models have been successfully applied to many tasks of Natural Language Processing. However, there are two aspects of this approach that need a deeper investigation: (i) Assessment of hypotheses generated for reranking at classification phase: baseline models generate a list of hypotheses and these are used for reranking without any assessment; (ii) Detection of cases where reranking models provide a worst result: the best hypothesis provided by the reranking model is assumed to be always the best result. In some cases the reranking model provides an incorrect hypothesis while the baseline best hypothesis is correct, especially when baseline models are accurate. In this paper we propose solutions for these two aspects: (i) a semantic inconsistency metric to select possibly more correct n-best hypotheses, from a large set generated by an SLU basiline model. The selected hypotheses are reranked applying a state-of-the-art model based on Partial Tree Kernels, which encode SLU hypotheses in Support Vector Machines with complex structured features; (ii) finally, we apply a decision strategy, based on confidence values, to select the final hypothesis between the first ranked hypothesis provided by the baseline SLU model and the first ranked hypothesis provided by the re-ranker. We show the effectiveness of these solutions presenting comparative results obtained reranking hypotheses generated by a very accurate Conditional Random Field model. We evaluate our approach on the French MEDIA corpus. The results show significant improvements with respect to current state-of-the-art and previous 1104 Sophie Rosset LIMSI-CNRS B.P. 133, 91403 Orsay Cedex France ro s set @ l ims i fr . re-ranking models.
5 0.48542449 35 emnlp-2011-Correcting Semantic Collocation Errors with L1-induced Paraphrases
Author: Daniel Dahlmeier ; Hwee Tou Ng
Abstract: We present a novel approach for automatic collocation error correction in learner English which is based on paraphrases extracted from parallel corpora. Our key assumption is that collocation errors are often caused by semantic similarity in the first language (L1language) of the writer. An analysis of a large corpus of annotated learner English confirms this assumption. We evaluate our approach on real-world learner data and show that L1-induced paraphrases outperform traditional approaches based on edit distance, homophones, and WordNet synonyms.
6 0.48532364 66 emnlp-2011-Hierarchical Phrase-based Translation Representations
7 0.48456368 123 emnlp-2011-Soft Dependency Constraints for Reordering in Hierarchical Phrase-Based Translation
8 0.48265013 1 emnlp-2011-A Bayesian Mixture Model for PoS Induction Using Multiple Features
9 0.48238558 136 emnlp-2011-Training a Parser for Machine Translation Reordering
10 0.47926539 132 emnlp-2011-Syntax-Based Grammaticality Improvement using CCG and Guided Search
11 0.47922978 53 emnlp-2011-Experimental Support for a Categorical Compositional Distributional Model of Meaning
12 0.47918051 128 emnlp-2011-Structured Relation Discovery using Generative Models
13 0.47879094 58 emnlp-2011-Fast Generation of Translation Forest for Large-Scale SMT Discriminative Training
14 0.47845972 22 emnlp-2011-Better Evaluation Metrics Lead to Better Machine Translation
15 0.47762448 46 emnlp-2011-Efficient Subsampling for Training Complex Language Models
16 0.47677159 28 emnlp-2011-Closing the Loop: Fast, Interactive Semi-Supervised Annotation With Queries on Features and Instances
17 0.47639745 111 emnlp-2011-Reducing Grounded Learning Tasks To Grammatical Inference
18 0.4763298 137 emnlp-2011-Training dependency parsers by jointly optimizing multiple objectives
19 0.47588736 59 emnlp-2011-Fast and Robust Joint Models for Biomedical Event Extraction
20 0.47540617 98 emnlp-2011-Named Entity Recognition in Tweets: An Experimental Study