acl acl2012 acl2012-108 knowledge-graph by maker-knowledge-mining

108 acl-2012-Hierarchical Chunk-to-String Translation

Source: pdf

Author: Yang Feng ; Dongdong Zhang ; Mu Li ; Qun Liu

Abstract: We present a hierarchical chunk-to-string translation model, which can be seen as a compromise between the hierarchical phrasebased model and the tree-to-string model, to combine the merits of the two models. With the help of shallow parsing, our model learns rules consisting of words and chunks and meanwhile introduce syntax cohesion. Under the weighed synchronous context-free grammar defined by these rules, our model searches for the best translation derivation and yields target translation simultaneously. Our experiments show that our model significantly outperforms the hierarchical phrasebased model and the tree-to-string model on English-Chinese Translation tasks.

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 Hierarchical Chunk-to-String Translation∗ Yang Feng† Dongdong Zhang‡ Mu Li‡ Ming Zhou‡ Qun Liu⋆ † Department of Computer Science ‡ Microsoft Research Asia University of Sheffield do zhang@mi cro s o ft . [sent-1, score-0.169]

2 cn ct Abstract We present a hierarchical chunk-to-string translation model, which can be seen as a compromise between the hierarchical phrasebased model and the tree-to-string model, to combine the merits of the two models. [sent-9, score-1.018]

3 With the help of shallow parsing, our model learns rules consisting of words and chunks and meanwhile introduce syntax cohesion. [sent-10, score-0.592]

4 Under the weighed synchronous context-free grammar defined by these rules, our model searches for the best translation derivation and yields target translation simultaneously. [sent-11, score-0.703]

5 Our experiments show that our model significantly outperforms the hierarchical phrasebased model and the tree-to-string model on English-Chinese Translation tasks. [sent-12, score-0.548]

6 ∗This work was done when the first author visited Microsoft Research Asia as an intern. [sent-15, score-0.045]

7 950 However, it is often desirable to consider syntactic constituents of subphrases, e. [sent-16, score-0.134]

8 the hierarchical phrase X → hX1 for X2, X2 de X1i can be applied to both of the following strings in Figure 1 “A request for a purchase of shares” “filed for bankruptcy”, and get the following translation, respectively “goumai gufen de shenqing” “pochan de shenqing”. [sent-18, score-0.5]

9 In the former, “A request” is a NP and this rule acts correctly while in the latter “filed” is a VP and this rule gives a wrong reordering. [sent-19, score-0.242]

10 If we specify the first X on the right-hand side to NP, this kind of errors can be avoided. [sent-20, score-0.112]

11 , 2006) introduces linguistic syntax via source parse to direct word reordering, especially longdistance reordering. [sent-23, score-0.224]

12 Furthermore, this model is formalised as Tree Substitution Grammars, so it observes syntactic cohesion. [sent-24, score-0.177]

13 Syntactic cohesion means that the translation of a string covered by a subtree in a source parse tends to be continuous. [sent-25, score-0.426]

14 Fox (2002) shows that translation between English and French satisfies cohesion in the majority cases. [sent-26, score-0.304]

15 Many previous works show promising results with an assumption that syntactic cohesion explains almost all translation movement for some language pairs (Wu, 1997; Yamada and Knight, 2001 ; Eisner, 2003; Graehl and Knight, 2004; Quirk et al. [sent-27, score-0.354]

16 This will lead to data sparseness and being vulnerable to parse errors. [sent-33, score-0.193]

17 In this paper, we present a hierarchical chunk-tostring translation model to combine the merits of the two models. [sent-34, score-0.602]

18 Instead of parse trees, our model introduces linguistic information in the form of chunks, so it does not need to care the internal structures and the roles in the main sentence of chunks. [sent-35, score-0.311]

19 Based on shallow parsing results, it learns rules consisting of either words (terminals) or chunks (nonterminals), where adjacent chunks are packed into one nonterminal. [sent-36, score-0.842]

20 It searches for the best derivation through the SCFG-motivated space defined by these rules and get target translation simultaneously. [sent-37, score-0.515]

21 In some sense, our model can be seen as a compromise between the hierarchical phrase-based model and the tree-to- string model, specifically • • • Compared with the hierarchical phrase-based model, eitd integrates linguistic syntax saen-db sseatdisfies syntactic cohesion. [sent-38, score-0.913]

22 Compared with the tree-to-string model, it only nCeoemdsp to perform seh tarleleo-wto parsing wodhieclh, i itn otnrolyduces less parsing errors. [sent-39, score-0.16]

23 Besides, our model allows a nonterminal in a rule to cover several chunks, which can alleviate data sparseness and the influence of parsing errors. [sent-40, score-0.494]

24 we refine our hierarchical chunk-to-string mwoede rle ifnintoe two rmo hdieerlsa:r a liocaolse model (Section 2. [sent-41, score-0.438]

25 1) which is more similar to the hierarchical phrase-based model and a tight model (Section 2. [sent-42, score-0.588]

26 The experiments show that on the 2008 NIST English-Chinese MT translation test set, both the loose model and the tight model outperform the hierarchical phrase-based model and the tree-to-string model, where the loose model has a better perfor- mance. [sent-44, score-1.251]

27 While in terms of speed, the tight model runs faster and its speed ranking is between the treeto-string model and the hierarchical phrase-based model. [sent-45, score-0.627]

28 951 NP IN NP IN NP VBD VP Agouremqauiestgfuofrenapduerchaseshenoqfingsharesbeiwasdimjia doe Tghaei bankyinhanghasyijnfgiledshenfqoirng banpokcruhapntcy 购买 NP 该的股份 VBZ 银行申请被递交 (a) VBN 已经 (b) IN NP 申请破产 Figure 1: A running example of two sentences. [sent-46, score-0.048]

29 For each sentence, the first row gives the chunk sequence. [sent-47, score-0.346]

30 S NP VP DT NN VBZ VP The bank has VBN PP filed IN NP for NN bankruptcy (a) A parse B-NP The I-NP bank B-VBZ has (b) A chunk tree B-VBN filed sequence got B-IN for B-NP bankruptcy from the parse tree Figure 2: An example of shallow parsing. [sent-48, score-1.689]

31 2 Modeling Shallow parsing (also chunking) is an analysis of a sentence which identifies the constituents (noun groups, verbs, verb groups, etc), but neither specifies their internal structures, nor their roles in the main sentence. [sent-49, score-0.28]

32 In Figure 1, we give the chunk sequence in the first row for each sentence. [sent-50, score-0.388]

33 We treat shallow parsing as a sequence label task, and a sentence f can have many possible different chunk la- bel sequences. [sent-51, score-0.604]

34 A SCFG produces a derivation by starting with a pair of start symbols and recursively rewrites every two coindexed nonterminals with the corresponding components of a matched rule. [sent-54, score-0.391]

35 A derivation yields a pair of strings on the right-hand side which are translation of each other. [sent-55, score-0.472]

36 In a weighted SCFG, each rule has a weight and the total weight of a derivation is the production of the weights of the rules used by the derivation. [sent-56, score-0.394]

37 A translation may be produced by many different derivations and we only use the best derivation to evaluate its probability. [sent-57, score-0.341]

38 We further refine our hierarchical chunk-to-string model into two models: a loose model which is more similar to the hierarchical phrase-based model and a tight model which is more similar to the tree-tostring model. [sent-59, score-1.221]

39 The two models differ in the form of rules and the way of estimating rule probabilities. [sent-60, score-0.171]

40 While for decoding, we employ the same decoding algorithm for the two models: given a test sentence, the decoders first perform shallow parsing to get the best chunk sequence, then apply a CYK parsing algorithm with beam search. [sent-61, score-0.688]

41 1 A Loose Model In our model, we employ rules containing nonterminals to handle long-distance reordering where boundary words play an important role. [sent-63, score-0.363]

42 So for the subphrases which cover more than one chunk, we just maintain boundary chunks: we bundle adjacent chunks into one nonterminal and denote it as the first chunk tag immediately followed by “-” and next followed by the last chunk tag. [sent-64, score-1.404]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('chunk', 0.3), ('hierarchical', 0.261), ('filed', 0.241), ('subphrases', 0.241), ('chunks', 0.221), ('np', 0.188), ('tight', 0.179), ('translation', 0.177), ('loose', 0.169), ('derivation', 0.164), ('bankruptcy', 0.158), ('scfg', 0.154), ('nonterminals', 0.143), ('shallow', 0.129), ('cohesion', 0.127), ('sheffield', 0.121), ('shenqing', 0.121), ('nonterminal', 0.116), ('vp', 0.109), ('rule', 0.1), ('cro', 0.095), ('merits', 0.09), ('compromise', 0.09), ('vbn', 0.09), ('vbz', 0.085), ('constituents', 0.084), ('terminals', 0.08), ('feng', 0.08), ('parsing', 0.08), ('parse', 0.077), ('model', 0.074), ('ft', 0.074), ('request', 0.074), ('roles', 0.072), ('rules', 0.071), ('denoting', 0.069), ('mi', 0.067), ('side', 0.066), ('phrasebased', 0.065), ('strings', 0.065), ('sparseness', 0.063), ('got', 0.062), ('cover', 0.061), ('bank', 0.059), ('production', 0.059), ('syntax', 0.058), ('searches', 0.056), ('refine', 0.055), ('synchronous', 0.055), ('asia', 0.054), ('vulnerable', 0.053), ('iuqun', 0.053), ('bel', 0.053), ('khk', 0.053), ('observes', 0.053), ('purchase', 0.053), ('reordering', 0.052), ('employ', 0.052), ('syntactic', 0.05), ('rle', 0.048), ('aln', 0.048), ('doe', 0.048), ('microsoft', 0.048), ('get', 0.047), ('row', 0.046), ('specify', 0.046), ('boundary', 0.045), ('longdistance', 0.045), ('conveniently', 0.045), ('hk', 0.045), ('kd', 0.045), ('mul', 0.045), ('sin', 0.045), ('visited', 0.045), ('string', 0.045), ('knight', 0.044), ('introduces', 0.044), ('internal', 0.044), ('matched', 0.044), ('tree', 0.043), ('besides', 0.043), ('nn', 0.043), ('adjacent', 0.042), ('vbd', 0.042), ('acts', 0.042), ('dongdong', 0.042), ('xc', 0.042), ('mingzhou', 0.042), ('reorderings', 0.042), ('grammars', 0.042), ('sequence', 0.042), ('rewrites', 0.04), ('followed', 0.039), ('learns', 0.039), ('speed', 0.039), ('groups', 0.039), ('occurrences', 0.039), ('fox', 0.039), ('etc', 0.039), ('packed', 0.039)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999982 108 acl-2012-Hierarchical Chunk-to-String Translation

Author: Yang Feng ; Dongdong Zhang ; Mu Li ; Qun Liu

2 0.17243892 25 acl-2012-An Exploration of Forest-to-String Translation: Does Translation Help or Hurt Parsing?

Author: Hui Zhang ; David Chiang

Abstract: Syntax-based translation models that operate on the output of a source-language parser have been shown to perform better if allowed to choose from a set of possible parses. In this paper, we investigate whether this is because it allows the translation stage to overcome parser errors or to override the syntactic structure itself. We find that it is primarily the latter, but that under the right conditions, the translation stage does correct parser errors, improving parsing accuracy on the Chinese Treebank.

3 0.16813083 155 acl-2012-NiuTrans: An Open Source Toolkit for Phrase-based and Syntax-based Machine Translation

Author: Tong Xiao ; Jingbo Zhu ; Hao Zhang ; Qiang Li

Abstract: We present a new open source toolkit for phrase-based and syntax-based machine translation. The toolkit supports several state-of-the-art models developed in statistical machine translation, including the phrase-based model, the hierachical phrase-based model, and various syntaxbased models. The key innovation provided by the toolkit is that the decoder can work with various grammars and offers different choices of decoding algrithms, such as phrase-based decoding, decoding as parsing/tree-parsing and forest-based decoding. Moreover, several useful utilities were distributed with the toolkit, including a discriminative reordering model, a simple and fast language model, and an implementation of minimum error rate training for weight tuning. 1

4 0.15788887 128 acl-2012-Learning Better Rule Extraction with Translation Span Alignment

Author: Jingbo Zhu ; Tong Xiao ; Chunliang Zhang

Abstract: This paper presents an unsupervised approach to learning translation span alignments from parallel data that improves syntactic rule extraction by deleting spurious word alignment links and adding new valuable links based on bilingual translation span correspondences. Experiments on Chinese-English translation demonstrate improvements over standard methods for tree-to-string and tree-to-tree translation. 1

5 0.1569604 148 acl-2012-Modified Distortion Matrices for Phrase-Based Statistical Machine Translation

Author: Arianna Bisazza ; Marcello Federico

Abstract: This paper presents a novel method to suggest long word reorderings to a phrase-based SMT decoder. We address language pairs where long reordering concentrates on few patterns, and use fuzzy chunk-based rules to predict likely reorderings for these phenomena. Then we use reordered n-gram LMs to rank the resulting permutations and select the n-best for translation. Finally we encode these reorderings by modifying selected entries of the distortion cost matrix, on a per-sentence basis. In this way, we expand the search space by a much finer degree than if we simply raised the distortion limit. The proposed techniques are tested on Arabic-English and German-English using well-known SMT benchmarks.

6 0.15017906 204 acl-2012-Translation Model Size Reduction for Hierarchical Phrase-based Statistical Machine Translation

7 0.15006971 109 acl-2012-Higher-order Constituent Parsing and Parser Combination

8 0.14467408 105 acl-2012-Head-Driven Hierarchical Phrase-based Translation

9 0.11961356 19 acl-2012-A Ranking-based Approach to Word Reordering for Statistical Machine Translation

10 0.11008604 147 acl-2012-Modeling the Translation of Predicate-Argument Structure for SMT

11 0.10485014 22 acl-2012-A Topic Similarity Model for Hierarchical Phrase-based Translation

12 0.10172882 141 acl-2012-Maximum Expected BLEU Training of Phrase and Lexicon Translation Models

13 0.099013604 140 acl-2012-Machine Translation without Words through Substring Alignment

14 0.0956508 3 acl-2012-A Class-Based Agreement Model for Generating Accurately Inflected Translations

15 0.095023386 127 acl-2012-Large-Scale Syntactic Language Modeling with Treelets

16 0.093959391 38 acl-2012-Bayesian Symbol-Refined Tree Substitution Grammars for Syntactic Parsing

17 0.093865916 139 acl-2012-MIX Is Not a Tree-Adjoining Language

18 0.093481414 131 acl-2012-Learning Translation Consensus with Structured Label Propagation

19 0.090766959 143 acl-2012-Mixing Multiple Translation Models in Statistical Machine Translation

20 0.0873513 116 acl-2012-Improve SMT Quality with Automatically Extracted Paraphrase Rules

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.238), (1, -0.184), (2, 0.001), (3, -0.031), (4, -0.039), (5, -0.114), (6, -0.028), (7, 0.101), (8, 0.051), (9, 0.028), (10, 0.011), (11, -0.09), (12, -0.027), (13, 0.003), (14, 0.009), (15, -0.082), (16, -0.02), (17, -0.019), (18, 0.009), (19, -0.13), (20, 0.028), (21, 0.108), (22, -0.118), (23, 0.091), (24, -0.057), (25, 0.033), (26, -0.017), (27, -0.154), (28, 0.044), (29, -0.044), (30, 0.028), (31, 0.056), (32, -0.142), (33, 0.042), (34, -0.123), (35, -0.007), (36, -0.06), (37, 0.017), (38, -0.02), (39, -0.069), (40, 0.078), (41, 0.03), (42, 0.003), (43, -0.057), (44, 0.067), (45, -0.064), (46, -0.032), (47, -0.016), (48, 0.043), (49, -0.03)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.94693518 108 acl-2012-Hierarchical Chunk-to-String Translation

Author: Yang Feng ; Dongdong Zhang ; Mu Li ; Qun Liu

2 0.778781 105 acl-2012-Head-Driven Hierarchical Phrase-based Translation

Author: Junhui Li ; Zhaopeng Tu ; Guodong Zhou ; Josef van Genabith

Abstract: This paper presents an extension of Chiang’s hierarchical phrase-based (HPB) model, called Head-Driven HPB (HD-HPB), which incorporates head information in translation rules to better capture syntax-driven information, as well as improved reordering between any two neighboring non-terminals at any stage of a derivation to explore a larger reordering search space. Experiments on Chinese-English translation on four NIST MT test sets show that the HD-HPB model significantly outperforms Chiang’s model with average gains of 1.91 points absolute in BLEU. 1

3 0.65528876 25 acl-2012-An Exploration of Forest-to-String Translation: Does Translation Help or Hurt Parsing?

Author: Hui Zhang ; David Chiang

4 0.65397966 155 acl-2012-NiuTrans: An Open Source Toolkit for Phrase-based and Syntax-based Machine Translation

Author: Tong Xiao ; Jingbo Zhu ; Hao Zhang ; Qiang Li

5 0.6421721 128 acl-2012-Learning Better Rule Extraction with Translation Span Alignment

Author: Jingbo Zhu ; Tong Xiao ; Chunliang Zhang

6 0.62005728 204 acl-2012-Translation Model Size Reduction for Hierarchical Phrase-based Statistical Machine Translation

7 0.58182496 19 acl-2012-A Ranking-based Approach to Word Reordering for Statistical Machine Translation

8 0.55320913 131 acl-2012-Learning Translation Consensus with Structured Label Propagation

9 0.53543466 148 acl-2012-Modified Distortion Matrices for Phrase-Based Statistical Machine Translation

10 0.52152658 57 acl-2012-Concept-to-text Generation via Discriminative Reranking

11 0.51572448 185 acl-2012-Strong Lexicalization of Tree Adjoining Grammars

12 0.509359 174 acl-2012-Semantic Parsing with Bayesian Tree Transducers

13 0.50446159 162 acl-2012-Post-ordering by Parsing for Japanese-English Statistical Machine Translation

14 0.49671653 127 acl-2012-Large-Scale Syntactic Language Modeling with Treelets

15 0.48930132 139 acl-2012-MIX Is Not a Tree-Adjoining Language

16 0.48768416 147 acl-2012-Modeling the Translation of Predicate-Argument Structure for SMT

17 0.4873665 109 acl-2012-Higher-order Constituent Parsing and Parser Combination

18 0.4541373 38 acl-2012-Bayesian Symbol-Refined Tree Substitution Grammars for Syntactic Parsing

19 0.45345506 107 acl-2012-Heuristic Cube Pruning in Linear Time

20 0.44366848 143 acl-2012-Mixing Multiple Translation Models in Statistical Machine Translation

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(26, 0.051), (28, 0.066), (30, 0.054), (37, 0.025), (39, 0.041), (53, 0.297), (57, 0.015), (74, 0.024), (82, 0.033), (84, 0.013), (85, 0.026), (90, 0.1), (92, 0.056), (94, 0.062), (99, 0.063)]

similar papers list:

simIndex simValue paperId paperTitle

1 0.80324006 51 acl-2012-Collective Generation of Natural Image Descriptions

Author: Polina Kuznetsova ; Vicente Ordonez ; Alexander Berg ; Tamara Berg ; Yejin Choi

Abstract: We present a holistic data-driven approach to image description generation, exploiting the vast amount of (noisy) parallel image data and associated natural language descriptions available on the web. More specifically, given a query image, we retrieve existing human-composed phrases used to describe visually similar images, then selectively combine those phrases to generate a novel description for the query image. We cast the generation process as constraint optimization problems, collectively incorporating multiple interconnected aspects of language composition for content planning, surface realization and discourse structure. Evaluation by human annotators indicates that our final system generates more semantically correct and linguistically appealing descriptions than two nontrivial baselines.

same-paper 2 0.71576399 108 acl-2012-Hierarchical Chunk-to-String Translation

Author: Yang Feng ; Dongdong Zhang ; Mu Li ; Qun Liu

3 0.47304162 83 acl-2012-Error Mining on Dependency Trees

Author: Claire Gardent ; Shashi Narayan

Abstract: In recent years, error mining approaches were developed to help identify the most likely sources of parsing failures in parsing systems using handcrafted grammars and lexicons. However the techniques they use to enumerate and count n-grams builds on the sequential nature of a text corpus and do not easily extend to structured data. In this paper, we propose an algorithm for mining trees and apply it to detect the most likely sources of generation failure. We show that this tree mining algorithm permits identifying not only errors in the generation system (grammar, lexicon) but also mismatches between the structures contained in the input and the input structures expected by our generator as well as a few idiosyncrasies/error in the input data.

4 0.47076729 80 acl-2012-Efficient Tree-based Approximation for Entailment Graph Learning

Author: Jonathan Berant ; Ido Dagan ; Meni Adler ; Jacob Goldberger

Abstract: Learning entailment rules is fundamental in many semantic-inference applications and has been an active field of research in recent years. In this paper we address the problem of learning transitive graphs that describe entailment rules between predicates (termed entailment graphs). We first identify that entailment graphs exhibit a “tree-like” property and are very similar to a novel type of graph termed forest-reducible graph. We utilize this property to develop an iterative efficient approximation algorithm for learning the graph edges, where each iteration takes linear time. We compare our approximation algorithm to a recently-proposed state-of-the-art exact algorithm and show that it is more efficient and scalable both theoretically and empirically, while its output quality is close to that given by the optimal solution of the exact algorithm.

5 0.46972954 76 acl-2012-Distributional Semantics in Technicolor

Author: Elia Bruni ; Gemma Boleda ; Marco Baroni ; Nam Khanh Tran

Abstract: Our research aims at building computational models of word meaning that are perceptually grounded. Using computer vision techniques, we build visual and multimodal distributional models and compare them to standard textual models. Our results show that, while visual models with state-of-the-art computer vision techniques perform worse than textual models in general tasks (accounting for semantic relatedness), they are as good or better models of the meaning of words with visual correlates such as color terms, even in a nontrivial task that involves nonliteral uses of such words. Moreover, we show that visual and textual information are tapping on different aspects of meaning, and indeed combining them in multimodal models often improves performance.

6 0.46871755 6 acl-2012-A Comprehensive Gold Standard for the Enron Organizational Hierarchy

7 0.46748823 139 acl-2012-MIX Is Not a Tree-Adjoining Language

8 0.46447083 40 acl-2012-Big Data versus the Crowd: Looking for Relationships in All the Right Places

9 0.46365497 22 acl-2012-A Topic Similarity Model for Hierarchical Phrase-based Translation

10 0.4635461 173 acl-2012-Self-Disclosure and Relationship Strength in Twitter Conversations

11 0.46265081 191 acl-2012-Temporally Anchored Relation Extraction

12 0.46167693 218 acl-2012-You Had Me at Hello: How Phrasing Affects Memorability

13 0.46095955 148 acl-2012-Modified Distortion Matrices for Phrase-Based Statistical Machine Translation

14 0.46090326 140 acl-2012-Machine Translation without Words through Substring Alignment

15 0.46077579 84 acl-2012-Estimating Compact Yet Rich Tree Insertion Grammars

16 0.46015409 206 acl-2012-UWN: A Large Multilingual Lexical Knowledge Base

17 0.46006352 29 acl-2012-Assessing the Effect of Inconsistent Assessors on Summarization Evaluation

18 0.45945662 123 acl-2012-Joint Feature Selection in Distributed Stochastic Learning for Large-Scale Discriminative Training in SMT

19 0.45943293 174 acl-2012-Semantic Parsing with Bayesian Tree Transducers

20 0.4591969 28 acl-2012-Aspect Extraction through Semi-Supervised Modeling