acl acl2010 acl2010-243 knowledge-graph by maker-knowledge-mining

243 acl-2010-Tree-Based and Forest-Based Translation

Source: pdf

Author: Yang Liu ; Liang Huang

Abstract: unkown-abstract

Reference: text

Summary: the most important sentenses genereted by tfidf model

sentIndex sentText sentNum sentScore

1 cn ct 1 Introduction The past several years have witnessed rapid advances in syntax-based machine translation, which exploits natural language syntax to guide translation. [sent-3, score-0.329]

2 Depending on the type of input, most ofthese efforts can be divided into two broad categories: (a) string-based systems whose input is a string, which is simultaneously parsed and translated by a synchronous grammar (Wu, 1997; Chiang, 2005; Galley et al. [sent-4, score-0.377]

3 , 2006), and (b) tree-based systems whose input is already a parse tree to be directly converted into a target tree or string (Lin, 2004; Ding and Palmer, 2005; Quirk et al. [sent-5, score-0.421]

4 Compared with their string-based counterparts, tree-based systems offer many attractive features: they are much faster in decoding (linear time vs. [sent-9, score-0.656]

5 cubic time), do not require sophisticated binarization (Zhang et al. [sent-10, score-0.207]

6 , 2006), and can use separate grammars for parsing and translation (e. [sent-11, score-0.383]

7 a context-free grammar for the former and a tree substitution grammar for the latter). [sent-13, score-0.344]

8 However, despite these advantages, most treebased systems suffer from a major drawback: they only use 1-best parse trees to direct translation, which potentially introduces translation mistakes due to parsing errors (Quirk and Corston-Oliver, 2006). [sent-14, score-0.996]

9 This situation becomes worse for resourcepoor source languages without enough Treebank data to train a high-accuracy parser. [sent-15, score-0.049]

10 This problem can be alleviated elegantly by using packed forests (Huang, 2008), which encodes exponentially many parse trees in a polynomial space. [sent-16, score-1.059]

11 , 2008; Mi and Huang, 2008) thus take a packed forest instead of a parse tree as an input. [sent-18, score-0.698]

12 In addition, packed forests could also be used for translation rule extraction, which helps alleviate the propagation of parsing errors into rule set. [sent-19, score-1.305]

13 Forest-based translation can be regarded as a compromise between the string-based and tree-based methods, while com2 Liang Huang Information Sciences Institute University of Southern California lhuang@ i i s . [sent-20, score-0.435]

14 edu bining the advantages of both: decoding is still fast, yet does not commit to a single parse. [sent-21, score-0.647]

15 Surprisingly, translating a forest of millions of trees is even faster than translating 30 individual trees, and offers significantly better translation quality. [sent-22, score-1.078]

16 2 Content Overview This tutorial surveys tree-based and forest-based translation methods. [sent-24, score-0.465]

17 For each approach, we will discuss the two fundamental tasks: decoding, which performs the actual translation, and rule extraction, which learns translation rules from realworld data automatically. [sent-25, score-0.603]

18 Finally, we will in- troduce some more recent developments to treebased and forest-based translation, such as tree sequence based models, tree-to-tree models, joint parsing and translation, and faster decoding algorithms. [sent-26, score-1.125]

19 We will conclude our talk by pointing out some directions for future work. [sent-27, score-0.165]

similar papers computed by tfidf model

tfidf for this paper:

wordName wordTfidf (topN-words)

[('decoding', 0.355), ('packed', 0.316), ('translation', 0.301), ('treebased', 0.202), ('faster', 0.191), ('forest', 0.182), ('huang', 0.173), ('forests', 0.153), ('quirk', 0.133), ('rule', 0.116), ('elegantly', 0.11), ('lhuang', 0.11), ('yliu', 0.11), ('alleviated', 0.11), ('troduce', 0.11), ('bining', 0.11), ('tree', 0.106), ('tutorial', 0.105), ('advantages', 0.103), ('overview', 0.102), ('trees', 0.102), ('mi', 0.098), ('translating', 0.097), ('realworld', 0.095), ('parse', 0.094), ('asscolcia', 0.09), ('cgeom', 0.09), ('jtuulytor', 0.09), ('witnessed', 0.085), ('binarization', 0.085), ('parsing', 0.082), ('pra', 0.082), ('sciences', 0.08), ('developments', 0.079), ('motivations', 0.079), ('commit', 0.079), ('compromise', 0.076), ('cubic', 0.076), ('ofthese', 0.076), ('liu', 0.075), ('sweden', 0.074), ('mistakes', 0.072), ('southern', 0.072), ('drawback', 0.072), ('uppsala', 0.072), ('grammar', 0.071), ('pointing', 0.07), ('extraction', 0.069), ('propagation', 0.066), ('string', 0.066), ('attractive', 0.065), ('af', 0.065), ('academy', 0.065), ('exponentially', 0.065), ('alleviate', 0.063), ('rapid', 0.062), ('ding', 0.062), ('counterparts', 0.062), ('galley', 0.059), ('sc', 0.059), ('surveys', 0.059), ('regarded', 0.058), ('talk', 0.057), ('polynomial', 0.057), ('millions', 0.056), ('institute', 0.056), ('synchronous', 0.055), ('substitution', 0.055), ('offers', 0.052), ('encodes', 0.052), ('exploits', 0.052), ('suffer', 0.052), ('surprisingly', 0.051), ('chiang', 0.051), ('situation', 0.049), ('converted', 0.049), ('pruning', 0.049), ('errors', 0.049), ('efforts', 0.048), ('learns', 0.047), ('palmer', 0.046), ('sophisticated', 0.046), ('ct', 0.046), ('extensions', 0.046), ('offer', 0.045), ('california', 0.044), ('fundamental', 0.044), ('helps', 0.043), ('guide', 0.043), ('simultaneously', 0.043), ('liang', 0.043), ('broad', 0.042), ('introduces', 0.042), ('translated', 0.042), ('past', 0.041), ('former', 0.041), ('fo', 0.04), ('yang', 0.039), ('popular', 0.039), ('conclude', 0.038)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.99999994 243 acl-2010-Tree-Based and Forest-Based Translation

Author: Yang Liu ; Liang Huang

Abstract: unkown-abstract

2 0.28440502 69 acl-2010-Constituency to Dependency Translation with Forests

Author: Haitao Mi ; Qun Liu

Abstract: Tree-to-string systems (and their forestbased extensions) have gained steady popularity thanks to their simplicity and efficiency, but there is a major limitation: they are unable to guarantee the grammaticality of the output, which is explicitly modeled in string-to-tree systems via targetside syntax. We thus propose to combine the advantages of both, and present a novel constituency-to-dependency translation model, which uses constituency forests on the source side to direct the translation, and dependency trees on the target side (as a language model) to ensure grammaticality. Medium-scale experiments show an absolute and statistically significant improvement of +0.7 BLEU points over a state-of-the-art forest-based tree-to-string system even with fewer rules. This is also the first time that a treeto-tree model can surpass tree-to-string counterparts.

3 0.26038891 118 acl-2010-Fine-Grained Tree-to-String Translation Rule Extraction

Author: Xianchao Wu ; Takuya Matsuzaki ; Jun'ichi Tsujii

Abstract: Tree-to-string translation rules are widely used in linguistically syntax-based statistical machine translation systems. In this paper, we propose to use deep syntactic information for obtaining fine-grained translation rules. A head-driven phrase structure grammar (HPSG) parser is used to obtain the deep syntactic information, which includes a fine-grained description of the syntactic property and a semantic representation of a sentence. We extract fine-grained rules from aligned HPSG tree/forest-string pairs and use them in our tree-to-string and string-to-tree systems. Extensive experiments on largescale bidirectional Japanese-English trans- lations testified the effectiveness of our approach.

4 0.21988545 71 acl-2010-Convolution Kernel over Packed Parse Forest

Author: Min Zhang ; Hui Zhang ; Haizhou Li

Abstract: This paper proposes a convolution forest kernel to effectively explore rich structured features embedded in a packed parse forest. As opposed to the convolution tree kernel, the proposed forest kernel does not have to commit to a single best parse tree, is thus able to explore very large object spaces and much more structured features embedded in a forest. This makes the proposed kernel more robust against parsing errors and data sparseness issues than the convolution tree kernel. The paper presents the formal definition of convolution forest kernel and also illustrates the computing algorithm to fast compute the proposed convolution forest kernel. Experimental results on two NLP applications, relation extraction and semantic role labeling, show that the proposed forest kernel significantly outperforms the baseline of the convolution tree kernel. 1

5 0.21571934 169 acl-2010-Learning to Translate with Source and Target Syntax

Author: David Chiang

Abstract: Statistical translation models that try to capture the recursive structure of language have been widely adopted over the last few years. These models make use of varying amounts of information from linguistic theory: some use none at all, some use information about the grammar of the target language, some use information about the grammar of the source language. But progress has been slower on translation models that are able to learn the relationship between the grammars of both the source and target language. We discuss the reasons why this has been a challenge, review existing attempts to meet this challenge, and show how some old and new ideas can be combined into a sim- ple approach that uses both source and target syntax for significant improvements in translation accuracy.

6 0.20147851 265 acl-2010-cdec: A Decoder, Alignment, and Learning Framework for Finite-State and Context-Free Translation Models

7 0.15364327 54 acl-2010-Boosting-Based System Combination for Machine Translation

8 0.12914643 48 acl-2010-Better Filtration and Augmentation for Hierarchical Phrase-Based Translation Rules

9 0.12820192 206 acl-2010-Semantic Parsing: The Task, the State of the Art and the Future

10 0.12192748 9 acl-2010-A Joint Rule Selection Model for Hierarchical Phrase-Based Translation

11 0.12014322 115 acl-2010-Filtering Syntactic Constraints for Statistical Machine Translation

12 0.11369187 154 acl-2010-Jointly Optimizing a Two-Step Conditional Random Field Model for Machine Transliteration and Its Fast Decoding Algorithm

13 0.10848286 93 acl-2010-Dynamic Programming for Linear-Time Incremental Parsing

14 0.10623505 133 acl-2010-Hierarchical Search for Word Alignment

15 0.10163417 31 acl-2010-Annotation

16 0.10162408 260 acl-2010-Wide-Coverage NLP with Linguistically Expressive Grammars

17 0.098400764 240 acl-2010-Training Phrase Translation Models with Leaving-One-Out

18 0.097624779 77 acl-2010-Cross-Language Document Summarization Based on Machine Translation Quality Prediction

19 0.096080638 51 acl-2010-Bilingual Sense Similarity for Statistical Machine Translation

20 0.091739014 75 acl-2010-Correcting Errors in a Treebank Based on Synchronous Tree Substitution Grammar

similar papers computed by lsi model

lsi for this paper:

topicId topicWeight

[(0, -0.208), (1, -0.243), (2, 0.05), (3, 0.008), (4, -0.128), (5, -0.074), (6, 0.126), (7, 0.022), (8, -0.258), (9, -0.009), (10, 0.124), (11, -0.058), (12, 0.143), (13, -0.02), (14, -0.016), (15, 0.125), (16, -0.067), (17, -0.009), (18, -0.039), (19, 0.009), (20, 0.091), (21, -0.04), (22, -0.192), (23, -0.088), (24, -0.156), (25, 0.014), (26, 0.02), (27, 0.043), (28, 0.051), (29, -0.071), (30, -0.07), (31, -0.116), (32, -0.074), (33, -0.045), (34, -0.077), (35, -0.17), (36, -0.033), (37, 0.082), (38, 0.085), (39, 0.047), (40, -0.026), (41, 0.009), (42, -0.009), (43, -0.075), (44, 0.008), (45, 0.005), (46, -0.058), (47, 0.017), (48, 0.001), (49, 0.015)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.96689606 243 acl-2010-Tree-Based and Forest-Based Translation

Author: Yang Liu ; Liang Huang

Abstract: unkown-abstract

2 0.81202203 118 acl-2010-Fine-Grained Tree-to-String Translation Rule Extraction

Author: Xianchao Wu ; Takuya Matsuzaki ; Jun'ichi Tsujii

3 0.73510814 69 acl-2010-Constituency to Dependency Translation with Forests

Author: Haitao Mi ; Qun Liu

4 0.70792365 265 acl-2010-cdec: A Decoder, Alignment, and Learning Framework for Finite-State and Context-Free Translation Models

Author: Chris Dyer ; Adam Lopez ; Juri Ganitkevitch ; Jonathan Weese ; Ferhan Ture ; Phil Blunsom ; Hendra Setiawan ; Vladimir Eidelman ; Philip Resnik

Abstract: Adam Lopez University of Edinburgh alopez@inf.ed.ac.uk Juri Ganitkevitch Johns Hopkins University juri@cs.jhu.edu Ferhan Ture University of Maryland fture@cs.umd.edu Phil Blunsom Oxford University pblunsom@comlab.ox.ac.uk Vladimir Eidelman University of Maryland vlad@umiacs.umd.edu Philip Resnik University of Maryland resnik@umiacs.umd.edu classes in a unified way.1 Although open source decoders for both phraseWe present cdec, an open source framework for decoding, aligning with, and training a number of statistical machine translation models, including word-based models, phrase-based models, and models based on synchronous context-free grammars. Using a single unified internal representation for translation forests, the decoder strictly separates model-specific translation logic from general rescoring, pruning, and inference algorithms. From this unified representation, the decoder can extract not only the 1- or k-best translations, but also alignments to a reference, or the quantities necessary to drive discriminative training using gradient-based or gradient-free optimization techniques. Its efficient C++ implementation means that memory use and runtime performance are significantly better than comparable decoders.

5 0.5485068 169 acl-2010-Learning to Translate with Source and Target Syntax

Author: David Chiang

6 0.52836406 71 acl-2010-Convolution Kernel over Packed Parse Forest

7 0.49998057 9 acl-2010-A Joint Rule Selection Model for Hierarchical Phrase-Based Translation

8 0.48384684 115 acl-2010-Filtering Syntactic Constraints for Statistical Machine Translation

9 0.47726303 54 acl-2010-Boosting-Based System Combination for Machine Translation

10 0.47050813 48 acl-2010-Better Filtration and Augmentation for Hierarchical Phrase-Based Translation Rules

11 0.4411158 211 acl-2010-Simple, Accurate Parsing with an All-Fragments Grammar

12 0.40784261 235 acl-2010-Tools for Multilingual Grammar-Based Translation on the Web

13 0.40729949 260 acl-2010-Wide-Coverage NLP with Linguistically Expressive Grammars

14 0.40513653 154 acl-2010-Jointly Optimizing a Two-Step Conditional Random Field Model for Machine Transliteration and Its Fast Decoding Algorithm

15 0.39651409 67 acl-2010-Computing Weakest Readings

16 0.39245802 98 acl-2010-Efficient Staggered Decoding for Sequence Labeling

17 0.39155838 249 acl-2010-Unsupervised Search for the Optimal Segmentation for Statistical Machine Translation

18 0.38501504 97 acl-2010-Efficient Path Counting Transducers for Minimum Bayes-Risk Decoding of Statistical Machine Translation Lattices

19 0.3610667 21 acl-2010-A Tree Transducer Model for Synchronous Tree-Adjoining Grammars

20 0.35021076 46 acl-2010-Bayesian Synchronous Tree-Substitution Grammar Induction and Its Application to Sentence Compression

similar papers computed by lda model

lda for this paper:

topicId topicWeight

[(25, 0.13), (42, 0.011), (44, 0.473), (59, 0.111), (73, 0.023), (78, 0.029), (83, 0.065), (98, 0.067)]

similar papers list:

simIndex simValue paperId paperTitle

same-paper 1 0.8827495 243 acl-2010-Tree-Based and Forest-Based Translation

Author: Yang Liu ; Liang Huang

Abstract: unkown-abstract

2 0.74701309 165 acl-2010-Learning Script Knowledge with Web Experiments

Author: Michaela Regneri ; Alexander Koller ; Manfred Pinkal

Abstract: We describe a novel approach to unsupervised learning of the events that make up a script, along with constraints on their temporal ordering. We collect naturallanguage descriptions of script-specific event sequences from volunteers over the Internet. Then we compute a graph representation of the script’s temporal structure using a multiple sequence alignment algorithm. The evaluation of our system shows that we outperform two informed baselines.

3 0.70705408 210 acl-2010-Sentiment Translation through Lexicon Induction

Author: Christian Scheible

Abstract: The translation of sentiment information is a task from which sentiment analysis systems can benefit. We present a novel, graph-based approach using SimRank, a well-established vertex similarity algorithm to transfer sentiment information between a source language and a target language graph. We evaluate this method in comparison with SO-PMI.

4 0.46110272 86 acl-2010-Discourse Structure: Theory, Practice and Use

Author: Bonnie Webber ; Markus Egg ; Valia Kordoni

Abstract: unkown-abstract

5 0.42735898 206 acl-2010-Semantic Parsing: The Task, the State of the Art and the Future

Author: Rohit J. Kate ; Yuk Wah Wong

Abstract: unkown-abstract

6 0.40800485 37 acl-2010-Automatic Evaluation Method for Machine Translation Using Noun-Phrase Chunking

7 0.39712381 247 acl-2010-Unsupervised Event Coreference Resolution with Rich Linguistic Features

8 0.39644843 31 acl-2010-Annotation

9 0.39328057 69 acl-2010-Constituency to Dependency Translation with Forests

10 0.38748339 260 acl-2010-Wide-Coverage NLP with Linguistically Expressive Grammars

11 0.38272268 71 acl-2010-Convolution Kernel over Packed Parse Forest